Indexed on: 08 Nov '16Published on: 08 Nov '16Published in: arXiv - Computer Science - Architecture
Convolutional neural networks (CNNs) have been widely employed in many applications such as image classification, video analysis and speech recognition. Being compute-intensive, CNN computations are mainly accelerated by GPUs with high power dissipations. Recently, studies were carried out exploiting FPGA as CNN accelerator because of its reconfigurability and energy efficiency advantage over GPU, especially when OpenCL-based high-level synthesis tools are now available providing fast verification and implementation flows. Previous OpenCL-based design only focused on creating a generic framework to identify performance-related hardware parameters, without utilizing FPGA's special capability of pipelining kernel functions to minimize memory bandwidth requirement. In this work, we propose an FPGA accelerator with a new architecture of deeply pipelined OpenCL kernels. Data reuse and task mapping techniques are also presented to improve design efficiency. The proposed schemes are verified by implementing two representative large-scale CNNs, AlexNet and VGG on Altera Stratix-V A7 FPGA. We have achieved a similar peak performance of 33.9 GOPS with a 34% resource reduction on DSP blocks compared to previous work. Our design is openly accessible and thus can be reused to explore new architectures for neural network accelerators.