联系我们
意见反馈

关注公众号

获得最新科研资讯

RAPID Team

Intro block

Share

Efficient Compiling on CGRAs

Coarse-grained reconfigurable architecture (CGRA) is a promising solution for high-performance and high energy-efficiency computing, which can be reconfigured dynamically at runtime. However, there were no effective design automation methods and high-level synthesis (HLS) theory when mapping software applications onto CGRA architecture. Our research focuses on design automation methods for general-purpose CGRA, which mainly focuses on four major challenges on these issues:



(1) Parallelism of applications exploitation for high performance;

(2) Memory management for access conflicts reduction;

(3) Configuration context compression for reconfiguration cost reduction;

(4) Energy management for high energy-efficiency solutions.

Reconfigurable Cloud Platform

Cloud computing provides shared computer processing resources and data to computers and other devices on demand. Most cloud platform is based on CPUs and GPUs, whose power consumption can be very high. Here we design a reconfigurable cloud platform, which uses our CHAMELEON CGRA chip as accelerator. Each CHAMELEON chip has 4x8x8 reconfigurable PEs using 65nm technology. We integrate two CHAMELEON chips onto a FPGA-assisted PCI-E board, and insert four PCI-E boards in one server. An elastic management system is build over a five-node (1 master + 4 slaves) cluster.  The computing speed shows a near-linear relationship with the number of computing nodes, and the computing efficiency is about three orders-of-magnitude better than Xeon CPU under 200MHz clock.

Thinker - Reconfigurable Neural Network Processor

"Thinker” is an energy-efficient hybrid neural network (NN) processor fabricated using 65nm technology. It has two 16x16 reconfigurable heterogeneous processing elements (PEs) arrays. To accelerate a hybrid-NN, PE array is designed to support on demand partitioning and reconfiguration for parallel processing different NNs. To improve the energy efficiency, each PE supports bit-width adaptive computing to meet variant bit-width of different neural layers. Measurement results show that this processor achieves a peak 409.6GOPS running at 200MHz and at most 5.09TOPS/W energy efficiency. It outperforms the state-of-the-art up to 5.2X in energy efficiency.

visits:649