Dr. Peng Ouyang
Coarse-grained reconfigurable architecture (CGRA) is a promising solution for high-performance and high energy-efficiency computing, which can be reconfigured dynamically at runtime. However, there were no effective design automation methods and high-level synthesis (HLS) theory when mapping software applications onto CGRA architecture. Our research focuses on design automation methods for general-purpose CGRA, which mainly focuses on four major challenges on these issues:
(1) Parallelism of applications exploitation for high performance;
(2) Memory management for access conflicts reduction;
(3) Configuration context compression for reconfiguration cost reduction;
(4) Energy management for high energy-efficiency solutions.
Cloud computing provides shared computer processing resources and data to computers and other devices on demand. Most cloud platform is based on CPUs and GPUs, whose power consumption can be very high. Here we design a reconfigurable cloud platform, which uses our CHAMELEON CGRA chip as accelerator. Each CHAMELEON chip has 4x8x8 reconfigurable PEs using 65nm technology. We integrate two CHAMELEON chips onto a FPGA-assisted PCI-E board, and insert four PCI-E boards in one server. An elastic management system is build over a five-node (1 master + 4 slaves) cluster. The computing speed shows a near-linear relationship with the number of computing nodes, and the computing efficiency is about three orders-of-magnitude better than Xeon CPU under 200MHz clock.
"Thinker” is an energy-efficient hybrid neural network (NN) processor fabricated using 65nm technology. It has two 16x16 reconfigurable heterogeneous processing elements (PEs) arrays. To accelerate a hybrid-NN, PE array is designed to support on demand partitioning and reconfiguration for parallel processing different NNs. To improve the energy efficiency, each PE supports bit-width adaptive computing to meet variant bit-width of different neural layers. Measurement results show that this processor achieves a peak 409.6GOPS running at 200MHz and at most 5.09TOPS/W energy efficiency. It outperforms the state-of-the-art up to 5.2X in energy efficiency.