A Low Power High Throughput Architecture for Deep Network Training by unknown
A Low Power High Throughput Architecture for Deep Network
Yangjie Qi and Rasitha Fernando
Advisors: Dr. Tarek M. Taha
The Algorithm and Core Design Network Mapping StrategyIntroduction
• Deep learning is a very important set of algorithms
that are now used for processing data in multiple
ways. One of the key aspects of deep learning, is
training. This is where the algorithm learns how to
classify new data. Training is very expensive in
time and energy, and thus is only done on large
powerful computers.
• We have developed a novel specialized chip that
can do this learning at very low power
consumption. The reduced power consumption will
allow this chip to be used in everyday devices like
cell phones, medical devices, and robots (the range
of devices is enormous), to make them much more
smart as they would be able to adapt on their own.
Formula for Back Propagation AlgorithmThe Data Flow for Training
Circuits for 
Backpropagation (C2)
Circuits for Transposed 
Weight Update (C4) 
Proposed multicore system SRAM based static routing switch











































































































































































































































𝐷𝑃(𝑖+1) = 𝑍(𝑖)𝑊(𝑖)→(𝑖+1) (1)
𝑍(𝑖+1) = 𝑓(𝐷𝑃 𝑖+1 ) (2)
𝐹(𝑖+1) = 𝑓′(𝐷𝑃 𝑖+1 ) (3)
𝐷(𝑜𝑢𝑡) = 𝐹(𝑖)⨀(𝑍 𝑜𝑢𝑡 − 𝑌) (4)
𝐷(𝑖) = 𝐹(𝑖)⨀𝐷(𝑖+1)(𝑊(𝑖)→(𝑖+1))𝑇 (5)
∆𝑊(𝑖)→(𝑖+1) = −𝜂((𝐷 𝑖+1 )𝑇(𝑍 𝑖 )𝑇) (6)
Calculation Equation Circuit
a (1) + (2) C1
b (3) C1
c (2) + (4) C1
d (5) C2



























































I       0
I    
0











Circuits for Inference 
Pass (C1)
Circuits for Weight 
Update (C3)
Circuits for Inference Pass 
































Circuits for Backpropagation and 






































T I       0
I    
0



















































































General Mapping Strategy Proposed Mapping Strategy






































































1 10 100 1000






Power Efficiency Compare with 
Other Design
0 1 10 100 1000













• The system have both training and recognition 
(evaluation of new input) capabilities. 
• The chip was about 2000x more energy efficient and 
about 14 times faster than an NVIDIA graphics 
processor for learning multiple types of data.
The Relationship
