3 research outputs found
An End-to-End HW/SW Co-Design Methodology to Design Efficient Deep Neural Network Systems using Virtual Models
End-to-end performance estimation and measurement of deep neural network
(DNN) systems become more important with increasing complexity of DNN systems
consisting of hardware and software components. The methodology proposed in
this paper aims at a reduced turn-around time for evaluating different design
choices of hardware and software components of DNN systems. This reduction is
achieved by moving the performance estimation from the implementation phase to
the concept phase by employing virtual hardware models instead of gathering
measurement results from physical prototypes. Deep learning compilers introduce
hardware-specific transformations and are, therefore, considered a part of the
design flow of virtual system models to extract end-to-end performance
estimations. To validate the run-time accuracy of the proposed methodology, a
system processing the DilatedVGG DNN is realized both as virtual system model
and as hardware implementation. The results show that up to 92 % accuracy can
be reached in predicting the processing time of the DNN inference
Design and Implementation of Hardware Accelerators for Neural Processing Applications
Primary motivation for this work was the need to implement hardware
accelerators for a newly proposed ANN structure called Auto Resonance Network
(ARN) for robotic motion planning. ARN is an approximating feed-forward
hierarchical and explainable network. It can be used in various AI applications
but the application base was small. Therefore, the objective of the research
was twofold: to develop a new application using ARN and to implement a hardware
accelerator for ARN. As per the suggestions given by the Doctoral Committee, an
image recognition system using ARN has been implemented. An accuracy of around
94% was achieved with only 2 layers of ARN. The network also required a small
training data set of about 500 images. Publicly available MNIST dataset was
used for this experiment. All the coding was done in Python. Massive
parallelism seen in ANNs presents several challenges to CPU design. For a given
functionality, e.g., multiplication, several copies of serial modules can be
realized within the same area as a parallel module. Advantage of using serial
modules compared to parallel modules under area constraints has been discussed.
One of the module often useful in ANNs is a multi-operand addition. One problem
in its implementation is that the estimation of carry bits when the number of
operands changes. A theorem to calculate exact number of carry bits required
for a multi-operand addition has been presented in the thesis which alleviates
this problem. The main advantage of the modular approach to multi-operand
addition is the possibility of pipelined addition with low reconfiguration
overhead. This results in overall increase in throughput for large number of
additions, typically seen in several DNN configurations