1,130 research outputs found
New Proposed Methodologies for Detection of Eye Diseases in Human Beings using HDL, Modelsim Matlab, Python & Tensor Flow w.r.t. the Bio-Medical Image Processing Point of View
In this research paper, the proposed methodologies for glaucoma detection are presented using different hardware & software tools
Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems
Deep Learning is increasingly being adopted by industry for computer vision
applications running on embedded devices. While Convolutional Neural Networks'
accuracy has achieved a mature and remarkable state, inference latency and
throughput are a major concern especially when targeting low-cost and low-power
embedded platforms. CNNs' inference latency may become a bottleneck for Deep
Learning adoption by industry, as it is a crucial specification for many
real-time processes. Furthermore, deployment of CNNs across heterogeneous
platforms presents major compatibility issues due to vendor-specific technology
and acceleration libraries. In this work, we present QS-DNN, a fully automatic
search based on Reinforcement Learning which, combined with an inference engine
optimizer, efficiently explores through the design space and empirically finds
the optimal combinations of libraries and primitives to speed up the inference
of CNNs on heterogeneous embedded devices. We show that, an optimized
combination can achieve 45x speedup in inference latency on CPU compared to a
dependency-free baseline and 2x on average on GPGPU compared to the best vendor
library. Further, we demonstrate that, the quality of results and time
"to-solution" is much better than with Random Search and achieves up to 15x
better results for a short-time search
A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures
In recent years, the field of Deep Learning has seen many disruptive and
impactful advancements. Given the increasing complexity of deep neural
networks, the need for efficient hardware accelerators has become more and more
pressing to design heterogeneous HPC platforms. The design of Deep Learning
accelerators requires a multidisciplinary approach, combining expertise from
several areas, spanning from computer architecture to approximate computing,
computational models, and machine learning algorithms. Several methodologies
and tools have been proposed to design accelerators for Deep Learning,
including hardware-software co-design approaches, high-level synthesis methods,
specific customized compilers, and methodologies for design space exploration,
modeling, and simulation. These methodologies aim to maximize the exploitable
parallelism and minimize data movement to achieve high performance and energy
efficiency. This survey provides a holistic review of the most influential
design methodologies and EDA tools proposed in recent years to implement Deep
Learning accelerators, offering the reader a wide perspective in this rapidly
evolving field. In particular, this work complements the previous survey
proposed by the same authors in [203], which focuses on Deep Learning hardware
accelerators for heterogeneous HPC platforms
Software-defined Design Space Exploration for an Efficient DNN Accelerator Architecture
Deep neural networks (DNNs) have been shown to outperform conventional
machine learning algorithms across a wide range of applications, e.g., image
recognition, object detection, robotics, and natural language processing.
However, the high computational complexity of DNNs often necessitates extremely
fast and efficient hardware. The problem gets worse as the size of neural
networks grows exponentially. As a result, customized hardware accelerators
have been developed to accelerate DNN processing without sacrificing model
accuracy. However, previous accelerator design studies have not fully
considered the characteristics of the target applications, which may lead to
sub-optimal architecture designs. On the other hand, new DNN models have been
developed for better accuracy, but their compatibility with the underlying
hardware accelerator is often overlooked. In this article, we propose an
application-driven framework for architectural design space exploration of DNN
accelerators. This framework is based on a hardware analytical model of
individual DNN operations. It models the accelerator design task as a
multi-dimensional optimization problem. We demonstrate that it can be
efficaciously used in application-driven accelerator architecture design. Given
a target DNN, the framework can generate efficient accelerator design solutions
with optimized performance and area. Furthermore, we explore the opportunity to
use the framework for accelerator configuration optimization under simultaneous
diverse DNN applications. The framework is also capable of improving neural
network models to best fit the underlying hardware resources
- …