1,068 research outputs found
Software-defined Design Space Exploration for an Efficient DNN Accelerator Architecture
Deep neural networks (DNNs) have been shown to outperform conventional
machine learning algorithms across a wide range of applications, e.g., image
recognition, object detection, robotics, and natural language processing.
However, the high computational complexity of DNNs often necessitates extremely
fast and efficient hardware. The problem gets worse as the size of neural
networks grows exponentially. As a result, customized hardware accelerators
have been developed to accelerate DNN processing without sacrificing model
accuracy. However, previous accelerator design studies have not fully
considered the characteristics of the target applications, which may lead to
sub-optimal architecture designs. On the other hand, new DNN models have been
developed for better accuracy, but their compatibility with the underlying
hardware accelerator is often overlooked. In this article, we propose an
application-driven framework for architectural design space exploration of DNN
accelerators. This framework is based on a hardware analytical model of
individual DNN operations. It models the accelerator design task as a
multi-dimensional optimization problem. We demonstrate that it can be
efficaciously used in application-driven accelerator architecture design. Given
a target DNN, the framework can generate efficient accelerator design solutions
with optimized performance and area. Furthermore, we explore the opportunity to
use the framework for accelerator configuration optimization under simultaneous
diverse DNN applications. The framework is also capable of improving neural
network models to best fit the underlying hardware resources
An Extended Model for Multi-Criteria Software Component Allocation on a Heterogeneous Embedded Platform
A recent development of heterogeneous platforms (i.e. those containing different types of computational units such as multicore CPUs, GPUs, and FPGAs) has enabled significant improvements in performance for real-time data processing. This potential, however, is still not fully utilized due to the lack of methods for optimal configuration of software; the allocation of different software components to different computational unit types is crucial for getting the maximal utilization of the platform, but for more complex systems it is difficult to find ad-hoc a good enough or the best configuration. With respect to system and user defined constraints, in this paper we are applying analytical hierarchical process and a genetic algorithm to find feasible, locally optimal solution for allocating software components to computational units
Optimization of Discrete-parameter Multiprocessor Systems using a Novel Ergodic Interpolation Technique
Modern multi-core systems have a large number of design parameters, most of
which are discrete-valued, and this number is likely to keep increasing as chip
complexity rises. Further, the accurate evaluation of a potential design choice
is computationally expensive because it requires detailed cycle-accurate system
simulation. If the discrete parameter space can be embedded into a larger
continuous parameter space, then continuous space techniques can, in principle,
be applied to the system optimization problem. Such continuous space techniques
often scale well with the number of parameters.
We propose a novel technique for embedding the discrete parameter space into
an extended continuous space so that continuous space techniques can be applied
to the embedded problem using cycle accurate simulation for evaluating the
objective function. This embedding is implemented using simulation-based
ergodic interpolation, which, unlike spatial interpolation, produces the
interpolated value within a single simulation run irrespective of the number of
parameters. We have implemented this interpolation scheme in a cycle-based
system simulator. In a characterization study, we observe that the interpolated
performance curves are continuous, piece-wise smooth, and have low statistical
error. We use the ergodic interpolation-based approach to solve a large
multi-core design optimization problem with 31 design parameters. Our results
indicate that continuous space optimization using ergodic interpolation-based
embedding can be a viable approach for large multi-core design optimization
problems.Comment: A short version of this paper will be published in the proceedings of
IEEE MASCOTS 2015 conferenc
Automatic Nested Loop Acceleration on FPGAs Using Soft CGRA Overlay
Session 1: HLS Toolingpostprin
High-Level Synthesis Hardware Design for FPGA-Based Accelerators: Models, Methodologies, and Frameworks
Hardware accelerators based on field programmable gate array (FPGA) and system on chip (SoC) devices have gained attention in recent years. One of the main reasons is that these devices contain reconfigurable logic, which makes them feasible for boosting the performance of applications. High-level synthesis (HLS) tools facilitate the creation of FPGA code from a high level of abstraction using different directives to obtain an optimized hardware design based on performance metrics. However, the complexity of the design space depends on different factors such as the number of directives used in the source code, the available resources in the device, and the clock frequency. Design space exploration (DSE) techniques comprise the evaluation of multiple implementations with different combinations of directives to obtain a design with a good compromise between different metrics. This paper presents a survey of models, methodologies, and frameworks proposed for metric estimation, FPGA-based DSE, and power consumption estimation on FPGA/SoC. The main features, limitations, and trade-offs of these approaches are described. We also present the integration of existing models and frameworks in diverse research areas and identify the different challenges to be addressed
- …