2,049 research outputs found
Trends in hardware architecture for mobile devices
In the last ten years, two main factors have fueled the steady growth in sales
of mobile computing and communication devices: a) the reduction of the
footprint of the devices themselves, such as cellular handsets and small
computers; and b) the success in developing low-power hardware which allows
the devices to operate autonomously for hours or even days. In this review, I
show that the first generation of mobile devices was DSP centric β that is,
its architecture was based in fast processing of digitized signals using low-
power, yet numerically powerful DSPs. However, the next generation of mobile
devices will be built around DSPs and low power microprocessor cores for
general processing applications. Mobile devices will become data-centric. The
main challenge for designers of such hybrid architectures is to increase the
computational performance of the computing unit, while keeping power constant,
or even reducing it. This report shows that low-power mobile hardware
architectures design goes hand in hand with advances in compiling techniques.
We look at the synergy between hardware and software, and show that a good
balance between both can lead to innovative lowpower processor architectures
Compiler optimization and ordering effects on VLIW code compression
Code size has always been an important issue for all embedded applications as well as larger systems. Code compression techniques have been devised as a way of battling bloated code; however, the impact of VLIW compiler methods and outputs on these compression schemes has not been thoroughly investigated. This paper describes the application of single- and multipleinstruction dictionary methods for code compression to decrease overall code size for the TI TMS320C6xxx DSP family. The compression scheme is applied to benchmarks taken from the Mediabench benchmark suite built with differing compiler optimization parameters. In the single instruction encoding scheme, it was found that compression ratios were not a useful indicator of the best overall code size β the best results (smallest overall code size) were obtained when the compression scheme was applied to sizeoptimized code. In the multiple instruction encoding scheme, changing parallel instruction order was found to only slightly improve compression in unoptimized code and does not affect the code compression when it is applied to builds already optimized for size
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
- β¦