Search CORE

2,049 research outputs found

Trends in hardware architecture for mobile devices

Author: Esponda Margarita
Publication venue
Publication date: 01/01/2004
Field of study

In the last ten years, two main factors have fueled the steady growth in sales of mobile computing and communication devices: a) the reduction of the footprint of the devices themselves, such as cellular handsets and small computers; and b) the success in developing low-power hardware which allows the devices to operate autonomously for hours or even days. In this review, I show that the first generation of mobile devices was DSP centric – that is, its architecture was based in fast processing of digitized signals using low- power, yet numerically powerful DSPs. However, the next generation of mobile devices will be built around DSPs and low power microprocessor cores for general processing applications. Mobile devices will become data-centric. The main challenge for designers of such hybrid architectures is to increase the computational performance of the computing unit, while keeping power constant, or even reducing it. This report shows that low-power mobile hardware architectures design goes hand in hand with advances in compiling techniques. We look at the synergy between hardware and software, and show that a good balance between both can lead to innovative lowpower processor architectures

Compiler optimization and ordering effects on VLIW code compression

Author: Ros Montserrat
Sutton Peter
Publication venue: 'Sociological Research Online'
Publication date: 01/01/2003
Field of study

Code size has always been an important issue for all embedded applications as well as larger systems. Code compression techniques have been devised as a way of battling bloated code; however, the impact of VLIW compiler methods and outputs on these compression schemes has not been thoroughly investigated. This paper describes the application of single- and multipleinstruction dictionary methods for code compression to decrease overall code size for the TI TMS320C6xxx DSP family. The compression scheme is applied to benchmarks taken from the Mediabench benchmark suite built with differing compiler optimization parameters. In the single instruction encoding scheme, it was found that compression ratios were not a useful indicator of the best overall code size – the best results (smallest overall code size) were obtained when the compression scheme was applied to sizeoptimized code. In the multiple instruction encoding scheme, changing parallel instruction order was found to only slightly improve compression in unoptimized code and does not affect the code compression when it is applied to builds already optimized for size

Research Online

Compiler optimization and ordering effects on VLIW code compression

Author: Montserrat Ros
Peter Sutton
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

Author: Bouganis Christos-Savvas
Kouris Alexandros
Venieris Stylianos I.
Publication venue
Publication date: 19/02/2018
Field of study

In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository