653 research outputs found
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
Energy challenges for ICT
The energy consumption from the expanding use of information and communications technology (ICT) is unsustainable with present drivers, and it will impact heavily on the future climate change. However, ICT devices have the potential to contribute signi - cantly to the reduction of CO2 emission and enhance resource e ciency in other sectors, e.g., transportation (through intelligent transportation and advanced driver assistance systems and self-driving vehicles), heating (through smart building control), and manu- facturing (through digital automation based on smart autonomous sensors). To address the energy sustainability of ICT and capture the full potential of ICT in resource e - ciency, a multidisciplinary ICT-energy community needs to be brought together cover- ing devices, microarchitectures, ultra large-scale integration (ULSI), high-performance computing (HPC), energy harvesting, energy storage, system design, embedded sys- tems, e cient electronics, static analysis, and computation. In this chapter, we introduce challenges and opportunities in this emerging eld and a common framework to strive towards energy-sustainable ICT
Ozone: Efficient Execution with Zero Timing Leakage for Modern Microarchitectures
Time variation during program execution can leak sensitive information. Time
variations due to program control flow and hardware resource contention have
been used to steal encryption keys in cipher implementations such as AES and
RSA. A number of approaches to mitigate timing-based side-channel attacks have
been proposed including cache partitioning, control-flow obfuscation and
injecting timing noise into the outputs of code. While these techniques make
timing-based side-channel attacks more difficult, they do not eliminate the
risks. Prior techniques are either too specific or too expensive, and all leave
remnants of the original timing side channel for later attackers to attempt to
exploit.
In this work, we show that the state-of-the-art techniques in timing
side-channel protection, which limit timing leakage but do not eliminate it,
still have significant vulnerabilities to timing-based side-channel attacks. To
provide a means for total protection from timing-based side-channel attacks, we
develop Ozone, the first zero timing leakage execution resource for a modern
microarchitecture. Code in Ozone execute under a special hardware thread that
gains exclusive access to a single core's resources for a fixed (and limited)
number of cycles during which it cannot be interrupted. Memory access under
Ozone thread execution is limited to a fixed size uncached scratchpad memory,
and all Ozone threads begin execution with a known fixed microarchitectural
state. We evaluate Ozone using a number of security sensitive kernels that have
previously been targets of timing side-channel attacks, and show that Ozone
eliminates timing leakage with minimal performance overhead
A Comparative Study of Scheduling Techniques for Multimedia Applications on SIMD Pipelines
Parallel architectures are essential in order to take advantage of the
parallelism inherent in streaming applications. One particular branch of these
employ hardware SIMD pipelines. In this paper, we analyse several scheduling
techniques, namely ad hoc overlapped execution, modulo scheduling and modulo
scheduling with unrolling, all of which aim to efficiently utilize the special
architecture design. Our investigation focuses on improving throughput while
analysing other metrics that are important for streaming applications, such as
register pressure, buffer sizes and code size. Through experiments conducted on
several media benchmarks, we present and discuss trade-offs involved when
selecting any one of these scheduling techniques.Comment: Presented at DATE Friday Workshop on Heterogeneous Architectures and
Design Methods for Embedded Image Systems (HIS 2015) (arXiv:1502.07241
Automatic Application-Specific Customization of Softcore Processor Microarchitecture, Masters Thesis, May 2006
Applications for constrained embedded systems are subject to strict runtime and resource utilization bounds. With soft core processors, application developers can customize the processor for their application, constrained by available hardware resources but aimed at high application performance. The more reconfigurable the processor is, the more options the application developers will have for customization and hence increased potential for improving application performance. However, such customization entails developing in-depth familiarity with all the parameters, in order to configure them effectively. This is typically infeasible, given the tight time-to-market pressure on the developers. Alternatively, developers could explore all possible configurations, but being exponential, this is infeasible even given only tens of parameters. This thesis presents an approach based on an assumption of parameter independence, for automatic microarchitecture customization. This approach is linear with the number of parameter values and hence, feasible and scalable. For the dimensions that we customize, namely application runtime and hardware resources, we formulate their costs as a constrained binary integer nonlinear optimization program. Though the results are not guaranteed to be optimal, we find they are near-optimal in practice. Our technique itself is general and can be applied to other design-space exploration problems
Object-oriented domain specific compilers for programming FPGAs
Published versio
- …