13,314 research outputs found
Optimizing the flash-RAM energy trade-off in deeply embedded systems
Deeply embedded systems often have the tightest constraints on energy
consumption, requiring that they consume tiny amounts of current and run on
batteries for years. However, they typically execute code directly from flash,
instead of the more energy efficient RAM. We implement a novel compiler
optimization that exploits the relative efficiency of RAM by statically moving
carefully selected basic blocks from flash to RAM. Our technique uses integer
linear programming, with an energy cost model to select a good set of basic
blocks to place into RAM, without impacting stack or data storage.
We evaluate our optimization on a common ARM microcontroller and succeed in
reducing the average power consumption by up to 41% and reducing energy
consumption by up to 22%, while increasing execution time. A case study is
presented, where an application executes code then sleeps for a period of time.
For this example we show that our optimization could allow the application to
run on battery for up to 32% longer. We also show that for this scenario the
total application energy can be reduced, even if the optimization increases the
execution time of the code
Design of multimedia processor based on metric computation
Media-processing applications, such as signal processing, 2D and 3D graphics
rendering, and image compression, are the dominant workloads in many embedded
systems today. The real-time constraints of those media applications have
taxing demands on today's processor performances with low cost, low power and
reduced design delay. To satisfy those challenges, a fast and efficient
strategy consists in upgrading a low cost general purpose processor core. This
approach is based on the personalization of a general RISC processor core
according the target multimedia application requirements. Thus, if the extra
cost is justified, the general purpose processor GPP core can be enforced with
instruction level coprocessors, coarse grain dedicated hardware, ad hoc
memories or new GPP cores. In this way the final design solution is tailored to
the application requirements. The proposed approach is based on three main
steps: the first one is the analysis of the targeted application using
efficient metrics. The second step is the selection of the appropriate
architecture template according to the first step results and recommendations.
The third step is the architecture generation. This approach is experimented
using various image and video algorithms showing its feasibility
Recommended from our members
Memory-Based High-Level Synthesis Optimizations Security Exploration on the Power Side-Channel
High-level synthesis (HLS) allows hardware designers to think algorithmically and not worry about low-level, cycle-by-cycle details. This provides the ability to quickly explore the architectural design space and tradeoffs between resource utilization and performance. Unfortunately, security evaluation is not a standard part of the HLS design flow. In this article, we aim to understand the effects of memory-based HLS optimizations on power side-channel leakage. We use Xilinx Vivado HLS to develop different cryptographic cores, implement them on a Spartan-6 FPGA, and collect power traces. We evaluate the designs with respect to resource utilization, performance, and information leakage through power consumption. We have two important observations and contributions. First, the choice of resource optimization directive results in different levels of side-channel vulnerabilities. Second, the partitioning optimization directive can greatly compromise the hardware cryptographic system through power side-channel leakage due to the deployment of memory control logic. We describe an evaluation procedure for power side-channel leakage and use it to make best-effort recommendations about how to design more secure architectures in the cryptographic domain
Building an Application-specific Memory Hierarchy on FPGA
The high potential performance of FPGAs cannot be exploited if a design suffers a memory bottleneck. Therefore, a memory hierarchy is needed to reuse data in on-chip memories and minimize the number of accesses to off-chip memory
Data Cache-Energy and Throughput Models: Design Exploration for Embedded Processors
Most modern 16-bit and 32-bit embedded processors contain cache memories to further increase instruction throughput of the device. Embedded processors that contain cache memories open an opportunity for the low-power research community to model the impact of cache energy consumption and throughput gains. For optimal cache memory configuration mathematical models have been proposed in the past. Most of these models are complex enough to be adapted for modern applications like run-time cache reconfiguration. This paper improves and validates previously proposed energy and throughput models for a data cache, which could be used for overhead analysis for various cache types with relatively small amount of inputs. These models analyze the energy and throughput of a data cache on an application basis, thus providing the hardware and software designer with the feedback vital to tune the cache or application for a given energy budget. The models are suitable for use at design time in the cache optimization process for embedded processors considering time and energy overhead or could be employed at runtime for reconfigurable architectures
- …