Search CORE

271 research outputs found

Multimedia terminal system-on-chip design and simulation

Author: Alessandro Scotto
Ivano Barbieri
M. Bariani
M. Raggio
Publication venue
Publication date: 01/09/2005
Field of study

This paper proposes a design approach based on integrated architectural and system-on-chip (SoC) simulations. The main idea is to have an efficient framework for the design and the evaluation of multimedia terminals, allowing a fast system simulation with a definable degree of accuracy. The design approach includes the simulation of very long instruction word (VLIW) digital signal processors (DSPs), the utilization of a device multiplexing the media streams, and the emulation of the real-time media acquisition. This methodology allows the evaluation of both the multimedia algorithm implementations and the hardware platform, giving feedback on the complete SoC including the interaction between modules and conflicts in accessing either the bus or shared resources. An instruction set architecture (ISA) simulator and an SoC simulation environment compose the integrated framework. In order to validate this approach, the evaluation of an audio-video multiprocessor terminal is presented, and the complete simulation test results are reported

Directory of Open Access Journals

Open Access Repository

Data Cache-Energy and Throughput Models: Design Exploration for Embedded Processors

Author: McDonald-Maier Klaus
Qadri Muhammad Yasir
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2009
Field of study

Most modern 16-bit and 32-bit embedded processors contain cache memories to further increase instruction throughput of the device. Embedded processors that contain cache memories open an opportunity for the low-power research community to model the impact of cache energy consumption and throughput gains. For optimal cache memory configuration mathematical models have been proposed in the past. Most of these models are complex enough to be adapted for modern applications like run-time cache reconfiguration. This paper improves and validates previously proposed energy and throughput models for a data cache, which could be used for overhead analysis for various cache types with relatively small amount of inputs. These models analyze the energy and throughput of a data cache on an application basis, thus providing the hardware and software designer with the feedback vital to tune the cache or application for a given energy budget. The models are suitable for use at design time in the cache optimization process for embedded processors considering time and energy overhead or could be employed at runtime for reconfigurable architectures

University of Essex Research Repository

Springer - Publisher Connector

Directory of Open Access Journals

Elementary operations: a novel concept for source-level timing estimation

Author: Danko Ivošević
Nikolina Frid
Vlado Sruk
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2019
Field of study

Early application timing estimation is essential in decision making during design space exploration of heterogeneous embedded systems in terms of hardware platform dimensioning and component selection. The decisions which have the impact on project duration and cost must be made before a platform prototype is available and software code is ready to be linked and thus timing estimation must be done using high-level models and simulators. Because of the ever increasing need to shorten the time to market, reducing the amount of time required to obtain the results is as important as achieving high estimation accuracy. In this paper, we propose a novel approach to source-level timing estimation with the aim to close the speed-accuracy gap by raising the level of abstraction and improving result reusability. We introduce a concept – elementary operations as distinct parts of source code which enable capturing platform behaviour without having the exact model of the processor pipeline, cache etc. We also present a timing estimation method which relies on elementary operations to craft hardware profiling benchmark and to build application and platform profiles. Experiments show an average estimation error of 5%, with maximum below 16%

Directory of Open Access Journals

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

Author: Mittal Sparsh
Publication venue
Publication date: 01/01/2014
Field of study

Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

arXiv.org e-Print Archive

Crossref

A hybrid genetic algorithm for constrained hardware- software partitioning

Author: Mudry Pierre-André
Tempesti Gianluca
Zufferey Guillaume
Publication venue
Publication date: 01/01/2006
Field of study

In this article, we propose a novel partitioning method for hardware-software codesign based on a genetic algorithm that has been enhanced for this specific task. Given a high- level program and an area constraint, our software considers different granularities levels to discover the most interesting blocks to be implemented in ad hoc functional units that can then be used as new instructions in a Move processor. Various optimizations are conducted to obtain a clean, very fast (in the order of a few seconds) and efficient partitioning on programs ranging from a few to several hundreds of lines of code

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Compiled Low-Level Virtual Instruction Set Simulation and Profiling for Code Partitioning and ASIP-Synthesis

Author: Carsten Gremzow
Publication venue
Publication date: 01/01/2007
Field of study

Abstract We present ongoing work and first results in static and detailed quantitative runtime analysis of LLVM byte code for the purpose of automatic procedural level partitioning and cosynthesis of complex software systems. Runtime behaviour is captured by reverse compilation of LLVM bytecode into augmented, self-profiling ANSI-C simulator programs retaining the LLVM instruction level. The actual global data flow is captured both in quantity and value range to guide function unit layout in the synthesis of application specific processors. Currently the implemented tool LLILA (Low Level Intermediate Language Analyzer) focuses on static code analysis on the inter-procedural data flow via e.g. function parameters and global variables to uncover a program's potential paths of data exchange

CiteSeerX

HW/SW codesign techniques for dynamically reconfigurable architectures

Author: J. Noguera
R.M. Badia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

An Efficient Power Estimation Methodology for Complex RISC Processor-based Platforms

Author: Ben Atitallah Rabie
Dekeyser Jean-Luc
Niar Smail
Rethinagiri Santhosh Kumar
Senn Eric
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/05/2012
Field of study

International audienceIn this contribution, we propose an efficient power estima- tion methodology for complex RISC processor-based plat- forms. In this methodology, the Functional Level Power Analysis (FLPA) is used to set up generic power models for the different parts of the system. Then, a simulation framework based on virtual platform is developed to evalu- ate accurately the activities used in the related power mod- els. The combination of the two parts above leads to a het- erogeneous power estimation that gives a better trade-off be- tween accuracy and speed. The usefulness and effectiveness of our proposed methodology is validated through ARM9 and ARM CortexA8 processor designed respectively around the OMAP5912 and OMAP3530 boards. This efficiency and the accuracy of our proposed methodology is evaluated by using a variety of basic programs to complete media bench- marks. Estimated power values are compared to real board measurements for the both ARM940T and ARM CortexA8 architectures. Our obtained power estimation results pro- vide less than 3% of error for ARM940T processor, 3.5% for ARM CortexA8 processor-based system and 1x faster compared to the state-of-the-art power estimation tools

HAL - Lille 3

INRIA a CCSD electronic archive server

HAL-Université de Bretagne Occidentale

Cycle-accurate performance modelling in an ultra-fast just-in-time dynamic binary translation instruction set simulator

Author: Bohm Igor
Franke Bjoern
Topham Nigel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

Abstract. Instruction set simulators (ISS) are vital tools for compiler and processor architecture design space exploration and verification. State-of-the-art simulators using just-in-time (JIT) dynamic binary translation (DBT) techniques are able to simulate complex embedded processors at speeds above 500 MIPS. However, these functional ISS do not provide microarchitectural observability. In contrast, low-level cycle-accurate ISS are too slow to simulate full-scale applications, forcing developers to revert to FPGA-based simulations. In this paper we demonstrate that it is possible to run ultra-high speed cycle-accurate instruction set simulations surpassing FPGA-based simulation speeds. We extend the JIT DBT engine of our ISS and augment JIT generated code with a verified cycle-accurate processor model. Our approach can model any microarchitectural configuration, does not rely on prior profiling, instrumentation, or compilation, and works for all binaries targeting a state-of-the-art embedded processor implementing the ARCompact TM instruction set architecture (ISA). We achieve simulation speeds up to 88 MIPS on a standard x86 desktop computer for the industry standard EEMBC, COREMARK and BIOPERF benchmark suites.

CiteSeerX

Crossref

Edinburgh Research Explorer