Search CORE

141,756 research outputs found

Recommended from our members

Data-dependent cycle-accurate power modeling of RTL-level IPs using machine learning

Author: Srour Malek
Publication venue
Publication date: 07/08/2018
Field of study

In a chip design project, early design planning has a strong impact on the schedule and the cost of design. Power estimation is part of early design planning, and it greatly affects design decisions. Power modeling performed at a high level of abstraction is fast but inaccurate due to lack of circuit switching activity information. By contrast, power modeling performed at a low level of abstraction is more accurate as the synthesized circuit synthesis is known, but this simulation is typically slow. This report explores a power modeling approach performed at register transfer level (RTL). It exploits machine learning models in order to have a fast yet relatively accurate cycle-by-cycle power estimation. The approach is data-dependent, where cycle-specific models are trained based on the switching activity of signals obtained from RTL simulation and cycle-by-cycle power values obtained from a reference gate-level simulation of an existing RTL design. Therefore, if any changes are applied to the RTL design, re-training of models is required. The approach aims at obtaining fast yet accurate power predictions for new invocations of a given trained model using signal activity information collected during simulation of the unmodified RTL. At a low level, the complete visibility of signals in a design unintuitively might cause overtraining the model leading to inaccurate estimation. The suggested model employs automatic feature selection in each cycle. Based on the invocations used to train the cycle-by-cycle models, only signals that may switch during a given cycle will be selected as the features for their respective cycle-specific model. The method was tested on an 8-by-8 DCT design and the power estimates were within 6.5% of those from a commercial power analysis tool. This report also simulates and compares the approach of cycle-specific models to the approach of a single global model for all cycles and show that the cycle-specific approach is twice as accurate.Electrical and Computer Engineerin

Texas ScholarWorks

Design of multimedia processor based on metric computation

Author: Balasa
Berekovic
Jean Luc Philippe
Jean Philippe Diguet
Mohamed Abid
Nader Ben Amor
Suzuki
Wuytack
Yannick Le Moullec
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility

arXiv.org e-Print Archive

Crossref

HAL-Université de Bretagne Occidentale

VBN

The Chameleon project in retrospective

Author: Heysters Paul M.
Molenkamp Bert
Smit Gerard J.M.
Publication venue: STW Technology Foundation
Publication date: 01/01/2004
Field of study

In this paper we describe in retrospective the main results of a four year project, called Chameleon. As part of this project we developed a coarse-grained reconfigurable core for DSP algorithms in wireless devices denoted MONTIUM. After presenting the main achievements within this project we present the lessons learned from this project

University of Twente Research Information

FASTCUDA: Open Source FPGA Accelerator &amp; Hardware-Software Codesign Toolset for CUDA Kernels

Author: de la Torre E.()
Lavagno L.()
Lazarescu M.()
Mavroidis I. ()
Papaefstathiou I.()
Papaefstathiou Ioannis(http://users.isc.tuc.gr/~ipapaefstathiou)
Schafer F.()
Παπαευσταθιου Ιωαννης(http://users.isc.tuc.gr/~ipapaefstathiou)
Publication venue: IEEE / Institute of Electrical and Electronics Engineers Incorporated:445 Hoes Lane:Piscataway, NJ 08854:(800)701-4333, (732)981-0060, EMAIL: [email protected], INTERNET: http://www.ieee.org, Fax: (732)981-9667
Publication date: 01/01/2012
Field of study

Using FPGAs as hardware accelerators that communicate with a central CPU is becoming a common practice in the embedded design world but there is no standard methodology and toolset to facilitate this path yet. On the other hand, languages such as CUDA and OpenCL provide standard development environments for Graphical Processing Unit (GPU) programming. FASTCUDA is a platform that provides the necessary software toolset, hardware architecture, and design methodology to efficiently adapt the CUDA approach into a new FPGA design flow. With FASTCUDA, the CUDA kernels of a CUDA-based application are partitioned into two groups with minimal user intervention: those that are compiled and executed in parallel software, and those that are synthesized and implemented in hardware. A modern low power FPGA can provide the processing power (via numerous embedded micro-CPUs) and the logic capacity for both the software and hardware implementations of the CUDA kernels. This paper describes the system requirements and the architectural decisions behind the FASTCUDA approach

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Institutional Repository of the Technical University of Crete

Lessons Learned from Designing the Montium - a Coarse-Grained Reconfigurable Processing Tile

Author: Heysters Paul M.
Molenkamp Bert
Rosien Michel A.J.
Smit Gerard J.M.
Publication venue: IEEE Computer Society Press
Publication date: 01/01/2004
Field of study

In this paper we describe in retrospective the main results of a four year project, called Chameleon. As part of this project we developed a coarse-grained reconfigurable core for DSP algorithms in wirelessdevices denoted MONTIUM. After presenting the main achievements within this project we present the lessons learned from this project

University of Twente Research Information

LOT: Logic Optimization with Testability - new transformations for logic synthesis

Author: Chatterjee Mitrajit
Kunz Wolfgang
Pradhan Dhiraj K.
Publication venue
Publication date: 01/01/1998
Field of study

A new approach to optimize multilevel logic circuits is introduced. Given a multilevel circuit, the synthesis method optimizes its area while simultaneously enhancing its random pattern testability. The method is based on structural transformations at the gate level. New transformations involving EX-OR gates as well as Reed–Muller expansions have been introduced in the synthesis of multilevel circuits. This method is augmented with transformations that specifically enhance random-pattern testability while reducing the area. Testability enhancement is an integral part of our synthesis methodology. Experimental results show that the proposed methodology not only can achieve lower area than other similar tools, but that it achieves better testability compared to available testability enhancement tools such as tstfx. Specifically for ISCAS-85 benchmark circuits, it was observed that EX-OR gate-based transformations successfully contributed toward generating smaller circuits compared to other state-of-the-art logic optimization tools

CiteSeerX

Hochschulschriftenserver - Universität Frankfurt am Main

Explore Bristol Research