Search CORE

15 research outputs found

Boosting Single Thread Performance in Mobile Processors via Reconfigurable Acceleration

Author: A. Duran
A. Ghuloum
A.C.S. Beck
C. Bienia
C. Lattner
C.K. Luk
G.M. Amdahl
M.B. Rutzig
N. Clark
R. Lysecky
S. Che
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Crossref

The University of Manchester - Institutional Repository

Constraint-Driven Instructions Selection and Application Scheduling in the DURASE system

Author: Charot François
Floch Antoine
Kuchcinski Krzysztof
Martin Kevin
Wolinski Christophe
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

International audienceThis paper presents a new constraint-driven method for computational pattern selection, mapping and application scheduling using reconfigurable processor extensions. The presented method is a part of DURASE system (Generic Environment for Design and Utilization of Reconfigurable Application-Specific Processors Extensions). The selected processor extensions are implemented as specialized processor instructions. They correspond to computational patterns identified as most frequently occurring or other interesting patterns in the application graph. Our methods can handle both time-constrained and resource-constrained scheduling. Experimental results obtained for the MediaBench and MiBench benchmarks show that the presented method ensures high speed-ups in application execution

HAL-CentraleSupelec

Lund University Publications

INRIA a CCSD electronic archive server

HAL-Rennes 1

Sélection automatique d'instructions et ordonnancement d'applications basés sur la programmation par contraintes

Author: Charot François
Floch Antoine
Kuchcinski Krzysztof
Martin Kevin
Wolinski Christophe
Publication venue: HAL CCSD
Publication date: 09/09/2009
Field of study

National audienceCe papier présente une nouvelle méthode, basée sur la programmation par contraintes, pour la sélection de motifs de calcul, le placement et l'ordonnancement d'applications sur des extensions de processeurs conﬁgurables. Cette méthode est intégrée dans l'environnement DURASE (Generic Environment for Design and Utilization of Reconﬁgurable Application-Speciﬁc Processors Extensions). Les extensions du proces- seur, qui mettent en œuvre les motifs de calcul et qui sont accessibles via des instructions spécialisées, sont fortement couplées au chemin de données du processeur. Ces instructions spécialisées sont géné- rées et sélectionnées à partir du graphe de l'application. Notre méthode supporte un ordonnancement sous contrainte de ressources ou sous contrainte de temps. Les résultats expérimentaux obtenus sur les benchmarks MediaBench et MiBench montrent une accélération de l'exécution des applications d'un facteur de 2,3 en moyenne

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

FPGA-aware techniques for rapid generation of profitable custom instructions

Author: Clarke C.T.
Lam S.-K.
Prakash A.
Srikanthan T.
Publication venue: 'Elsevier BV'
Publication date: 01/05/2013
Field of study

OPUS

Crossref

ASAM : Automatic Architecture Synthesis and Application Mapping; dl. 3.2: Instruction set synthesis

Author: Corvino R.
Diken E.
Jordans R.
Jozwiak L.
Publication venue: 'Anadolu Universitesi Bilim ve Teknoloji Dergisi C : Yasam Bilimleri ve Biyoteknoloji'
Publication date: 01/01/2011
Field of study

No abstract

Pure OAI Repository

Rapid evaluation of custom instruction selection approaches with FPGA estimation

Author: Clarke Christopher T.
Lam Siew Kei
Srikanthan Thambipillai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 10/03/2014
Field of study

The main aim of this article is to demonstrate that a fast and accurate FPGA estimation engine is indispensable in design flows for custom instruction (template) selection. The need for a FPGA estimation engine stems from the difficulty in predicting the FPGA performance measures of selected custom instructions. We will present a FPGA estimation technique that partitions the high-level representation of custom instructions into clusters based on the structural organization of the target FPGA, while taking into account general logic synthesis principles adopted by FPGA tools. In this work, we have evaluated a widely used graph covering algorithm with various heuristics for custom instruction selection. In addition, we present an algorithm called Refined Largest Fit First (RLFF) that relies on a graph covering heuristic to select non-overlapping superset templates, which typically incorporate frequently used basic templates. The initial solution is further refined by considering overlapping templates that were ignored previously to see if their introduction could lead to higher performance. While RLFF provides the most efficient cover compared to the ILP method and other graph covering heuristics, FPGA estimation results reveals that RLFF leads to the worst performance in certain applications. It is therefore a worthy proposition to equip design flows with accurate FPGA estimation in order to rapidly determine the most profitable custom instruction approach for a given application.</jats:p

OPUS

Crossref

DESIGN AUTOMATION FOR LOW POWER RFID TAGS

Author: Dontharaju Swapna Rao
Publication venue
Publication date: 08/09/2008
Field of study

Radio Frequency Identification (RFID) tags are small, wireless devices capable of automated item identification, used in a variety of applications including supply chain management, asset management, automatic toll collection (EZ Pass), etc. However, the design of these types of custom systems using the traditional methods can take months for a hardware engineer to develop and debug. In this dissertation, an automated, low-power flow for the design of RFID tags has been developed, implemented and validated. This dissertation presents the RFID Compiler, which permits high-level design entry using a simple description of the desired primitives and their behavior in ANSI-C. The compiler has different back-ends capable of targeting microprocessor-based or custom hardware-based tags. For the hardware-based tag, the back-end automatically converts the user-supplied behavior in C to low power synthesizable VHDL optimized for RFID applications. The compiler also integrates a fast, high-level power macromodeling flow, which can be used to generate power estimates within 15% accuracy of industry CAD tools and to optimize the primitives and / or the behaviors, compared to conventional practices. Using the RFID Compiler, the user can develop the entire design in a matter of days or weeks. The compiler has been used to implement standards such as ANSI, ISO 18000-7, 18000-6C and 18185-7. The automatically generated tag designs were validated by targeting microprocessors such as the AD Chips EISC and FPGAs such as Xilinx Spartan 3. The corresponding ASIC implementation is comparable to the conventionally designed commercial tags in terms of the energy and area. Thus, the RFID Compiler permits the design of power efficient, custom RFID tags by a wider audience with a dramatically reduced design cycle

D-Scholarship@Pitt

Processor acceleration through automated instruction set customization

Author: Hongtao Zhong
Nathan Clark
Scott Mahlke
Publication venue
Publication date: 01/01/2003
Field of study

Application-specific extensions to the computational capabilities of a processor provide an efficient mechanism to meet the growing performance and power demands of embedded applications. Hardware, in the form of new function units (or co-processors), and the corresponding instructions, are added to a baseline processor to meet the critical computational demands of a target application. The central challenge with this approach is the large degree of human effort required to identify and create the custom hardware units, as well as porting the application to the extended processor. In this paper, we present the design of a system to automate the instruction set customization process. A dataflow graph design space exploration engine efficiently identifies profitable computation subgraphs from which to create custom hardware, without artificially constraining their size or shape. The system also contains a compiler subgraph matching framework that identifies opportunities to exploit and generalize the hardware to support more computation graphs. We demonstrate the effectiveness of this system across a range of application domains and study the applicability of the custom hardware across the domain. 1

CiteSeerX

Low power architectures for streaming applications

Author: He Y.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2013
Field of study

Repository TU/e

Pure OAI Repository