Search CORE

270 research outputs found

Improving embedded system design by means of HW-SW compilation on reconfigurable coprocessors

Author: Fernando Rincón
Francisco Moya
José M. Moya
Juan Carlos López
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Crossref

Automatic synthesis of reconfigurable instruction set accelerators

Author: Kastrup B.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2001
Field of study

Repository TU/e

Pure OAI Repository

A Micro Power Hardware Fabric for Embedded Computing

Author: Mehta Gayatri
Publication venue
Publication date: 25/09/2009
Field of study

Field Programmable Gate Arrays (FPGAs) mitigate many of the problemsencountered with the development of ASICs by offering flexibility, faster time-to-market, and amortized NRE costs, among other benefits. While FPGAs are increasingly being used for complex computational applications such as signal and image processing, networking, and cryptology, they are far from ideal for these tasks due to relatively high power consumption and silicon usage overheads compared to direct ASIC implementation. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. In this research, a parameterized, reconfigurable fabric model named as domain specific fabric (DSF) is developed that exhibits ASIC-like power characteristics for Digital Signal Processing (DSP) style applications. Using this model, the impact of varying different design parameters on power and performance has been studied. Different optimization techniques like local search and simulated annealing are used to determine the appropriate interconnect for a specific set of applications. A design space exploration tool has been developed to automate and generate a tailored architectural instance of the fabric.The fabric has been synthesized on 160 nm cell-based ASIC fabrication process from OKI and 130 nm from IBM. A detailed power-performance analysis has been completed using signal and image processing benchmarks from the MediaBench benchmark suite and elsewhere with comparisons to other hardware and software implementations. The optimized fabric implemented using the 130 nm process yields energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor

D-Scholarship@Pitt

A general purpose HyperTransport-based Application Accelerator Framework

Author: David Kramer
Fabian Nowak
Karl Wolfgang
Rainer Buchty
Thorsten Vogel
Wolfgang Karl
Publication venue
Publication date: 01/01/2009
Field of study

HyperTransport provides a flexible, low latency and high bandwidth interconnection between processors and also between processors and peripheral omponents. Therefore, the interconnection is no longer a performance bottleneck when integrating application specific accelerators in modern computing systems. Current FPGAs providing huge computational power and permit the acceleration of compute-intensive kernels. We therefore present a general purpose architecture based on HyperTransport and modern FPGAs to accelerate time-consuming computations. Further, we present a prototypical implementation of our architecture. Here we used an AMD Opteron-based system with the HTX Board [6] to demonstrate that common applications can benefit from available hardware accelerators. A cryptographic example showed that the encryption of files, larger then 50 kByte, can be successfully accelerated

CiteSeerX

Heidelberger Dokumentenserver

Twill: A Hybrid Microcontroller-FPGA Framework for Parallelizing Single- Threaded C Programs

Author: Gallatin Douglas S.
Publication venue: DigitalCommons@CalPoly
Publication date: 01/03/2014
Field of study

Increasingly System-On-A-Chip platforms which incorporate both micropro- cessors and re-programmable logic are being utilized across several ﬁelds ranging from the automotive industry to network infrastructure. Unfortunately, the de- velopment tools accompanying these products leave much to be desired, requiring knowledge of both traditional embedded systems languages like C and hardware description languages like Verilog. We propose to bridge this gap with Twill, a truly automatic hybrid compiler that can take advantage of the parallelism inherent in these platforms. Twill can extract long-running threads from single threaded C code and distribute these threads across the hardware and software domains to more fully utilize the asymmetric characteristics between processors and the embedded reconﬁgurable logic fabric. We show that Twill provides a sig- niﬁcant performance increase on the CHStone benchmarks with an average 1.63 times increase over the pure hardware approach and an increase of 22.2 times on average over the pure software approach while reducing the area required by the reconﬁgurable logic by on average 1.73 times compared to the pure hardware approach

DigitalCommons@CalPoly

A Survey and Evaluation of FPGA High-Level Synthesis Tools

Author: Anderson Jason
Bertels Koen
Brown Stephen
Canis Andrew
Chen Yu Ting
Choi Jongsok
Ferrandi Fabrizio
Fort Blair
Hsiao Hsuan
Nane Razvan
Pilato Christian
Sima Vlad-Mihai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

High-level synthesis (HLS) is increasingly popular for the design of high-performance and energy-efficient heterogeneous systems, shortening time-to-market and addressing today's system complexity. HLS allows designers to work at a higher-level of abstraction by using a software program to specify the hardware functionality. Additionally, HLS is particularly interesting for designing field-programmable gate array circuits, where hardware implementations can be easily refined and replaced in the target device. Recent years have seen much activity in the HLS research community, with a plethora of HLS tool offerings, from both industry and academia. All these tools may have different input languages, perform different internal optimizations, and produce results of different quality, even for the very same input description. Hence, it is challenging to compare their performance and understand which is the best for the hardware to be implemented. We present a comprehensive analysis of recent HLS tools, as well as overview the areas of active interest in the HLS research community. We also present a first-published methodology to evaluate different HLS tools. We use our methodology to compare one commercial and three academic tools on a common set of C benchmarks, aiming at performing an in-depth evaluation in terms of performance and the use of resources

Archivio istituzionale della ricerca - Politecnico di Milano