Search CORE

4,082 research outputs found

Generating high-performance custom floating-point pipelines

Author: de Dinechin Florent
Klein Cristian
Pasca Bogdan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/08/2009
Field of study

International audienceCustom operators, working at custom precisions, are a key ingredient to fully exploit the FPGA flexibility advantage for high-performance computing. Unfortunately, such operators are costly to design, and application designers tend to rely on less efficient off-the-shelf operators. To address this issue, an open-source architecture generator framework is introduced. Its salient features are an easy learning curve from VHDL, the ability to embedd arbitrary synthesisable VHDL code, portability to mainstream FPGA targets from Xilinx and Altera, automatic management of complex pipelines with support for frequency-directed pipeline, automatic test-bench generation. This generator is presented around the simple example of a collision detector, which it significantly improves in accuracy, DSP count, logic usage, frequency and latency with respect to an implementation using standard floating-point operators

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

JANUS: an FPGA-based System for High Performance Scientific Computing

Author: A Cruz
Alfonso Tarancon
Andrea Maiorano
Antonio Gordillo-Guerrero
Antonio Munoz-Sudupe
Daniele Sciretti
David Yllanes
Denis Navarro
Enzo Marinari
Filippo Mantovani
Francesco Belletti
Gianpaolo Zanier
Giorgio Parisi
J Luis Velasco
Juan J Ruiz-Lorenzo
Luis Antonio Fernandez
Marco Guidetti
Maria Cotallo
Mauro Rossi
Raffaele Tripiccione
Sebastiano Fabio Schifano
Sergio Perez-Gaviro
Victor Martin-Mayor
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 08/04/2008
Field of study

This paper describes JANUS, a modular massively parallel and reconfigurable FPGA-based computing system. Each JANUS module has a computational core and a host. The computational core is a 4x4 array of FPGA-based processing elements with nearest-neighbor data links. Processors are also directly connected to an I/O node attached to the JANUS host, a conventional PC. JANUS is tailored for, but not limited to, the requirements of a class of hard scientific applications characterized by regular code structure, unconventional data manipulation instructions and not too large data-base size. We discuss the architecture of this configurable machine, and focus on its use on Monte Carlo simulations of statistical mechanics. On this class of application JANUS achieves impressive performances: in some cases one JANUS processing element outperfoms high-end PCs by a factor ~ 1000. We also discuss the role of JANUS on other classes of scientific applications.Comment: 11 pages, 6 figures. Improved version, largely rewritten, submitted to Computing in Science & Engineerin

arXiv.org e-Print Archive

Docta Complutense

Crossref

Archivio istituzionale della ricerca - Università di Ferrara

Archivio della ricerca- Università di Roma La Sapienza

RHINO software-defined radio processing blocks

Author: Tsoeunyane Lekhobola Joachim
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2015
Field of study

This MSc project focuses on the design and implementation of a library of parameterizable, modular and reusable Digital IP blocks designed around use in Software-Defined Radio (SDR) applications and compatibility with the RHINO platform. The RHINO platform has commonalities with the better known ROACH platform, but it is a significantly cut-down and lowercost alternative which has similarities in the interfacing and FPGA/Processor interconnects of ROACH. The purpose of the library and design framework presented in this work aims to alleviate some of the commercial, high cost and static structure concerns about IP cores provided by FPGA manufactures and third-party IP vendors. It will also work around the lack of parameters and bus compatibility issues often encountered when using the freely available open resources. The RHINO hardware platform will be used for running practical applications and testing of the blocks. The HDL library that is being constructed is targeted towards both novice and experienced low-level HDL developers who can download and use it for free, and it will provide them experience of using IP Cores that support open bus interfaces in order to exploit SoC design without commercial, parameter and bus compatibility limitations. The provided modules will be of particularly benefit to the novice developers in providing ready-made examples of processing blocks, as well as parameterization settings for the interfacing blocks and associated RF receiver side configuration settings; all together these examples will help new developers establish effective ways to build their own SDR prototypes using RHINO

Cape Town University OpenUCT

Recommended from our members

VIPER : a 25-MHz, 100-MIPS peak VLIW micro-processor

Author: Abnous A.
Bagherzadeh N.
Gray J.
Naylor A.
Publication venue: eScholarship, University of California
Publication date: 01/01/1992
Field of study

This paper describes the design and implementation of a very long instruction word (VLIW) microprocessor. The VIPER (VLIW integer processor) contains four pipelined functional units, and can achieve 100 MIPS peak performance at 25 MHz. The procesor is capable of performing multiway branch operations, two load/store operations and up to four ALU operations in each clock cycle, with full register file access to each functional unit. VIPER is the first VLIW microprocessor known that can achieve this level of performance. Designed in twelve months, the processor is integrated with an instruction cache controller and a data cache, requiring 450,000 transistors and a die size of 12.9 by 9.1 mm in a 1.2 µm technology

eScholarship - University of California

Low power digital signal processing

Author: Paker Ozgun
Publication venue: Technical University of Denmark
Publication date: 01/01/2003
Field of study

Online Research Database In Technology

The Development of TIGRA: A Zero Latency Interface For Accelerator Communication in RISC-V Processors

Author: Green Wesley Brad
Publication venue: Clemson University Libraries
Publication date: 01/05/2022
Field of study

Field programmable gate arrays (FPGA) give developers the ability to design application specific hardware by means of software, providing a method of accelerating algorithms with higher power efficiency when compared to CPU or GPU accelerated applications. FPGA accelerated applications tend to follow either a loosely coupled or tightly coupled design. Loosely coupled designs often use OpenCL to utilize the FPGA as an accelerator much like a GPU, which provides a simplifed design flow with the trade-off of increased overhead and latency due to bus communication. Tightly coupled designs modify an existing CPU to introduce instruction set extensions to provide a minimal latency accelerator at the cost of higher programming effort to include the custom design. This dissertation details the design of the Tightly Integrated, Generic RISC-V Accelerator (TIGRA) interface which provides the benefits of both loosely and tightly coupled accelerator designs. TIGRA enabled designs incur zero latency with a simple-to-use interface that reduces programming effort when implementing custom logic within a processor. This dissertation shows the incorporation of TIGRA into the simple PicoRV32 processor, the highly customizable Rocket Chip generator, and the FPGA optimized Taiga processor. Each processor design is tested with AES 128-bit encryption and posit arithmetic to demonstrate TIGRA functionality. After a one time programming cost to incorporate a TIGRA interface into an existing processor, new functional units can be added with up to a 75% reduction in the lines of code required when compared to non-TIGRA enabled designs. Additionally, each functional unit created is co-compatible with each processor as the TIGRA interface remains constant between each design. The results prove that using the TIGRA interface introduces no latency and is capable of incorporating existing custom logic designs without modification for all three processors tested. When compared to the PicoRV32 coprocessor interface (PCPI), TIGRA coupled designs complete one clock cycle faster. Similarly, TIGRA outperforms the Rocket Chip custom coprocessor (RoCC) interface by an average of 6.875 clock cycles per instruction. The Taiga processor\u27s decoupled execution units allow for instructions to execute concurrently and uses a tag management system that is similar to out-of-order processors. The inclusion of the TIGRA interface within this processor abstracts the tag management from the user and demonstrates that the TIGRA interface can be applied to out-of-order processors. When coupled with partial reconfiguration, the flexibility and modularity of TIGRA drastically increases. By creating a reprogrammable region for the custom logic connected via TIGRA, users can swap out the connected design at runtime to customize the processor for a given application. Further, partial reconfiguration allows users to only compile the custom logic design as opposed to the entire CPU, resulting in an 18.1% average reduction of compilation during the design process in the case studies. Paired with the programming effort saved by using TIGRA, partial reconfiguration improves the time to design and test new functionality timelines for a processor

Clemson University: TigerPrints

Parallel Adaptive Monte Carlo Integration with the Event Generator WHIZARD

Author: Braß Simon
Kilian Wolfgang
Reuter Jürgen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

We describe a new parallel approach to the evaluation of phase space for Monte-Carlo event generation, implemented within the framework of the WHIZARD package. The program realizes a twofold self-adaptive multi-channel parameterization of phase space and makes use of the standard OpenMP and MPI protocols for parallelization. The modern MPI3 feature of asynchronous communication is an essential ingredient of the computing model. Parallel numerical evaluation applies both to phase-space integration and to event generation, thus covering the most computing-intensive parts of physics simulation for a realistic collider environment.Comment: 28 pages, 4 figure

arXiv.org e-Print Archive

DESY Publication Database

EDP Sciences OAI-PMH repository (1.2.0)

Directory of Open Access Journals

DESY