4,082 research outputs found

    Generating high-performance custom floating-point pipelines

    Get PDF
    International audienceCustom operators, working at custom precisions, are a key ingredient to fully exploit the FPGA flexibility advantage for high-performance computing. Unfortunately, such operators are costly to design, and application designers tend to rely on less efficient off-the-shelf operators. To address this issue, an open-source architecture generator framework is introduced. Its salient features are an easy learning curve from VHDL, the ability to embedd arbitrary synthesisable VHDL code, portability to mainstream FPGA targets from Xilinx and Altera, automatic management of complex pipelines with support for frequency-directed pipeline, automatic test-bench generation. This generator is presented around the simple example of a collision detector, which it significantly improves in accuracy, DSP count, logic usage, frequency and latency with respect to an implementation using standard floating-point operators

    JANUS: an FPGA-based System for High Performance Scientific Computing

    Get PDF
    This paper describes JANUS, a modular massively parallel and reconfigurable FPGA-based computing system. Each JANUS module has a computational core and a host. The computational core is a 4x4 array of FPGA-based processing elements with nearest-neighbor data links. Processors are also directly connected to an I/O node attached to the JANUS host, a conventional PC. JANUS is tailored for, but not limited to, the requirements of a class of hard scientific applications characterized by regular code structure, unconventional data manipulation instructions and not too large data-base size. We discuss the architecture of this configurable machine, and focus on its use on Monte Carlo simulations of statistical mechanics. On this class of application JANUS achieves impressive performances: in some cases one JANUS processing element outperfoms high-end PCs by a factor ~ 1000. We also discuss the role of JANUS on other classes of scientific applications.Comment: 11 pages, 6 figures. Improved version, largely rewritten, submitted to Computing in Science & Engineerin

    RHINO software-defined radio processing blocks

    Get PDF
    This MSc project focuses on the design and implementation of a library of parameterizable, modular and reusable Digital IP blocks designed around use in Software-Defined Radio (SDR) applications and compatibility with the RHINO platform. The RHINO platform has commonalities with the better known ROACH platform, but it is a significantly cut-down and lowercost alternative which has similarities in the interfacing and FPGA/Processor interconnects of ROACH. The purpose of the library and design framework presented in this work aims to alleviate some of the commercial, high cost and static structure concerns about IP cores provided by FPGA manufactures and third-party IP vendors. It will also work around the lack of parameters and bus compatibility issues often encountered when using the freely available open resources. The RHINO hardware platform will be used for running practical applications and testing of the blocks. The HDL library that is being constructed is targeted towards both novice and experienced low-level HDL developers who can download and use it for free, and it will provide them experience of using IP Cores that support open bus interfaces in order to exploit SoC design without commercial, parameter and bus compatibility limitations. The provided modules will be of particularly benefit to the novice developers in providing ready-made examples of processing blocks, as well as parameterization settings for the interfacing blocks and associated RF receiver side configuration settings; all together these examples will help new developers establish effective ways to build their own SDR prototypes using RHINO

    Low power digital signal processing

    Get PDF

    The Development of TIGRA: A Zero Latency Interface For Accelerator Communication in RISC-V Processors

    Get PDF
    Field programmable gate arrays (FPGA) give developers the ability to design application specific hardware by means of software, providing a method of accelerating algorithms with higher power efficiency when compared to CPU or GPU accelerated applications. FPGA accelerated applications tend to follow either a loosely coupled or tightly coupled design. Loosely coupled designs often use OpenCL to utilize the FPGA as an accelerator much like a GPU, which provides a simplifed design flow with the trade-off of increased overhead and latency due to bus communication. Tightly coupled designs modify an existing CPU to introduce instruction set extensions to provide a minimal latency accelerator at the cost of higher programming effort to include the custom design. This dissertation details the design of the Tightly Integrated, Generic RISC-V Accelerator (TIGRA) interface which provides the benefits of both loosely and tightly coupled accelerator designs. TIGRA enabled designs incur zero latency with a simple-to-use interface that reduces programming effort when implementing custom logic within a processor. This dissertation shows the incorporation of TIGRA into the simple PicoRV32 processor, the highly customizable Rocket Chip generator, and the FPGA optimized Taiga processor. Each processor design is tested with AES 128-bit encryption and posit arithmetic to demonstrate TIGRA functionality. After a one time programming cost to incorporate a TIGRA interface into an existing processor, new functional units can be added with up to a 75% reduction in the lines of code required when compared to non-TIGRA enabled designs. Additionally, each functional unit created is co-compatible with each processor as the TIGRA interface remains constant between each design. The results prove that using the TIGRA interface introduces no latency and is capable of incorporating existing custom logic designs without modification for all three processors tested. When compared to the PicoRV32 coprocessor interface (PCPI), TIGRA coupled designs complete one clock cycle faster. Similarly, TIGRA outperforms the Rocket Chip custom coprocessor (RoCC) interface by an average of 6.875 clock cycles per instruction. The Taiga processor\u27s decoupled execution units allow for instructions to execute concurrently and uses a tag management system that is similar to out-of-order processors. The inclusion of the TIGRA interface within this processor abstracts the tag management from the user and demonstrates that the TIGRA interface can be applied to out-of-order processors. When coupled with partial reconfiguration, the flexibility and modularity of TIGRA drastically increases. By creating a reprogrammable region for the custom logic connected via TIGRA, users can swap out the connected design at runtime to customize the processor for a given application. Further, partial reconfiguration allows users to only compile the custom logic design as opposed to the entire CPU, resulting in an 18.1% average reduction of compilation during the design process in the case studies. Paired with the programming effort saved by using TIGRA, partial reconfiguration improves the time to design and test new functionality timelines for a processor

    Parallel Adaptive Monte Carlo Integration with the Event Generator WHIZARD

    Full text link
    We describe a new parallel approach to the evaluation of phase space for Monte-Carlo event generation, implemented within the framework of the WHIZARD package. The program realizes a twofold self-adaptive multi-channel parameterization of phase space and makes use of the standard OpenMP and MPI protocols for parallelization. The modern MPI3 feature of asynchronous communication is an essential ingredient of the computing model. Parallel numerical evaluation applies both to phase-space integration and to event generation, thus covering the most computing-intensive parts of physics simulation for a realistic collider environment.Comment: 28 pages, 4 figure
    corecore