462 research outputs found

    High-Level Synthesis for Embedded Systems

    Get PDF

    Hardware Acceleration Using Functional Languages

    Get PDF
    Cílem této práce je prozkoumat možnosti využití funkcionálního paradigmatu pro hardwarovou akceleraci, konkrétně pro datově paralelní úlohy. Úroveň abstrakce tradičních jazyků pro popis hardwaru, jako VHDL a Verilog, přestáví stačit. Pro popis na algoritmické či behaviorální úrovni se rozmáhají jazyky původně navržené pro vývoj softwaru a modelování, jako C/C++, SystemC nebo MATLAB. Funkcionální jazyky se s těmi imperativními nemůžou měřit v rozšířenosti a oblíbenosti mezi programátory, přesto je předčí v mnoha vlastnostech, např. ve verifikovatelnosti, schopnosti zachytit inherentní paralelismus a v kompaktnosti kódu. Pro akceleraci datově paralelních výpočtů se často používají jednotky FPGA, grafické karty (GPU) a vícejádrové procesory. Praktická část této práce rozšiřuje existující knihovnu Accelerate pro počítání na grafických kartách o výstup do VHDL. Accelerate je možno chápat jako doménově specifický jazyk vestavěný do Haskellu s backendem pro prostředí NVIDIA CUDA. Rozšíření pro vysokoúrovňovou syntézu obvodů ve VHDL představené v této práci používá stejný jazyk a frontend.The aim of this thesis is to research how the functional paradigm can be used for hardware acceleration with an emphasis on data-parallel tasks. The level of abstraction of the traditional hardware description languages, such as VHDL or Verilog, is becoming to low. High-level languages from the domains of software development and modeling, such as C/C++, SystemC or MATLAB, are experiencing a boom for hardware description on the algorithmic or behavioral level. Functional Languages are not so commonly used, but they outperform imperative languages in verification, the ability to capture inherent paralellism and the compactness of code. Data-parallel task are often accelerated on FPGAs, GPUs and multicore processors. In this thesis, we use a library for general-purpose GPU programs called Accelerate and extend it to produce VHDL. Accelerate is a domain-specific language embedded into Haskell with a backend for the NVIDIA CUDA platform. We use the language and its frontend, and create a new backend for high-level synthesis of circuits in VHDL.

    Small Proofs from Congruence Closure

    Full text link
    Satisfiability Modulo Theory (SMT) solvers and equality saturation engines must generate proof certificates from e-graph-based congruence closure procedures to enable verification and conflict clause generation. Smaller proof certificates speed up these activities. Though the problem of generating proofs of minimal size is known to be NP-complete, existing proof minimization algorithms for congruence closure generate unnecessarily large proofs and introduce asymptotic overhead over the core congruence closure procedure. In this paper, we introduce an O(n^5) time algorithm which generates optimal proofs under a new relaxed "proof tree size" metric that directly bounds proof size. We then relax this approach further to a practical O(n \log(n)) greedy algorithm which generates small proofs with no asymptotic overhead. We implemented our techniques in the egg equality saturation toolkit, yielding the first certifying equality saturation engine. We show that our greedy approach in egg quickly generates substantially smaller proofs than the state-of-the-art Z3 SMT solver on a corpus of 3760 benchmarks

    Hardware implementation of non-bonded forces in molecular dynamics simulations

    Get PDF
    Molecular Dynamics is a computational method based on classical mechanics to describe the behavior of a molecular system. This method is used in biomolecular simulations, which are intended to contribute to the study and advance of nanotechnology, medicine, chemistry and biology. Software implementations of Molecular Dynamics simulations can spend most of time computing the non-bonded interactions. This work presents the design and implementation of an FPGA-based coprocessor that accelerates MD simulations by computing in parallel the non-bonded interactions, specifically, the van der Waals and the electrostatic interactions. These interactions are modeled as the Lennard-Jones 6-12 potential and the direct-space Ewald summation, respectively. In addition, this work introduces a novel variable transformation of the potential energy functions, and a novel interpolation method with pseudo-floating-point representation to compute the short-range forces. Also, it uses a combination of fixed-point and floating-point arithmetic to obtain the best of both representations. The FPGA coprocessor is a memory-mapped system connected to a host by PCI Express, and is provided with interruption capabilities to improve parallelization. Its main block is based on a single functional pipeline, and is connected via Avalon Bus to other peripherals such as the PCIe Hard-IP and the SG-DMA. It is implemented on an Altera¿s EP2AGX125EF35C4 device, can process 16k particles, and is configured to store up to 16 different types of particles. Simulations in a custom C-application for MD that only computes non-bonded forces become up to 12.5x faster using the FPGA coprocessor when considering 12500 atoms.PregradoINGENIERO(A) EN ELECTRÓNIC

    A configurable vector processor for accelerating speech coding algorithms

    Get PDF
    The growing demand for voice-over-packer (VoIP) services and multimedia-rich applications has made increasingly important the efficient, real-time implementation of low-bit rates speech coders on embedded VLSI platforms. Such speech coders are designed to substantially reduce the bandwidth requirements thus enabling dense multichannel gateways in small form factor. This however comes at a high computational cost which mandates the use of very high performance embedded processors. This thesis investigates the potential acceleration of two major ITU-T speech coding algorithms, namely G.729A and G.723.1, through their efficient implementation on a configurable extensible vector embedded CPU architecture. New scalar and vector ISAs were introduced which resulted in up to 80% reduction in the dynamic instruction count of both workloads. These instructions were subsequently encapsulated into a parametric, hybrid SISD (scalar processor)–SIMD (vector) processor. This work presents the research and implementation of the vector datapath of this vector coprocessor which is tightly-coupled to a Sparc-V8 compliant CPU, the optimization and simulation methodologies employed and the use of Electronic System Level (ESL) techniques to rapidly design SIMD datapaths

    High-level synthesis for FPGAs: From prototyping to deployment

    Get PDF
    Abstract-Escalating System-on-Chip design complexity is pushing the design community to raise the level of abstraction beyond RTL. Despite the unsuccessful adoptions of early generations of commercial high-level synthesis (HLS) systems, we believe that the tipping point for transitioning to HLS methodology is happening now, especially for FPGA designs. The latest generation of HLS tools has made significant progress in providing wide language coverage and robust compilation technology, platform-based modeling, advancement in core HLS algorithms, and a domain-specific approach. In this paper we use AutoESL's AutoPilot HLS tool coupled with domain-specific system-level implementation platforms developed by Xilinx as an example to demonstrate the effectiveness of state-of-art C-to-FPGA synthesis solutions targeting multiple application domains. Complex industrial designs targeting Xilinx FPGAs are also presented as case studies, including comparison of HLS solutions versus optimized manual designs. Index Terms-Domain-specific design, field-programmable gate array (FPGA), high-level synthesis (HLS), quality of results (QoR)

    Taylor Expansion Diagrams: A Canonical Representation for Verification of Data Flow Designs

    Full text link
    corecore