214 research outputs found

    Essentials of computing systems

    Get PDF
    Computers were invented to “compute“, i.e., to solve all sort of mathematical problems. A computer system contains hardware and systems software that work together to run software applications. The underlying concepts that support the construction of a computer are relatively stable. In fact, (almost) all computer systems have a similar organization, i.e., their hardware and software components are arranged in hierarchical layers (or levels) and perform similar functions. This book is written for programmers and software engineers who want to understand how the components of a computer work and how they affect the correctness and performance of their programs.Publishe

    A formally verified compiler back-end

    Get PDF
    This article describes the development and formal verification (proof of semantic preservation) of a compiler back-end from Cminor (a simple imperative intermediate language) to PowerPC assembly code, using the Coq proof assistant both for programming the compiler and for proving its correctness. Such a verified compiler is useful in the context of formal methods applied to the certification of critical software: the verification of the compiler guarantees that the safety properties proved on the source code hold for the executable compiled code as well

    HTA: A Scalable High-Throughput Accelerator for Irregular HPC Workloads

    Get PDF
    We propose a new architecture called HTA for high throughput irregular HPC applications with little data reuse. HTA reduces the contention within the memory system with the help of a partitioned memory controller that is amenable for 2.5D implementation using Silicon Photonics. In terms of scalability, HTA supports 4 × higher number of compute units compared to the state-of-the-art GPU systems. Our simulation-based evaluation on a representative set of HPC benchmarks shows that the proposed design reduces the queuing latency by 10% to 30%, and improves the variability in memory access latency by 10% to 60%. Our results show that the HTA improves the L1 miss penalty by 2.3 × to 5 × over GPUs. When compared to a multi-GPU system with the same number of compute units, our simulation results show that the HTA can provide up to 2 × speedup

    A computer-aided design for digital filter implementation

    Get PDF
    Imperial Users onl

    A transprecision floating-point cluster for efficient near-sensor data analytics

    Full text link
    Recent applications in the domain of near-sensor computing require the adoption of floating-point arithmetic to reconcile high precision results with a wide dynamic range. In this paper, we propose a multi-core computing cluster that leverages the fined-grained tunable principles of transprecision computing to provide support to near-sensor applications at a minimum power budget. Our design - based on the open-source RISC-V architecture - combines parallelization and sub-word vectorization with near-threshold operation, leading to a highly scalable and versatile system. We perform an exhaustive exploration of the design space of the transprecision cluster on a cycle-accurate FPGA emulator, with the aim to identify the most efficient configurations in terms of performance, energy efficiency, and area efficiency. We also provide a full-fledged software stack support, including a parallel runtime and a compilation toolchain, to enable the development of end-to-end applications. We perform an experimental assessment of our design on a set of benchmarks representative of the near-sensor processing domain, complementing the timing results with a post place-&-route analysis of the power consumption. Finally, a comparison with the state-of-the-art shows that our solution outperforms the competitors in energy efficiency, reaching a peak of 97 Gflop/s/W on single-precision scalars and 162 Gflop/s/W on half-precision vectors

    FPGA Frequency Domain Based Gps Coarse Acquisition Processor using FFT

    Get PDF
    The Global Positioning System or GPS is a satellite based technology that has gained widespread use worldwide in civilian and military applications. Direct Sequence Spread spectrum (DSSS) is the method whereby the data transmitted by the satellite and received by user is kept secure, low power and relatively noise-immune. The first step required in the GPS operation is to perform a lock on the incoming signal, both with respect to time synchronization and frequency resolution. Because of the need for reduced time to lock and also reduced hardware, algorithms based in the frequency domain have been developed. These algorithms take advantage of the time to frequency matrix operation known as the fast Fourier transform or FFT. For this thesis, a Direct Sequence Spread Spectrum Coarse Acquisition code processor based on the FFT was implemented in VHDL and targeted to a Xilinx Virtex –II Pro Field Programmable Gate Array (FPGA). The use of the FFT allows simultaneous lock on coarse acquisition (C/A) code and carrier frequency. Because of hardware limitations, a novel technique of sub-sampling is used in this system to obtain data block sizes that match hardware limitations. In addition, design challenges related to scheduling and timing were addressed, allowing a system with 19 pipeline stages to be built. The system, which fits on a Xilinx Virtex-II pro XC2VP70 FPGA, uses 10 ms of data to perform the lock with 5.5 ms of processing time at 100 MHz and theoretically can operate on signals 20 db below the noise floor

    Execution model and optimizing compilation for execution migration

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 137-141).Although systems with hardware support for fine-grained execution migration are becoming a reality, no concrete execution model or compiler exist for these machines. This limits the complexity of software that can be written for these machines, and therefore also the scope of studies for which these machines can be used. In this thesis, we define a productive programming model for an execution migration platform by exposing migration as a set of interfaces usable with the C programming language via a custom optimizing compiler. We employ hardware-software co-design to describe a stack core architecture with support for partial context migration in order to simplify the compiler problem and improve compiler efficiency. We also consider instruction encoding in abstract terms to establish a baseline comparison of encoded instruction density to an ideal upper bound. The stack-based execution migration platform offers a new and unexplored cost model, which leads us to reevaluate the trade-offs associated with compilation for these architectures, and to explore novel algorithms, or novel applications of existing optimizations. Throughout this work, we attempt to gain a deep understanding of the costs and benefits of execution migration by aggressive design space exploration. We use the insight gained to better inform the the problem of compiling to this unorthodox architecture, and design the compiler, a library of optimized parallel primitives, and a set of compiler optimization passes to best reflect and utilize the underlying hardware.by Ilia Andreevich Lebedev.S.M
    • …
    corecore