214 research outputs found
Essentials of computing systems
Computers were invented to “compute“, i.e., to solve all sort of mathematical problems. A computer system contains hardware and systems software that work together to run software applications. The underlying concepts that support the construction of a computer are relatively stable. In fact, (almost) all computer systems have a similar organization, i.e., their hardware and software components are arranged in hierarchical layers (or levels) and perform similar functions. This book is written for programmers and software engineers who want to understand how the components of a computer work and how they affect the correctness and performance of their programs.Publishe
A formally verified compiler back-end
This article describes the development and formal verification (proof of
semantic preservation) of a compiler back-end from Cminor (a simple imperative
intermediate language) to PowerPC assembly code, using the Coq proof assistant
both for programming the compiler and for proving its correctness. Such a
verified compiler is useful in the context of formal methods applied to the
certification of critical software: the verification of the compiler guarantees
that the safety properties proved on the source code hold for the executable
compiled code as well
HTA: A Scalable High-Throughput Accelerator for Irregular HPC Workloads
We propose a new architecture called HTA for high throughput irregular HPC applications with little data reuse. HTA reduces the contention within the memory system with the help of a partitioned memory controller that is amenable for 2.5D implementation using Silicon Photonics. In terms of scalability, HTA supports 4 × higher number of compute units compared to the state-of-the-art GPU systems. Our simulation-based evaluation on a representative set of HPC benchmarks shows that the proposed design reduces the queuing latency by 10% to 30%, and improves the variability in memory access latency by 10% to 60%. Our results show that the HTA improves the L1 miss penalty by 2.3 × to 5 × over GPUs. When compared to a multi-GPU system with the same number of compute units, our simulation results show that the HTA can provide up to 2 × speedup
A computer-aided design for digital filter implementation
Imperial Users onl
A transprecision floating-point cluster for efficient near-sensor data analytics
Recent applications in the domain of near-sensor computing require the
adoption of floating-point arithmetic to reconcile high precision results with
a wide dynamic range. In this paper, we propose a multi-core computing cluster
that leverages the fined-grained tunable principles of transprecision computing
to provide support to near-sensor applications at a minimum power budget. Our
design - based on the open-source RISC-V architecture - combines
parallelization and sub-word vectorization with near-threshold operation,
leading to a highly scalable and versatile system. We perform an exhaustive
exploration of the design space of the transprecision cluster on a
cycle-accurate FPGA emulator, with the aim to identify the most efficient
configurations in terms of performance, energy efficiency, and area efficiency.
We also provide a full-fledged software stack support, including a parallel
runtime and a compilation toolchain, to enable the development of end-to-end
applications. We perform an experimental assessment of our design on a set of
benchmarks representative of the near-sensor processing domain, complementing
the timing results with a post place-&-route analysis of the power consumption.
Finally, a comparison with the state-of-the-art shows that our solution
outperforms the competitors in energy efficiency, reaching a peak of 97
Gflop/s/W on single-precision scalars and 162 Gflop/s/W on half-precision
vectors
FPGA Frequency Domain Based Gps Coarse Acquisition Processor using FFT
The Global Positioning System or GPS is a satellite based technology that has gained widespread use worldwide in civilian and military applications. Direct Sequence Spread spectrum (DSSS) is the method whereby the data transmitted by the satellite and received by user is kept secure, low power and relatively noise-immune. The first step required in the GPS operation is to perform a lock on the incoming signal, both with respect to time synchronization and frequency resolution. Because of the need for reduced time to lock and also reduced hardware, algorithms based in the frequency domain have been developed. These algorithms take advantage of the time to frequency matrix operation known as the fast Fourier transform or FFT. For this thesis, a Direct Sequence Spread Spectrum Coarse Acquisition code processor based on the FFT was implemented in VHDL and targeted to a Xilinx Virtex –II Pro Field Programmable Gate Array (FPGA). The use of the FFT allows simultaneous lock on coarse acquisition (C/A) code and carrier frequency. Because of hardware limitations, a novel technique of sub-sampling is used in this system to obtain data block sizes that match hardware limitations. In addition, design challenges related to scheduling and timing were addressed, allowing a system with 19 pipeline stages to be built. The system, which fits on a Xilinx Virtex-II pro XC2VP70 FPGA, uses 10 ms of data to perform the lock with 5.5 ms of processing time at 100 MHz and theoretically can operate on signals 20 db below the noise floor
Execution model and optimizing compilation for execution migration
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 137-141).Although systems with hardware support for fine-grained execution migration are becoming a reality, no concrete execution model or compiler exist for these machines. This limits the complexity of software that can be written for these machines, and therefore also the scope of studies for which these machines can be used. In this thesis, we define a productive programming model for an execution migration platform by exposing migration as a set of interfaces usable with the C programming language via a custom optimizing compiler. We employ hardware-software co-design to describe a stack core architecture with support for partial context migration in order to simplify the compiler problem and improve compiler efficiency. We also consider instruction encoding in abstract terms to establish a baseline comparison of encoded instruction density to an ideal upper bound. The stack-based execution migration platform offers a new and unexplored cost model, which leads us to reevaluate the trade-offs associated with compilation for these architectures, and to explore novel algorithms, or novel applications of existing optimizations. Throughout this work, we attempt to gain a deep understanding of the costs and benefits of execution migration by aggressive design space exploration. We use the insight gained to better inform the the problem of compiling to this unorthodox architecture, and design the compiler, a library of optimized parallel primitives, and a set of compiler optimization passes to best reflect and utilize the underlying hardware.by Ilia Andreevich Lebedev.S.M
- …