Search CORE

565 research outputs found

Abstract State Machines 1988-1998: Commented ASM Bibliography

Author: Boerger Egon
Huggins James K.
Publication venue
Publication date: 01/01/1998
Field of study

An annotated bibliography of papers which deal with or use Abstract State Machines (ASMs), as of January 1998.Comment: Also maintained as a BibTeX file at http://www.eecs.umich.edu/gasm

arXiv.org e-Print Archive

CiteSeerX

Kettering University

The Development of TIGRA: A Zero Latency Interface For Accelerator Communication in RISC-V Processors

Author: Green Wesley Brad
Publication venue: Clemson University Libraries
Publication date: 01/05/2022
Field of study

Field programmable gate arrays (FPGA) give developers the ability to design application specific hardware by means of software, providing a method of accelerating algorithms with higher power efficiency when compared to CPU or GPU accelerated applications. FPGA accelerated applications tend to follow either a loosely coupled or tightly coupled design. Loosely coupled designs often use OpenCL to utilize the FPGA as an accelerator much like a GPU, which provides a simplifed design flow with the trade-off of increased overhead and latency due to bus communication. Tightly coupled designs modify an existing CPU to introduce instruction set extensions to provide a minimal latency accelerator at the cost of higher programming effort to include the custom design. This dissertation details the design of the Tightly Integrated, Generic RISC-V Accelerator (TIGRA) interface which provides the benefits of both loosely and tightly coupled accelerator designs. TIGRA enabled designs incur zero latency with a simple-to-use interface that reduces programming effort when implementing custom logic within a processor. This dissertation shows the incorporation of TIGRA into the simple PicoRV32 processor, the highly customizable Rocket Chip generator, and the FPGA optimized Taiga processor. Each processor design is tested with AES 128-bit encryption and posit arithmetic to demonstrate TIGRA functionality. After a one time programming cost to incorporate a TIGRA interface into an existing processor, new functional units can be added with up to a 75% reduction in the lines of code required when compared to non-TIGRA enabled designs. Additionally, each functional unit created is co-compatible with each processor as the TIGRA interface remains constant between each design. The results prove that using the TIGRA interface introduces no latency and is capable of incorporating existing custom logic designs without modification for all three processors tested. When compared to the PicoRV32 coprocessor interface (PCPI), TIGRA coupled designs complete one clock cycle faster. Similarly, TIGRA outperforms the Rocket Chip custom coprocessor (RoCC) interface by an average of 6.875 clock cycles per instruction. The Taiga processor\u27s decoupled execution units allow for instructions to execute concurrently and uses a tag management system that is similar to out-of-order processors. The inclusion of the TIGRA interface within this processor abstracts the tag management from the user and demonstrates that the TIGRA interface can be applied to out-of-order processors. When coupled with partial reconfiguration, the flexibility and modularity of TIGRA drastically increases. By creating a reprogrammable region for the custom logic connected via TIGRA, users can swap out the connected design at runtime to customize the processor for a given application. Further, partial reconfiguration allows users to only compile the custom logic design as opposed to the entire CPU, resulting in an 18.1% average reduction of compilation during the design process in the case studies. Paired with the programming effort saved by using TIGRA, partial reconfiguration improves the time to design and test new functionality timelines for a processor

Clemson University: TigerPrints

Advanced information processing system: The Army fault tolerant architecture conceptual study. Volume 1: Army fault tolerant architecture overview

Author: Alger L. S.
Babikyan C. A.
Butler B. P.
Friend S. A.
Ganska R. J.
Harper R. E.
Lala J. H.
Masotto T. K.
Meyer A. J.
Morton D. P.
Publication venue
Publication date
Field of study

Digital computing systems needed for Army programs such as the Computer-Aided Low Altitude Helicopter Flight Program and the Armored Systems Modernization (ASM) vehicles may be characterized by high computational throughput and input/output bandwidth, hard real-time response, high reliability and availability, and maintainability, testability, and producibility requirements. In addition, such a system should be affordable to produce, procure, maintain, and upgrade. To address these needs, the Army Fault Tolerant Architecture (AFTA) is being designed and constructed under a three-year program comprised of a conceptual study, detailed design and fabrication, and demonstration and validation phases. Described here are the results of the conceptual study phase of the AFTA development. Given here is an introduction to the AFTA program, its objectives, and key elements of its technical approach. A format is designed for representing mission requirements in a manner suitable for first order AFTA sizing and analysis, followed by a discussion of the current state of mission requirements acquisition for the targeted Army missions. An overview is given of AFTA's architectural theory of operation

NASA Technical Reports Server

Software Technologies - 8th International Joint Conference, ICSOFT 2013 : Revised Selected Papers

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

University of Twente Research Information

Evaluating kernels on Xeon Phi to accelerate Gysela application

Author: Bigot J.
Cartier-Michaud T.
Grandgirard V.
Haefele M.
Latu G.
Rozar F.
Publication venue
Publication date: 21/07/2014
Field of study

This work describes the challenges presented by porting parts ofthe Gysela code to the Intel Xeon Phi coprocessor, as well as techniques used for optimization, vectorization and tuning that can be applied to other applications. We evaluate the performance of somegeneric micro-benchmark on Phi versus Intel Sandy Bridge. Several interpolation kernels useful for the Gysela application are analyzed and the performance are shown. Some memory-bound and compute-bound kernels are accelerated by a factor 2 on the Phi device compared to Sandy architecture. Nevertheless, it is hard, if not impossible, to reach a large fraction of the peek performance on the Phi device,especially for real-life applications as Gysela. A collateral benefit of this optimization and tuning work is that the execution time of Gysela (using 4D advections) has decreased on a standard architecture such as Intel Sandy Bridge.Comment: submitted to ESAIM proceedings for CEMRACS 2014 summer school version reviewe

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL-CEA

HAL UVSQ

Analysis and modeling of the timing behavior of GPU architectures

Author: Voudouris P.
Publication venue
Publication date: 01/01/2014
Field of study

Repository TU/e

Pure OAI Repository

Compiler verification in the context of pervasive system verification

Author: Leinenbach Dirk Carsten
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2008
Field of study

This thesis presents the formal verification of the compiling specification for a simple, non-optimizing compiler from the C-like programming language C0 to VAMP assembly code. The main result is a step-by-step simulation theorem between C0 programs and the compiled code (which is specified by the compiling specification). Additionally, a C0 small-step semantics and a verification methodology for VAMP assembly have been developed. This work is part of the Verisoft project which aims at the pervasive formal verification of an entire computer system. The key concept in Verisoft';s methodology is to prove properties of computer systems at the relatively abstract C0 layer and to transfer them via several intermediate layers down to the concrete hardware layer. After successful transfer of a property to the hardware layer, we can be sure that no oversimplifications have been done in the formalizations of the more abstract layers. This context of pervasive system verification imposes several special requirements to our compiler correctness theorem. In particular, the simulation theorem had to be formulated based on small-step semantics to allow for reasoning about non-terminating and interleaving programs. Another important feature is that our result incorporates resource restrictions at the hardware layer and allows to discharge them at the C0 layer. All results presented in this thesis have been formalized in the theorem prover Isabelle / HOL.Die vorliegende Arbeit befasst sich mit der formalen Verifikation des Codegenerierungsalgorithmus eines nicht optimierenden Compilers von der C-ähnlichen Sprache C0 nach VAMP Assembler. Das Hauptergebnis ist ein Schritt-für- Schritt Simulationssatz zwischen C0 Programmen und dem compilierten Code. Die Arbeit umfasst zus¨atzlich die Entwicklung einer Small-Step Semantik für C0 sowie einer Verifikationsmethodik für VAMP Assemblerprogramme. Diese Arbeit ist Teil des Verisoft Projekts, das auf die durchgängige formale Verifikation von Computersystemen abzielt. Die Methodik von Verisoft basiert auf der Verifikation von Eigenschaften eines Computersystems auf der relativ abstrakten C0 Ebene und deren anschließendem Transfer auf die konkrete Hardwareebene. Ein solcher erfolgreicher Eigenschaftstransfer garantiert, dass auf den abstrakten Ebenen keine zu starken Vereinfachungen vorgenommen worden sind. Die Einbettung in die durchgängige Verifikation von Systemen stellt zahlreiche speziellen Anforderungen an den Compilerkorrektheitssatz. Insbesondere muss der Simulationssatz auf einer Small-Step Semantik basieren, um die Behandlung von nebenläufigen und von nicht terminierenden Programmen zu ermöglichen. Eine weitere Eigenschaft ist, dass unser Resultat Ressourcenbeschränkungen auf der Hardwareebene einbezieht und deren Entlastung auf der C0 Ebene erlaubt. Alle Resultate dieser Arbeit sind im Theorembeweiser Isabelle / HOL formalisiert worden

Universaar

Acronym