Search CORE

80 research outputs found

Effective Simulation for The Giga-scale Massively Parallel Supercomputer SR2201

Author: Junji Nakagoshi
Kaoru Suzuki
Masato Kurosaki
Shunsuke Miyamoto
Publication venue
Publication date: 11/04/2020
Field of study

Obtaining performance and programmability using reconfigurable hardware for media processing

Author: Kung Ling-Pei, 1961-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2002
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2002.Includes bibliographical references (p. 127-132).An imperative requirement in the design of a reconfigurable computing system or in the development of a new application on such a system is performance gains. However, such developments suffer from long-and-difficult programming process, hard-to-predict performance gains, and limited scope of applications. To address these problems, we need to understand reconfigurable hardware's capabilities and limitations, its performance advantages and disadvantages, re-think reconfigurable system architectures, and develop new tools to explore its utility. We begin by examining performance contributors at the system level. We identify those from general-purpose and those from dedicated components. We propose an architecture by integrating reconfigurable hardware within the general-purpose framework. This is to avoid and minimize dedicated hardware and organization for programmability. We analyze reconfigurable logic architectures and their performance limitations. This analysis leads to a theory that reconfigurable logic can never be clocked faster than a fixed-logic design based on the same fabrication technology. Though highly unpredictable, we can obtain a quick upper bound estimate on the clock speed based on a few parameters. We also analyze microprocessor architectures and establish an analytical performance model. We use this model to estimate performance bounds using very little information on task properties. These bounds help us to detect potential memory-bound tasks. For a compute-bound task, we compare its performance upper bound with the upper bound on reconfigurable clock speed to further rule out unlikely speedup candidates.(cont.) These performance estimates require very few parameters, and can be quickly obtained without writing software or hardware codes. They can be integrated with design tools as front end tools to explore speedup opportunities without costly trials. We believe this will broaden the applicability of reconfigurable computing.by Ling-Pei Kung.Ph.D

DSpace@MIT

Design and implementation of a multi-purpose cluster system NIU

Author: Ang Boon Seong, 1966-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1999
Field of study

Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (p. 209-221).by Boon Seong Ang.Ph.D

DSpace@MIT

Método para la evaluación de un microcontrolador de núcleo abierto

Author: Bricout Christophe F. L.
Jasinski Ricardo P.
Luz Sibilla B
Pedroni Volnei A.
Publication venue: 'Universidad Distrital Francisco Jose de Caldas'
Publication date: 01/12/2011
Field of study

La etapa de verifi cación desempeña un papel fundamental en el diseñoe implementación de microcontroladores. Con el fi n de realizar una verificación acertada del diseño, son utilizadas algunas técnicas de verificación funcional tales como: pruebas defi nidas por el diseñador paraverifi car el desempeño ante casos extremos, la simulación a través detestbenches, y la ejecución de aplicaciones extensas. El proyecto propuestoen este trabajo tiene como objetivo desarrollar e implementarun método para la evaluación de un microcontrolador de núcleo abierto,con la realización de pruebas directamente sobre el hardware. Esteenfoque presenta como ventajas, un proceso mucho más rápido queotros métodos que emplean simulaciones y menos requerimiento dememoria para las pruebas. Un Ethernet IP Core ha sido integrado alproyecto, con el fi n de hacer que el método sea independiente del sistemaoperativo, de la arquitectura de microprocesador y de la herramientade diseño

Universidad Distrital de la ciudad de Bogotá: Open Journal Systems

Directory of Open Access Journals

Recommended from our members

Scalable algorithms for software based self test using formal methods

Author: Prabhu Mahesh
Publication venue
Publication date: 09/07/2014
Field of study

textTransistor scaling has kept up with Moore's law with a doubling of the number of transistors on a chip. More logic on a chip means more opportunities for manufacturing defects to slip in. This, in turn, has made processor testing after manufacturing a significant challenge. At-speed functional testing, being completely non-intrusive, has been seen as the ideal way of testing chips. However for processor testing, generating instruction level tests for covering all faults is a challenge given the issue of scalability. Data-path faults are relatively easier to control and observe compared to control-path faults. In this research we present a novel method to generate instruction level tests for hard to detect control-path faults in a processor. We initially map the gate level stuck-at fault to the Register Transfer Level (RTL) and build an equivalent faulty RTL model. The fault activation and propagation constraints are captured using Control and Data Flow Graphs of the RTL as a Liner Temporal Logic (LTL) property. This LTL property is then negated and given to a Bounded Model Checker based on a Bit-Vector Satisfiability Module Theories (SMT) solver. From the counter-example to the property we can extract a sequence of instructions that activates the gate level fault and propagates the fault effect to one of the observable points in the design. Other than the user supplying instruction constraints, this approach is completely automatic and does not require any manual intervention. Not all the design behaviors are required to generate a test for a fault. We use this insight to scale our previous methodology further. Underapproximations are design abstractions that only capture a subset of the original design behaviors. The use of RTL for test generation affords us two types of under-approximations: bit-width reduction and operator approximation. These are abstractions that perform reductions based on semantics of the RTL design. We also explore structural reductions of the RTL, called path based search, where we search through error propagation paths incrementally. This approach increases the size of the test generation problem step by step. In this way the SMT solver searches through the state space piecewise rather than doing the entire search at once. Experimental results show that our methods are robust and scalable for generating functional tests for hard to detect faults.Electrical and Computer Engineerin

Texas ScholarWorks

Optimizing SIMD execution in HW/SW co-designed processors

Author: Kumar Rakesh
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2014
Field of study

SIMD accelerators are ubiquitous in microprocessors from different computing domains. Their high compute power and hardware simplicity improve overall performance in an energy efficient manner. Moreover, their replicated functional units and simple control mechanism make them amenable to scaling to higher vector lengths. However, code generation for these accelerators has been a challenge from the days of their inception. Compilers generate vector code conservatively to ensure correctness. As a result they lose significant vectorization opportunities and fail to extract maximum benefits out of SIMD accelerators. This thesis proposes to vectorize the program binary at runtime in a speculative manner, in addition to the compile time static vectorization. There are different environments that support runtime profiling and optimization support required for dynamic vectorization, one of most prominent ones being: 1) Dynamic Binary Translators and Optimizers (DBTO) and 2) Hardware/Software (HW/SW) Co-designed Processors. HW/SW co-designed environment provides several advantages over DBTOs like transparent incorporations of new hardware features, binary compatibility, etc. Therefore, we use HW/SW co-designed environment to assess the potential of speculative dynamic vectorization. Furthermore, we analyze vector code generation for wider vector units and find out that even though SIMD accelerators are amenable to scaling from the hardware point of view, vector code generation at higher vector length is even more challenging. The two major factors impeding vectorization for wider SIMD units are: 1) Reduced dynamic instruction stream coverage for vectorization and 2) Large number of permutation instructions. To solve the first problem we propose Variable Length Vectorization that iteratively vectorizes for multiple vector lengths to improve dynamic instruction stream coverage. Secondly, to reduce the number of permutation instructions we propose Selective Writing that selectively writes to different parts of a vector register and avoids permutations. Finally, we tackle the problem of leakage energy in SIMD accelerators. Since SIMD accelerators consume significant amount of real estate on the chip, they become the principle source of leakage if not utilized judiciously. Power gating is one of the most widely used techniques to reduce leakage energy of functional units. However, power gating has its own energy and performance overhead associated with it. We propose to selectively devectorize the vector code when higher SIMD lanes are used intermittently. This selective devectorization keeps the higher SIMD lanes idle and power gated for maximum duration. Therefore, resulting in overall leakage energy reduction.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Reconfigurable Model for RISC Processors

Author: Lima Thiago Pinheiro Felix da Silva e
Publication venue: RIT Scholar Works
Publication date: 01/12/2016
Field of study

The instruction set of a processor is embodied in the particular micro-architecture representing the processor hardware. Verifying proper operation of the instruction set for a particular processor hardware implementation requires exhaustive testing to expose unknown dependencies and other elusive design flaws. This paper presents the research and development of a flexible micro-architectural model written in SystemC for a RISC processor based upon a user defined configuration database; the RISC processor is based on an architecture assigned in course Design of Computer Systems (DCS) offered at Rochester Institute of Technology (RIT). This model will be tested by a test bench written in SystemVerilog, using randomly generated instructions, and results will be compared with various DCS student processors originally developed at the Register Transfer Level (RTL) in a Hardware Description Language (HDL) such as Verilog or VHDL. The test bench will provide stimulus such as the system clock and random instructions through a program memory attached to both the model and RTL processor. The main goal of this work is to automate verification and validation of a diverse set of processors designed in RTL by using an appropriate configuration database and comparison of all states and signals from the processor being tested and the model developed by the author. The test results will be compared and discussed

RIT Scholar Works

Handbook of digital techniques for high-speed design: design examples, signaling and memory technologies, fiber optics, modeling and simulation to ensure signal integrity

Author: Granberg Tom
Publication venue: Prentice-Hall
Publication date: 01/01/2004
Field of study

CERN Document Server

ASC: A stream compiler for computing with FPGAs

Author: Mencer O
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Published versio

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository