667 research outputs found

    Formal Architecture Specification for Time Analysis

    Get PDF
    International audienceWCET calculus is nowadays a must for safety critical systems. As a matter of fact, basic real-time properties rely on accurate timings. Although over the last years, substantial progress has been made in order to get a more precise WCET, we believe that the design of the underlying frameworks deserve more attention. In this paper, we are concerned mainly with two aspects which deal with the modularity of these frameworks. First, we enhance the existing language Sim-nML for describing processors at the instruction level in order to capture modern architecture aspects. Second, we propose a light DSL in order to describe, in a formal prose, architectural aspects related to both the structural aspects as well as to the behavioral aspects

    From FPGA to ASIC: A RISC-V processor experience

    Get PDF
    This work document a correct design flow using these tools in the Lagarto RISC- V Processor and the RTL design considerations that must be taken into account, to move from a design for FPGA to design for ASIC

    Architectural Verification of Four-instruction Superscalar Processor for MIPS I Instruction Set

    Get PDF
    The study undertaken in this thesis tries to tackle this inefficiency by having extra register locations other than the architectural registers called pseudo-registers, and a pointer scheme is followed to reference both architectural and pseudo registers. This scheme renames each logical destination register of an incoming instruction, to a pseudo register referenced by pointers called pseudo-pointers. Two separate lists of these pointers are maintained, one for all types of instructions and the other for only unspeculated instructions. When a branch instruction preceding the speculated instruction is evaluated and it is established that the prediction was correct, the machine state is altered by updating the pointer lists instead of moving the data. As the pointes are only 6-bits, the inefficiency is considerably reduced. This processor scheme is implemented using the Verilog hardware description language (HDL). The following study provides architectural details of each component used in the processor, stressing issues involved in the implementation and methods used to overcome these issues. This study also discusses verification methodology, documenting steps involved in compiling a 'c' program and loading it onto the simulated instructions cache and data cache for simulation. Finally, simulation results are presented for a sample 'c' program verifying the design

    Numerical aerodynamic simulation facility preliminary study: Executive study

    Get PDF
    A computing system was designed with the capability of providing an effective throughput of one billion floating point operations per second for three dimensional Navier-Stokes codes. The methodology used in defining the baseline design, and the major elements of the numerical aerodynamic simulation facility are described

    Design of a Five Stage Pipeline CPU with Interruption System

    Get PDF
    A central processing unit (CPU), also referred to as a central processor unit, is the hardware within a computer that carries out the instructions of a computer program by performing the basic arithmetical, logical, and input/output operations of the system. The term has been in use in the computer industry at least since the early 1960s.The form, design, and implementation of CPUs have changed over the course of their history, but their fundamental operation remains much the same. A computer can have more than one CPU; this is called multiprocessing. All modern CPUs are microprocessors, meaning contained on a single chip. Some integrated circuits (ICs) can contain multiple CPUs on a single chip; those ICs are called multi-core processors. An IC containing a CPU can also contain peripheral devices, and other components of a computer system; this is called a system on a chip (SoC).Two typical components of a CPU are the arithmetic logic unit (ALU), which performs arithmetic and logical operations, and the control unit (CU), which extracts instructions from memory and decodes and executes them, calling on the ALU when necessary. Not all computational systems rely on a central processing unit. An array processor or vector processor has multiple parallel computing elements, with no one unit considered the "center". In the distributed computing model, problems are solved by a distributed interconnected set of processors

    Machine Learning for Microprocessor Performance Bug Localization

    Full text link
    The validation process for microprocessors is a very complex task that consumes substantial engineering time during the design process. Bugs that degrade overall system performance, without affecting its functional correctness, are particularly difficult to debug given the lack of a golden reference for bug-free performance. This work introduces two automated performance bug localization methodologies based on machine learning that aims to aid the debugging process. Our results show that, the evaluated microprocessor core performance bugs whose average IPC impact is greater than 1%, our best-performing technique is able to localize the exact microarchitectural unit of the bug \sim77\% of the time, while achieving a top-3 unit accuracy (out of 11 possible locations) of over 90% for bugs with the same average IPC impact. The proposed system in our simulation setup requires only a few seconds to perform a bug location inference, which leads to a reduced debugging time.Comment: 12 pages, 6 figure

    Design and implementation of a bootrom in a Linux capable RISC-V processor

    Get PDF
    El moviment de codi obert promet revolucionar el món del maquinari igual que el programari ha revolucionat. Gràcies a l'arquitectura de conjunt d'instruccions o ISA (de l'anglès Instruction Set Architecture) RISC-V de codi obert, molts projectes s'estan obrint camí per oferir una alternativa a l'hermètic i privatiu món de l'arquitectura de computadors. En aquest context neix el projecte DRAC, les sigles del qual fan referència, de l'anglès, a Designing RISC-V-based Accelerators for next-generation Computers. Aquest projecte, liderat pel Barcelona Supercomputing Center (BSC), desenvolupa processadors i acceleradors basats en la tecnologia RISC-V, amb l'objectiu d'accelerar tasques de seguretat, medicina personalitzada i navegació autònoma. Aquesta tesi té com a propòsit dissenyar, implementar i verificar una bootrom pel SoC (de l'anglès System on Chip) de 64 bits DRAC 22 nm. Aquest SoC integra un processador RISC-V de 7 etapes anomenat Sargantana. El disseny del SoC, basat en l'anterior \textit{tape-out} anomenat PreDRAC, es divideix en dues parts. Una part conté tots els components orientats a l'ASIC; l'altra conté els elements orientats a la FPGA. Una de les raons principals d'aquesta divisió és que no existia una bootrom orientada a ASIC i, per tant, calia utilitzar la FPGA per arrencar el xip. Amb la integració de la bootrom desenvolupada en aquesta tesi, el SoC serà capaç d'arrencar per ell mateix, eliminant la part orientada a la FPGA del disseny del SoC.The open-source movement promises to revolutionize the hardware world just as it has revolutionized software. Thanks to the open-source RISC-V instruction set architecture (ISA), many projects are making their way to offer an alternative in the hermetic and proprietary world of computer architecture. The DRAC project, whose acronym refers to Designing RISC-V-based Accelerators for next-generation Computers, was created in this context. This project, led by the Barcelona Supercomputing Center (BSC), develops processors and accelerators based on RISC-V technology, and their purpose is to accelerate security tasks, personalized medicine and autonomous navigation. This thesis aims to design, implement and verify a bootrom for the 64-bit DRAC 22 nm System on Chip (SoC). This SoC integrates an in-order 7-stage RISC-V core called Sargantana. The SoC design, based on the previous tape-out PreDRAC, is divided into two parts. One part contains all the ASIC-oriented components; the other contains the FPGA-oriented components. One of the main reasons for this division is that there was no ASIC-oriented bootrom, and therefore, it was necessary to use the FPGA to boot the chip. With the integration of the bootrom developed in this thesis, the SoC will be able to boot by itself, eliminating the FPGA-oriented part of the SoC design

    Doctor of Philosophy

    Get PDF
    dissertationThe embedded system space is characterized by a rapid evolution in the complexity and functionality of applications. In addition, the short time-to-market nature of the business motivates the use of programmable devices capable of meeting the conflicting constraints of low-energy, high-performance, and short design times. The keys to achieving these conflicting constraints are specialization and maximally extracting available application parallelism. General purpose processors are flexible but are either too power hungry or lack the necessary performance. Application-specific integrated circuits (ASICS) efficiently meet the performance and power needs but are inflexible. Programmable domain-specific architectures (DSAs) are an attractive middle ground, but their design requires significant time, resources, and expertise in a variety of specialties, which range from application algorithms to architecture and ultimately, circuit design. This dissertation presents CoGenE, a design framework that automates the design of energy-performance-optimal DSAs for embedded systems. For a given application domain and a user-chosen initial architectural specification, CoGenE consists of a a Compiler to generate execution binary, a simulator Generator to collect performance/energy statistics, and an Explorer that modifies the current architecture to improve energy-performance-area characteristics. The above process repeats automatically until the user-specified constraints are achieved. This removes or alleviates the time needed to understand the application, manually design the DSA, and generate object code for the DSA. Thus, CoGenE is a new design methodology that represents a significant improvement in performance, energy dissipation, design time, and resources. This dissertation employs the face recognition domain to showcase a flexible architectural design methodology that creates "ASIC-like" DSAs. The DSAs are instruction set architecture (ISA)-independent and achieve good energy-performance characteristics by coscheduling the often conflicting constraints of data access, data movement, and computation through a flexible interconnect. This represents a significant increase in programming complexity and code generation time. To address this problem, the CoGenE compiler employs integer linear programming (ILP)-based 'interconnect-aware' scheduling techniques for automatic code generation. The CoGenE explorer employs an iterative technique to search the complete design space and select a set of energy-performance-optimal candidates. When compared to manual designs, results demonstrate that CoGenE produces superior designs for three application domains: face recognition, speech recognition and wireless telephony. While CoGenE is well suited to applications that exhibit a streaming behavior, multithreaded applications like ray tracing present a different but important challenge. To demonstrate its generality, CoGenE is evaluated in designing a novel multicore N-wide SIMD architecture, known as StreamRay, for the ray tracing domain. CoGenE is used to synthesize the SIMD execution cores, the compiler that generates the application binary, and the interconnection subsystem. Further, separating address and data computations in space reduces data movement and contention for resources, thereby significantly improving performance compared to existing ray tracing approaches
    corecore