3,909 research outputs found
From FPGA to ASIC: A RISC-V processor experience
This work document a correct design flow using these tools in the Lagarto RISC- V Processor and the RTL design considerations that must be taken into account, to move from a design for FPGA to design for ASIC
On the Determinism of Multi-core Processors
Hard real time systems are evolving in order to respond to the increasing demand in complex functionalities while taking advantage of newer hardware. Software development for safety critical systems has to comply with strict requirements that will facilitate the certification process. During this process, each part of the system is evaluated, requiring a certain level of assurance in order to provide confidence in the product. In particular there must be a level of confidence that the system behaves deterministically that may be based on functionality, resources and time. The success of system verification depends greatly on the capacity to determine its exact behavior. Nonetheless, hardware evolved in order to maximize the average computation power throughput with little to no regard to the deterministic aspect. Therefore modern architectural features of processors, like pipelines, cache memories and co-processors, make it hard to verify that all the needed properties are respected. The multi-core is furthermore difficult to analyze as the architecture employs mechanisms that compromise strong spatial and temporal partitioning when using shared resources without rigorous access control like shared caches or shared input/outputs. In this paper we identify and analyze the main sources of nondeterminism of the multi-cores with regard to the timing estimation. Precise determination of the worst case execution time is a challenging task even in single-core architectures. The problems are accentuated in the multi-core context mainly due to the resource sharing that can lead to highly complex interactions or to nondeterminism. Most of the units that generate behaviors that are hard to take into account can be deactivated, but it is not always easy to predict the impact on the performance. Nevertheless some of the features cannot be disabled (such as the out of order execution or some nondeterministic crossbar access policies) which leads to the invalidation of the respective platform for applications with high criticality level. We will address the problematic units, propose configuration or architecture guidelines and estimate their impact on the performance and determinism of the system
Leveraging register windows to reduce physical registers to the bare minimum
Register window is an architectural technique that reduces memory operations required to save and restore registers across procedure calls. Its effectiveness depends on the size of the register file. Such register requirements are normally increased for out-of-order execution because it requires registers for the in-flight instructions, in addition to the architectural ones. However, a large register file has an important cost in terms of area and power and may even affect the cycle time. In this paper, we propose a software/hardware early register release technique that leverage register windows to drastically reduce the register requirements, and hence, reduce the register file cost. Contrary to the common belief that out-of-order processors with register windows would need a large physical register file, this paper shows that the physical register file size may be reduced to the bare minimum by using this novel microarchitecture. Moreover, our proposal has much lower hardware complexity than previous approaches, and requires minimal changes to a conventional register window scheme. Performance studies show that the proposed technique can reduce the number of physical registers to the number of logical registers plus one (minimum number to guarantee forward progress) and still achieve almost the same performance as an unbounded register file.Peer ReviewedPostprint (published version
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)
A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture
Microprocessors: the engines of the digital age
The microprocessor—a computer central processing unit integrated onto a single microchip—has come to dominate computing across all of its scales from the tiniest consumer appliance to the largest supercomputer. This dominance has taken decades to achieve, but an irresistible logic made the ultimate outcome inevitable. The objectives of this Perspective paper are to offer a brief history of the development of the microprocessor and to answer questions such as: where did the microprocessor come from, where is it now, and where might it go in the future
Custom-Enabled System Architectures for High End Computing
The US Federal Government has convened a major committee to determine future directions for government sponsored high end computing system acquisitions and enabling research. The High End Computing Revitalization Task Force was inaugurated in 2003 involving all Federal agencies for which high end computing is critical to meeting mission goals. As part of the HECRTF agenda, a multi-day community wide workshop was conducted involving experts from academia, industry, and the national laboratories and centers to provide the broadest perspective on important issues related to the HECRTF purview. Among the most critical issues in establishing future directions is the relative merits of commodity based systems such as clusters and MPPs versus custom system architecture strategies. This paper presents a perspective on the importance and value of the custom architecture approach in meeting future US requirements in supercomputing. The contents of this paper reflect the ideas of the participants of the working group chartered to explore custom enabled system architectures for high end computing. As in any such consensus presentation, while this paper captures the key ideas and tradeoffs, it does not exactly match the viewpoint of any single contributor, and there remains much room for constructive disagreement and refinement of the essential conclusions
Energy challenges for ICT
The energy consumption from the expanding use of information and communications technology (ICT) is unsustainable with present drivers, and it will impact heavily on the future climate change. However, ICT devices have the potential to contribute signi - cantly to the reduction of CO2 emission and enhance resource e ciency in other sectors, e.g., transportation (through intelligent transportation and advanced driver assistance systems and self-driving vehicles), heating (through smart building control), and manu- facturing (through digital automation based on smart autonomous sensors). To address the energy sustainability of ICT and capture the full potential of ICT in resource e - ciency, a multidisciplinary ICT-energy community needs to be brought together cover- ing devices, microarchitectures, ultra large-scale integration (ULSI), high-performance computing (HPC), energy harvesting, energy storage, system design, embedded sys- tems, e cient electronics, static analysis, and computation. In this chapter, we introduce challenges and opportunities in this emerging eld and a common framework to strive towards energy-sustainable ICT
Recommended from our members
AN ARCHITECTURE EVALUATION AND IMPLEMENTATION OF A SOFT GPGPU FOR FPGAs
Embedded and mobile systems must be able to execute a variety of different types of code, often with minimal available hardware. Many embedded systems now come with a simple processor and an FPGA, but not more energy-hungry components, such as a GPGPU. In this dissertation we present FlexGrip, a soft architecture which allows for the execution of GPGPU code on an FPGA without the need to recompile the design. The architecture is optimized for FPGA implementation to effectively support the conditional and thread-based execution characteristics of GPGPU execution without FPGA design recompilation. This architecture supports direct CUDA compilation to a binary which is executable on the FPGA-based GPGPU. Our architecture is customizable, thus providing the FPGA designer with a selection of GPGPU cores which display performance versus area tradeoffs.
This dissertation describes the FlexGrip architecture in detail and showcases the benefits by evaluating the design for a collection of five standard CUDA benchmarks which are compiled using standard GPGPU compilation tools. Speedups of 23x, on average, versus a MicroBlaze microprocessor are achieved for designs which take advantage of the conditional execution capabilities offered by FlexGrip. We also show FlexGrip can achieve an 80% average reduction of dynamic energy versus the MicroBlaze microprocessor.
The dissertation furthers discussion by exploring application-customized versions of the soft GPGPU, thus exploiting the overlay architecture. We expand the architecture to multiple processors per GPGPU and optimizing away features which are not needed by certain classes of applications. These optimizations, which include the effective use of block RAMs and DSP blocks, are critical to the performance of FlexGrip. By implementing a 2 GPGPU design, we show speedups of 44x on average versus a MicroBlaze microprocessor. Application-customized versions of the soft GPGPU can be used to further reduce dynamic energy consumption by an average of 14%.
To complete this thesis, we augmented a GPGPU cycle accurate simulator to emulate FlexGrip and evaluate different levels of cache design spaces. We show performance increases for select benchmarks, however, we also show that 64% and 45% of benchmarks exhibited performance decreases when L1D cache was enabled for the 1 SMP and 2 SMP configurations, and only one benchmark showed performance improvement when the L2 cache was enabled
- …