438 research outputs found

    Harnessing Simulation Acceleration to Solve the Digital Design Verification Challenge.

    Full text link
    Today, design verification is by far the most resource and time-consuming activity of any new digital integrated circuit development. Within this area, the vast majority of the verification effort in industry relies on simulation platforms, which are implemented either in hardware or software. A "simulator" includes a model of each component of a design and has the capability of simulating its behavior under any input scenario provided by an engineer. Thus, simulators are deployed to evaluate the behavior of a design under as many input scenarios as possible and to identify and debug all incorrect functionality. Two features are critical in simulators for the validation effort to be effective: performance and checking/debugging capabilities. A wide range of simulator platforms are available today: on one end of the spectrum there are software-based simulators, providing a very rich software infrastructure for checking and debugging the design's functionality, but executing only at 1-10 simulation cycles per second (while actual chips operate at GHz speeds). At the other end of the spectrum, there are hardware-based platforms, such as accelerators, emulators and even prototype silicon chips, providing higher performances by 4 to 9 orders of magnitude, at the cost of very limited or non-existent checking/debugging capabilities. As a result, today, simulation-based validation is crippled: one can either have satisfactory performance on hardware-accelerated platforms or critical infrastructures for checking/debugging on software simulators, but not both. This dissertation brings together these two ends of the spectrum by presenting solutions that offer high-performance simulation with effective checking and debugging capabilities. Specifically, it addresses the performance challenge of software simulators by leveraging inexpensive off-the-shelf graphics processors as massively parallel execution substrates, and then exposing the parallelism inherent in the design model to that architecture. For hardware-based platforms, the dissertation provides solutions that offer enhanced checking and debugging capabilities by abstracting the relevant data to be logged during simulation so to minimize the cost of collection, transfer and processing. Altogether, the contribution of this dissertation has the potential to solve the challenge of digital design verification by enabling effective high-performance simulation-based validation.PHDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/99781/1/dchatt_1.pd

    GPU Based Acceleration of SystemC and Transaction Level Models for MPSOC Simulation

    Get PDF
    With increasing number of cores on a chip, the complexity of modeling hardware using virtual prototype is increasing rapidly. Typical SOCs today have multipro-cessors connected through a bus or NOC architecture which can be modeled using SystemC framework. SystemC is a popular language used for early design exploration and performance analysis of complex embedded systems. TLM2.0, an extension of SystemC, is increasingly used in MPSOC designs for simulating loosely and approxi-mately timed transaction level models. The OSCI reference kernel which implements SystemC library runs on a single thread, slowing up the simulation speed to a large extent. Previous works have used the computational power of multi-core systems and GPUs which can run multiple threads simultaneously, speeding up the simu-lation. Multi-core simulations are not as effective in cases where thread runtime is low, because synchronization overhead becomes comparable to thread runtime. Modern GPUs can run thousands of threads at a time and have shown good results for synthesizable designs in recent efforts. However, development in these works are limited to synthesizable subsets of SystemC models, not supporting timed events for process communication. In this research work, a methodology is proposed for accelerating timed event based SystemC TLM2.0 model to GPU based kernel, which maps SystemC processes to CUDA threads in GPU, providing high data level par-allelism. This work aims to provide a scalable solution for simulating large MPSOC designs, facilitating early design exploration and performance analysis. Experiments have shown that the proposed technique provides a speed-up of the order of 100x for typical MPSOC designs

    Doctor of Philosophy in Computer Science

    Get PDF
    dissertationRay tracing is becoming more widely adopted in offline rendering systems due to its natural support for high quality lighting. Since quality is also a concern in most real time systems, we believe ray tracing would be a welcome change in the real time world, but is avoided due to insufficient performance. Since power consumption is one of the primary factors limiting the increase of processor performance, it must be addressed as a foremost concern in any future ray tracing system designs. This will require cooperating advances in both algorithms and architecture. In this dissertation I study ray tracing system designs from a data movement perspective, targeting the various memory resources that are the primary consumer of power on a modern processor. The result is high performance, low energy ray tracing architectures

    VThreads: A novel VLIW chip multiprocessor with hardware-assisted PThreads

    Get PDF
    We discuss VThreads, a novel VLIW CMP with hardware-assisted shared-memory Thread support. VThreads supports Instruction Level Parallelism via static multiple-issue and Thread Level Parallelism via hardware-assisted POSIX Threads along with extensive customization. It allows the instantiation of tightlycoupled streaming accelerators and supports up to 7-address Multiple-Input, Multiple-Output instruction extensions. VThreads is designed in technology-independent Register-Transfer-Level VHDL and prototyped on 40 nm and 28 nm Field-Programmable gate arrays. It was evaluated against a PThreads-based multiprocessor based on the Sparc-V8 ISA. On a 65 nm ASIC implementation VThreads achieves up to x7.2 performance increase on synthetic benchmarks, x5 on a parallel Mandelbrot implementation, 66% better on a threaded JPEG implementation, 79% better on an edge-detection benchmark and ~13% improvement on DES compared to the Leon3MP CMP. In the range of 2 to 8 cores VThreads demonstrates a post-route (statistical) power reduction between 65% to 57% at an area increase of 1.2%-10% for 1-8 cores, compared to a similarly-configured Leon3MP CMP. This combination of micro-architectural features, scalability, extensibility, hardware support for low-latency PThreads, power efficiency and area make the processor an attractive proposition for low-power, deeply-embedded applications requiring minimum OS support

    Digital signal processor fundamentals and system design

    Get PDF
    Digital Signal Processors (DSPs) have been used in accelerator systems for more than fifteen years and have largely contributed to the evolution towards digital technology of many accelerator systems, such as machine protection, diagnostics and control of beams, power supply and motors. This paper aims at familiarising the reader with DSP fundamentals, namely DSP characteristics and processing development. Several DSP examples are given, in particular on Texas Instruments DSPs, as they are used in the DSP laboratory companion of the lectures this paper is based upon. The typical system design flow is described; common difficulties, problems and choices faced by DSP developers are outlined; and hints are given on the best solution

    Pre-validation of SoC via hardware and software co-simulation

    Get PDF
    Abstract. System-on-chips (SoCs) are complex entities consisting of multiple hardware and software components. This complexity presents challenges in their design, verification, and validation. Traditional verification processes often test hardware models in isolation until late in the development cycle. As a result, cooperation between hardware and software development is also limited, slowing down bug detection and fixing. This thesis aims to develop, implement, and evaluate a co-simulation-based pre-validation methodology to address these challenges. The approach allows for the early integration of hardware and software, serving as a natural intermediate step between traditional hardware model verification and full system validation. The co-simulation employs a QEMU CPU emulator linked to a register-transfer level (RTL) hardware model. This setup enables the execution of software components, such as device drivers, on the target instruction set architecture (ISA) alongside cycle-accurate RTL hardware models. The thesis focuses on two primary applications of co-simulation. Firstly, it allows software unit tests to be run in conjunction with hardware models, facilitating early communication between device drivers, low-level software, and hardware components. Secondly, it offers an environment for using software in functional hardware verification. A significant advantage of this approach is the early detection of integration errors. Software unit tests can be executed at the IP block level with actual hardware models, a task previously only possible with costly system-level prototypes. This enables earlier collaboration between software and hardware development teams and smoothens the transition to traditional system-level validation techniques.Järjestelmäpiirin esivalidointi laitteiston ja ohjelmiston yhteissimulaatiolla. Tiivistelmä. Järjestelmäpiirit (SoC) ovat monimutkaisia kokonaisuuksia, jotka koostuvat useista laitteisto- ja ohjelmistokomponenteista. Tämä monimutkaisuus asettaa haasteita niiden suunnittelulle, varmennukselle ja validoinnille. Perinteiset varmennusprosessit testaavat usein laitteistomalleja eristyksissä kehityssyklin loppuvaiheeseen saakka. Tämän myötä myös yhteistyö laitteisto- ja ohjelmistokehityksen välillä on vähäistä, mikä hidastaa virheiden tunnistamista ja korjausta. Tämän diplomityön tavoitteena on kehittää, toteuttaa ja arvioida laitteisto-ohjelmisto-yhteissimulointiin perustuva esivalidointimenetelmä näiden haasteiden ratkaisemiseksi. Menetelmä mahdollistaa laitteiston ja ohjelmiston varhaisen integroinnin, toimien luonnollisena välietappina perinteisen laitteistomallin varmennuksen ja koko järjestelmän validoinnin välillä. Yhteissimulointi käyttää QEMU suoritinemulaattoria, joka on yhdistetty rekisterinsiirtotason (RTL) laitteistomalliin. Tämä mahdollistaa ohjelmistokomponenttien, kuten laiteajureiden, suorittamisen kohdejärjestelmän käskysarja-arkkitehtuurilla (ISA) yhdessä kellosyklitarkkojen RTL laitteistomallien kanssa. Työ keskittyy kahteen yhteissimulaation pääsovellukseen. Ensinnäkin se mahdollistaa ohjelmiston yksikkötestien suorittamisen laitteistomallien kanssa, varmistaen kommunikaation laiteajurien, matalan tason ohjelmiston ja laitteistokomponenttien välillä. Toiseksi se tarjoaa ympäristön ohjelmiston käyttämiseen toiminnallisessa laitteiston varmennuksessa. Merkittävä etu tästä lähestymistavasta on integraatiovirheiden varhainen havaitseminen. Ohjelmiston yksikkötestejä voidaan suorittaa jo IP-lohkon tasolla oikeilla laitteistomalleilla, mikä on aiemmin ollut mahdollista vain kalliilla järjestelmätason prototyypeillä. Tämä mahdollistaa aikaisemman ohjelmisto- ja laitteistokehitystiimien välisen yhteistyön ja helpottaa siirtymistä perinteisiin järjestelmätason validointimenetelmiin

    Achieving Functional Correctness in Large Interconnect Systems.

    Full text link
    In today's semi-conductor industry, large chip-multiprocessors and systems-on-chip are being developed, integrating a large number of components on a single chip. The sheer size of these designs and the intricacy of the communication patterns they exhibit have propelled the development of network-on-chip (NoC) interconnects as the basis for the communication infrastructure in these systems. Faced with the interconnect's growing size and complexity, several challenges hinder its effective validation. During the interconnect's development, the functional verification process relies heavily on the use of emulation and post-silicon validation platforms. However, detecting and debugging errors on these platforms is a difficult endeavour due to the limited observability, and in turn the low verification capabilities, they provide. Additionally, with the inherent incompleteness of design-time validation efforts, the potential of design bugs escaping into the interconnect of a released product is also a concern, as these bugs can threaten the viability of the entire system. This dissertation provides solutions to enable the development of functionally correct interconnect designs. We first address the challenges encountered during design-time verification efforts, by providing two complementary mechanisms that allow emulation and post-silicon verification frameworks to capture a detailed overview of the functional behaviour of the interconnect. Our first solution re-purposes the contents of in-flight traffic to log debug data from the interconnect's execution. This approach enables the validation of the interconnect using synthetic traffic workloads, while attaining over 80% observability of the routes followed by packets and capturing valuable debugging information. We also develop an alternative mechanism that boosts observability by taking periodic snapshots of execution, thus extending the verification capabilities to run both synthetic traffic and real-application workloads. The collected snapshots enhance detection and debugging support, and they provide observability of over 50% of packets and reconstructs at least half of each of their routes. Moreover, we also develop error detection and recovery solutions to address the threat of design bugs escaping into the interconnect's runtime operation. Our runtime techniques can overcome communication errors without needing to store replicate copies of all in-flight packets, thereby achieving correctness at minimal area costsPhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116741/1/rawanak_1.pd

    Heterogeneous processor pipeline for a product cipher application

    Full text link
    Processing data received as a stream is a task commonly performed by modern embedded devices, in a wide range of applications such as multimedia (encoding/decoding/ playing media), networking (switching and routing), digital security, scientific data processing, etc. Such processing normally tends to be calculation intensive and therefore requiring significant processing power. Therefore, hardware acceleration methods to increase the performance of such applications constitute an important area of study. In this paper, we present an evaluation of one such method to process streaming data, namely multi-processor pipeline architecture. The hardware is based on a Multiple-Processor System on Chip (MPSoC), using a data encryption algorithm as a case study. The algorithm is partitioned on a coarse grained level and mapped on to an MPSoC with five processor cores in a pipeline, using specifically configured Xtensa LX3 cores. The system is then selectively optimized by strengthening and pruning the resources of each processor core. The optimized system is evaluated and compared against an optimal single-processor System on Chip (SoC) for the same application. The multiple-processor pipeline system for data encryption algorithms used was observed to provide significant speed ups, up to 4.45 times that of the single-processor system, which is close to the ideal speed up from a five-stage pipeline

    Automated Debugging Methodology for FPGA-based Systems

    Get PDF
    Electronic devices make up a vital part of our lives. These are seen from mobiles, laptops, computers, home automation, etc. to name a few. The modern designs constitute billions of transistors. However, with this evolution, ensuring that the devices fulfill the designer’s expectation under variable conditions has also become a great challenge. This requires a lot of design time and effort. Whenever an error is encountered, the process is re-started. Hence, it is desired to minimize the number of spins required to achieve an error-free product, as each spin results in loss of time and effort. Software-based simulation systems present the main technique to ensure the verification of the design before fabrication. However, few design errors (bugs) are likely to escape the simulation process. Such bugs subsequently appear during the post-silicon phase. Finding such bugs is time-consuming due to inherent invisibility of the hardware. Instead of software simulation of the design in the pre-silicon phase, post-silicon techniques permit the designers to verify the functionality through the physical implementations of the design. The main benefit of the methodology is that the implemented design in the post-silicon phase runs many order-of-magnitude faster than its counterpart in pre-silicon. This allows the designers to validate their design more exhaustively. This thesis presents five main contributions to enable a fast and automated debugging solution for reconfigurable hardware. During the research work, we used an obstacle avoidance system for robotic vehicles as a use case to illustrate how to apply the proposed debugging solution in practical environments. The first contribution presents a debugging system capable of providing a lossless trace of debugging data which permits a cycle-accurate replay. This methodology ensures capturing permanent as well as intermittent errors in the implemented design. The contribution also describes a solution to enhance hardware observability. It is proposed to utilize processor-configurable concentration networks, employ debug data compression to transmit the data more efficiently, and partially reconfiguring the debugging system at run-time to save the time required for design re-compilation as well as preserve the timing closure. The second contribution presents a solution for communication-centric designs. Furthermore, solutions for designs with multi-clock domains are also discussed. The third contribution presents a priority-based signal selection methodology to identify the signals which can be more helpful during the debugging process. A connectivity generation tool is also presented which can map the identified signals to the debugging system. The fourth contribution presents an automated error detection solution which can help in capturing the permanent as well as intermittent errors without continuous monitoring of debugging data. The proposed solution works for designs even in the absence of golden reference. The fifth contribution proposes to use artificial intelligence for post-silicon debugging. We presented a novel idea of using a recurrent neural network for debugging when a golden reference is present for training the network. Furthermore, the idea was also extended to designs where golden reference is not present
    • …
    corecore