9 research outputs found

    Improving the design process of VLSI circuits by means of a hardware debugging system: UNSHADES-1 framework

    Get PDF
    Due to the increase in size and complexity of VLSI integrated circuits, new design tools are becoming needed. Telecommunications and Electronic Industry demand designs that integrate intensive digital signal processing blocks and complex control tasks. Rapid Prototyping techniques introduce a new stage into the design flow that overcome the drawbacks of simulation stage and shorten design times. Advanced FPGAs can host the design for its emulation and can run inserted into the final system. The benefits of their use go beyond the simple rapid prototyping approach, and are able to provide additional information and other useful tasks that will be presented in this paper

    Efficient hardware debugging using parameterized FPGA reconfiguration

    Get PDF
    Functional errors and bugs inadvertently introduced at the RTL stage of the design process are responsible for the largest fraction of silicon IC re-spins. Thus, comprehensive func- tional verification is the key to reduce development costs and to deliver a product in time. The increasing demands for verification led to an increase in FPGA-based tools that perform emulation. These tools can run at much higher operating frequencies and achieve higher coverage than simulation. However, an important pitfall of the FPGA tools is that they suffer from limited internal signal observability, as only a small and preselected set of signals is guided towards (embedded) trace buffers and observed. This paper proposes a dynamically reconfigurable network of multiplexers that significantly enhance the visibility of internal signals. It allows the designer to dynamically change the small set of internal signals to be observed, virtually enlarging the set of observed signals significantly. These multiplexers occupy minimal space, as they are implemented by the FPGA’s routing infrastructure

    Microprocessor and FPGA interfaces for in-system co-debugging in field programmable hybrid systems

    Get PDF
    Modern trends in technology require efficient control and processing platforms based on connected software-hardware subsystems. Due to their complexity and size, algorithms implemented on these platforms are difficult to test and verify. When these types of solution are being designed, it is necessary to provide information of the internal values of registers and memories of both the software and hardware during the execution of the complete system. The final architecture of the targeted design and its debugging capabilities strongly depends on how the hybrid system is connected and clocked. This article discusses different architectural strategies that have been adopted for a hybrid hardware-software platform, built ready for debugging, and that uses components that can be easily found with a few special features. All the solutions have been implemented and evaluated using the UNSHADES-2 framework

    Validation and verification of the interconnection of hardware intellectual property blocks for FPGA-based packet processing systems

    Get PDF
    As networks become more versatile, the computational requirement for supporting additional functionality increases. The increasing demands of these networks can be met by Field Programmable Gate Arrays (FPGA), which are an increasingly popular technology for implementing packet processing systems. The fine-grained parallelism and density of these devices can be exploited to meet the computational requirements and implement complex systems on a single chip. However, the increasing complexity of FPGA-based systems makes them susceptible to errors and difficult to test and debug. To tackle the complexity of modern designs, system-level languages have been developed to provide abstractions suited to the domain of the target system. Unfortunately, the lack of formality in these languages can give rise to errors that are not caught until late in the design cycle. This thesis presents three techniques for verifying and validating FPGA-based packet processing systems described in a system-level description language. First, a type system is applied to the system description language to detect errors before implementation. Second, system-level transaction monitoring is used to observe high-level events on-chip following implementation. Third, the high-level information embodied in the system description language is exploited to allow the system to be automatically instrumented for on-chip monitoring. This thesis demonstrates that these techniques catch errors which are undetected by traditional verification and validation tools. The locations of faults are specified and errors are caught earlier in the design flow, which saves time by reducing synthesis iterations

    Cycle-accurate modeling of multicore processors on FPGAs

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2013.Cataloged from PDF version of thesis.Includes bibliographical references (pages 169-176).We present a novel modeling methodology which enables the generation of a high-performance, cycle-accurate simulator from a cycle-level specification of the target design. We describe Arete, a full-system multicore processor simulator, developed using our modeling methodology. We provide details on Arete's resource-efficient and high-performance implementation on multiple FPGA platforms, and the architectural experiments performed using it. We present clear evidence that the use of simplified models in architectural studies can lead to wrong conclusions. Through two experiments performed using both cycle-accurate and simplified models, we show that on one hand there are substantial quantitative and qualitative differences in results, and on the other, the results match quite well.by Asif Imtiaz Khan.Ph.D

    Automated Debugging Methodology for FPGA-based Systems

    Get PDF
    Electronic devices make up a vital part of our lives. These are seen from mobiles, laptops, computers, home automation, etc. to name a few. The modern designs constitute billions of transistors. However, with this evolution, ensuring that the devices fulfill the designer’s expectation under variable conditions has also become a great challenge. This requires a lot of design time and effort. Whenever an error is encountered, the process is re-started. Hence, it is desired to minimize the number of spins required to achieve an error-free product, as each spin results in loss of time and effort. Software-based simulation systems present the main technique to ensure the verification of the design before fabrication. However, few design errors (bugs) are likely to escape the simulation process. Such bugs subsequently appear during the post-silicon phase. Finding such bugs is time-consuming due to inherent invisibility of the hardware. Instead of software simulation of the design in the pre-silicon phase, post-silicon techniques permit the designers to verify the functionality through the physical implementations of the design. The main benefit of the methodology is that the implemented design in the post-silicon phase runs many order-of-magnitude faster than its counterpart in pre-silicon. This allows the designers to validate their design more exhaustively. This thesis presents five main contributions to enable a fast and automated debugging solution for reconfigurable hardware. During the research work, we used an obstacle avoidance system for robotic vehicles as a use case to illustrate how to apply the proposed debugging solution in practical environments. The first contribution presents a debugging system capable of providing a lossless trace of debugging data which permits a cycle-accurate replay. This methodology ensures capturing permanent as well as intermittent errors in the implemented design. The contribution also describes a solution to enhance hardware observability. It is proposed to utilize processor-configurable concentration networks, employ debug data compression to transmit the data more efficiently, and partially reconfiguring the debugging system at run-time to save the time required for design re-compilation as well as preserve the timing closure. The second contribution presents a solution for communication-centric designs. Furthermore, solutions for designs with multi-clock domains are also discussed. The third contribution presents a priority-based signal selection methodology to identify the signals which can be more helpful during the debugging process. A connectivity generation tool is also presented which can map the identified signals to the debugging system. The fourth contribution presents an automated error detection solution which can help in capturing the permanent as well as intermittent errors without continuous monitoring of debugging data. The proposed solution works for designs even in the absence of golden reference. The fifth contribution proposes to use artificial intelligence for post-silicon debugging. We presented a novel idea of using a recurrent neural network for debugging when a golden reference is present for training the network. Furthermore, the idea was also extended to designs where golden reference is not present

    Correct synthesis and integration of compiler-generated function units

    Get PDF
    PhD ThesisComputer architectures can use custom logic in addition to general pur- pose processors to improve performance for a variety of applications. The use of custom logic allows greater parallelism for some algorithms. While conventional CPUs typically operate on words, ne-grained custom logic can improve e ciency for many bit level operations. The commodi ca- tion of eld programmable devices, particularly FPGAs, has improved the viability of using custom logic in an architecture. This thesis introduces an approach to reasoning about the correctness of compilers that generate custom logic that can be synthesized to provide hardware acceleration for a given application. Compiler intermediate representations (IRs) and transformations that are relevant to genera- tion of custom logic are presented. Architectures may vary in the way that custom logic is incorporated, and suitable abstractions are used in order that the results apply to compilation for a variety of the design parameters that are introduced by the use of custom logic

    Hardware Acceleration of Electronic Design Automation Algorithms

    Get PDF
    With the advances in very large scale integration (VLSI) technology, hardware is going parallel. Software, which was traditionally designed to execute on single core microprocessors, now faces the tough challenge of taking advantage of this parallelism, made available by the scaling of hardware. The work presented in this dissertation studies the acceleration of electronic design automation (EDA) software on several hardware platforms such as custom integrated circuits (ICs), field programmable gate arrays (FPGAs) and graphics processors. This dissertation concentrates on a subset of EDA algorithms which are heavily used in the VLSI design flow, and also have varying degrees of inherent parallelism in them. In particular, Boolean satisfiability, Monte Carlo based statistical static timing analysis, circuit simulation, fault simulation and fault table generation are explored. The architectural and performance tradeoffs of implementing the above applications on these alternative platforms (in comparison to their implementation on a single core microprocessor) are studied. In addition, this dissertation also presents an automated approach to accelerate uniprocessor code using a graphics processing unit (GPU). The key idea is to partition the software application into kernels in an automated fashion, such that multiple instances of these kernels, when executed in parallel on the GPU, can maximally benefit from the GPU?s hardware resources. The work presented in this dissertation demonstrates that several EDA algorithms can be successfully rearchitected to maximally harness their performance on alternative platforms such as custom designed ICs, FPGAs and graphic processors, and obtain speedups upto 800X. The approaches in this dissertation collectively aim to contribute towards enabling the computer aided design (CAD) community to accelerate EDA algorithms on arbitrary hardware platforms

    Implementation of an AMIDAR-based Java Processor

    Get PDF
    This thesis presents a Java processor based on the Adaptive Microinstruction Driven Architecture (AMIDAR). This processor is intended as a research platform for investigating adaptive processor architectures. Combined with a configurable accelerator, it is able to detect and speed up hot spots of arbitrary applications dynamically. In contrast to classical RISC processors, an AMIDAR-based processor consists of four main types of components: a token machine, functional units (FUs), a token distribution network and an FU interconnect structure. The token machine is a specialized functional unit and controls the other FUs by means of tokens. These tokens are delivered to the FUs over the token distribution network. The tokens inform the FUs about what to do with input data and where to send the results. Data is exchanged among the FUs over the FU interconnect structure. Based on the virtual machine architecture defined by the Java bytecode, a total of six FUs have been developed for the Java processor, namely a frame stack, a heap manager, a thread scheduler, a debugger, an integer ALU and a floating-point unit. Using these FUs, the processor can already execute the SPEC JVM98 benchmark suite properly. This indicates that it can be employed to run a broad variety of applications rather than embedded software only. Besides bytecode execution, several enhanced features have also been implemented in the processor to improve its performance and usability. First, the processor includes an object cache using a novel cache index generation scheme that provides a better average hit rate than the classical XOR-based scheme. Second, a hardware garbage collector has been integrated into the heap manager, which greatly reduces the overhead caused by the garbage collection process. Third, thread scheduling has been realized in hardware as well, which allows it to be performed concurrently with the running application. Furthermore, a complete debugging framework has been developed for the processor, which provides powerful debugging functionalities at both software and hardware levels
    corecore