105 research outputs found

    Protocol-directed trace signal selection for post-silicon validation

    Get PDF
    Due to the increasing complexity of modern digital designs using NoC (network-on-chip) communication, post-silicon validation has become an arduous task that consumes much of the development time of the product. The process of finding the root cause of bugs during post-silicon validation is very difficult because of the lack of observability of all signals on the chip. To increase observability for post-silicon validation, an effective silicon debug technique is to use an on-chip trace buffer to monitor and capture the circuit response of certain selected signals during its post-silicon operation. However, because of area limitations for debug structures on chip and routing concerns, the signals that are selected to be traced are a very small subset of all available signals. Traditionally, these trace signals were chosen manually by system designers who determined what signals may be needed for debug once the design reaches post-silicon. However, because modern digital designs have become very complex with many concurrent processes, this method is no longer reliable. Recent work has concentrated on automating the selection of low-level signals from a gate-level analysis. But none of them has ever been able to interpret the trace signals as high-level meaningful debugging information. In this work, we present an automated protocol-directed trace selection where the guiding force is the set of system-level protocols. We use a probabilistic formulation to select messages for tracing and then further analyze these solutions. This method produces traces that allow a debugger to observe when behavior has deviated from the correct path of execution and localize this incorrect behavior for further analysis. Most importantly, unlike the previous gate-level analysis based methods, this method can be applied during the chip design phase when most of the debug features are also designed. In addition, this method drastically reduces the time needed to select signals, as we automate a currently manual process

    Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

    Full text link
    Modern computer chips comprise many components, including microprocessor cores, memory modules, on-chip networks, and accelerators. Such system-on-chip (SoC) designs are deployed in a variety of computing devices: from internet-of-things, to smartphones, to personal computers, to data centers. In this dissertation, we discuss evasive errors in SoC designs and how these errors can be addressed efficiently. In particular, we focus on two types of errors: design bugs and permanent faults. Design bugs originate from the limited amount of time allowed for design verification and validation. Thus, they are often found in functional features that are rarely activated. Complete functional verification, which can eliminate design bugs, is extremely time-consuming, thus impractical in modern complex SoC designs. Permanent faults are caused by failures of fragile transistors in nano-scale semiconductor manufacturing processes. Indeed, weak transistors may wear out unexpectedly within the lifespan of the design. Hardware structures that reduce the occurrence of permanent faults incur significant silicon area or performance overheads, thus they are infeasible for most cost-sensitive SoC designs. To tackle and overcome these evasive errors efficiently, we propose to leverage the principle of decomposition to lower the complexity of the software analysis or the hardware structures involved. To this end, we present several decomposition techniques, specific to major SoC components. We first focus on microprocessor cores, by presenting a lightweight bug-masking analysis that decomposes a program into individual instructions to identify if a design bug would be masked by the program's execution. We then move to memory subsystems: there, we offer an efficient memory consistency testing framework to detect buggy memory-ordering behaviors, which decomposes the memory-ordering graph into small components based on incremental differences. We also propose a microarchitectural patching solution for memory subsystem bugs, which augments each core node with a small distributed programmable logic, instead of including a global patching module. In the context of on-chip networks, we propose two routing reconfiguration algorithms that bypass faulty network resources. The first computes short-term routes in a distributed fashion, localized to the fault region. The second decomposes application-aware routing computation into simple routing rules so to quickly find deadlock-free, application-optimized routes in a fault-ridden network. Finally, we consider general accelerator modules in SoC designs. When a system includes many accelerators, there are a variety of interactions among them that must be verified to catch buggy interactions. To this end, we decompose such inter-module communication into basic interaction elements, which can be reassembled into new, interesting tests. Overall, we show that the decomposition of complex software algorithms and hardware structures can significantly reduce overheads: up to three orders of magnitude in the bug-masking analysis and the application-aware routing, approximately 50 times in the routing reconfiguration latency, and 5 times on average in the memory-ordering graph checking. These overhead reductions come with losses in error coverage: 23% undetected bug-masking incidents, 39% non-patchable memory bugs, and occasionally we overlook rare patterns of multiple faults. In this dissertation, we discuss the ideas and their trade-offs, and present future research directions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147637/1/doowon_1.pd

    Detection, diagnosis and modeling of ESD-induced soft failures - a gate-level and mixed-signal approach

    Get PDF
    Electronic systems are an indispensable part of people's lives today. However, the reliability of electronic systems can be threatened by external stimuli such as Electrostatic Discharges (ESDs). ESDs can either physically damage an electronic system or let it malfunction without damaging it. Therefore, a lot of design work and qualification testings are needed by manufacturers to improve the robustness against the negative effects of ESDs. The trial-and-error based solution implementation has incurred huge costs to companies in terms of labor and time. Despite the ever-increasing effort being devoted to solving ESD-related problems, cases of field returns still happen, and a significant portion can be attributed to soft failure induced by system-level ESD. Despite that, the ESD-induced permanent failures are well-studied and protection mechanisms have proven to work, the studies on ESD-induced soft failures are all on the physical and transistor level. In this thesis, we studied ESD-induced soft failures by first conducting case studies of injecting ESDs into physical devices and observing the application level symptoms of the failures, and then performing simulation-based ESD injections on a well-known instruction-set-architecture. For the first time, we correlated the physical level ESD event to high-level system behavior. We implemented a mixed-signal-simulation-based fault injection environment and device models to allow ESDs to be injected to target systems. By injecting different types of ESDs into the target system, we, for the first time, identified gate-level bit-flip patterns from a SPICE level high-voltage event. Our experimental results show that the extent of register value corruption can be single-bit or widespread, and the bit flips manifested can affect the system in multiple ways. We also demonstrated low-cost protection measures for some of the failures resulted

    Ancient and historical systems

    Get PDF

    Faculty Publications & Presentations, 2006-2007

    Get PDF

    Understanding Quantum Technologies 2022

    Full text link
    Understanding Quantum Technologies 2022 is a creative-commons ebook that provides a unique 360 degrees overview of quantum technologies from science and technology to geopolitical and societal issues. It covers quantum physics history, quantum physics 101, gate-based quantum computing, quantum computing engineering (including quantum error corrections and quantum computing energetics), quantum computing hardware (all qubit types, including quantum annealing and quantum simulation paradigms, history, science, research, implementation and vendors), quantum enabling technologies (cryogenics, control electronics, photonics, components fabs, raw materials), quantum computing algorithms, software development tools and use cases, unconventional computing (potential alternatives to quantum and classical computing), quantum telecommunications and cryptography, quantum sensing, quantum technologies around the world, quantum technologies societal impact and even quantum fake sciences. The main audience are computer science engineers, developers and IT specialists as well as quantum scientists and students who want to acquire a global view of how quantum technologies work, and particularly quantum computing. This version is an extensive update to the 2021 edition published in October 2021.Comment: 1132 pages, 920 figures, Letter forma

    Approaches to Building a Quantum Computer Based on Semiconductors

    Get PDF
    Throughout this Ph.D., the quest to build a quantum computer has accelerated, driven by ever-improving fidelities of conventional qubits and the development of new technologies that promise topologically protected qubits with the potential for lifetimes that exceed those of comparable conventional qubits. As such, there has been an explosion of interest in the design of an architecture for a quantum computer. This design would have to include high-quality qubits at the bottom of the stack, be extensible, and allow the layout of many qubits with scalable methods for readout and control of the entire device. Furthermore, the whole experimental infrastructure must handle the requirements for parallel operation of many qubits in the system. Hence the crux of this thesis: to design an architecture for a semiconductor-based quantum computer that encompasses all the elements that would be required to build a large scale quantum machine, and investigate the individual these elements at each layer of this stack, from qubit to readout to control
    • …
    corecore