394 research outputs found

    Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

    Full text link
    Modern computer chips comprise many components, including microprocessor cores, memory modules, on-chip networks, and accelerators. Such system-on-chip (SoC) designs are deployed in a variety of computing devices: from internet-of-things, to smartphones, to personal computers, to data centers. In this dissertation, we discuss evasive errors in SoC designs and how these errors can be addressed efficiently. In particular, we focus on two types of errors: design bugs and permanent faults. Design bugs originate from the limited amount of time allowed for design verification and validation. Thus, they are often found in functional features that are rarely activated. Complete functional verification, which can eliminate design bugs, is extremely time-consuming, thus impractical in modern complex SoC designs. Permanent faults are caused by failures of fragile transistors in nano-scale semiconductor manufacturing processes. Indeed, weak transistors may wear out unexpectedly within the lifespan of the design. Hardware structures that reduce the occurrence of permanent faults incur significant silicon area or performance overheads, thus they are infeasible for most cost-sensitive SoC designs. To tackle and overcome these evasive errors efficiently, we propose to leverage the principle of decomposition to lower the complexity of the software analysis or the hardware structures involved. To this end, we present several decomposition techniques, specific to major SoC components. We first focus on microprocessor cores, by presenting a lightweight bug-masking analysis that decomposes a program into individual instructions to identify if a design bug would be masked by the program's execution. We then move to memory subsystems: there, we offer an efficient memory consistency testing framework to detect buggy memory-ordering behaviors, which decomposes the memory-ordering graph into small components based on incremental differences. We also propose a microarchitectural patching solution for memory subsystem bugs, which augments each core node with a small distributed programmable logic, instead of including a global patching module. In the context of on-chip networks, we propose two routing reconfiguration algorithms that bypass faulty network resources. The first computes short-term routes in a distributed fashion, localized to the fault region. The second decomposes application-aware routing computation into simple routing rules so to quickly find deadlock-free, application-optimized routes in a fault-ridden network. Finally, we consider general accelerator modules in SoC designs. When a system includes many accelerators, there are a variety of interactions among them that must be verified to catch buggy interactions. To this end, we decompose such inter-module communication into basic interaction elements, which can be reassembled into new, interesting tests. Overall, we show that the decomposition of complex software algorithms and hardware structures can significantly reduce overheads: up to three orders of magnitude in the bug-masking analysis and the application-aware routing, approximately 50 times in the routing reconfiguration latency, and 5 times on average in the memory-ordering graph checking. These overhead reductions come with losses in error coverage: 23% undetected bug-masking incidents, 39% non-patchable memory bugs, and occasionally we overlook rare patterns of multiple faults. In this dissertation, we discuss the ideas and their trade-offs, and present future research directions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147637/1/doowon_1.pd

    Transactional Support in MapReduce for Speculative Parallelism

    Get PDF

    The parallel event loop model and runtime: a parallel programming model and runtime system for safe event-based parallel programming

    Get PDF
    Recent trends in programming models for server-side development have shown an increasing popularity of event-based single- threaded programming models based on the combination of dynamic languages such as JavaScript and event-based runtime systems for asynchronous I/O management such as Node.JS. Reasons for the success of such models are the simplicity of the single-threaded event-based programming model as well as the growing popularity of the Cloud as a deployment platform for Web applications. Unfortunately, the popularity of single-threaded models comes at the price of performance and scalability, as single-threaded event-based models present limitations when parallel processing is needed, and traditional approaches to concurrency such as threads and locks don't play well with event-based systems. This dissertation proposes a programming model and a runtime system to overcome such limitations by enabling single-threaded event-based applications with support for speculative parallel execution. The model, called Parallel Event Loop, has the goal of bringing parallel execution to the domain of single-threaded event-based programming without relaxing the main characteristics of the single-threaded model, and therefore providing developers with the impression of a safe, single-threaded, runtime. Rather than supporting only pure single-threaded programming, however, the parallel event loop can also be used to derive safe, high-level, parallel programming models characterized by a strong compatibility with single-threaded runtimes. We describe three distinct implementations of speculative runtimes enabling the parallel execution of event-based applications. The first implementation we describe is a pessimistic runtime system based on locks to implement speculative parallelization. The second and the third implementations are based on two distinct optimistic runtimes using software transactional memory. Each of the implementations supports the parallelization of applications written using an asynchronous single-threaded programming style, and each of them enables applications to benefit from parallel execution
    corecore