2,698 research outputs found

    Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

    Full text link
    Modern computer chips comprise many components, including microprocessor cores, memory modules, on-chip networks, and accelerators. Such system-on-chip (SoC) designs are deployed in a variety of computing devices: from internet-of-things, to smartphones, to personal computers, to data centers. In this dissertation, we discuss evasive errors in SoC designs and how these errors can be addressed efficiently. In particular, we focus on two types of errors: design bugs and permanent faults. Design bugs originate from the limited amount of time allowed for design verification and validation. Thus, they are often found in functional features that are rarely activated. Complete functional verification, which can eliminate design bugs, is extremely time-consuming, thus impractical in modern complex SoC designs. Permanent faults are caused by failures of fragile transistors in nano-scale semiconductor manufacturing processes. Indeed, weak transistors may wear out unexpectedly within the lifespan of the design. Hardware structures that reduce the occurrence of permanent faults incur significant silicon area or performance overheads, thus they are infeasible for most cost-sensitive SoC designs. To tackle and overcome these evasive errors efficiently, we propose to leverage the principle of decomposition to lower the complexity of the software analysis or the hardware structures involved. To this end, we present several decomposition techniques, specific to major SoC components. We first focus on microprocessor cores, by presenting a lightweight bug-masking analysis that decomposes a program into individual instructions to identify if a design bug would be masked by the program's execution. We then move to memory subsystems: there, we offer an efficient memory consistency testing framework to detect buggy memory-ordering behaviors, which decomposes the memory-ordering graph into small components based on incremental differences. We also propose a microarchitectural patching solution for memory subsystem bugs, which augments each core node with a small distributed programmable logic, instead of including a global patching module. In the context of on-chip networks, we propose two routing reconfiguration algorithms that bypass faulty network resources. The first computes short-term routes in a distributed fashion, localized to the fault region. The second decomposes application-aware routing computation into simple routing rules so to quickly find deadlock-free, application-optimized routes in a fault-ridden network. Finally, we consider general accelerator modules in SoC designs. When a system includes many accelerators, there are a variety of interactions among them that must be verified to catch buggy interactions. To this end, we decompose such inter-module communication into basic interaction elements, which can be reassembled into new, interesting tests. Overall, we show that the decomposition of complex software algorithms and hardware structures can significantly reduce overheads: up to three orders of magnitude in the bug-masking analysis and the application-aware routing, approximately 50 times in the routing reconfiguration latency, and 5 times on average in the memory-ordering graph checking. These overhead reductions come with losses in error coverage: 23% undetected bug-masking incidents, 39% non-patchable memory bugs, and occasionally we overlook rare patterns of multiple faults. In this dissertation, we discuss the ideas and their trade-offs, and present future research directions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147637/1/doowon_1.pd

    A hierarchical distributed control model for coordinating intelligent systems

    Get PDF
    A hierarchical distributed control (HDC) model for coordinating cooperative problem-solving among intelligent systems is described. The model was implemented using SOCIAL, an innovative object-oriented tool for integrating heterogeneous, distributed software systems. SOCIAL embeds applications in 'wrapper' objects called Agents, which supply predefined capabilities for distributed communication, control, data specification, and translation. The HDC model is realized in SOCIAL as a 'Manager'Agent that coordinates interactions among application Agents. The HDC Manager: indexes the capabilities of application Agents; routes request messages to suitable server Agents; and stores results in a commonly accessible 'Bulletin-Board'. This centralized control model is illustrated in a fault diagnosis application for launch operations support of the Space Shuttle fleet at NASA, Kennedy Space Center

    Evolving Fault Tolerant Robotic Controllers

    Get PDF
    Fault tolerant control and evolutionary algorithms are two different research areas. However with the development of artificial intelligence, evolutionary algorithms have demonstrated competitive performance compared to traditional approaches for the optimisation task. For this reason, the combination of fault tolerant control and evolutionary algorithms has become a new research topic with the evolving of controllers so as to achieve different fault tolerant control schemes. However most of the controller evolution tasks are based on the optimisation of controller parameters so as to achieve the fault tolerant control, so structure optimisation based evolutionary algorithm approaches have not been investigated as the same level as parameter optimisation approaches. For this reason, this thesis investigates whether structure optimisation based evolutionary algorithm approaches could be implemented into a robot sensor fault tolerant control scheme based on the phototaxis task in addition to just parameter optimisation, and explores whether controller structure optimisation could demonstrate potential benefit in a greater degree than just controller parameter optimisation. This thesis presents a new multi-objective optimisation algorithm in the structure optimisation level called Multi-objective Cartesian Genetic Programming, which is created based on Cartesian Genetic Programming and Non-dominated Sorting Genetic Algorithm 2, in terms of NeuroEvolution based robotic controller optimisation. In order to solve two main problems during the algorithm development, this thesis investigates the benefit of genetic redundancy as well as preserving neutral genetic drift in order to solve the random neighbour pick problem during crowding fill for survival selection and investigates how hyper-volume indicator is employed to measure the multi-objective optimisation algorithm performance in order to assess the convergence for Multi-objective Cartesian Genetic Programming. Furthermore, this thesis compares Multi-objective Cartesian Genetic Programming with Non-dominated Sorting Genetic Algorithm 2 for their evolution performance and investigates how Multi-objective Cartesian Genetic Programming could be performing for a more difficult fault tolerant control scenario besides the basic one, which further demonstrates the benefit of utilising structure optimisation based evolutionary algorithm approach for robotic fault tolerant control

    Co-evolutionary automatic programming for software development

    Get PDF
    AbstractSince the 1970s the goal of generating programs in an automatic way (i.e., Automatic Programming) has been sought. A user would just define what he expects from the program (i.e., the requirements), and it should be automatically generated by the computer without the help of any programmer. Unfortunately, this task is much harder than expected. Although transformation methods are usually employed to address this problem, they cannot be employed if the gap between the specification and the actual implementation is too wide. In this paper we introduce a novel conceptual framework for evolving programs from their specification. We use genetic programming to evolve the programs, and at the same time we exploit the specification to co-evolve sets of unit tests. Programs are rewarded by how many tests they do not fail, whereas the unit tests are rewarded by how many programs they make to fail. We present and analyse seven different problems on which this novel technique is successfully applied

    Building safer robots: Safety driven control

    Get PDF
    In recent years there has been a concerted effort to address many of the safety issues associated with physical human-robot interaction (pHRI). However, a number of challenges remain. For personal robots, and those intended to operate in unstructured environments, the problem of safety is compounded. In this paper we argue that traditional system design techniques fail to capture the complexities associated with dynamic environments. We present an overview of our safety-driven control system and its implementation methodology. The methodology builds on traditional functional hazard analysis, with the addition of processes aimed at improving the safety of autonomous personal robots. This will be achieved with the use of a safety system developed during the hazard analysis stage. This safety system, called the safety protection system, will initially be used to verify that safety constraints, identified during hazard analysis, have been implemented appropriately. Subsequently it will serve as a high-level safety enforcer, by governing the actions of the robot and preventing the control layer from performing unsafe operations. To demonstrate the effectiveness of the design, a series of experiments have been conducted using a MobileRobots PeopleBot. Finally, results are presented demonstrating how faults injected into a controller can be consistently identified and handled by the safety protection system. © The Author(s) 2012

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    Big Data Testing Techniques: Taxonomy, Challenges and Future Trends

    Full text link
    Big Data is reforming many industrial domains by providing decision support through analyzing large data volumes. Big Data testing aims to ensure that Big Data systems run smoothly and error-free while maintaining the performance and quality of data. However, because of the diversity and complexity of data, testing Big Data is challenging. Though numerous research efforts deal with Big Data testing, a comprehensive review to address testing techniques and challenges of Big Data is not available as yet. Therefore, we have systematically reviewed the Big Data testing techniques evidence occurring in the period 2010-2021. This paper discusses testing data processing by highlighting the techniques used in every processing phase. Furthermore, we discuss the challenges and future directions. Our findings show that diverse functional, non-functional and combined (functional and non-functional) testing techniques have been used to solve specific problems related to Big Data. At the same time, most of the testing challenges have been faced during the MapReduce validation phase. In addition, the combinatorial testing technique is one of the most applied techniques in combination with other techniques (i.e., random testing, mutation testing, input space partitioning and equivalence testing) to find various functional faults through Big Data testing.Comment: 32 page
    • …
    corecore