140 research outputs found

    Failure detection and repair of threads in CTAS

    Get PDF
    Thesis (M. Eng. and S.B.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 73).Reliable, error-free software is hard to come by, and this is especially true for newer, larger, or more complex programs. CTAS, an air traffic control tool, falls into this category, making it a good candidate for research on error compensation. Specifically, this thesis addresses the issue of thread crashes in one portion of CTAS. We reimplement the thread structure in question around a simpler problem, and develop a failure detector and an accompanying repair mechanism to monitor it. These add-on components provide the application with thread consistency by swiftly and transparently recovering from crashes, thereby yielding a more stable, self-sufficient, and generally more reliable operating environment.by Farid Jahanmir.M.Eng.and S.B

    How do programs become more concurrent? A story of program transformations

    Get PDF
    For several decades, programmers have relied onMooreâ s Law to improve the performance of their softwareapplications. From now on, programmers need to programthe multi-cores if they want to deliver efficient code. Inthe multi-core era, a major maintenance task will be tomake sequential programs more concurrent. What are themost common transformations to retrofit concurrency intosequential programs?We studied the source code of 5 open-source Javaprojects. We analyzed qualitatively and quantitatively thechange patterns that developers have used in order toretrofit concurrency. We found that these transformationsbelong to four categories: transformations that improve thelatency, the throughput, the scalability, or correctness of theapplications. In addition, we report on our experience ofparallelizing one of our own programs. Our findings caneducate software developers on how to parallelize sequentialprograms, and can provide hints for tool vendors aboutwhat transformations are worth automating

    Byzantine Fault Tolerance for Distributed Systems

    Get PDF
    The growing reliance on online services imposes a high dependability requirement on the computer systems that provide these services. Byzantine fault tolerance (BFT) is a promising technology to solidify such systems for the much needed high dependability. BFT employs redundant copies of the servers and ensures that a replicated system continues providing correct services despite the attacks on a small portion of the system. In this dissertation research, I developed novel algorithms and mechanisms to control various types of application nondeterminism and to ensure the long-term reliability of BFT systems via a migration-based proactive recovery scheme. I also investigated a new approach to significantly improve the overall system throughput by enabling concurrent processing using Software Transactional Memory (STM). Controlling application nondeterminism is essential to achieve strong replica consistency because the BFT technology is based on state-machine replication, which requires deterministic operation of each replica. Proactive recovery is necessary to ensure that the fundamental assumption of using the BFT technology is not violated over long term, i.e., less than one-third of replicas remain correct. Without proactive recovery, more and more replicas will be compromised under continuously attacks, which would render BFT ineffective. STM based concurrent processing maximized the system throughput by utilizing the power of multi-core CPUs while preserving strong replication consistenc

    Byzantine Fault Tolerance for Distributed Systems

    Get PDF
    The growing reliance on online services imposes a high dependability requirement on the computer systems that provide these services. Byzantine fault tolerance (BFT) is a promising technology to solidify such systems for the much needed high dependability. BFT employs redundant copies of the servers and ensures that a replicated system continues providing correct services despite the attacks on a small portion of the system. In this dissertation research, I developed novel algorithms and mechanisms to control various types of application nondeterminism and to ensure the long-term reliability of BFT systems via a migration-based proactive recovery scheme. I also investigated a new approach to significantly improve the overall system throughput by enabling concurrent processing using Software Transactional Memory (STM). Controlling application nondeterminism is essential to achieve strong replica consistency because the BFT technology is based on state-machine replication, which requires deterministic operation of each replica. Proactive recovery is necessary to ensure that the fundamental assumption of using the BFT technology is not violated over long term, i.e., less than one-third of replicas remain correct. Without proactive recovery, more and more replicas will be compromised under continuously attacks, which would render BFT ineffective. STM based concurrent processing maximized the system throughput by utilizing the power of multi-core CPUs while preserving strong replication consistenc

    Unified Behavior Framework for Reactive Robot Control in Real-Time Systems

    Get PDF
    Endeavors in mobile robotics focus on developing autonomous vehicles that operate in dynamic and uncertain environments. By reducing the need for human-in- the-loop control, unmanned vehicles are utilized to achieve tasks considered dull or dangerous by humans. Because unexpected latency can adversely affect the quality of an autonomous system\u27s operations, which in turn can affect lives and property in the real-world, their ability to detect and handle external events is paramount to providing safe and dependable operation. Behavior-based systems form the basis of autonomous control for many robots. This thesis presents the unified behavior framework, a new and novel approach which incorporates the critical ideas and concepts of the existing reactive controllers in an effort to simplify development without locking the system developer into using any single behavior system. The modular design of the framework is based on modern software engineering principles and only specifies a functional interface for components, leaving the implementation details to the developers. In addition to its use of industry standard techniques in the design of reactive controllers, the unified behavior framework guarantees the responsiveness of routines that are critical to the vehicle\u27s safe operation by allowing individual behaviors to be scheduled by a real-time process controller. The experiments in this thesis demonstrate the ability of the framework to: 1) interchange behavioral components during execution to generate various global behavior attributes; 2) apply genetic programming techniques to automate the discovery of effective structures for a domain that are up to 122 percent better than those crafted by an expert; and 3) leverage real-time scheduling technologies to guarantee the responsiveness of time critical routines regardless of the system\u27s computational load

    A Survey of Operating Systems Infrastructure for Embedded Systems

    Get PDF
    Since early applications in the 1960s, embedded systems have come down in price and there has been a dramatic rise in processing power and functionality. In addition, embedded systems are becoming increasingly complex. High-end devices, such as mobile phones, PDAs, entertainment devices, and set-top boxes, feature millions of lines of code with varying degrees of assurance of correctness. Nowadays, more and more embedded systems are implemented in a distributed way, a wide range of high-performance distributed embedded systems have been designed and deployed. As a lot of aspects of embedded system design become increasingly dependent on the effective interaction of distributed processors, it is clear that as much effort needs to be focused on software infrastructure, such as operating systems, with respect to how to provide functionality in order to fulfill these requirements. This technical report presents some of the approaches associated to operating systems that have been used in order to fulfill these needs.CAPES/MEC - Brasil, Project BEX3342/08-

    Dependable Computing on Inexact Hardware through Anomaly Detection.

    Full text link
    Reliability of transistors is on the decline as transistors continue to shrink in size. Aggressive voltage scaling is making the problem even worse. Scaled-down transistors are more susceptible to transient faults as well as permanent in-field hardware failures. In order to continue to reap the benefits of technology scaling, it has become imperative to tackle the challenges risen due to the decreasing reliability of devices for the mainstream commodity market. Along with the worsening reliability, achieving energy efficiency and performance improvement by scaling is increasingly providing diminishing marginal returns. More than any other time in history, the semiconductor industry faces the crossroad of unreliability and the need to improve energy efficiency. These challenges of technology scaling can be tackled by categorizing the target applications in the following two categories: traditional applications that have relatively strict correctness requirement on outputs and emerging class of soft applications, from various domains such as multimedia, machine learning, and computer vision, that are inherently inaccuracy tolerant to a certain degree. Traditional applications can be protected against hardware failures by low-cost detection and protection methods while soft applications can trade off quality of outputs to achieve better performance or energy efficiency. For traditional applications, I propose an efficient, software-only application analysis and transformation solution to detect data and control flow transient faults. The intelligence of the data flow solution lies in the use of dynamic application information such as control flow, memory and value profiling. The control flow protection technique achieves its efficiency by simplifying signature calculations in each basic block and by performing checking at a coarse-grain level. For soft applications, I develop a quality control technique. The quality control technique employs continuous, light-weight checkers to ensure that the approximation is controlled and application output is acceptable. Overall, I show that the use of low-cost checkers to produce dependable results on commodity systems---constructed from inexact hardware components---is efficient and practical.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113341/1/dskhudia_1.pd
    corecore