120 research outputs found
Recommended from our members
Deterministic, Mutable, and Distributed Record-Replay for Operating Systems and Database Systems
Application record and replay is the ability to record application execution and replay it at a later time. Record-replay has many use cases including diagnosing and debugging applications by capturing and reproducing hard to find bugs, providing transparent application fault tolerance by maintaining a live replica of a running program, and offline instrumentation that would be too costly to run in a production environment. Different record-replay systems may offer different levels of replay faithfulness, the strongest level being deterministic replay which guarantees an identical reenactment of the original execution. Such a guarantee requires capturing all sources of nondeterminism during the recording phase. In the general case, such record-replay systems can dramatically hinder application performance, rendering them unpractical in certain application domains. Furthermore, various use cases are incompatible with strictly replaying the original execution. For example, in a primary-secondary database scenario, the secondary database would be unable to serve additional traffic while being replicated. No record-replay system fit all use cases.
This dissertation shows how to make deterministic record-replay fast and efficient, how broadening replay semantics can enable powerful new use cases, and how choosing the right level of abstraction for record-replay can support distributed and heterogeneous database replication with little effort.
We explore four record-replay systems with different semantics enabling different use cases. We first present Scribe, an OS-level deterministic record-replay mechanism that support multi-process applications on multi-core systems. One of the main challenge is to record the interaction of threads running on different CPU cores in an efficient manner. Scribe introduces two new lightweight OS mechanisms, rendezvous point and sync points, to efficiently record nondeterministic interactions such as related system calls, signals, and shared memory accesses. Scribe allows the capture and replication of hard to find bugs to facilitate debugging and serves as a solid foundation for our two following systems.
We then present RacePro, a process race detection system to improve software correctness. Process races occur when multiple processes access shared operating system resources, such as files, without proper synchronization. Detecting process races is difficult due to the elusive nature of these bugs, and the heterogeneity of frameworks involved in such bugs. RacePro is the first tool to detect such process races. RacePro records application executions in deployed systems, allowing offline race detection by analyzing the previously recorded log. RacePro then replays the application execution and forces the manifestation of detected races to check their effect on the application. Upon failure, RacePro reports potentially harmful races to developers.
Third, we present Dora, a mutable record-replay system which allows a recorded execution of an application to be replayed with a modified version of the application. Mutable record-replay provides a number of benefits for reproducing, diagnosing, and fixing software bugs. Given a recording and a modified application, finding a mutable replay is challenging, and undecidable in the general case. Despite the difficulty of the problem, we show a very simple but effective algorithm to search for suitable replays.
Lastly, we present Synapse, a heterogeneous database replication system designed for Web applications. Web applications are increasingly built using a service-oriented architecture that integrates services powered by a variety of databases. Often, the same data, needed by multiple services, must be replicated across different databases and kept in sync. Unfortunately, these databases use vendor specific data replication engines which are not compatible with each other. To solve this challenge, Synapse operates at the application level to access a unified data representation through object relational mappers. Additionally, Synapse leverages application semantics to replicate data with good consistency semantics using mechanisms similar to Scribe
Agent-Based Modeling: The Right Mathematics for the Social Sciences?
This study provides a basic introduction to agent-based modeling (ABM) as a powerful blend of classical and constructive mathematics, with a primary focus on its applicability for social science research.� The typical goals of ABM social science researchers are discussed along with the culture-dish nature of their computer experiments. The applicability of ABM for science more generally is also considered, with special attention to physics. Finally, two distinct types of ABM applications are summarized in order to illustrate concretely the duality of ABM: Real-world systems can not only be simulated with verisimilitude using ABM; they can also be efficiently and robustly designed and constructed on the basis of ABM principles. �
AI and OR in management of operations: history and trends
The last decade has seen a considerable growth in the use of Artificial Intelligence (AI) for operations management with the aim of finding solutions to problems that are increasing in complexity and scale. This paper begins by setting the context for the survey through a historical perspective of OR and AI. An extensive survey of applications of AI techniques for operations management, covering a total of over 1200 papers published from 1995 to 2004 is then presented. The survey utilizes Elsevier's ScienceDirect database as a source. Hence, the survey may not cover all the relevant journals but includes a sufficiently wide range of publications to make it representative of the research in the field. The papers are categorized into four areas of operations management: (a) design, (b) scheduling, (c) process planning and control and (d) quality, maintenance and fault diagnosis. Each of the four areas is categorized in terms of the AI techniques used: genetic algorithms, case-based reasoning, knowledge-based systems, fuzzy logic and hybrid techniques. The trends over the last decade are identified, discussed with respect to expected trends and directions for future work suggested
Concurrent use of two programming tools for heterogeneous supercomputers
In this thesis, a demostration of the heterogeneous use of two programming paradigms for heterogeneous computing called Cluster-M and HAsC is presented. Both paradigms can efficiently support heterogeneous networks by preserving a level of abstraction which does not include any architecture mapping details. Furthermore, they are both machine independent and hence are scalable. Unlike, almost all existing heterogeneous orchestration tools which are MIMD based, HAsC is based on the fundamental concepts of SIMD associative computing. HAsC models a heterogeneous network as a coarse grained associative computer and is designed to optimize the execution of problems with large ratios of computations to instructions. Ease of programming and execution speed, not the utilization of idle resources are the primary goals of HAsC On the other hand, Cluster-M is a generic technique that can be applied to both coarse grained as well as fine grained networks. Cluster-M provides an environment for porting various tasks onto the machines in a heterogeneous suite such that resources utilization is maximized and the overall execution time is minimized. An illustration of how these two paradigms can be used together to provide an efficient medium for heterogeneous programming is included. Finally, their scalability is discussed
Overlapping of Communication and Computation and Early Binding: Fundamental Mechanisms for Improving Parallel Performance on Clusters of Workstations
This study considers software techniques for improving performance on clusters of workstations and approaches for designing message-passing middleware that facilitate scalable, parallel processing. Early binding and overlapping of communication and computation are identified as fundamental approaches for improving parallel performance and scalability on clusters. Currently, cluster computers using the Message-Passing Interface for interprocess communication are the predominant choice for building high-performance computing facilities, which makes the findings of this work relevant to a wide audience from the areas of high-performance computing and parallel processing. The performance-enhancing techniques studied in this work are presently underutilized in practice because of the lack of adequate support by existing message-passing libraries and are also rarely considered by parallel algorithm designers. Furthermore, commonly accepted methods for performance analysis and evaluation of parallel systems omit these techniques and focus primarily on more obvious communication characteristics such as latency and bandwidth. This study provides a theoretical framework for describing early binding and overlapping of communication and computation in models for parallel programming. This framework defines four new performance metrics that facilitate new approaches for performance analysis of parallel systems and algorithms. This dissertation provides experimental data that validate the correctness and accuracy of the performance analysis based on the new framework. The theoretical results of this performance analysis can be used by designers of parallel system and application software for assessing the quality of their implementations and for predicting the effective performance benefits of early binding and overlapping. This work presents MPI/Pro, a new MPI implementation that is specifically optimized for clusters of workstations interconnected with high-speed networks. This MPI implementation emphasizes features such as persistent communication, asynchronous processing, low processor overhead, and independent message progress. These features are identified as critical for delivering maximum performance to applications. The experimental section of this dissertation demonstrates the capability of MPI/Pro to facilitate software techniques that result in significant application performance improvements. Specific demonstrations with Virtual Interface Architecture and TCP/IP over Ethernet are offered
NASA high performance computing and communications program
The National Aeronautics and Space Administration's HPCC program is part of a new Presidential initiative aimed at producing a 1000-fold increase in supercomputing speed and a 100-fold improvement in available communications capability by 1997. As more advanced technologies are developed under the HPCC program, they will be used to solve NASA's 'Grand Challenge' problems, which include improving the design and simulation of advanced aerospace vehicles, allowing people at remote locations to communicate more effectively and share information, increasing scientist's abilities to model the Earth's climate and forecast global environmental trends, and improving the development of advanced spacecraft. NASA's HPCC program is organized into three projects which are unique to the agency's mission: the Computational Aerosciences (CAS) project, the Earth and Space Sciences (ESS) project, and the Remote Exploration and Experimentation (REE) project. An additional project, the Basic Research and Human Resources (BRHR) project exists to promote long term research in computer science and engineering and to increase the pool of trained personnel in a variety of scientific disciplines. This document presents an overview of the objectives and organization of these projects as well as summaries of individual research and development programs within each project
Agent-based modeling: the right mathematics for the social sciences?
This study provides a basic introduction to agent-based modeling (ABM) as a powerful blend of classical and constructive mathematics, with a primary focus on its applicability for social science research. The typical goals of ABM social science researchers are discussed along with the culture-dish nature of their computer experiments. The applicability of ABM for science more generally is also considered, with special attention to physics. Finally, two distinct types of ABM applications are summarized in order to illustrate concretely the duality of ABM: Real-world systems can not only be simulated with verisimilitude using ABM; they can also be efficiently and robustly designed and constructed on the basis of ABM principles
NASA space station automation: AI-based technology review
Research and Development projects in automation for the Space Station are discussed. Artificial Intelligence (AI) based automation technologies are planned to enhance crew safety through reduced need for EVA, increase crew productivity through the reduction of routine operations, increase space station autonomy, and augment space station capability through the use of teleoperation and robotics. AI technology will also be developed for the servicing of satellites at the Space Station, system monitoring and diagnosis, space manufacturing, and the assembly of large space structures
- …