216 research outputs found
Operating System Support for Redundant Multithreading
Failing hardware is a fact and trends in microprocessor design indicate that the fraction of hardware suffering from permanent and transient faults will continue to increase in future chip generations. Researchers proposed various solutions to this issue with different downsides: Specialized hardware components make hardware more expensive in production and consume additional energy at runtime. Fault-tolerant algorithms and libraries enforce specific programming models on the developer. Compiler-based fault tolerance requires the source code for all applications to be available for recompilation. In this thesis I present ASTEROID, an operating system architecture that integrates applications with different reliability needs.
ASTEROID is built on top of the L4/Fiasco.OC microkernel and extends the system with Romain, an operating system service that transparently replicates user applications. Romain supports single- and multi-threaded applications without requiring access to the application's source code. Romain replicates applications and their resources completely and thereby does not rely on hardware extensions, such as ECC-protected memory. In my thesis I describe how to efficiently implement replication as a form of redundant multithreading in software. I develop mechanisms to manage replica resources and to make multi-threaded programs behave deterministically for replication.
I furthermore present an approach to handle applications that use shared-memory channels with other programs. My evaluation shows that Romain provides 100% error detection and more than 99.6% error correction for single-bit flips in memory and general-purpose registers. At the same time, Romain's execution time overhead is below 14% for single-threaded applications running in triple-modular redundant mode. The last part of my thesis acknowledges that software-implemented fault tolerance methods often rely on the correct functioning of a certain set of hardware and software components, the Reliable Computing Base (RCB).
I introduce the concept of the RCB and discuss what constitutes the RCB of the ASTEROID system and other fault tolerance mechanisms. Thereafter I show three case studies that evaluate approaches to protecting RCB components and thereby aim to achieve a software stack that is fully protected against hardware errors
Flexible Scheduling in Middleware for Distributed rate-based real-time applications - Doctoral Dissertation, May 2002
Distributed rate-based real-time systems, such as process control and avionics mission computing systems, have traditionally been scheduled statically. Static scheduling provides assurance of schedulability prior to run-time overhead. However, static scheduling is brittle in the face of unanticipated overload, and treats invocation-to-invocation variations in resource requirements inflexibly. As a consequence, processing resources are often under-utilized in the average case, and the resulting systems are hard to adapt to meet new real-time processing requirements. Dynamic scheduling offers relief from the limitations of static scheduling. However, dynamic scheduling offers relief from the limitations of static scheduling. However, dynamic scheduling often has a high run-time cost because certain decisions are enforced on-line. Furthermore, under conditions of overload tasks can be scheduled dynamically that may never be dispatched, or that upon dispatch would miss their deadlines. We review the implications of these factors on rate-based distributed systems, and posits the necessity to combine static and dynamic approaches to exploit the strengths and compensate for the weakness of either approach in isolation. We present a general hybrid approach to real-time scheduling and dispatching in middleware, that can employ both static and dynamic components. This approach provides (1) feasibility assurance for the most critical tasks, (2) the ability to extend this assurance incrementally to operations in successively lower criticality equivalence classes, (3) the ability to trade off bounds on feasible utilization and dispatching over-head in cases where, for example, execution jitter is a factor or rates are not harmonically related, and (4) overall flexibility to make more optimal use of scarce computing resources and to enforce a wider range of application-specified execution requirements. This approach also meets additional constraints of an increasingly important class of rate-based systems, those with requirements for robust management of real-time performance in the face of rapidly and widely changing operating conditions. To support these requirements, we present a middleware framework that implements the hybrid scheduling and dispatching approach described above, and also provides support for (1) adaptive re-scheduling of operations at run-time and (2) reflective alternation among several scheduling strategies to improve real-time performance in the face of changing operating conditions. Adaptive re-scheduling must be performed whenever operating conditions exceed the ability of the scheduling and dispatching infrastructure to meet the critical real-time requirements of the system under the currently specified rates and execution times of operations. Adaptive re-scheduling relies on the ability to change the rates of execution of at least some operations, and may occur under the control of a higher-level middleware resource manager. Different rates of execution may be specified under different operating conditions, and the number of such possible combinations may be arbitrarily large. Furthermore, adaptive rescheduling may in turn require notification of rate-sensitive application components. It is therefore desirable to handle variations in operating conditions entirely within the scheduling and dispatching infrastructure when possible. A rate-based distributed real-time application, or a higher-level resource manager, could thus fall back on adaptive re-scheduling only when it cannot achieve acceptable real-time performance through self-adaptation. Reflective alternation among scheduling heuristics offers a way to tune real-time performance internally, and we offer foundational support for this approach. In particular, run-time observable information such as that provided by our metrics-feedback framework makes it possible to detect that a given current scheduling heuristic is underperforming the level of service another could provide. Furthermore we present empirical results for our framework in a realistic avionics mission computing environment. This forms the basis for guided adaption. This dissertation makes five contributions in support of flexible and adaptive scheduling and dispatching in middleware. First, we provide a middle scheduling framework that supports arbitrary and fine-grained composition of static/dynamic scheduling, to assure critical timeliness constraints while improving noncritical performance under a range of conditions. Second, we provide a flexible dispatching infrastructure framework composed of fine-grained primitives, and describe how appropriate configurations can be generated automatically based on the output of the scheduling framework. Third, we describe algorithms to reduce the overhead and duration of adaptive rescheduling, based on sorting for rate selection and priority assignment. Fourth, we provide timely and efficient performance information through an optimized metrics-feedback framework, to support higher-level reflection and adaptation decisions. Fifth, we present the results of empirical studies to quantify and evaluate the performance of alternative canonical scheduling heuristics, across a range of load and load jitter conditions. These studies were conducted within an avionics mission computing applications framework running on realistic middleware and embedded hardware. The results obtained from these studies (1) demonstrate the potential benefits of reflective alternation among distinct scheduling heuristics at run-time, and (2) suggest performance factors of interest for future work on adaptive control policies and mechanisms using this framework
Agreement-related problems:from semi-passive replication to totally ordered broadcast
Agreement problems constitute a fundamental class of problems in the context of distributed systems. All agreement problems follow a common pattern: all processes must agree on some common decision, the nature of which depends on the specific problem. This dissertation mainly focuses on three important agreements problems: Replication, Total Order Broadcast, and Consensus. Replication is a common means to introduce redundancy in a system, in order to improve its availability. A replicated server is a server that is composed of multiple copies so that, if one copy fails, the other copies can still provide the service. Each copy of the server is called a replica. The replicas must all evolve in manner that is consistent with the other replicas. Hence, updating the replicated server requires that every replica agrees on the set of modifications to carry over. There are two principal replication schemes to ensure this consistency: active replication and passive replication. In Total Order Broadcast, processes broadcast messages to all processes. However, all messages must be delivered in the same order. Also, if one process delivers a message m, then all correct processes must eventually deliver m. The problem of Consensus gives an abstraction to most other agreement problems. All processes initiate a Consensus by proposing a value. Then, all processes must eventually decide the same value v that must be one of the proposed values. These agreement problems are closely related to each other. For instance, Chandra and Toueg [CT96] show that Total Order Broadcast and Consensus are equivalent problems. In addition, Lamport [Lam78] and Schneider [Sch90] show that active replication needs Total Order Broadcast. As a result, active replication is also closely related to the Consensus problem. The first contribution of this dissertation is the definition of the semi-passive replication technique. Semi-passive replication is a passive replication scheme based on a variant of Consensus (called Lazy Consensus and also defined here). From a conceptual point of view, the result is important as it helps to clarify the relation between passive replication and the Consensus problem. In practice, this makes it possible to design systems that react more quickly to failures. The problem of Total Order Broadcast is well-known in the field of distributed systems and algorithms. In fact, there have been already more than fifty algorithms published on the problem so far. Although quite similar, it is difficult to compare these algorithms as they often differ with respect to their actual properties, assumptions, and objectives. The second main contribution of this dissertation is to define five classes of total order broadcast algorithms, and to relate existing algorithms to those classes. The third contribution of this dissertation is to compare the expected performance of the various classes of total order broadcast algorithms. To achieve this goal, we define a set of metrics to predict the performance of distributed algorithms
Distributed computing in space-based wireless sensor networks
This thesis investigates the application of distributed computing in general and wireless sensor networks in particular to space applications. Particularly, the thesis addresses issues related to the design of "space-based wireless sensor networks" that consist of ultra-small satellite nodes flying together in close formations. The design space of space-based wireless sensor networks is explored. Consequently, a methodology for designing space-based wireless sensor networks is proposed that is based on a modular architecture. The hardware modules take the form of 3-D Multi-Chip Modules (MCM). The design of hardware modules is demonstrated by designing a representative on-board computer module. The onboard computer module contains an FPGA which includes a system-on-chip architecture that is based on soft components and provides a degree of flexibility at the later stages of the design of the mission.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Volume I: Acquisition Research: The Foundation for Innovation
Proceedings Paper (for Acquisition Research Program)Accordingly, the year 2006 was especially significant for the NPS Acquisition Research Program in taking major strides toward expanding the program''s reach in important ways to other institutions.'' The number of research institutions participating as collaborators grew to 35 with the formation of a Virtual University Consortium.'' Most noteworthy was, as mentioned above, our securing sponsorship from USD(AT&L) to fund research proposals selected from a nationwide call, or Broad Agency Announcement (BAA) (copy available at www.acquisitionresearch.org).'' We''re truly excited at the prospects of receiving innovative and cutting edge proposals from the top minds around the country.'' We trust that this new sponsorship will act like good seeds sown in fertile soil, yielding rich fruits of profitable acquisition research for many years to come.Naval Postgraduate School Acquisition Research ProgramApproved for public release; distribution is unlimited
Recommended from our members
Assessing the security benefits of defence in depth
Most modern computer systems are connected to the Internet. This brings many opportunities for revenue generation via e-commerce and information sharing, but also threats due to the exposure of these systems to malicious adversaries. Therefore, almost all organisations deploy security tools to improve overall detection capabilities. However, all security tools have limitations: they may fail to detect attacks, fail to uncover all vulnerabilities or generate alarms for non-malicious traffic or non-vulnerable code. Using terminology from signalling theory, we can state that security tools suffer from two types of failures: failure to correctly label a malicious event as malicious (False Negatives); and failure to correctly label a non-malicious event as non-malicious (False Positive). These failures may vary from one tool to another, since security tools are diverse in their weaknesses as well as their strengths. Therefore, an obvious design paradigm when deploying these defences is Diversity or Defence in Depth: the expectation is that employing multiple tools increases the chance of detecting malicious behaviour.
This thesis presents research to assess the benefits (or harm) from using diversity. This thesis begins with a literature review on defence in depth, diversity and fault tolerance while identifying areas for further research. This review is followed by the presentation of the overall methodology that we have used to perform the diversity assessment for three types of defence tools namely AntiVirus (AV) products, Intrusion Detection Systems (IDS) and Static Analysis Tools (SAT). The context of this project is inspired by the EPSRC D3S project in the Centre for Software Reliability (CSR) at the City, University of London as well as the previous work on diversity conducted at the same centre, but also elsewhere in the world. This thesis presents the results using the well-known metrics for binary classifiers: Sensitivity and Specificity; and assesses the various forms of adjudication that may be used: 1-out-of-N (1ooN – raise an alarm as long as ANY of the defences do so), N-out-of-N (NooN – raise an alarm only if ALL the defences do so), majority voting (raise an alarm where a MAJORITY of the defences do so) or optimal adjudication (raise an alarm in such a way that it minimises the overall loss to the system from a failure).
The first study compares the detection capabilities of nine different AV products. Additionally, for each vendor, the detection capabilities of the version of the product that is available for free in the VirusTotal platform are compared with the full capability version of that product that is available from the same vendor’s website. Counterintuitively, the free version of AVs from VirusTotal performed better (in most cases) than the commercial versions from the same vendor.
The second study compares the detection capabilities of IDS when deployed in a combined configuration. The functionally diverse combinations are shown to increase the true positive rate significantly while experiencing smaller increases in false positive rate.
The third study analyses the improvements and deteriorations of using diverse SATs to detect web vulnerabilities. The largest improvements in sensitivity, with the least deterioration in specificity was observed with the 1ooN configurations, in NooN configurations there is an improvement in specificity compared with individual systems, and there is a deterioration in sensitivity.
Finally, the benefits of “optimal adjudication” were also investigated: the result shows that the total loss that can result from the two types of failures considered (False Positives and False Negatives) can be significantly reduced with optimal adjudication configurations compared with more conventional methods of adjudication such as 1ooN, NooN or majority voting.
In conclusion, using diverse security protection tools is shown to be beneficial to improving the detection capability of three different families of products and optimal adjudication techniques can help balance the benefits of improved detection while lowering the false positive rates
Combining SOA and BPM Technologies for Cross-System Process Automation
This paper summarizes the results of an industry case study that introduced a cross-system business process automation solution based on a combination of SOA and BPM standard technologies (i.e., BPMN, BPEL, WSDL). Besides discussing major weaknesses of the existing, custom-built, solution and comparing them against experiences with the developed prototype, the paper presents a course of action for transforming the current solution into the proposed solution. This includes a general approach, consisting of four distinct steps, as well as specific action items that are to be performed for every step. The discussion also covers language and tool support and challenges arising from the transformation
- …