10 research outputs found

    Using status messages in the distributed test architecture

    Get PDF
    If the system under test has multiple interfaces/ports and these are physically distributed then in testing we place a tester at each port. If these testers cannot directly communicate with one another and there is no global clock then we are testing in the distributed test architecture. If the distributed test architecture is used then there may be input sequences that cannot be applied in testing without introducing controllability problems. Additionally, observability problems can allow fault masking. In this paper we consider the situation in which the testers can apply a status message: an input that causes the system under test to identify its current state. We show how such a status message can be used in order to overcome controllability and observability problems

    Run-Time Monitoring of Timing Constraints: A Survey of Methods and Tools

    Get PDF
    Abstract-Despite the availability of static analysis methods to achieve a correct-by-construction design for different systems in terms of timing behavior, violations of timing constraints can still occur at run-time due to different reasons. The aim of monitoring of system performance with respect to the timing constraints is to detect the violations of timing specifications, or to predict them based on the current system performance data. Considerable work has been dedicated to suggesting efficient performance monitoring approaches during the past years. This paper presents a survey and classification of those approaches in order to help researchers gain a better view over different methods and developments in monitoring of timing behavior of systems. Classifications of the mentioned approaches are given based on different items that are seen as important in developing a monitoring system, i.e., the use of additional hardware, the data collection approach, etc. Moreover, a description of how these different methods work is presented in this paper along with the advantages and downsides of each of them

    An Architectural Framework for Performance Analysis: Supporting the Design, Configuration, and Control of DIS /HLA Simulations

    Get PDF
    Technology advances are providing greater capabilities for most distributed computing environments. However, the advances in capabilities are paralleled by progressively increasing amounts of system complexity. In many instances, this complexity can lead to a lack of understanding regarding bottlenecks in run-time performance of distributed applications. This is especially true in the domain of distributed simulations where a myriad of enabling technologies are used as building blocks to provide large-scale, geographically disperse, dynamic virtual worlds. Persons responsible for the design, configuration, and control of distributed simulations need to understand the impact of decisions made regarding the allocation and use of the logical and physical resources that comprise a distributed simulation environment and how they effect run-time performance. Distributed Interactive Simulation (DIS) and High Level Architecture (HLA) simulation applications historically provide some of the most demanding distributed computing environments in terms of performance, and as such have a justified need for performance information sufficient to support decision-makers trying to improve system behavior. This research addresses two fundamental questions: (1) Is there an analysis framework suitable for characterizing DIS and HLA simulation performance? and (2) what kind of mechanism can be used to adequately monitor, measure, and collect performance data to support different performance analysis objectives for DIS and HLA simulations? This thesis presents a unified, architectural framework for DIS and HLA simulations, provides details on a performance monitoring system, and shows its effectiveness through a series of use cases that include practical applications of the framework to support real-world U.S. Department of Defense (DoD) programs. The thesis also discusses the robustness of the constructed framework and its applicability to performance analysis of more general distributed computing applications

    Doctor of Philosophy

    Get PDF
    dissertationA modern software system is a composition of parts that are themselves highly complex: operating systems, middleware, libraries, servers, and so on. In principle, compositionality of interfaces means that we can understand any given module independently of the internal workings of other parts. In practice, however, abstractions are leaky, and with every generation, modern software systems grow in complexity. Traditional ways of understanding failures, explaining anomalous executions, and analyzing performance are reaching their limits in the face of emergent behavior, unrepeatability, cross-component execution, software aging, and adversarial changes to the system at run time. Deterministic systems analysis has a potential to change the way we analyze and debug software systems. Recorded once, the execution of the system becomes an independent artifact, which can be analyzed offline. The availability of the complete system state, the guaranteed behavior of re-execution, and the absence of limitations on the run-time complexity of analysis collectively enable the deep, iterative, and automatic exploration of the dynamic properties of the system. This work creates a foundation for making deterministic replay a ubiquitous system analysis tool. It defines design and engineering principles for building fast and practical replay machines capable of capturing complete execution of the entire operating system with an overhead of several percents, on a realistic workload, and with minimal installation costs. To enable an intuitive interface of constructing replay analysis tools, this work implements a powerful virtual machine introspection layer that enables an analysis algorithm to be programmed against the state of the recorded system through familiar terms of source-level variable and type names. To support performance analysis, the replay engine provides a faithful performance model of the original execution during replay

    A Hierarchical Filtering-Based Monitoring Architecture for Large-scale Distributed Systems

    Get PDF
    On-line monitoring is essential for observing and improving the reliability and performance of large-scale distributed (LSD) systems. In an LSD environment, large numbers of events are generated by system components during their execution and interaction with external objects (e.g. users or processes). These events must be monitored to accurately determine the run-time behavior of an LSD system and to obtain status information that is required for debugging and steering applications. However, the manner in which events are generated in an LSD system is complex and represents a number of challenges for an on-line monitoring system. Correlated events axe generated concurrently and can occur at multiple locations distributed throughout the environment. This makes monitoring an intricate task and complicates the management decision process. Furthermore, the large number of entities and the geographical distribution inherent with LSD systems increases the difficulty of addressing traditional issues, such as performance bottlenecks, scalability, and application perturbation. This dissertation proposes a scalable, high-performance, dynamic, flexible and non-intrusive monitoring architecture for LSD systems. The resulting architecture detects and classifies interesting primitive and composite events and performs either a corrective or steering action. When appropriate, information is disseminated to management applications, such as reactive control and debugging tools. The monitoring architecture employs a novel hierarchical event filtering approach that distributes the monitoring load and limits event propagation. This significantly improves scalability and performance while minimizing the monitoring intrusiveness. The architecture provides dynamic monitoring capabilities through: subscription policies that enable applications developers to add, delete and modify monitoring demands on-the-fly, an adaptable configuration that accommodates environmental changes, and a programmable environment that facilitates development of self-directed monitoring tasks. Increased flexibility is achieved through a declarative and comprehensive monitoring language, a simple code instrumentation process, and automated monitoring administration. These elements substantially relieve the burden imposed by using on-line distributed monitoring systems. In addition, the monitoring system provides techniques to manage the trade-offs between various monitoring objectives. The proposed solution offers improvements over related works by presenting a comprehensive architecture that considers the requirements and implied objectives for monitoring large-scale distributed systems. This architecture is referred to as the HiFi monitoring system. To demonstrate effectiveness at debugging and steering LSD systems, the HiFi monitoring system has been implemented at the Old Dominion University for monitoring the Interactive Remote Instruction (IRI) system. The results from this case study validate that the HiFi system achieves the objectives outlined in this thesis

    Deployment and Debugging of Real-Time Applications on Multicore Architectures

    Get PDF
    It is essential to enable information extraction from software. Program tracing techniques are an example of information extraction. Program tracing extracts information from the program during execution. Tracing helps with the testing and validation of software to ensure that the software under test is correct. Information extraction is done by instrumenting the program. Logged information can be stored in dedicated logging memories or can be buffered and streamed off-chip to an external monitor. The designer inspects the trace after execution to identify potentially erroneous state information. In addition, the trace can provide the state information that serves as input to generate the erroneous output for reproducibility. Information extraction can be difficult and expensive due to the increase in size and complexity of modern software systems. For the sub-class of software systems known as real-time systems, these issues are further aggravated. This is because real-time systems demand timing guarantees in addition to functional correctness. Consequently, any instrumentation to the original program code for the purpose of information extraction may affect the temporal behaviors of the program. This perturbation of temporal behaviors can lead to the violation of timing constraints, which may bias the program execution and/or cause the program to miss its deadline. As a result, there is considerable interest in devising techniques to allow for information extraction without missing a program’s deadline that is known as time-aware instrumentation. This thesis investigates time-aware instrumentation mechanisms to instrument programs while respecting their timing constraints and functional behavior. Knowledge of the underlying hardware on which the software runs, enables the extraction of more information via the instrumentation process. Chip-multiprocessors offer a solution to the performance bottleneck on uni-processors. Providing timing guarantees for hard real-time systems, however, on chip-multiprocessors is difficult. This is because conventional communication interconnects are designed to optimize the average-case performance. Therefore, researchers propose interconnects such as the priority-aware networks to satisfy the requirements of hard real-time systems. The priority-aware interconnects, however, lack the proper analysis techniques to facilitate the deployment of real-time systems. This thesis also investigates latency and buffer space analysis techniques for pipelined communication resource models, as well as algorithms for the proper deployment of real-time applications to these platforms. The analysis techniques proposed in this thesis provide guarantees on the schedulability of real-time systems on chip-multiprocessors. These guarantees are based on reducing contention in the interconnect while simultaneously accurately computing the worst-case communication latencies. While these worst-case latencies provide bounds for computing the overall worst-case execution time of applications on chip-multiprocessors, they also provide means to assigning instrumentation budgets required by time-aware instrumentation. Leveraging these platform-specific analysis techniques for the assignment of instrumentation budgets, allows for extracting more information from the instrumentation process

    An Ontology-Based Approach To Concern-Specific Dynamic Software Structure Monitoring

    Get PDF
    Software reliability has not kept pace with computing hardware. Despite the use reliability improvement techniques and methods, faults remain that lead to software errors and failures. Runtime monitoring can improve software reliability by detecting certain errors before failures occur. Monitoring is also useful for online and electronic services, where resource management directly impacts reliability and quality. For example, resource ownership errors can accumulate over time (e. g. , as resource leaks) and result in software aging. Early detection of errors allows more time for corrective action before failures or service outages occur. In addition, the ability to monitor individual software concerns, such as application resource ownership structure, can help support autonomic computing for self-healing, self-adapting and self-optimizing software. This thesis introduces ResOwn - an application resource ownership ontology for interactive session-oriented services. ResOwn provides software monitoring with enriched concepts of application resource ownership borrowed from real-world legal and ownership ontologies. ResOwn is formally defined in OWL-DL (Web Ontology Language Description Logic), verified using an off-the-shelf reasoner, and tested using the call processing software for a small private branch exchange (PBX). The ResOwn Prime Directive states that every object in an operational software system is a resource, an owner, or both simultaneously. Resources produce benefits. Beneficiary owners may receive resource benefits. Nonbeneficiary owners may only manage resources. This approach distinguishes resource ownership use from management and supports the ability to detect when a resource's role-based runtime capacity has been exceeded. This thesis also presents a greybox approach to concern-specific, dynamic software structure monitoring including a monitor architecture, greybox interpreter, and algorithms for deriving monitoring model from a monitored target's formal specifications. The target's requirements and design are assumed to be specified in SDL, a formalism based on communicating extended finite state machines. Greybox abstraction, applicable to both behavior and structure, provides direction on what parts, and how much of the target to instrument, and what types of resource errors to detect. The approach was manually evaluated using a number of resource allocation and ownership scenarios. These scenarios were obtained by collecting actual call traces from an instrumented PBX. The results of an analytical evaluation of ResOwn and the monitoring approach are presented in a discussion of key advantages and known limitations. Conclusions and recommended future work are discussed at the end of the thesis
    corecore