209 research outputs found

    Designing Robust API Monitoring Solutions

    Get PDF
    racing the sequence of library calls and system calls that a program makes is very helpful to characterize its interactions with the surrounding environment and, ultimately, its semantics. However, due to the entanglements of real-world software stacks, accomplishing this task can be surprisingly challenging as we take accuracy, reliability, and transparency into the equation. In this article, we identify six challenges that API monitoring solutions should overcome in order to manage these dimensions effectively and outline actionable design points for building robust API tracers that can be used even for security research. We then detail and evaluate SNIPER, an open-source API tracing system available in two variants based on dynamic binary instrumentation (for simplified in-guest deployment) and hardware-assisted virtualization (realizing the first general user-space tracer of this kind), respectively

    LogOS: an Automatic Logging Framework for Service-Oriented Architectures

    Get PDF
    International audienceAs multi-source, component based platforms are becoming widespread both for constrained devices and cloud computing, the need for automatic logging framework is increas- ing. Indeed, components from untrusted and possibly competing vendors are being deployed to the same runtime environments. They are also being integrated, with some components from a vendor being exposed as a service to another one. This paper presents our investigations on an automated log-based architec- ture called LogOS, focused on service interactions monitoring. We ported and experimented it on Java / OSGi to enable identification between bundle providers in cases of failures. We motivate the need for an automatic logging framework in service- oriented architectures, and discuss the requirements of such frameworks design. We present our implementation on OSGi and expose the trade-offs in doing so. We conduct some experiments and, despite a necessary and significant existing overhead due to unequivocal identification constraints, we show that it should not be a major hindrance to the adoption of automatic frameworks for most service-oriented applications. Finally, we position our approach and give some perspectives

    Understanding the Error Behavior of Complex Critical Software Systems through Field Data

    Get PDF
    Software systems are the basis for human everyday activities, which are increasingly dependent on software. Software is an integral part of systems we interact with in our daily life raging form small systems for entertainment and domotics, to large systems and infrastructures that provide fundamental services such as telecommunication, transportation, and financial. In particular, software systems play a key role in the context of critical domains, supporting crucial activities. For example, ground and air transportation, power supply, nuclear plants, and medical applications strongly rely on software systems: failures affecting these systems can lead to severe consequences, which can be catastrophic in terms of business or, even worse, human losses. Therefore, given the growing dependence on software systems in life- and critical-applications, dependability, has become among one of the most relevant industry and research concerns in the last decades. Software faults have been recognized as one of the major cause for system failures since the hardware failure rate has been decreasing over the years. Time and cost constraints, along with technical limitations, often do not allow to fully validate the correctness of the software solely by means of testing; therefore, software might be released with residual faults that activate during operations. The activation of a fault generates errors which propagate through the components of the system, possibly leading to a failure. Therefore, in order to produce reliable software, it is important to understand how errors affect a software system. This is of paramount importance especially in the context of complex critical software systems, where the occurrence of a failure can lead to severe consequences. However, the analysis of the error behavior of this kind of system is not trivial. They are often distributed systems based on many interacting heterogeneous components and layers, including Off-The-Shelf (OTS), third party components and legacy systems. All these aspects, undermine the understanding of the error behavior of complex critical software system. A well established methodology to evaluate the dependability of operational systems and to identify their dependability bottlenecks is represented by field failure data analysis (FFDA), which is based on the monitoring and recording of errors and failures occurred during the operational phase of the system under real workload conditions, i.e., field data. Indeed, direct measurement and analysis of natural failures occurring under real workload conditions is among the most accurate ways to assess dependability characteristics. One of the main sources of field data, are monitoring techniques. The contribution of the thesis is to provide a methodology that allows understanding the error behavior of complex critical software systems by means of field data generated by the monitoring techniques already implemented in the target system. The use of available monitoring techniques allows to overcome the limitations imposed in the context of critical systems, avoiding severe changes in the system, and preserving its functionality and performance. The methodology is based on fault injection experiments that stimulate the target system with different error conditions. Injection experiments allow to accelerate the collection of error data naturally generated by the monitoring techniques already implemented in the system. The collected data are analyzed in order to characterize the behavior of the system under the occurred software errors. To this aim, the proposed methodology leverages a set of innovative means defined in this dissertation, i.e., (i) Error Propagation graphs, which allow to analyze the error propagation phenomena occurred in the target system and that can be inferred by the collected field data, and a set of metrics composed by (ii) Error Determination Degree, which allows gaining insights into the ability of error notifications of a monitoring technique to suggest either the fault that led to the error, or the failure the error led to in the system, (iii) Error Propagation Reportability, which allow understanding the ability of a monitoring technique at reporting the propagation of errors, and (iv) Data Dissimilarity, which allows gaining insights into the suitability of the data generated by the monitoring techniques for failure analysis. The methodology has been experimented on two instances of complex critical software systems in the field of Air Traffic Control (ATC), i.e., a communication middleware supporting data exchanging among ATC applications, and an arrival manager that is responsible for managing flight arrivals to a given airspace, within an industry-academia collaboration in the context of a national research project. Results show that field data generated by means of monitoring techniques already implemented in a complex critical software system can be leveraged to obtain insights about the error behavior exhibited by the target system, as well as about the potential beneficial locations for EDMs and ERMs. In addition, the proposed methodology also allowed to characterize the effectiveness of the monitoring techniques in terms of failure reporting, error propagation reportability, and data dissimilarity

    Dynamic Assembly for System Adaptability, Dependability, and Assurance

    Get PDF
    (DASASA) ProjectAuthor-contributed print ite

    A Monitoring Approach for Dynamic Service-Oriented Architecture Systems

    Get PDF
    International audienceIn the context of Dynamic Service-oriented Architecture(SOA), where services may dynamically appear or disappear transparently to the user, classical monitoring approaches which inject monitors into services cannot be used. We argue that, since SOA services are loosely coupled, monitors must also be loosely coupled. In this paper, we describe an ongoing work proposing a monitoring approach dedicated to dynamic SOA systems. We defined two key properties of loosely coupled monitoring systems: dynamicity resilience and comprehensiveness. We propose a preliminary implementation targeted at the OSGi framework

    Data-driven Protection of Transformers, Phase Angle Regulators, and Transmission Lines in Interconnected Power Systems

    Get PDF
    This dissertation highlights the growing interest in and adoption of machine learning approaches for fault detection in modern electric power grids. Once a fault has occurred, it must be identified quickly and a variety of preventative steps must be taken to remove or insulate it. As a result, detecting, locating, and classifying faults early and accurately can improve safety and dependability while reducing downtime and hardware damage. Machine learning-based solutions and tools to carry out effective data processing and analysis to aid power system operations and decision-making are becoming preeminent with better system condition awareness and data availability. Power transformers, Phase Shift Transformers or Phase Angle Regulators, and transmission lines are critical components in power systems, and ensuring their safety is a primary issue. Differential relays are commonly employed to protect transformers, whereas distance relays are utilized to protect transmission lines. Magnetizing inrush, overexcitation, and current transformer saturation make transformer protection a challenge. Furthermore, non-standard phase shift, series core saturation, low turn-to-turn, and turn-to-ground fault currents are non-traditional problems associated with Phase Angle Regulators. Faults during symmetrical power swings and unstable power swings may cause mal-operation of distance relays, and unintentional and uncontrolled islanding. The distance relays also mal-operate for transmission lines connected to type-3 wind farms. The conventional protection techniques would no longer be adequate to address the above-mentioned challenges due to their limitations in handling and analyzing the massive amount of data, limited generalizability of conventional models, incapability to model non-linear systems, etc. These limitations of conventional differential and distance protection methods bring forward the motivation of using machine learning techniques in addressing various protection challenges. The power transformers and Phase Angle Regulators are modeled to simulate and analyze the transients accurately. Appropriate time and frequency domain features are selected using different selection algorithms to train the machine learning algorithms. The boosting algorithms outperformed the other classifiers for detection of faults with balanced accuracies of above 99% and computational time of about one and a half cycles. The case studies on transmission lines show that the developed methods distinguish power swings and faults, and determine the correct fault zone. The proposed data-driven protection algorithms can work together with conventional differential and distance relays and offer supervisory control over their operation and thus improve the dependability and security of protection systems

    Developing Cyberspace Data Understanding: Using CRISP-DM for Host-based IDS Feature Mining

    Get PDF
    Current intrusion detection systems generate a large number of specific alerts, but do not provide actionable information. Many times, these alerts must be analyzed by a network defender, a time consuming and tedious task which can occur hours or days after an attack occurs. Improved understanding of the cyberspace domain can lead to great advancements in Cyberspace situational awareness research and development. This thesis applies the Cross Industry Standard Process for Data Mining (CRISP-DM) to develop an understanding about a host system under attack. Data is generated by launching scans and exploits at a machine outfitted with a set of host-based data collectors. Through knowledge discovery, features are identified within the data collected which can be used to enhance host-based intrusion detection. By discovering relationships between the data collected and the events, human understanding of the activity is shown. This method of searching for hidden relationships between sensors greatly enhances understanding of new attacks and vulnerabilities, bolstering our ability to defend the cyberspace domain

    The terminator : an AI-based framework to handle dependability threats in large-scale distributed systems

    Get PDF
    With the advent of resource-hungry applications such as scientific simulations and artificial intelligence (AI), the need for high-performance computing (HPC) infrastructure is becoming more pressing. HPC systems are typically characterised by the scale of the resources they possess, containing a large number of sophisticated HW components that are tightly integrated. This scale and design complexity inherently contribute to sources of uncertainties, i.e., there are dependability threats that perturb the system during application execution. During system execution, these HPC systems generate a massive amount of log messages that capture the health status of the various components. Several previous works have leveraged those systems’ logs for dependability purposes, such as failure prediction, with varying results. In this work, three novel AI-based techniques are proposed to address two major dependability problems, those of (i) error detection and (ii) failure prediction. The proposed error detection technique leverages the sentiments embedded in log messages in a novel way, making the approach HPC system-independent, i.e., the technique can be used to detect errors in any HPC system. On the other hand, two novel self-supervised transformer neural networks are developed for failure prediction, thereby obviating the need for labels, which are notoriously difficult to obtain in HPC systems. The first transformer technique, called Clairvoyant, accurately predicts the location of the failure, while the second technique, called Time Machine, extends Clairvoyant by also accurately predicting the lead time to failure (LTTF). Time Machine addresses the typical regression problem of LTTF as a novel multi-class classification problem, using a novel oversampling method for online time-based task training. Results from six real-world HPC clusters’ datasets show that our approaches significantly outperform the state-of-the-art methods on various metrics
    • …
    corecore