2 research outputs found

    Mining Event Traces from Real-time Systems for Anomaly Detection

    Get PDF
    Real-time systems are a significant class of applications, poised to grow even further as autonomous vehicles and the Internet of Things (IoT) become a reality. The computation and communication tasks of the underlying embedded systems must comply with strict timing and safety requirements as undetected defects in these systems may lead to catastrophic failures. The runtime behavior of these systems is prone to uncertainties arising from dynamic workloads and extra-functional conditions that affect both the software and hardware over the course of their deployment, e.g., unscheduled firmware updates, communication channel saturation, power-saving mode switches, or external malicious attacks. The operation in such unpredictable environments prevents the detection of anomalous behavior using traditional formal modeling and analysis techniques as they generally consider worst-case analysis and tend to be overly conservative. To overcome these limitations, and primarily motivated by the increasing availability of generated traces from real-time embedded systems, this thesis presents TRACMIN - Trace Mining using Arrival Curves - which is an anomaly detection approach that empirically constructs arrival curves from event traces to capture the recurrent behavior and intrinsic features of a given real-time system. The thesis uses TRACMIN to fill the gap between formal analysis techniques of real-time systems and trace mining approaches that lack expressive, human-readable, and scalable methods. The thesis presents definitions, metrics, and tools to employ statistical learning techniques to cluster and classify traces generated from different modes of normal operation versus anomalous traces. Experimenting with multiple datasets from deployed real-time embedded systems facing performance degradation and hardware misconfiguration anomalies demonstrates the feasibility and viability of our approaches on timestamped event traces generated from an industrial real-time operating system. Acknowledging the high computation expense for constructing empirical arrival curves, the thesis provides a rapid algorithm to achieve desirable scalability on lengthy traces paving the way for adoption in research and industry. Finally, the thesis presents a robustness analysis for the arrival curves models by employing theories of demand-bound functions from the scheduling domain. The analysis provides bounds on how much disruption a real-time system modeled using our approach can tolerate before being declared anomalous, which is crucial for specification and certification purposes. In conclusion, TRACMIN combines empirical and theoretical methods to provide a concrete anomaly detection framework that uses robust models of arrival curves scalably constructed from event traces to detect anomalies that affect the recurrent behavior of a real-time system

    SLA Calculus

    Get PDF
    For modeling Service-Oriented Architectures (SOAs) and validating worst-case performance guarantees a deterministic modeling method with efficient analysis is presented. Upper and lower bounds for delay and workload in systems are used to describe performance contracts. The SLA Calculus allows one to combine model descriptions for single systems and to derive bounds for reaction time and capacity of composed systems with analytic means. The intended, but not exclusive modeling domain for SLA Calculus are distributed software systems with reaction time constraints. SOAs are a system design paradigm that encapsulate software functions in service applications. Due to their standardized interfaces and accessibility via networks, large systems can be composed from smaller services and presented as services again. A well-known implementation of the service paradigm are Web Services that allow applications with components connected by the Internet. Own services and those rented from providers can be transparently combined by users. Performance guarantees for SOAs gain importance with more complex systems and applications in business environments When a service is rented by a customer the provider agrees upon a Service Level Agreement (SLA) with conditions concerning interface, pricing and performance. Service reaction time in form of delay is an important part in many SLAs and subject to performance models discussed in this work. With SLAs providers implicate a maximum delay for their products when the customer limits the workload to their systems. Hence customers expect the contracted service provider to deliver the performance figures unless the workload exceeds the SLA. Since contract penalties could apply, providers have a natural interest in dimensioning their service in regard to the SLA. Even for maximum workloads specified in the contracts the worst-case delay has to hold. Moreover, due to the compositional nature of Web Services, customers become providers themselves when they offer their service compositions to others. Again, worst-case performance bounds are of major interest here. Analyzing models of SOAs is an option to plan, dimension and validate service performance. For system modeling and analysis many methods exist. Queueing Systems and simulation are two well-known approaches in computer science. They provide average and thus long-term performance numbers quite easily using, probabilistic workload and service process descriptions. Deriving system behavior in worst-case situations for performance guarantees is elaborative and can be impossible for more complex systems. Receiving delay bounds usable in SLAs for SOAs by model analysis is still a research issue. A promising candidate to model SOA with SLAs is Network Calculus, an analytical method to derive performance bounds for network components. Given deterministic descriptions for arrival to and service in a network node hard bounds for network delay and the required buffer memory in routers are computed. A fine-granular separation between short- and long-term goals is possible. Network Calculus models also feature composition of elements and fast analytical analysis. When applied to SOAs with SLAs the problem arises that SLAs are not suitable as a system description and information source for Network Calculus models. Especially the internal service capacity is not exposed by SLAs, since providers consider them as a business secret. Without service process descriptions Network Calculus models cannot be analyzed. The SLA Calculus is presented as a solution to this problem. As a novel contribution for deterministic model analysis for SOAs, SLA Calculus is an extension to Network Calculus. Instead of service process descriptions, it uses information on latency to characterize a system. Delay of services is not a scalar analysis result anymore, it becomes a process over time that is bound with Network Calculus-style curves, the delay curves. Together with arrival curves the performance contracts in SLAs are formalized by so-called SLA Delay Properties (SDPs) as a description for the service performance in worst-case. Service composition can be modeled by serial and parallel combination of SDPs. The necessary theorems for the resulting worst-case bounds are given and proved. We will present a method to transfer these performance figures to the missing service process description again. Apart from basic theory we will also consider solutions for practical modeling situations. An algorithm to extract arrival and delay curves from measurements, enables the modeler to include already existing systems without given SLAs as model elements. Finally, we will sketch a selection method in form of an optimization problem for services to support the dynamic service selection in SOAs with a Service Broker. SLA Calculus model analysis will deliver deterministic upper and lower bounds for workload capacities and response times. For upper bounds the worst-case is assumed, thus bounds are pessimistic. The advantage of SLA Calculus is the ability to compute these bounds very fast and to give system modelers a quick overview on system characteristics considering extreme situations. In other modeling methods a lengthy transient analysis would be required. The strict perspective towards worst-case brought up another analysis target: Until now, relatively little attention was paid to contract conformance between subsequent services within service compositions. When services offer different workload capacities the arrival rate to the system needs to be adjusted to avoid bottlenecks. Additionally, for service compositions no response time contract can be guaranteed without internal buffering to enforce a common arrival rate. SLA Calculus unveils the necessary buffer delays and is able to bound them
    corecore