2,425 research outputs found

    Stateful Detection in High Throughput Distributed Systems

    Get PDF
    With the increasing speed of computers, complexity of applications and large scale of applications, many of today’s distributed systems exchange data at a high rate. It is important to provide error detection capabilities to such applications that provide critical functionality. Significant prior work has been done in software implemented error detection achieved through a fault tolerance system separate from the application system. However, the high rate of data coupled with complex detection can cause the capacity of the fault tolerance system to be exhausted resulting in low detection accuracy. This is particularly the case when the detection is done against rules based on state that has been generated in the system. We present a new stateful detection mechanism which is based on observing messages exchanged between the protocol participants, deducing the application state from them, and matching against anomaly based rules. We have previously shown the capacity constraint of the detection framework called the Monitor. Here we extend the Monitor framework to incorporate a sampling approach which adjusts the rate of messages to be verified by sampling the incoming application stream of messages. The adjustment is such that the breakdown in the Monitor capacity is avoided. The cost of processing each message increases because the application state is no longer accurately known at the Monitor. However, the overall detection cost is reduced due to the lower rate of messages processed. We show that even with sampling, the Monitor is able to track the possible state of the protocol entity and provide stateful detection. We implement the approach and apply it to a reliable multicast protocol called TRAM. We demonstrate the gains of the approach by comparing the latency and accuracy of fault detection to the baseline Monitor system

    P4CEP: Towards In-Network Complex Event Processing

    Full text link
    In-network computing using programmable networking hardware is a strong trend in networking that promises to reduce latency and consumption of server resources through offloading to network elements (programmable switches and smart NICs). In particular, the data plane programming language P4 together with powerful P4 networking hardware has spawned projects offloading services into the network, e.g., consensus services or caching services. In this paper, we present a novel case for in-network computing, namely, Complex Event Processing (CEP). CEP processes streams of basic events, e.g., stemming from networked sensors, into meaningful complex events. Traditionally, CEP processing has been performed on servers or overlay networks. However, we argue in this paper that CEP is a good candidate for in-network computing along the communication path avoiding detouring streams to distant servers to minimize communication latency while also exploiting processing capabilities of novel networking hardware. We show that it is feasible to express CEP operations in P4 and also present a tool to compile CEP operations, formulated in our P4CEP rule specification language, to P4 code. Moreover, we identify challenges and problems that we have encountered to show future research directions for implementing full-fledged in-network CEP systems.Comment: 6 pages. Author's versio

    SecSip: A Stateful Firewall for SIP-based Networks

    Get PDF
    SIP-based networks are becoming the de-facto standard for voice, video and instant messaging services. Being exposed to many threats while playing an major role in the operation of essential services, the need for dedicated security management approaches is rapidly increasing. In this paper we present an original security management approach based on a specific vulnerability aware SIP stateful firewall. Through known attack descriptions, we illustrate the power of the configuration language of the firewall which uses the capability to specify stateful objects that track data from multiple SIP elements within their lifetime. We demonstrate through measurements on a real implementation of the firewall its efficiency and performance

    StreamLearner: Distributed Incremental Machine Learning on Event Streams: Grand Challenge

    Full text link
    Today, massive amounts of streaming data from smart devices need to be analyzed automatically to realize the Internet of Things. The Complex Event Processing (CEP) paradigm promises low-latency pattern detection on event streams. However, CEP systems need to be extended with Machine Learning (ML) capabilities such as online training and inference in order to be able to detect fuzzy patterns (e.g., outliers) and to improve pattern recognition accuracy during runtime using incremental model training. In this paper, we propose a distributed CEP system denoted as StreamLearner for ML-enabled complex event detection. The proposed programming model and data-parallel system architecture enable a wide range of real-world applications and allow for dynamically scaling up and out system resources for low-latency, high-throughput event processing. We show that the DEBS Grand Challenge 2017 case study (i.e., anomaly detection in smart factories) integrates seamlessly into the StreamLearner API. Our experiments verify scalability and high event throughput of StreamLearner.Comment: Christian Mayer, Ruben Mayer, and Majd Abdo. 2017. StreamLearner: Distributed Incremental Machine Learning on Event Streams: Grand Challenge. In Proceedings of the 11th ACM International Conference on Distributed and Event-based Systems (DEBS '17), 298-30

    Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management

    Get PDF
    As users of big data applications expect fresh results, we witness a new breed of stream processing systems (SPS) that are designed to scale to large numbers of cloud-hosted machines. Such systems face new challenges: (i) to benefit from the pay-as-you-go model of cloud computing, they must scale out on demand, acquiring additional virtual machines (VMs) and parallelising operators when the workload increases; (ii) failures are common with deployments on hundreds of VMs - systems must be fault-tolerant with fast recovery times, yet low per-machine overheads. An open question is how to achieve these two goals when stream queries include stateful operators, which must be scaled out and recovered without affecting query results. Our key idea is to expose internal operator state explicitly to the SPS through a set of state management primitives. Based on them, we describe an integrated approach for dynamic scale out and recovery of stateful operators. Externalised operator state is checkpointed periodically by the SPS and backed up to upstream VMs. The SPS identifies individual operator bottlenecks and automatically scales them out by allocating new VMs and partitioning the check-pointed state. At any point, failed operators are recovered by restoring checkpointed state on a new VM and replaying unprocessed tuples. We evaluate this approach with the Linear Road Benchmark on the Amazon EC2 cloud platform and show that it can scale automatically to a load factor of L=350 with 50 VMs, while recovering quickly from failures. Copyright © 2013 ACM

    LightBox: Full-stack Protected Stateful Middlebox at Lightning Speed

    Full text link
    Running off-site software middleboxes at third-party service providers has been a popular practice. However, routing large volumes of raw traffic, which may carry sensitive information, to a remote site for processing raises severe security concerns. Prior solutions often abstract away important factors pertinent to real-world deployment. In particular, they overlook the significance of metadata protection and stateful processing. Unprotected traffic metadata like low-level headers, size and count, can be exploited to learn supposedly encrypted application contents. Meanwhile, tracking the states of 100,000s of flows concurrently is often indispensable in production-level middleboxes deployed at real networks. We present LightBox, the first system that can drive off-site middleboxes at near-native speed with stateful processing and the most comprehensive protection to date. Built upon commodity trusted hardware, Intel SGX, LightBox is the product of our systematic investigation of how to overcome the inherent limitations of secure enclaves using domain knowledge and customization. First, we introduce an elegant virtual network interface that allows convenient access to fully protected packets at line rate without leaving the enclave, as if from the trusted source network. Second, we provide complete flow state management for efficient stateful processing, by tailoring a set of data structures and algorithms optimized for the highly constrained enclave space. Extensive evaluations demonstrate that LightBox, with all security benefits, can achieve 10Gbps packet I/O, and that with case studies on three stateful middleboxes, it can operate at near-native speed.Comment: Accepted at ACM CCS 201

    Lightweight Asynchronous Snapshots for Distributed Dataflows

    Full text link
    Distributed stateful stream processing enables the deployment and execution of large scale continuous computations in the cloud, targeting both low latency and high throughput. One of the most fundamental challenges of this paradigm is providing processing guarantees under potential failures. Existing approaches rely on periodic global state snapshots that can be used for failure recovery. Those approaches suffer from two main drawbacks. First, they often stall the overall computation which impacts ingestion. Second, they eagerly persist all records in transit along with the operation states which results in larger snapshots than required. In this work we propose Asynchronous Barrier Snapshotting (ABS), a lightweight algorithm suited for modern dataflow execution engines that minimises space requirements. ABS persists only operator states on acyclic execution topologies while keeping a minimal record log on cyclic dataflows. We implemented ABS on Apache Flink, a distributed analytics engine that supports stateful stream processing. Our evaluation shows that our algorithm does not have a heavy impact on the execution, maintaining linear scalability and performing well with frequent snapshots.Comment: 8 pages, 7 figure
    • …
    corecore