4,544 research outputs found

    Scalable Multi-Parameter Outlier Detection Technology

    Get PDF
    The real-time detection of anomalous phenomena on streaming data has become increasingly important for applications ranging from fraud detection, financial analysis to traffic management. In these streaming applications, often a large number of similar continuous outlier detection queries are executed concurrently. In the light of the high algorithmic complexity of detecting and maintaining outlier patterns for different parameter settings independently, we propose a shared execution methodology called SOP that handles a large batch of requests with diverse pattern configurations. First, our systematic analysis reveals opportunities for maximum resource sharing by leveraging commonalities among outlier detection queries. For that, we introduce a sharing strategy that integrates all computation results into one compact data structure. It leverages temporal relationships among stream data points to prioritize the probing process. Second, this work is the first to consider predicate constraints in the outlier detection context. By distinguishing between target and scope constraints, customized fragment sharing and block selection strategies can be effectively applied to maximize the efficiency of system resource utilization. Our experimental studies utilizing real stream data demonstrate that our approach performs 3 orders of magnitude faster than the start-of-the-art and scales to 1000s of queries

    State-Slice: A New Stream Query Optimization Paradigm for Multi-query and Distributed Processing

    Get PDF
    Modern stream applications necessitate the handling of large numbers of continuous queries specified over high volume data streams. This dissertation proposes novel solutions to continuous query optimization in three core areas of stream query processing, namely state-slice based multiple continuous query sharing, ring-based multi-way join query distribution and scalable distributed multi-query optimization. The first part of the dissertation proposes efficient optimization strategies that utilize the novel state-slicing concept to achieve maximum memory and computation sharing for stream join queries with window constraints. Extensive analytical and experimental evaluations demonstrate that our proposed strategies is capable to minimize the memory or CPU consumptions for multiple join queries. The second part of this dissertation proposes a novel scheme for the distributed execution of generic multi-way joins with window constraints. The proposed scheme partitions the states into disjoint slices in the time domain, and then distributes the fine-grained states in the cluster, forming a virtual computation ring. New challenges to support this distributed state-slicing processing are answered by numerous new techniques. The extensive experimental evaluations show that the proposed strategies achieve significant performance improvements in terms of response time and memory usages for a wide range of configurations and workloads on a real system. Ring based distributed stream query processing and multi-query sharing both are based on the state-slice concept. The third part of this dissertation combines the first two parts of this dissertation work and proposes a novel distributed multi-query optimization technique

    Efficient algorithms for passive network measurement

    Get PDF
    Network monitoring has become a necessity to aid in the management and operation of large networks. Passive network monitoring consists of extracting metrics (or any information of interest) by analyzing the traffic that traverses one or more network links. Extracting information from a high-speed network link is challenging, given the great data volumes and short packet inter-arrival times. These difficulties can be alleviated by using extremely efficient algorithms or by sampling the incoming traffic. This work improves the state of the art in both these approaches. For one-way packet delay measurement, we propose a series of improvements over a recently appeared technique called Lossy Difference Aggregator. A main limitation of this technique is that it does not provide per-flow measurements. We propose a data structure called Lossy Difference Sketch that is capable of providing such per-flow delay measurements, and, unlike recent related works, does not rely on any model of packet delays. In the problem of collecting measurements under the sliding window model, we focus on the estimation of the number of active flows and in traffic filtering. Using a common approach, we propose one algorithm for each problem that obtains great accuracy with significant resource savings. In the traffic sampling area, the selection of the sampling rate is a crucial aspect. The most sensible approach involves dynamically adjusting sampling rates according to network traffic conditions, which is known as adaptive sampling. We propose an algorithm called Cuckoo Sampling that can operate with a fixed memory budget and perform adaptive flow-wise packet sampling. It is based on a very simple data structure and is computationally extremely lightweight. The techniques presented in this work are thoroughly evaluated through a combination of theoretical and experimental analysis.Postprint (published version

    Resource Sharing in Continuous Sliding-Window Aggregates

    Get PDF

    Multi-scale window specification over streaming trajectories

    Get PDF
    Enormous amounts of positional information are collected by monitoring applications in domains such as fleet management cargo transport wildlife protection etc. With the advent of modern location-based services processing such data mostly focuses on providing real-time response to a variety of user requests in continuous and scalable fashion. An important class of such queries concerns evolving trajectories that continuously trace the streaming locations of moving objects like GPS-equipped vehicles commodities with RFID\u27s people with smartphones etc. In this work we propose an advanced windowing operator that enables online incremental examination of recent motion paths at multiple resolutions for numerous point entities. When applied against incoming positions this window can abstract trajectories at coarser representations towards the past while retaining progressively finer features closer to the present. We explain the semantics of such multi-scale sliding windows through parameterized functions that reflect the sequential nature of trajectories and can effectively capture their spatiotemporal properties. Such window specification goes beyond its usual role for non-blocking processing of multiple concurrent queries. Actually it can offer concrete subsequences from each trajectory thus preserving continuity in time and contiguity in space along the respective segments. Further we suggest language extensions in order to express characteristic spatiotemporal queries using windows. Finally we discuss algorithms for nested maintenance of multi-scale windows and evaluate their efficiency against streaming positional data offering empirical evidence of their benefits to online trajectory processing
    • …
    corecore