6,291 research outputs found

    An XML Framework for Integrating Continuous Queries, Composite Event Detection, and Database Condition Monitoring for Multiple Data Streams

    Get PDF
    With advancements in technology over the last ten years, data management issues have evolved from a stored persistent form to also include streaming data generated from sensors and other software monitoring tools. Furthermore, distributed, event-based systems are becoming more prevalent, with a need to develop applications that can dynamically respond to information extracted from data streams. This research is investigating the integration of stream processing and event processing techniques, with expressive filtering capabilities that include queries over persistent databases to provide application context to the filtering process. Distributed Event Processing Agents (DEPAs) continuously filter events from multiple data streams of different formats that provide XML views. Composite events for data streams are expressed using the Composite Event Detection Language (CEDL) and mapped to Composite XQuery (CXQ) for implementation. CXQ is a language that extends XQuery with features from CEDL, including operators for expressing sequence, disjunction, conjunction, repetition, aggregation, and time windows for events. Continuous queries and composite event filters are integrated with techniques for materialized view maintenance and incremental evaluation in condition monitoring to provide efficient ways of enhancing stream filters with database queries. The filtering and event detection load is distributed among multiple DEPAs, with CXQ expressions decomposed to allocate subcomponents of the expression to DEPAs that efficiently communicate in the global detection of composite events. A unique aspect of our research is that it extends XQuery with temporal, composite event features to combine techniques for continuous queries in stream processing, incremental evaluation in condition monitoring, and detection and filtering of composite events, creating an expressive environment for the extraction of meaningful events from multiple data streams with XML views

    Distributed Inference and Query Processing for RFID Tracking and Monitoring

    Get PDF
    In this paper, we present the design of a scalable, distributed stream processing system for RFID tracking and monitoring. Since RFID data lacks containment and location information that is key to query processing, we propose to combine location and containment inference with stream query processing in a single architecture, with inference as an enabling mechanism for high-level query processing. We further consider challenges in instantiating such a system in large distributed settings and design techniques for distributed inference and query processing. Our experimental results, using both real-world data and large synthetic traces, demonstrate the accuracy, efficiency, and scalability of our proposed techniques.Comment: VLDB201

    Towards an Architecture for Efficient Distributed Search of Multimodal Information

    Get PDF
    The creation of very large-scale multimedia search engines, with more than one billion images and videos, is a pressing need of digital societies where data is generated by multiple connected devices. Distributing search indexes in cloud environments is the inevitable solution to deal with the increasing scale of image and video collections. The distribution of such indexes in this setting raises multiple challenges such as the even partitioning of data space, load balancing across index nodes and the fusion of the results computed over multiple nodes. The main question behind this thesis is how to reduce and distribute the multimedia retrieval computational complexity? This thesis studies the extension of sparse hash inverted indexing to distributed settings. The main goal is to ensure that indexes are uniformly distributed across computing nodes while keeping similar documents on the same nodes. Load balancing is performed at both node and index level, to guarantee that the retrieval process is not delayed by nodes that have to inspect larger subsets of the index. Multimodal search requires the combination of the search results from individual modalities and document features. This thesis studies rank fusion techniques focused on reducing complexity by automatically selecting only the features that improve retrieval effectiveness. The achievements of this thesis span both distributed indexing and rank fusion research. Experiments across multiple datasets show that sparse hashes can be used to distribute documents and queries across index entries in a balanced and redundant manner across nodes. Rank fusion results show that is possible to reduce retrieval complexity and improve efficiency by searching only a subset of the feature indexes

    Spatial Index for Uncertain Time Series

    Get PDF
    A search for patterns in uncertain time series is time-expensive in today\u27s large databases using the currently available methods. To accelerate the search process for uncertain time series data, in this paper, we explore a spatial index structure, which uses uncertain information stored in minimum bounding rectangle and ameliorates the general prune/search process along the path from the root to leaves. To get a better performance, we normalize the uncertain time series using the weighted variance before the prune/hit process. Meanwhile, we add two goodness measures with respect to the variance to improve the robustness. The extensive experiments show that, compared with the primitive probabilistic similarity search algorithm, the prune/hit process of the spatial index can be more efficient and robust using the specific preprocess and variant index operations with just a little loss of accuracy

    StreamingHub: Interactive Stream Analysis Workflows

    Get PDF
    Reusable data/code and reproducible analyses are foundational to quality research. This aspect, however, is often overlooked when designing interactive stream analysis workflows for time-series data (e.g., eye-tracking data). A mechanism to transmit informative metadata alongside data may allow such workflows to intelligently consume data, propagate metadata to downstream tasks, and thereby auto-generate reusable, reproducible analytic outputs with zero supervision. Moreover, a visual programming interface to design, develop, and execute such workflows may allow rapid prototyping for interdisciplinary research. Capitalizing on these ideas, we propose StreamingHub, a framework to build metadata propagating, interactive stream analysis workflows using visual programming. We conduct two case studies to evaluate the generalizability of our framework. Simultaneously, we use two heuristics to evaluate their computational fluidity and data growth. Results show that our framework generalizes to multiple tasks with a minimal performance overhead

    Attribute Relationship Analysis in Outlier Mining and Stream Processing

    Get PDF
    The main theme of this thesis is to unite two important fields of data analysis, outlier mining and attribute relationship analysis. In this work we establish the connection between these two fields. We present techniques which exploit this connection, allowing to improve outlier detection in high dimensional data. In the second part of the thesis we extend our work to the emerging topic of data streams

    Living analytics methods for the social web

    Get PDF
    [no abstract
    corecore