6,291 research outputs found
An XML Framework for Integrating Continuous Queries, Composite Event Detection, and Database Condition Monitoring for Multiple Data Streams
With advancements in technology over the last ten years, data management
issues have evolved from a stored persistent form to also include streaming
data generated from sensors and other software monitoring tools.
Furthermore, distributed, event-based systems are becoming more prevalent,
with a need to develop applications that can dynamically respond to information extracted from data streams. This research is investigating the
integration of stream processing and event processing techniques, with
expressive filtering capabilities that include queries over persistent databases
to provide application context to the filtering process. Distributed Event
Processing Agents (DEPAs) continuously filter events from multiple data
streams of different formats that provide XML views. Composite events for
data streams are expressed using the Composite Event Detection Language (CEDL) and mapped to Composite XQuery (CXQ) for implementation. CXQ is a language that extends XQuery with features from CEDL, including operators for expressing sequence, disjunction, conjunction, repetition, aggregation, and time windows for events. Continuous queries and composite event filters are integrated with techniques for materialized view maintenance and
incremental evaluation in condition monitoring to provide efficient ways of
enhancing stream filters with database queries. The filtering and event
detection load is distributed among multiple DEPAs, with CXQ expressions
decomposed to allocate subcomponents of the expression to DEPAs that
efficiently communicate in the global detection of composite events. A unique
aspect of our research is that it extends XQuery with temporal, composite
event features to combine techniques for continuous queries in stream
processing, incremental evaluation in condition monitoring, and detection and
filtering of composite events, creating an expressive environment for the
extraction of meaningful events from multiple data streams with XML views
Distributed Inference and Query Processing for RFID Tracking and Monitoring
In this paper, we present the design of a scalable, distributed stream
processing system for RFID tracking and monitoring. Since RFID data lacks
containment and location information that is key to query processing, we
propose to combine location and containment inference with stream query
processing in a single architecture, with inference as an enabling mechanism
for high-level query processing. We further consider challenges in
instantiating such a system in large distributed settings and design techniques
for distributed inference and query processing. Our experimental results, using
both real-world data and large synthetic traces, demonstrate the accuracy,
efficiency, and scalability of our proposed techniques.Comment: VLDB201
Towards an Architecture for Efficient Distributed Search of Multimodal Information
The creation of very large-scale multimedia search engines, with more than one billion
images and videos, is a pressing need of digital societies where data is generated by multiple connected devices. Distributing search indexes in cloud environments is the inevitable solution to deal with the increasing scale of image and video collections. The distribution of such indexes in this setting raises multiple challenges such as the even partitioning of data space, load balancing across index nodes and the fusion of the results computed over multiple nodes. The main question behind this thesis is how to reduce and distribute the multimedia retrieval computational complexity?
This thesis studies the extension of sparse hash inverted indexing to distributed settings.
The main goal is to ensure that indexes are uniformly distributed across computing nodes while keeping similar documents on the same nodes. Load balancing is performed at both node and index level, to guarantee that the retrieval process is not delayed by nodes that have to inspect larger subsets of the index.
Multimodal search requires the combination of the search results from individual modalities and document features. This thesis studies rank fusion techniques focused on reducing complexity by automatically selecting only the features that improve retrieval effectiveness.
The achievements of this thesis span both distributed indexing and rank fusion research.
Experiments across multiple datasets show that sparse hashes can be used to distribute documents and queries across index entries in a balanced and redundant manner across nodes. Rank fusion results show that is possible to reduce retrieval complexity and improve efficiency by searching only a subset of the feature indexes
Spatial Index for Uncertain Time Series
A search for patterns in uncertain time series is time-expensive in today\u27s large databases using the currently available methods. To accelerate the search process for uncertain time series data, in this paper, we explore a spatial index structure, which uses uncertain information stored in minimum bounding rectangle and ameliorates the general prune/search process along the path from the root to leaves. To get a better performance, we normalize the uncertain time series using the weighted variance before the prune/hit process. Meanwhile, we add two goodness measures with respect to the variance to improve the robustness. The extensive experiments show that, compared with the primitive probabilistic similarity search algorithm, the prune/hit process of the spatial index can be more efficient and robust using the specific preprocess and variant index operations with just a little loss of accuracy
StreamingHub: Interactive Stream Analysis Workflows
Reusable data/code and reproducible analyses are foundational to quality research. This aspect, however, is often overlooked when designing interactive stream analysis workflows for time-series data (e.g., eye-tracking data). A mechanism to transmit informative metadata alongside data may allow such workflows to intelligently consume data, propagate metadata to downstream tasks, and thereby auto-generate reusable, reproducible analytic outputs with zero supervision. Moreover, a visual programming interface to design, develop, and execute such workflows may allow rapid prototyping for interdisciplinary research. Capitalizing on these ideas, we propose StreamingHub, a framework to build metadata propagating, interactive stream analysis workflows using visual programming. We conduct two case studies to evaluate the generalizability of our framework. Simultaneously, we use two heuristics to evaluate their computational fluidity and data growth. Results show that our framework generalizes to multiple tasks with a minimal performance overhead
Attribute Relationship Analysis in Outlier Mining and Stream Processing
The main theme of this thesis is to unite two important fields of data analysis, outlier mining and attribute relationship analysis. In this work we establish the connection between these two fields. We present techniques which exploit this connection, allowing to improve outlier detection in high dimensional data. In the second part of the thesis we extend our work to the emerging topic of data streams
Living analytics methods for the social web
[no abstract
- …