7,683 research outputs found
Fast Search for Dynamic Multi-Relational Graphs
Acting on time-critical events by processing ever growing social media or
news streams is a major technical challenge. Many of these data sources can be
modeled as multi-relational graphs. Continuous queries or techniques to search
for rare events that typically arise in monitoring applications have been
studied extensively for relational databases. This work is dedicated to answer
the question that emerges naturally: how can we efficiently execute a
continuous query on a dynamic graph? This paper presents an exact subgraph
search algorithm that exploits the temporal characteristics of representative
queries for online news or social media monitoring. The algorithm is based on a
novel data structure called the Subgraph Join Tree (SJ-Tree) that leverages the
structural and semantic characteristics of the underlying multi-relational
graph. The paper concludes with extensive experimentation on several real-world
datasets that demonstrates the validity of this approach.Comment: SIGMOD Workshop on Dynamic Networks Management and Mining (DyNetMM),
201
Knowledge-infused and Consistent Complex Event Processing over Real-time and Persistent Streams
Emerging applications in Internet of Things (IoT) and Cyber-Physical Systems
(CPS) present novel challenges to Big Data platforms for performing online
analytics. Ubiquitous sensors from IoT deployments are able to generate data
streams at high velocity, that include information from a variety of domains,
and accumulate to large volumes on disk. Complex Event Processing (CEP) is
recognized as an important real-time computing paradigm for analyzing
continuous data streams. However, existing work on CEP is largely limited to
relational query processing, exposing two distinctive gaps for query
specification and execution: (1) infusing the relational query model with
higher level knowledge semantics, and (2) seamless query evaluation across
temporal spaces that span past, present and future events. These allow
accessible analytics over data streams having properties from different
disciplines, and help span the velocity (real-time) and volume (persistent)
dimensions. In this article, we introduce a Knowledge-infused CEP (X-CEP)
framework that provides domain-aware knowledge query constructs along with
temporal operators that allow end-to-end queries to span across real-time and
persistent streams. We translate this query model to efficient query execution
over online and offline data streams, proposing several optimizations to
mitigate the overheads introduced by evaluating semantic predicates and in
accessing high-volume historic data streams. The proposed X-CEP query model and
execution approaches are implemented in our prototype semantic CEP engine,
SCEPter. We validate our query model using domain-aware CEP queries from a
real-world Smart Power Grid application, and experimentally analyze the
benefits of our optimizations for executing these queries, using event streams
from a campus-microgrid IoT deployment.Comment: 34 pages, 16 figures, accepted in Future Generation Computer Systems,
October 27, 201
Relational Algebra for In-Database Process Mining
The execution logs that are used for process mining in practice are often
obtained by querying an operational database and storing the result in a flat
file. Consequently, the data processing power of the database system cannot be
used anymore for this information, leading to constrained flexibility in the
definition of mining patterns and limited execution performance in mining large
logs. Enabling process mining directly on a database - instead of via
intermediate storage in a flat file - therefore provides additional flexibility
and efficiency. To help facilitate this ideal of in-database process mining,
this paper formally defines a database operator that extracts the 'directly
follows' relation from an operational database. This operator can both be used
to do in-database process mining and to flexibly evaluate process mining
related queries, such as: "which employee most frequently changes the 'amount'
attribute of a case from one task to the next". We define the operator using
the well-known relational algebra that forms the formal underpinning of
relational databases. We formally prove equivalence properties of the operator
that are useful for query optimization and present time-complexity properties
of the operator. By doing so this paper formally defines the necessary
relational algebraic elements of a 'directly follows' operator, which are
required for implementation of such an operator in a DBMS
A Survey on Array Storage, Query Languages, and Systems
Since scientific investigation is one of the most important providers of
massive amounts of ordered data, there is a renewed interest in array data
processing in the context of Big Data. To the best of our knowledge, a unified
resource that summarizes and analyzes array processing research over its long
existence is currently missing. In this survey, we provide a guide for past,
present, and future research in array processing. The survey is organized along
three main topics. Array storage discusses all the aspects related to array
partitioning into chunks. The identification of a reduced set of array
operators to form the foundation for an array query language is analyzed across
multiple such proposals. Lastly, we survey real systems for array processing.
The result is a thorough survey on array data storage and processing that
should be consulted by anyone interested in this research topic, independent of
experience level. The survey is not complete though. We greatly appreciate
pointers towards any work we might have forgotten to mention.Comment: 44 page
Temporal Stream Algebra
Data stream management systems (DSMS) so far focus on
event queries and hardly consider combined queries to both
data from event streams and from a database. However,
applications like emergency management require combined
data stream and database queries. Further requirements are
the simultaneous use of multiple timestamps after different
time lines and semantics, expressive temporal relations between multiple time-stamps and
exible negation, grouping
and aggregation which can be controlled, i. e. started and
stopped, by events and are not limited to fixed-size time
windows. Current DSMS hardly address these requirements.
This article proposes Temporal Stream Algebra (TSA) so
as to meet the afore mentioned requirements. Temporal
streams are a common abstraction of data streams and data-
base relations; the operators of TSA are generalizations of
the usual operators of Relational Algebra. A in-depth 'analysis of temporal relations guarantees that valid TSA expressions are non-blocking, i. e. can be evaluated incrementally.
In this respect TSA differs significantly from previous algebraic approaches which use specialized operators to prevent
blocking expressions on a "syntactical" level
Object-oriented querying of existing relational databases
In this paper, we present algorithms which allow an object-oriented
querying of existing relational databases. Our goal is to provide an improved query
interface for relational systems with better query facilities than SQL. This
seems to be very important since, in real world applications, relational systems
are most commonly used and their dominance will remain in the near future. To
overcome the drawbacks of relational systems, especially the poor query facilities
of SQL, we propose a schema transformation and a query translation algorithm.
The schema transformation algorithm uses additional semantic information to enhance
the relational schema and transform it into a corresponding object-oriented
schema. If the additional semantic information can be deducted from an underlying
entity-relationship design schema, the schema transformation may be done
fully automatically. To query the created object-oriented schema, we use the
Structured Object Query Language (SOQL) which provides declarative query facilities
on objects. SOQL queries using the created object-oriented schema are
much shorter, easier to write and understand and more intuitive than corresponding
S Q L queries leading to an enhanced usability and an improved querying of
the database. The query translation algorithm automatically translates SOQL queries
into equivalent SQL queries for the original relational schema
Shared Arrangements: practical inter-query sharing for streaming dataflows
Current systems for data-parallel, incremental processing and view
maintenance over high-rate streams isolate the execution of independent
queries. This creates unwanted redundancy and overhead in the presence of
concurrent incrementally maintained queries: each query must independently
maintain the same indexed state over the same input streams, and new queries
must build this state from scratch before they can begin to emit their first
results. This paper introduces shared arrangements: indexed views of maintained
state that allow concurrent queries to reuse the same in-memory state without
compromising data-parallel performance and scaling. We implement shared
arrangements in a modern stream processor and show order-of-magnitude
improvements in query response time and resource consumption for interactive
queries against high-throughput streams, while also significantly improving
performance in other domains including business analytics, graph processing,
and program analysis
- …