47,937 research outputs found
A Distributed Path Query Engine for Temporal Property Graphs
Property graphs are a common form of linked data, with path queries used to
traverse and explore them for enterprise transactions and mining. Temporal
property graphs are a recent variant where time is a first-class entity to be
queried over, and their properties and structure vary over time. These are seen
in social, telecom, transit and epidemic networks. However, current graph
databases and query engines have limited support for temporal relations among
graph entities, no support for time-varying entities and/or do not scale on
distributed resources. We address this gap by extending a linear path query
model over property graphs to include intuitive temporal predicates and
aggregation operators over temporal graphs. We design a distributed execution
model for these temporal path queries using the interval-centric computing
model, and develop a novel cost model to select an efficient execution plan
from several. We perform detailed experiments of our Granite distributed query
engine using both static and dynamic temporal property graphs as large as 52M
vertices, 218M edges and 325M properties, and a 1600-query workload, derived
from the LDBC benchmark. We often offer sub-second query latencies on a
commodity cluster, which is 149x-1140x faster compared to industry-leading
Neo4J shared-memory graph database and the JanusGraph / Spark distributed graph
query engine. Granite also completes 100% of the queries for all graphs,
compared to only 32-92% workload completion by the baseline systems. Further,
our cost model selects a query plan that is within 10% of the optimal execution
time in 90% of the cases. Despite the irregular nature of graph processing, we
exhibit a weak-scaling efficiency >= 60% on 8 nodes and >= 40% on 16 nodes, for
most query workloads.Comment: An extended version of the paper that appears in IEEE/ACM
International Symposium on Cluster, Cloud and Internet Computing (CCGrid),
202
Curracurrong: a stream processing system for distributed environments
Advances in technology have given rise to applications that are deployed on wireless sensor networks (WSNs), the cloud, and the Internet of things. There are many emerging applications, some of which include sensor-based monitoring, web traffic processing, and network monitoring. These applications collect large amount of data as an unbounded sequence of events and process them to generate a new sequences of events. Such applications need an adequate programming model that can process large amount of data with minimal latency; for this purpose, stream programming, among other paradigms, is ideal. However, stream programming needs to be adapted to meet the challenges inherent in running it in distributed environments. These challenges include the need for modern domain specific language (DSL), the placement of computations in the network to minimise energy costs, and timeliness in real-time applications. To overcome these challenges we developed a stream programming model that achieves easy-to-use programming interface, energy-efficient actor placement, and timeliness. This thesis presents Curracurrong, a stream data processing system for distributed environments. In Curracurrong, a query is represented as a stream graph of stream operators and communication channels. Curracurrong provides an extensible stream operator library and adapts to a wide range of applications. It uses an energy-efficient placement algorithm that optimises communication and computation. We extend the placement problem to support dynamically changing networks, and develop a dynamic program with polynomially bounded runtime to solve the placement problem. In many stream-based applications, real-time data processing is essential. We propose an approach that measures time delays in stream query processing; this model measures the total computational time from input to output of a query, i.e., end-to-end delay
Curracurrong: a stream processing system for distributed environments
Advances in technology have given rise to applications that are deployed on wireless sensor networks (WSNs), the cloud, and the Internet of things. There are many emerging applications, some of which include sensor-based monitoring, web traffic processing, and network monitoring. These applications collect large amount of data as an unbounded sequence of events and process them to generate a new sequences of events. Such applications need an adequate programming model that can process large amount of data with minimal latency; for this purpose, stream programming, among other paradigms, is ideal. However, stream programming needs to be adapted to meet the challenges inherent in running it in distributed environments. These challenges include the need for modern domain specific language (DSL), the placement of computations in the network to minimise energy costs, and timeliness in real-time applications. To overcome these challenges we developed a stream programming model that achieves easy-to-use programming interface, energy-efficient actor placement, and timeliness. This thesis presents Curracurrong, a stream data processing system for distributed environments. In Curracurrong, a query is represented as a stream graph of stream operators and communication channels. Curracurrong provides an extensible stream operator library and adapts to a wide range of applications. It uses an energy-efficient placement algorithm that optimises communication and computation. We extend the placement problem to support dynamically changing networks, and develop a dynamic program with polynomially bounded runtime to solve the placement problem. In many stream-based applications, real-time data processing is essential. We propose an approach that measures time delays in stream query processing; this model measures the total computational time from input to output of a query, i.e., end-to-end delay
A Selectivity based approach to Continuous Pattern Detection in Streaming Graphs
Cyber security is one of the most significant technical challenges in current
times. Detecting adversarial activities, prevention of theft of intellectual
properties and customer data is a high priority for corporations and government
agencies around the world. Cyber defenders need to analyze massive-scale,
high-resolution network flows to identify, categorize, and mitigate attacks
involving networks spanning institutional and national boundaries. Many of the
cyber attacks can be described as subgraph patterns, with prominent examples
being insider infiltrations (path queries), denial of service (parallel paths)
and malicious spreads (tree queries). This motivates us to explore subgraph
matching on streaming graphs in a continuous setting. The novelty of our work
lies in using the subgraph distributional statistics collected from the
streaming graph to determine the query processing strategy. We introduce a
"Lazy Search" algorithm where the search strategy is decided on a
vertex-to-vertex basis depending on the likelihood of a match in the vertex
neighborhood. We also propose a metric named "Relative Selectivity" that is
used to select between different query processing strategies. Our experiments
performed on real online news, network traffic stream and a synthetic social
network benchmark demonstrate 10-100x speedups over selectivity agnostic
approaches.Comment: in 18th International Conference on Extending Database Technology
(EDBT) (2015
- …