4,893 research outputs found
A Distributed Path Query Engine for Temporal Property Graphs
Property graphs are a common form of linked data, with path queries used to
traverse and explore them for enterprise transactions and mining. Temporal
property graphs are a recent variant where time is a first-class entity to be
queried over, and their properties and structure vary over time. These are seen
in social, telecom, transit and epidemic networks. However, current graph
databases and query engines have limited support for temporal relations among
graph entities, no support for time-varying entities and/or do not scale on
distributed resources. We address this gap by extending a linear path query
model over property graphs to include intuitive temporal predicates and
aggregation operators over temporal graphs. We design a distributed execution
model for these temporal path queries using the interval-centric computing
model, and develop a novel cost model to select an efficient execution plan
from several. We perform detailed experiments of our Granite distributed query
engine using both static and dynamic temporal property graphs as large as 52M
vertices, 218M edges and 325M properties, and a 1600-query workload, derived
from the LDBC benchmark. We often offer sub-second query latencies on a
commodity cluster, which is 149x-1140x faster compared to industry-leading
Neo4J shared-memory graph database and the JanusGraph / Spark distributed graph
query engine. Granite also completes 100% of the queries for all graphs,
compared to only 32-92% workload completion by the baseline systems. Further,
our cost model selects a query plan that is within 10% of the optimal execution
time in 90% of the cases. Despite the irregular nature of graph processing, we
exhibit a weak-scaling efficiency >= 60% on 8 nodes and >= 40% on 16 nodes, for
most query workloads.Comment: An extended version of the paper that appears in IEEE/ACM
International Symposium on Cluster, Cloud and Internet Computing (CCGrid),
202
NOUS: Construction and Querying of Dynamic Knowledge Graphs
The ability to construct domain specific knowledge graphs (KG) and perform
question-answering or hypothesis generation is a transformative capability.
Despite their value, automated construction of knowledge graphs remains an
expensive technical challenge that is beyond the reach for most enterprises and
academic institutions. We propose an end-to-end framework for developing custom
knowledge graph driven analytics for arbitrary application domains. The
uniqueness of our system lies A) in its combination of curated KGs along with
knowledge extracted from unstructured text, B) support for advanced trending
and explanatory questions on a dynamic KG, and C) the ability to answer queries
where the answer is embedded across multiple data sources.Comment: Codebase: https://github.com/streaming-graphs/NOU
Shared Arrangements: practical inter-query sharing for streaming dataflows
Current systems for data-parallel, incremental processing and view
maintenance over high-rate streams isolate the execution of independent
queries. This creates unwanted redundancy and overhead in the presence of
concurrent incrementally maintained queries: each query must independently
maintain the same indexed state over the same input streams, and new queries
must build this state from scratch before they can begin to emit their first
results. This paper introduces shared arrangements: indexed views of maintained
state that allow concurrent queries to reuse the same in-memory state without
compromising data-parallel performance and scaling. We implement shared
arrangements in a modern stream processor and show order-of-magnitude
improvements in query response time and resource consumption for interactive
queries against high-throughput streams, while also significantly improving
performance in other domains including business analytics, graph processing,
and program analysis
- …