11 research outputs found
Formalising openCypher Graph Queries in Relational Algebra
Graph database systems are increasingly adapted for storing and processing
heterogeneous network-like datasets. However, due to the novelty of such
systems, no standard data model or query language has yet emerged.
Consequently, migrating datasets or applications even between related
technologies often requires a large amount of manual work or ad-hoc solutions,
thus subjecting the users to the possibility of vendor lock-in. To avoid this
threat, vendors are working on supporting existing standard languages (e.g.
SQL) or creating standardised languages.
In this paper, we present a formal specification for openCypher, a high-level
declarative graph query language with an ongoing standardisation effort. We
introduce relational graph algebra, which extends relational operators by
adapting graph-specific operators and define a mapping from core openCypher
constructs to this algebra. We propose an algorithm that allows systematic
compilation of openCypher queries.Comment: ADBIS conference (21st European Conference on Advances in Databases
and Information Systems) The final publication is available at Springer via
https://doi.org/10.1007/978-3-319-66917-5_1
Model-driven engineering of an openCypher engine: using graph queries to compile graph queries
Graph database systems are increasingly adapted for storing and processing heterogeneous network-like datasets. Many challenging applications with near real-time requirements - such as financial fraud detection, on-the-fly model validation and root cause analysis - can be formalised as graph problems and tackled with graph databases efficiently. However, as no standard graph query language has yet emerged, users are subjected to the possibility of vendor lock-in.
The openCypher group aims to define an open specification for a declarative graph query language. However, creating an openCypher-compatible query engine requires significant research and engineering efforts. Meanwhile, model-driven language workbenches support the creation of domain-specific languages by providing high-level tools to create parsers, editors and compilers. In this paper, we present an approach to build a compiler and optimizer for openCypher using model-driven technologies, which allows developers to define declarative optimization rules
Incremental View Maintenance for Property Graph Queries
This paper discusses the challenges of incremental view maintenance for
property graph queries. We select a subset of property graph queries and
present an approach that uses nested relational algebra to allow incremental
evaluation
An exploration of graph algorithms and graph databases
With data becoming larger in quantity, the need for complex, efficient algorithms to solve computationally complex problems has become greater. In this thesis we evaluate a selection of graph algorithms; we provide a novel algorithm for solving and approximating the Longest Simple Cycle problem, as well as providing novel implementations of other graph algorithms in graph database systems.The first area of exploration is finding the Longest Simple Cycle in a graph problem. We propose two methods of finding the longest simple cycle. The first method is an exact approach based on a flow-based Integer Linear Program. The second is a multi-start local search heuristic which uses a simple depth-first search as a basis for a cycle, and improves this with four perturbation operators.Secondly, we focus on implementing the Minimum Dominating Set problem into graph database systems. An unoptimised greedy heuristic solution to the Minimum Dominating Set problem is implemented into a client-server system using a declarative query language, an embedded database system using an imperative query language and a high level language as a direct comparison. The performance of the graph back-end on the database systems is evaluated. The language expressiveness of the query languages is also explored. We identify limitations of the query methods of the database system, and propose a function that increases the functionality of the queries
Rel2Graph: Automated Mapping From Relational Databases to a Unified Property Knowledge Graph
Although a few approaches are proposed to convert relational databases to
graphs, there is a genuine lack of systematic evaluation across a wider
spectrum of databases. Recognising the important issue of query mapping, this
paper proposes an approach Rel2Graph, an automatic knowledge graph construction
(KGC) approach from an arbitrary number of relational databases. Our approach
also supports the mapping of conjunctive SQL queries into pattern-based NoSQL
queries. We evaluate our proposed approach on two widely used relational
database-oriented datasets: Spider and KaggleDBQA benchmarks for semantic
parsing. We employ the execution accuracy (EA) metric to quantify the
proportion of results by executing the NoSQL queries on the property knowledge
graph we construct that aligns with the results of SQL queries performed on
relational databases. Consequently, the counterpart property knowledge graph of
benchmarks with high accuracy and integrity can be ensured. The code and data
will be publicly available. The code and data are available at
github\footnote{https://github.com/nlp-tlp/Rel2Graph}
Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries
Graph processing has become an important part of multiple areas of computer
science, such as machine learning, computational sciences, medical
applications, social network analysis, and many others. Numerous graphs such as
web or social networks may contain up to trillions of edges. Often, these
graphs are also dynamic (their structure changes over time) and have
domain-specific rich data associated with vertices and edges. Graph database
systems such as Neo4j enable storing, processing, and analyzing such large,
evolving, and rich datasets. Due to the sheer size of such datasets, combined
with the irregular nature of graph processing, these systems face unique design
challenges. To facilitate the understanding of this emerging domain, we present
the first survey and taxonomy of graph database systems. We focus on
identifying and analyzing fundamental categories of these systems (e.g., triple
stores, tuple stores, native graph database systems, or object-oriented
systems), the associated graph models (e.g., RDF or Labeled Property Graph),
data organization techniques (e.g., storing graph data in indexing structures
or dividing data into records), and different aspects of data distribution and
query execution (e.g., support for sharding and ACID). 51 graph database
systems are presented and compared, including Neo4j, OrientDB, or Virtuoso. We
outline graph database queries and relationships with associated domains (NoSQL
stores, graph streaming, and dynamic graph algorithms). Finally, we describe
research and engineering challenges to outline the future of graph databases
Adaptive Management of Multimodel Data and Heterogeneous Workloads
Data management systems are facing a growing demand for a tighter integration of heterogeneous data from different applications and sources for both operational and analytical purposes in real-time. However, the vast diversification of the data management landscape has led to a situation where there is a trade-off between high operational performance and a tight integration of data. The difference between the growth of data volume and the growth of computational power demands a new approach for managing multimodel data and handling heterogeneous workloads.
With PolyDBMS we present a novel class of database management systems, bridging the gap between multimodel database and polystore systems. This new kind of database system combines the operational capabilities of traditional database systems with the flexibility of polystore systems. This includes support for data modifications, transactions, and schema changes at runtime. With native support for multiple data models and query languages, a PolyDBMS presents a holistic solution for the management of heterogeneous data. This does not only enable a tight integration of data across different applications, it also allows a more efficient usage of resources. By leveraging and combining highly optimized database systems as storage and execution engines, this novel class of database system takes advantage of decades of database systems research and development.
In this thesis, we present the conceptual foundations and models for building a PolyDBMS. This includes a holistic model for maintaining and querying multiple data models in one logical schema that enables cross-model queries. With the PolyAlgebra, we present a solution for representing queries based on one or multiple data models while preserving their semantics. Furthermore, we introduce a concept for the adaptive planning and decomposition of queries across heterogeneous database systems with different capabilities and features.
The conceptual contributions presented in this thesis materialize in Polypheny-DB, the first implementation of a PolyDBMS. Supporting the relational, document, and labeled property graph data model, Polypheny-DB is a suitable solution for structured, semi-structured, and unstructured data. This is complemented by an extensive type system that includes support for binary large objects. With support for multiple query languages, industry standard query interfaces, and a rich set of domain-specific data stores and data sources, Polypheny-DB offers a flexibility unmatched by existing data management solutions
Towards FAIRer Biological Knowledge Networks Using a Hybrid Linked Data and Graph Database Approach
The speed and accuracy of new scientific discoveries – be it by humans or artificial intelligence – depends on the quality of the underlying data and on the technology to connect, search and share the data efficiently. In recent years, we have seen the rise of graph databases and semi-formal data models such as knowledge graphs to facilitate software approaches to scientific discovery. These approaches extend work based on formalised models, such as the Semantic Web. In this paper, we present our developments to connect, search and share data about genome-scale knowledge networks (GSKN). We have developed a simple application ontology based on OWL/RDF with mappings to standard schemas. We are employing the ontology to power data access services like resolvable URIs, SPARQL endpoints, JSON-LD web APIs and Neo4j-based knowledge graphs. We demonstrate how the proposed ontology and graph databases considerably improve search and access to interoperable and reusable biological knowledge (i.e. the FAIRness data principles)
Pseudo-contractions as Gentle Repairs
Updating a knowledge base to remove an unwanted consequence is a challenging task. Some of the original sentences must be either deleted or weakened in such a way that the sentence to be removed is no longer entailed by the resulting set. On the other hand, it is desirable that the existing knowledge be preserved as much as possible, minimising the loss of information. Several approaches to this problem can be found in the literature. In particular, when the knowledge is represented by an ontology, two different families of frameworks have been developed in the literature in the past decades with numerous ideas in common but with little interaction between the communities: applications of AGM-like Belief Change and justification-based Ontology Repair. In this paper, we investigate the relationship between pseudo-contraction operations and gentle repairs. Both aim to avoid the complete deletion of sentences when replacing them with weaker versions is enough to prevent the entailment of the unwanted formula. We show the correspondence between concepts on both sides and investigate under which conditions they are equivalent. Furthermore, we propose a unified notation for the two approaches, which might contribute to the integration of the two areas