182 research outputs found
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography
We study maximal identifiability, a measure recently introduced in Boolean
Network Tomography to characterize networks' capability to localize failure
nodes in end-to-end path measurements. We prove tight upper and lower bounds on
the maximal identifiability of failure nodes for specific classes of network
topologies, such as trees and -dimensional grids, in both directed and
undirected cases. We prove that directed -dimensional grids with support
have maximal identifiability using monitors; and in the
undirected case we show that monitors suffice to get identifiability of
. We then study identifiability under embeddings: we establish relations
between maximal identifiability, embeddability and graph dimension when network
topologies are model as DAGs. Our results suggest the design of networks over
nodes with maximal identifiability using
monitors and a heuristic to boost maximal identifiability on a given network by
simulating -dimensional grids. We provide positive evidence of this
heuristic through data extracted by exact computation of maximal
identifiability on examples of small real networks
Automata with Nested Pebbles Capture First-Order Logic with Transitive Closure
String languages recognizable in (deterministic) log-space are characterized
either by two-way (deterministic) multi-head automata, or following Immerman,
by first-order logic with (deterministic) transitive closure. Here we elaborate
this result, and match the number of heads to the arity of the transitive
closure. More precisely, first-order logic with k-ary deterministic transitive
closure has the same power as deterministic automata walking on their input
with k heads, additionally using a finite set of nested pebbles. This result is
valid for strings, ordered trees, and in general for families of graphs having
a fixed automaton that can be used to traverse the nodes of each of the graphs
in the family. Other examples of such families are grids, toruses, and
rectangular mazes. For nondeterministic automata, the logic is restricted to
positive occurrences of transitive closure.
The special case of k=1 for trees, shows that single-head deterministic
tree-walking automata with nested pebbles are characterized by first-order
logic with unary deterministic transitive closure. This refines our earlier
result that placed these automata between first-order and monadic second-order
logic on trees.Comment: Paper for Logical Methods in Computer Science, 27 pages, 1 figur
Path constraints in semistructured data
International audienceWe consider semistructured data as multirooted edge-labelled directed graphs, and path inclusion constraints on these graphs. A path inclusion constraint pnot precedes, equalsq is satisfied by a semistructured data if any node reached by the regular query p is also reached by the regular query q. In this paper, two problems are mainly studied: the implication problem and the problem of the existence of a finite exact model. - We give a new decision algorithm for the implication problem of a constraint pnot precedes, equalsq by a set of bounded path constraints pinot precedes, equalsui where p, q, and the pi's are regular path expressions and the ui's are words, improving in this particular case, the more general algorithms of S. Abiteboul and V. Vianu, and N. Alechina et al. In the case of a set of word equalities ui≡vi, we provide a more efficient decision algorithm for the implication of a word equality u≡v, improving the more general algorithm of P. Buneman et al. We prove that, in this case, implication for nondeterministic models is equivalent to implication for (complete) deterministic ones. - We introduce the notion of exact model: an exact model of a set of path constraints Click to view the MathML source satisfies the constraint pnot precedes, equalsq if and only if this constraint is implied by Click to view the MathML source. We prove that any set of constraints has an exact model and we give a decidable characterization of data which are exact models of bounded path inclusion constraints sets
A Trichotomy for Regular Simple Path Queries on Graphs
Regular path queries (RPQs) select nodes connected by some path in a graph.
The edge labels of such a path have to form a word that matches a given regular
expression. We investigate the evaluation of RPQs with an additional constraint
that prevents multiple traversals of the same nodes. Those regular simple path
queries (RSPQs) find several applications in practice, yet they quickly become
intractable, even for basic languages such as (aa)* or a*ba*.
In this paper, we establish a comprehensive classification of regular
languages with respect to the complexity of the corresponding regular simple
path query problem. More precisely, we identify the fragment that is maximal in
the following sense: regular simple path queries can be evaluated in polynomial
time for every regular language L that belongs to this fragment and evaluation
is NP-complete for languages outside this fragment. We thus fully characterize
the frontier between tractability and intractability for RSPQs, and we refine
our results to show the following trichotomy: Evaluations of RSPQs is either
AC0, NL-complete or NP-complete in data complexity, depending on the regular
language L. The fragment identified also admits a simple characterization in
terms of regular expressions.
Finally, we also discuss the complexity of the following decision problem:
decide, given a language L, whether finding a regular simple path for L is
tractable. We consider several alternative representations of L: DFAs, NFAs or
regular expressions, and prove that this problem is NL-complete for the first
representation and PSPACE-complete for the other two. As a conclusion we extend
our results from edge-labeled graphs to vertex-labeled graphs and vertex-edge
labeled graphs.Comment: 15 pages, conference submissio
Graph Pattern Matching in GQL and SQL/PGQ
As graph databases become widespread, JTC1 -- the committee in joint charge
of information technology standards for the International Organization for
Standardization (ISO), and International Electrotechnical Commission (IEC) --
has approved a project to create GQL, a standard property graph query language.
This complements a project to extend SQL with a new part, SQL/PGQ, which
specifies how to define graph views over an SQL tabular schema, and to run
read-only queries against them.
Both projects have been assigned to the ISO/IEC JTC1 SC32 working group for
Database Languages, WG3, which continues to maintain and enhance SQL as a
whole. This common responsibility helps enforce a policy that the identical
core of both PGQ and GQL is a graph pattern matching sub-language, here termed
GPML.
The WG3 design process is also analyzed by an academic working group, part of
the Linked Data Benchmark Council (LDBC), whose task is to produce a formal
semantics of these graph data languages, which complements their standard
specifications.
This paper, written by members of WG3 and LDBC, presents the key elements of
the GPML of SQL/PGQ and GQL in advance of the publication of these new
standards
MULTIHIERARCHICAL DOCUMENTS AND FINE-GRAINED ACCESS CONTROL
This work presents new models and algorithms for creating, modifying, and controlling access to complex text. The digitization of texts opens new opportunities for preservation, access, and analysis, but at the same time raises questions regarding how to represent and collaboratively edit such texts. Two issues of particular interest are modelling the relationships of markup (annotations) in complex texts, and controlling the creation and modification of those texts. This work addresses and connects these issues, with emphasis on data modelling, algorithms, and computational complexity; and contributes new results in these areas of research.
Although hierarchical models of text and markup are common, complex texts often exhibit layers of overlapping structure that are best described by multihierarchical markup. We develop a new model of multihierarchical markup, the globally ordered GODDAG, that combines features of both graph- and range-based models of markup, allowing documents to be unambiguously serialized. We describe extensions to the XPath query language to support globally ordered GODDAGs, provide semantics for a set of update operations on this structure, and provide algorithms for converting between two different representations of the globally ordered GODDAG.
Managing the collaborative editing of documents can require restricting the types of changes different editors may make, while not altogether restricting their access to the document. Fine-grained access control allows precisely these kinds of restrictions on the operations that a user is or is not permitted to perform on a document. We describe a rule-based model of fine-grained access control for updates of hierarchical documents, and in this context analyze the document generation problem: determining whether a document could have been created without violating a particular access control policy. We show that this problem is undecidable in the general case and provide computational complexity bounds for a number of restricted variants of the problem.
Finally, we extend our fine-grained access control model from hierarchical to multihierarchical documents. We provide semantics for fine-grained access control policies that control splice-in, splice-out, and rename operations on globally ordered GODDAGs, and show that the multihierarchical version of the document generation problem remains undecidable
I/O efficient bisimulation partitioning on very large directed acyclic graphs
In this paper we introduce the first efficient external-memory algorithm to
compute the bisimilarity equivalence classes of a directed acyclic graph (DAG).
DAGs are commonly used to model data in a wide variety of practical
applications, ranging from XML documents and data provenance models, to web
taxonomies and scientific workflows. In the study of efficient reasoning over
massive graphs, the notion of node bisimilarity plays a central role. For
example, grouping together bisimilar nodes in an XML data set is the first step
in many sophisticated approaches to building indexing data structures for
efficient XPath query evaluation. To date, however, only internal-memory
bisimulation algorithms have been investigated. As the size of real-world DAG
data sets often exceeds available main memory, storage in external memory
becomes necessary. Hence, there is a practical need for an efficient approach
to computing bisimulation in external memory.
Our general algorithm has a worst-case IO-complexity of O(Sort(|N| + |E|)),
where |N| and |E| are the numbers of nodes and edges, resp., in the data graph
and Sort(n) is the number of accesses to external memory needed to sort an
input of size n. We also study specializations of this algorithm to common
variations of bisimulation for tree-structured XML data sets. We empirically
verify efficient performance of the algorithms on graphs and XML documents
having billions of nodes and edges, and find that the algorithms can process
such graphs efficiently even when very limited internal memory is available.
The proposed algorithms are simple enough for practical implementation and use,
and open the door for further study of external-memory bisimulation algorithms.
To this end, the full open-source C++ implementation has been made freely
available
Document similarity
In recent years, development of tools and methods for measuring document similarity has become a thriving field in informatics, computer science, and digital humanities. Historically, questions of document similarity have been (and still are) important or even crucial in a large variety of situations. Typically, similarity is judged by criteria which depend on context. The move from traditional to digital text technology has not only provided new possibilities for discovery and measurement of document similarity, it has also posed new challenges. Some of these challenges are technical, others conceptual. This paper argues that a particular, well-established, traditional way of starting with an arbitrary document and constructing a document similar to it, namely transcription, may fruitfully be brought to bear on questions concerning similarity criteria for digital documents. Some simple similarity measures are presented and their application to marked up documents are discussed. We conclude that when documents are encoded in the same vocabulary, n-grams constructed to include markup can be used to recognize structural similarities between documents.publishedVersio
- …