182 research outputs found

    Tight Bounds for Maximal Identifiability of Failure Nodes in Boolean Network Tomography

    Full text link
    We study maximal identifiability, a measure recently introduced in Boolean Network Tomography to characterize networks' capability to localize failure nodes in end-to-end path measurements. We prove tight upper and lower bounds on the maximal identifiability of failure nodes for specific classes of network topologies, such as trees and dd-dimensional grids, in both directed and undirected cases. We prove that directed dd-dimensional grids with support nn have maximal identifiability dd using 2d(n1)+22d(n-1)+2 monitors; and in the undirected case we show that 2d2d monitors suffice to get identifiability of d1d-1. We then study identifiability under embeddings: we establish relations between maximal identifiability, embeddability and graph dimension when network topologies are model as DAGs. Our results suggest the design of networks over NN nodes with maximal identifiability Ω(logN)\Omega(\log N) using O(logN)O(\log N) monitors and a heuristic to boost maximal identifiability on a given network by simulating dd-dimensional grids. We provide positive evidence of this heuristic through data extracted by exact computation of maximal identifiability on examples of small real networks

    Automata with Nested Pebbles Capture First-Order Logic with Transitive Closure

    Get PDF
    String languages recognizable in (deterministic) log-space are characterized either by two-way (deterministic) multi-head automata, or following Immerman, by first-order logic with (deterministic) transitive closure. Here we elaborate this result, and match the number of heads to the arity of the transitive closure. More precisely, first-order logic with k-ary deterministic transitive closure has the same power as deterministic automata walking on their input with k heads, additionally using a finite set of nested pebbles. This result is valid for strings, ordered trees, and in general for families of graphs having a fixed automaton that can be used to traverse the nodes of each of the graphs in the family. Other examples of such families are grids, toruses, and rectangular mazes. For nondeterministic automata, the logic is restricted to positive occurrences of transitive closure. The special case of k=1 for trees, shows that single-head deterministic tree-walking automata with nested pebbles are characterized by first-order logic with unary deterministic transitive closure. This refines our earlier result that placed these automata between first-order and monadic second-order logic on trees.Comment: Paper for Logical Methods in Computer Science, 27 pages, 1 figur

    Path constraints in semistructured data

    Get PDF
    International audienceWe consider semistructured data as multirooted edge-labelled directed graphs, and path inclusion constraints on these graphs. A path inclusion constraint pnot precedes, equalsq is satisfied by a semistructured data if any node reached by the regular query p is also reached by the regular query q. In this paper, two problems are mainly studied: the implication problem and the problem of the existence of a finite exact model. - We give a new decision algorithm for the implication problem of a constraint pnot precedes, equalsq by a set of bounded path constraints pinot precedes, equalsui where p, q, and the pi's are regular path expressions and the ui's are words, improving in this particular case, the more general algorithms of S. Abiteboul and V. Vianu, and N. Alechina et al. In the case of a set of word equalities ui≡vi, we provide a more efficient decision algorithm for the implication of a word equality u≡v, improving the more general algorithm of P. Buneman et al. We prove that, in this case, implication for nondeterministic models is equivalent to implication for (complete) deterministic ones. - We introduce the notion of exact model: an exact model of a set of path constraints Click to view the MathML source satisfies the constraint pnot precedes, equalsq if and only if this constraint is implied by Click to view the MathML source. We prove that any set of constraints has an exact model and we give a decidable characterization of data which are exact models of bounded path inclusion constraints sets

    A Trichotomy for Regular Simple Path Queries on Graphs

    Full text link
    Regular path queries (RPQs) select nodes connected by some path in a graph. The edge labels of such a path have to form a word that matches a given regular expression. We investigate the evaluation of RPQs with an additional constraint that prevents multiple traversals of the same nodes. Those regular simple path queries (RSPQs) find several applications in practice, yet they quickly become intractable, even for basic languages such as (aa)* or a*ba*. In this paper, we establish a comprehensive classification of regular languages with respect to the complexity of the corresponding regular simple path query problem. More precisely, we identify the fragment that is maximal in the following sense: regular simple path queries can be evaluated in polynomial time for every regular language L that belongs to this fragment and evaluation is NP-complete for languages outside this fragment. We thus fully characterize the frontier between tractability and intractability for RSPQs, and we refine our results to show the following trichotomy: Evaluations of RSPQs is either AC0, NL-complete or NP-complete in data complexity, depending on the regular language L. The fragment identified also admits a simple characterization in terms of regular expressions. Finally, we also discuss the complexity of the following decision problem: decide, given a language L, whether finding a regular simple path for L is tractable. We consider several alternative representations of L: DFAs, NFAs or regular expressions, and prove that this problem is NL-complete for the first representation and PSPACE-complete for the other two. As a conclusion we extend our results from edge-labeled graphs to vertex-labeled graphs and vertex-edge labeled graphs.Comment: 15 pages, conference submissio

    Graph Pattern Matching in GQL and SQL/PGQ

    Get PDF
    As graph databases become widespread, JTC1 -- the committee in joint charge of information technology standards for the International Organization for Standardization (ISO), and International Electrotechnical Commission (IEC) -- has approved a project to create GQL, a standard property graph query language. This complements a project to extend SQL with a new part, SQL/PGQ, which specifies how to define graph views over an SQL tabular schema, and to run read-only queries against them. Both projects have been assigned to the ISO/IEC JTC1 SC32 working group for Database Languages, WG3, which continues to maintain and enhance SQL as a whole. This common responsibility helps enforce a policy that the identical core of both PGQ and GQL is a graph pattern matching sub-language, here termed GPML. The WG3 design process is also analyzed by an academic working group, part of the Linked Data Benchmark Council (LDBC), whose task is to produce a formal semantics of these graph data languages, which complements their standard specifications. This paper, written by members of WG3 and LDBC, presents the key elements of the GPML of SQL/PGQ and GQL in advance of the publication of these new standards


    Get PDF
    This work presents new models and algorithms for creating, modifying, and controlling access to complex text. The digitization of texts opens new opportunities for preservation, access, and analysis, but at the same time raises questions regarding how to represent and collaboratively edit such texts. Two issues of particular interest are modelling the relationships of markup (annotations) in complex texts, and controlling the creation and modification of those texts. This work addresses and connects these issues, with emphasis on data modelling, algorithms, and computational complexity; and contributes new results in these areas of research. Although hierarchical models of text and markup are common, complex texts often exhibit layers of overlapping structure that are best described by multihierarchical markup. We develop a new model of multihierarchical markup, the globally ordered GODDAG, that combines features of both graph- and range-based models of markup, allowing documents to be unambiguously serialized. We describe extensions to the XPath query language to support globally ordered GODDAGs, provide semantics for a set of update operations on this structure, and provide algorithms for converting between two different representations of the globally ordered GODDAG. Managing the collaborative editing of documents can require restricting the types of changes different editors may make, while not altogether restricting their access to the document. Fine-grained access control allows precisely these kinds of restrictions on the operations that a user is or is not permitted to perform on a document. We describe a rule-based model of fine-grained access control for updates of hierarchical documents, and in this context analyze the document generation problem: determining whether a document could have been created without violating a particular access control policy. We show that this problem is undecidable in the general case and provide computational complexity bounds for a number of restricted variants of the problem. Finally, we extend our fine-grained access control model from hierarchical to multihierarchical documents. We provide semantics for fine-grained access control policies that control splice-in, splice-out, and rename operations on globally ordered GODDAGs, and show that the multihierarchical version of the document generation problem remains undecidable

    I/O efficient bisimulation partitioning on very large directed acyclic graphs

    Get PDF
    In this paper we introduce the first efficient external-memory algorithm to compute the bisimilarity equivalence classes of a directed acyclic graph (DAG). DAGs are commonly used to model data in a wide variety of practical applications, ranging from XML documents and data provenance models, to web taxonomies and scientific workflows. In the study of efficient reasoning over massive graphs, the notion of node bisimilarity plays a central role. For example, grouping together bisimilar nodes in an XML data set is the first step in many sophisticated approaches to building indexing data structures for efficient XPath query evaluation. To date, however, only internal-memory bisimulation algorithms have been investigated. As the size of real-world DAG data sets often exceeds available main memory, storage in external memory becomes necessary. Hence, there is a practical need for an efficient approach to computing bisimulation in external memory. Our general algorithm has a worst-case IO-complexity of O(Sort(|N| + |E|)), where |N| and |E| are the numbers of nodes and edges, resp., in the data graph and Sort(n) is the number of accesses to external memory needed to sort an input of size n. We also study specializations of this algorithm to common variations of bisimulation for tree-structured XML data sets. We empirically verify efficient performance of the algorithms on graphs and XML documents having billions of nodes and edges, and find that the algorithms can process such graphs efficiently even when very limited internal memory is available. The proposed algorithms are simple enough for practical implementation and use, and open the door for further study of external-memory bisimulation algorithms. To this end, the full open-source C++ implementation has been made freely available

    Document similarity

    Get PDF
    In recent years, development of tools and methods for measuring document similarity has become a thriving field in informatics, computer science, and digital humanities. Historically, questions of document similarity have been (and still are) important or even crucial in a large variety of situations. Typically, similarity is judged by criteria which depend on context. The move from traditional to digital text technology has not only provided new possibilities for discovery and measurement of document similarity, it has also posed new challenges. Some of these challenges are technical, others conceptual. This paper argues that a particular, well-established, traditional way of starting with an arbitrary document and constructing a document similar to it, namely transcription, may fruitfully be brought to bear on questions concerning similarity criteria for digital documents. Some simple similarity measures are presented and their application to marked up documents are discussed. We conclude that when documents are encoded in the same vocabulary, n-grams constructed to include markup can be used to recognize structural similarities between documents.publishedVersio