365 research outputs found
Deterministic Automata for Unordered Trees
Automata for unordered unranked trees are relevant for defining schemas and
queries for data trees in Json or Xml format. While the existing notions are
well-investigated concerning expressiveness, they all lack a proper notion of
determinism, which makes it difficult to distinguish subclasses of automata for
which problems such as inclusion, equivalence, and minimization can be solved
efficiently. In this paper, we propose and investigate different notions of
"horizontal determinism", starting from automata for unranked trees in which
the horizontal evaluation is performed by finite state automata. We show that a
restriction to confluent horizontal evaluation leads to polynomial-time
emptiness and universality, but still suffers from coNP-completeness of the
emptiness of binary intersections. Finally, efficient algorithms can be
obtained by imposing an order of horizontal evaluation globally for all
automata in the class. Depending on the choice of the order, we obtain
different classes of automata, each of which has the same expressiveness as
CMso.Comment: In Proceedings GandALF 2014, arXiv:1408.556
Schemas for Unordered XML on a DIME
We investigate schema languages for unordered XML having no relative order
among siblings. First, we propose unordered regular expressions (UREs),
essentially regular expressions with unordered concatenation instead of
standard concatenation, that define languages of unordered words to model the
allowed content of a node (i.e., collections of the labels of children).
However, unrestricted UREs are computationally too expensive as we show the
intractability of two fundamental decision problems for UREs: membership of an
unordered word to the language of a URE and containment of two UREs.
Consequently, we propose a practical and tractable restriction of UREs,
disjunctive interval multiplicity expressions (DIMEs).
Next, we employ DIMEs to define languages of unordered trees and propose two
schema languages: disjunctive interval multiplicity schema (DIMS), and its
restriction, disjunction-free interval multiplicity schema (IMS). We study the
complexity of the following static analysis problems: schema satisfiability,
membership of a tree to the language of a schema, schema containment, as well
as twig query satisfiability, implication, and containment in the presence of
schema. Finally, we study the expressive power of the proposed schema languages
and compare them with yardstick languages of unordered trees (FO, MSO, and
Presburger constraints) and DTDs under commutative closure. Our results show
that the proposed schema languages are capable of expressing many practical
languages of unordered trees and enjoy desirable computational properties.Comment: Theory of Computing System
Semantics and Validation of Shapes Schemas for RDF
We present a formal semantics and proof of soundness for shapes schemas, an
expressive schema language for RDF graphs that is the foundation of Shape
Expressions Language 2.0. It can be used to describe the vocabulary and the
structure of an RDF graph, and to constrain the admissible properties and
values for nodes in that graph. The language defines a typing mechanism called
shapes against which nodes of the graph can be checked. It includes an
algebraic grouping operator, a choice operator and cardinality constraints for
the number of allowed occurrences of a property. Shapes can be combined using
Boolean operators, and can use possibly recursive references to other shapes.
We describe the syntax of the language and define its semantics. The
semantics is proven to be well-defined for schemas that satisfy a reasonable
syntactic restriction, namely stratified use of negation and recursion. We
present two algorithms for the validation of an RDF graph against a shapes
schema. The first algorithm is a direct implementation of the semantics,
whereas the second is a non-trivial improvement. We also briefly give
implementation guidelines
Simple Schemas for Unordered XML
We consider unordered XML, where the relative order among siblings is ignored, and propose two simple yet practical schema formalisms: disjunctive multiplicity schemas (DMS), and its restriction, disjunction-free multiplicity schemas (MS). We investigate their computational properties and characterize the complexity of the following static analysis problems: schema satisfiability, membership of a tree to the language of a schema, schema containment, twig query satisfiability, implication, and containment in the presence of schema. Our research indicates that the proposed formalisms retain much of the expressiveness of DTDs without an increase in computational complexity. 1
A Tree Logic with Graded Paths and Nominals
Regular tree grammars and regular path expressions constitute core constructs
widely used in programming languages and type systems. Nevertheless, there has
been little research so far on reasoning frameworks for path expressions where
node cardinality constraints occur along a path in a tree. We present a logic
capable of expressing deep counting along paths which may include arbitrary
recursive forward and backward navigation. The counting extensions can be seen
as a generalization of graded modalities that count immediate successor nodes.
While the combination of graded modalities, nominals, and inverse modalities
yields undecidable logics over graphs, we show that these features can be
combined in a tree logic decidable in exponential time
Efficient asymmetric inclusion of regular expressions with interleaving and counting for XML type-checking
The inclusion of Regular Expressions (REs) is the kernel of any type-checking algorithm for XML manipulation languages. XML applications would benefit from the extension of REs with interleaving and counting, but this is not feasible in general, since inclusion is EXPSPACE-complete for such extended REs. In Colazzo et al. (2009) [1] we introduced a notion of ?conflict-free REs?, which are extended REs with excellent complexity behaviour, including a polynomial inclusion algorithm [1] and linear membership (Ghelli et al., 2008 [2]). Conflict-free REs have interleaving and counting, but the complexity is tamed by the ?conflict-free? limitations, which have been found to be satisfied by the vast majority of the content models published on the Web.However, a type-checking algorithm needs to compare machine-generated subtypes against human-defined supertypes. The conflict-free restriction, while quite harmless for the human-defined supertype, is far too restrictive for the subtype. We show here that the PTIME inclusion algorithm can be actually extended to deal with totally unrestricted REs with counting and interleaving in the subtype position, provided that the supertype is conflict-free.This is exactly the expressive power that we need in order to use subtyping inside type-checking algorithms, and the cost of this generalized algorithm is only quadratic, which is as good as the best algorithm we have for the symmetric case (see [1]). The result is extremely surprising, since we had previously found that symmetric inclusion becomes NP-hard as soon as the candidate subtype is enriched with binary intersection, a generalization that looked much more innocent than what we achieve here
Containment of Shape Expression Schemas for RDF
We study the problem of containment for shape expression schemas (ShEx) for
RDF graphs. We identify a subclass of ShEx that has a natural graphical
representation in the form of shape graphs and their semantics is captured with
a tractable notion of embedding of an RDF graph in a shape graph. When applied
to pairs of shape graphs, an embedding is a sufficient condition for
containment, and for a practical subclass of deterministic shape graphs, it is
also a necessary one, thus yielding a subclass with tractable containment.
While for general shape graphs a minimal counter-example i.e., an instance
proving non-containment, might be of exponential size, we show that containment
is EXP-hard and in coNEXP. Finally, we show that containment for arbitrary ShEx
is coNEXP-hard and in coTwoNEXP^NP
The Complexity of SORE-definability Problems
Single occurrence regular expressions (SORE) are a special kind of deterministic regular expressions, which are extensively used in the schema languages DTD and XSD for XML documents. In this paper, with motivations from the simplification of XML schemas, we consider the SORE-definability problem: Given a regular expression, decide whether it has an equivalent SORE. We investigate extensively the complexity of the SORE-definability problem: We consider both (standard) regular expressions and regular expressions with counting, and distinguish between the alphabets of size at least two and unary alphabets. In all cases, we obtain tight complexity bounds. In addition, we consider another variant of this problem, the bounded SORE-definability problem, which is to decide, given a regular expression E and a number M (encoded in unary or binary), whether there is an SORE, which is equivalent to E on the set of words of length at most M. We show that in several cases, there is an exponential decrease in the complexity when switching from the SORE-definability problem to its bounded variant
- …