21 research outputs found
Mining Measured Information from Text
We present an approach to extract measured information from text (e.g., a
1370 degrees C melting point, a BMI greater than 29.9 kg/m^2 ). Such
extractions are critically important across a wide range of domains -
especially those involving search and exploration of scientific and technical
documents. We first propose a rule-based entity extractor to mine measured
quantities (i.e., a numeric value paired with a measurement unit), which
supports a vast and comprehensive set of both common and obscure measurement
units. Our method is highly robust and can correctly recover valid measured
quantities even when significant errors are introduced through the process of
converting document formats like PDF to plain text. Next, we describe an
approach to extracting the properties being measured (e.g., the property "pixel
pitch" in the phrase "a pixel pitch as high as 352 {\mu}m"). Finally, we
present MQSearch: the realization of a search engine with full support for
measured information.Comment: 4 pages; 38th International ACM SIGIR Conference on Research and
Development in Information Retrieval (SIGIR '15
Deterministic Automata for Unordered Trees
Automata for unordered unranked trees are relevant for defining schemas and
queries for data trees in Json or Xml format. While the existing notions are
well-investigated concerning expressiveness, they all lack a proper notion of
determinism, which makes it difficult to distinguish subclasses of automata for
which problems such as inclusion, equivalence, and minimization can be solved
efficiently. In this paper, we propose and investigate different notions of
"horizontal determinism", starting from automata for unranked trees in which
the horizontal evaluation is performed by finite state automata. We show that a
restriction to confluent horizontal evaluation leads to polynomial-time
emptiness and universality, but still suffers from coNP-completeness of the
emptiness of binary intersections. Finally, efficient algorithms can be
obtained by imposing an order of horizontal evaluation globally for all
automata in the class. Depending on the choice of the order, we obtain
different classes of automata, each of which has the same expressiveness as
CMso.Comment: In Proceedings GandALF 2014, arXiv:1408.556
Global Numerical Constraints on Trees
We introduce a logical foundation to reason on tree structures with
constraints on the number of node occurrences. Related formalisms are limited
to express occurrence constraints on particular tree regions, as for instance
the children of a given node. By contrast, the logic introduced in the present
work can concisely express numerical bounds on any region, descendants or
ancestors for instance. We prove that the logic is decidable in single
exponential time even if the numerical constraints are in binary form. We also
illustrate the usage of the logic in the description of numerical constraints
on multi-directional path queries on XML documents. Furthermore, numerical
restrictions on regular languages (XML schemas) can also be concisely described
by the logic. This implies a characterization of decidable counting extensions
of XPath queries and XML schemas. Moreover, as the logic is closed under
negation, it can thus be used as an optimal reasoning framework for testing
emptiness, containment and equivalence
Logics for Unranked Trees: An Overview
Labeled unranked trees are used as a model of XML documents, and logical
languages for them have been studied actively over the past several years. Such
logics have different purposes: some are better suited for extracting data,
some for expressing navigational properties, and some make it easy to relate
complex properties of trees to the existence of tree automata for those
properties. Furthermore, logics differ significantly in their model-checking
properties, their automata models, and their behavior on ordered and unordered
trees. In this paper we present a survey of logics for unranked trees
Decidable Classes of Tree Automata Mixing Local and Global Constraints Modulo Flat Theories
We define a class of ranked tree automata TABG generalizing both the tree
automata with local tests between brothers of Bogaert and Tison (1992) and with
global equality and disequality constraints (TAGED) of Filiot et al. (2007).
TABG can test for equality and disequality modulo a given flat equational
theory between brother subterms and between subterms whose positions are
defined by the states reached during a computation. In particular, TABG can
check that all the subterms reaching a given state are distinct. This
constraint is related to monadic key constraints for XML documents, meaning
that every two distinct positions of a given type have different values. We
prove decidability of the emptiness problem for TABG. This solves, in
particular, the open question of the decidability of emptiness for TAGED. We
further extend our result by allowing global arithmetic constraints for
counting the number of occurrences of some state or the number of different
equivalence classes of subterms (modulo a given flat equational theory)
reaching some state during a computation. We also adapt the model to unranked
ordered terms. As a consequence of our results for TABG, we prove the
decidability of a fragment of the monadic second order logic on trees extended
with predicates for equality and disequality between subtrees, and cardinality.Comment: 39 pages, to appear in LMCS journa
Regular hedge model checking
We extend the regular model checking framework so that it can handle systems with arbitrary width tree-like structures. Con gurations of a system are represented by trees of arbitrary arities, sets of con gurations are represented by regular hedge automata, and the dynamics of a system is modeled by a regular hedge transducer. We consider the problem of computing the transitive closure T + of a regular hedge transducer T. This construction is not possible in general.
Therefore, we present a general acceleration technique for computing T+. Our method consists of enhancing the termination of the iterative computation of the different compositions Ti by merging the states of the hedge transducers according to an appropriate equivalence relation that preserves the traces of the transducers. We provide a methodology for effectively deriving equivalence relations that are appropriate. We have successfully applied our technique to compute transitive closures for some mutual exclusion protocols de ned on arbitrary width tree topologies, as well as for an XML application.4th IFIP International Conference on Theoretical Computer ScienceRed de Universidades con Carreras en Informática (RedUNCI
Tree Automata with Global Constraints for Infinite Trees
We study an extension of tree automata on infinite trees with global equality and disequality constraints. These constraints can enforce that all subtrees for which in the accepting run a state q is reached (at the root of that subtree) are identical, or that these trees differ from the subtrees at which a state q\u27 is reached. We consider the closure properties of this model and its decision problems. While the emptiness problem for the general model remains open, we show the decidability of the emptiness problem for the case that the given automaton only uses equality constraints
Containment of Shape Expression Schemas for RDF
We study the problem of containment for shape expression schemas (ShEx) for
RDF graphs. We identify a subclass of ShEx that has a natural graphical
representation in the form of shape graphs and their semantics is captured with
a tractable notion of embedding of an RDF graph in a shape graph. When applied
to pairs of shape graphs, an embedding is a sufficient condition for
containment, and for a practical subclass of deterministic shape graphs, it is
also a necessary one, thus yielding a subclass with tractable containment.
While for general shape graphs a minimal counter-example i.e., an instance
proving non-containment, might be of exponential size, we show that containment
is EXP-hard and in coNEXP. Finally, we show that containment for arbitrary ShEx
is coNEXP-hard and in coTwoNEXP^NP
Logics for Unordered Trees with Data Constraints on Siblings
International audienceWe study counting monadic second-order logics (CMso) for unordered data trees. Our objective is to enhance this logic with data constraints for comparing string data values attached to sibling edges of a data tree. We show that CMso satisfiability becomes undecidable when adding data constraints between siblings that can check the equality of factors of data values. For more restricted data constraints that can only check the equality of prefixes, we show that it becomes decidable, and propose a related automaton model with good complexities. This restricted logic is relevant to applications such as checking well-formedness properties of semi-structured databases and file trees. Our decidability results are obtained by compilation of CMso to automata for unordered trees, where both are enhanced with data constraints in a novel manner
Automata for Unordered Trees
International audienceWe present a framework for defining automata for unordereddata trees that is parametrized by the way in which multisets of children nodes are described. Presburger tree automata and alternatingPresburger tree automata are particular instances. We establish the usual equivalence in expressiveness of tree automata and MSO for the automata defined inour framework.We then investigate subclasses of automata for unordered treesfor which testing language equivalence is in P-time. For this we start from automata in our framework that describe multisets of childrenby finite automata, and propose two approaches of how todo this deterministically. We show that a restriction to confluent horizontal evaluation leads to polynomial-time emptiness and universality, but still suffers fromcoNP-completeness of the emptiness of binary intersections. Finally, efficient algorithms can be obtained by imposing an order of horizontal evaluation globally for all automata in the class. Depending onthe choice of the order, we obtain different classes of automata, eachof which has the same expressiveness as Counting MSO