21 research outputs found

    Mining Measured Information from Text

    Full text link
    We present an approach to extract measured information from text (e.g., a 1370 degrees C melting point, a BMI greater than 29.9 kg/m^2 ). Such extractions are critically important across a wide range of domains - especially those involving search and exploration of scientific and technical documents. We first propose a rule-based entity extractor to mine measured quantities (i.e., a numeric value paired with a measurement unit), which supports a vast and comprehensive set of both common and obscure measurement units. Our method is highly robust and can correctly recover valid measured quantities even when significant errors are introduced through the process of converting document formats like PDF to plain text. Next, we describe an approach to extracting the properties being measured (e.g., the property "pixel pitch" in the phrase "a pixel pitch as high as 352 {\mu}m"). Finally, we present MQSearch: the realization of a search engine with full support for measured information.Comment: 4 pages; 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '15

    Deterministic Automata for Unordered Trees

    Get PDF
    Automata for unordered unranked trees are relevant for defining schemas and queries for data trees in Json or Xml format. While the existing notions are well-investigated concerning expressiveness, they all lack a proper notion of determinism, which makes it difficult to distinguish subclasses of automata for which problems such as inclusion, equivalence, and minimization can be solved efficiently. In this paper, we propose and investigate different notions of "horizontal determinism", starting from automata for unranked trees in which the horizontal evaluation is performed by finite state automata. We show that a restriction to confluent horizontal evaluation leads to polynomial-time emptiness and universality, but still suffers from coNP-completeness of the emptiness of binary intersections. Finally, efficient algorithms can be obtained by imposing an order of horizontal evaluation globally for all automata in the class. Depending on the choice of the order, we obtain different classes of automata, each of which has the same expressiveness as CMso.Comment: In Proceedings GandALF 2014, arXiv:1408.556

    Global Numerical Constraints on Trees

    Full text link
    We introduce a logical foundation to reason on tree structures with constraints on the number of node occurrences. Related formalisms are limited to express occurrence constraints on particular tree regions, as for instance the children of a given node. By contrast, the logic introduced in the present work can concisely express numerical bounds on any region, descendants or ancestors for instance. We prove that the logic is decidable in single exponential time even if the numerical constraints are in binary form. We also illustrate the usage of the logic in the description of numerical constraints on multi-directional path queries on XML documents. Furthermore, numerical restrictions on regular languages (XML schemas) can also be concisely described by the logic. This implies a characterization of decidable counting extensions of XPath queries and XML schemas. Moreover, as the logic is closed under negation, it can thus be used as an optimal reasoning framework for testing emptiness, containment and equivalence

    Logics for Unranked Trees: An Overview

    Get PDF
    Labeled unranked trees are used as a model of XML documents, and logical languages for them have been studied actively over the past several years. Such logics have different purposes: some are better suited for extracting data, some for expressing navigational properties, and some make it easy to relate complex properties of trees to the existence of tree automata for those properties. Furthermore, logics differ significantly in their model-checking properties, their automata models, and their behavior on ordered and unordered trees. In this paper we present a survey of logics for unranked trees

    Decidable Classes of Tree Automata Mixing Local and Global Constraints Modulo Flat Theories

    Get PDF
    We define a class of ranked tree automata TABG generalizing both the tree automata with local tests between brothers of Bogaert and Tison (1992) and with global equality and disequality constraints (TAGED) of Filiot et al. (2007). TABG can test for equality and disequality modulo a given flat equational theory between brother subterms and between subterms whose positions are defined by the states reached during a computation. In particular, TABG can check that all the subterms reaching a given state are distinct. This constraint is related to monadic key constraints for XML documents, meaning that every two distinct positions of a given type have different values. We prove decidability of the emptiness problem for TABG. This solves, in particular, the open question of the decidability of emptiness for TAGED. We further extend our result by allowing global arithmetic constraints for counting the number of occurrences of some state or the number of different equivalence classes of subterms (modulo a given flat equational theory) reaching some state during a computation. We also adapt the model to unranked ordered terms. As a consequence of our results for TABG, we prove the decidability of a fragment of the monadic second order logic on trees extended with predicates for equality and disequality between subtrees, and cardinality.Comment: 39 pages, to appear in LMCS journa

    Regular hedge model checking

    Get PDF
    We extend the regular model checking framework so that it can handle systems with arbitrary width tree-like structures. Con gurations of a system are represented by trees of arbitrary arities, sets of con gurations are represented by regular hedge automata, and the dynamics of a system is modeled by a regular hedge transducer. We consider the problem of computing the transitive closure T + of a regular hedge transducer T. This construction is not possible in general. Therefore, we present a general acceleration technique for computing T+. Our method consists of enhancing the termination of the iterative computation of the different compositions Ti by merging the states of the hedge transducers according to an appropriate equivalence relation that preserves the traces of the transducers. We provide a methodology for effectively deriving equivalence relations that are appropriate. We have successfully applied our technique to compute transitive closures for some mutual exclusion protocols de ned on arbitrary width tree topologies, as well as for an XML application.4th IFIP International Conference on Theoretical Computer ScienceRed de Universidades con Carreras en Informática (RedUNCI

    Tree Automata with Global Constraints for Infinite Trees

    Get PDF
    We study an extension of tree automata on infinite trees with global equality and disequality constraints. These constraints can enforce that all subtrees for which in the accepting run a state q is reached (at the root of that subtree) are identical, or that these trees differ from the subtrees at which a state q\u27 is reached. We consider the closure properties of this model and its decision problems. While the emptiness problem for the general model remains open, we show the decidability of the emptiness problem for the case that the given automaton only uses equality constraints

    Containment of Shape Expression Schemas for RDF

    Get PDF
    We study the problem of containment for shape expression schemas (ShEx) for RDF graphs. We identify a subclass of ShEx that has a natural graphical representation in the form of shape graphs and their semantics is captured with a tractable notion of embedding of an RDF graph in a shape graph. When applied to pairs of shape graphs, an embedding is a sufficient condition for containment, and for a practical subclass of deterministic shape graphs, it is also a necessary one, thus yielding a subclass with tractable containment. While for general shape graphs a minimal counter-example i.e., an instance proving non-containment, might be of exponential size, we show that containment is EXP-hard and in coNEXP. Finally, we show that containment for arbitrary ShEx is coNEXP-hard and in coTwoNEXP^NP

    Logics for Unordered Trees with Data Constraints on Siblings

    Get PDF
    International audienceWe study counting monadic second-order logics (CMso) for unordered data trees. Our objective is to enhance this logic with data constraints for comparing string data values attached to sibling edges of a data tree. We show that CMso satisfiability becomes undecidable when adding data constraints between siblings that can check the equality of factors of data values. For more restricted data constraints that can only check the equality of prefixes, we show that it becomes decidable, and propose a related automaton model with good complexities. This restricted logic is relevant to applications such as checking well-formedness properties of semi-structured databases and file trees. Our decidability results are obtained by compilation of CMso to automata for unordered trees, where both are enhanced with data constraints in a novel manner

    Automata for Unordered Trees

    Get PDF
    International audienceWe present a framework for defining automata for unordereddata trees that is parametrized by the way in which multisets of children nodes are described. Presburger tree automata and alternatingPresburger tree automata are particular instances. We establish the usual equivalence in expressiveness of tree automata and MSO for the automata defined inour framework.We then investigate subclasses of automata for unordered treesfor which testing language equivalence is in P-time. For this we start from automata in our framework that describe multisets of childrenby finite automata, and propose two approaches of how todo this deterministically. We show that a restriction to confluent horizontal evaluation leads to polynomial-time emptiness and universality, but still suffers fromcoNP-completeness of the emptiness of binary intersections. Finally, efficient algorithms can be obtained by imposing an order of horizontal evaluation globally for all automata in the class. Depending onthe choice of the order, we obtain different classes of automata, eachof which has the same expressiveness as Counting MSO
    corecore