11 research outputs found

    Parsing Linear Context-Free Rewriting Systems with Fast Matrix Multiplication

    Get PDF
    We describe a matrix multiplication recognition algorithm for a subset of binary linear context-free rewriting systems (LCFRS) with running time O(nωd)O(n^{\omega d}) where M(m)=O(mω)M(m) = O(m^{\omega}) is the running time for m×mm \times m matrix multiplication and dd is the "contact rank" of the LCFRS -- the maximal number of combination and non-combination points that appear in the grammar rules. We also show that this algorithm can be used as a subroutine to get a recognition algorithm for general binary LCFRS with running time O(nωd+1)O(n^{\omega d + 1}). The currently best known ω\omega is smaller than 2.382.38. Our result provides another proof for the best known result for parsing mildly context sensitive formalisms such as combinatory categorial grammars, head grammars, linear indexed grammars, and tree adjoining grammars, which can be parsed in time O(n4.76)O(n^{4.76}). It also shows that inversion transduction grammars can be parsed in time O(n5.76)O(n^{5.76}). In addition, binary LCFRS subsumes many other formalisms and types of grammars, for some of which we also improve the asymptotic complexity of parsing

    Three Studies on Model Transformations - Parsing, Generation and Ease of Use

    Get PDF
    ABSTRACTTransformations play an important part in both software development and the automatic processing of natural languages. We present three publications rooted in the multi-disciplinary research of Language Technology and Software Engineering and relate their contribution to the literature on syntactical transformations. Parsing Linear Context-Free Rewriting SystemsThe first publication describes four different parsing algorithms for the mildly context-sensitive grammar formalism Linear Context-Free Rewriting Systems. The algorithms automatically transform a text into a chart. As a result the parse chart contains the (possibly partial) analysis of the text according to a grammar with a lower level of abstraction than the original text. The uni-directional and endogenous transformations are described within the framework of parsing as deduction. Natural Language Generation from Class DiagramsUsing the framework of Model-Driven Architecture we generate natural language from class diagrams. The transformation is done in two steps. In the first step we transform the class diagram, defined by Executable and Translatable UML, to grammars specified by the Grammatical Framework. The grammars are then used to generate the desired text. Overall, the transformation is uni-directional, automatic and an example of a reverse engineering translation. Executable and Translatable UML - How Difficult Can it Be?Within Model-Driven Architecture there has been substantial research on the transformation from Platform-Independent Models (PIM) into Platform-Specifc Models, less so on the transformation from Computationally Independent Models (CIM) into PIMs. This publication reflects on the outcomes of letting novice software developers transform CIMs specified by UML into PIMs defined in Executable and Translatable UML.ConclusionThe three publications show how model transformations can be used within both Language Technology and Software Engineering to tackle the challenges of natural language processing and software development

    Multiple context-free path querying by matrix multiplication

    Get PDF
    Many graph analysis problems can be formulated as formal language-constrained path querying problems where the formal languages are used as constraints for navigational path queries. Recently, the context-free language (CFL) reachability formulation has become very popular and can be used in many areas, for example, querying graph databases, Resource Description Framework (RDF) analysis. However, the generative capacity of context-free grammars (CFGs) is too weak to generate some complex queries, for example, from natural languages, and the various extensions of CFGs have been proposed. Multiple context-free grammar (MCFG) is one of such extensions of CFGs. Despite the fact that, to the best of our knowledge, there is no algorithm for MCFL-reachability, this problem is known to be decidable. This paper is devoted to developing the first such algorithm for the MCFL-reachability problem. The essence of the proposed algorithm is to use a set of Boolean matrices and operations on them to find paths in a graph that satisfy the given constraints. The main operation here is Boolean matrix multiplication. As a result, the algorithm returns a set of matrices containing all information needed to solve the MCFL-reachability problem. The presented algorithm is implemented in Python using GraphBLAS API. An analysis of real RDF data and synthetic graphs for some MCFLs is performed. The study showed that using a sparse format for matrix storage and parallel computing for graphs with tens of thousands of edges the analysis time can be 10–20 minutes. The result of the analysis provides tens of millions of reachable vertex pairs. The proposed algorithm can be applied in problems of static code analysis, bioinformatics, network analysis, as well as in graph databases when a path query cannot be expressed using context-free grammars. The provided algorithm is linear algebra-based, hence, it allows one to use high-performance libraries and utilize modern parallel hardware

    Un algorithme d'analyse de type earley pour grammaires à concaténation d'intervalles

    Get PDF
    Nous prĂ©sentons ici diffĂ©rents algorithmes d’analyse pour grammaires Ă  concatĂ©nation d’intervalles (Range Concatenation Grammar, RCG), dont un nouvel algorithme de type Earley, dans le paradigme de l’analyse dĂ©ductive. Notre travail est motivĂ© par l’intĂ©rĂȘt portĂ© rĂ©cemment Ă  ce type de grammaire, et comble un manque dans la littĂ©rature existante.We present several different parsing algorithms for Range Concatenation Grammar (RCG), inter alia an entirely novel Earley-style algorithm, using the deductive parsing framework. Our work is motivated by recent interest in range concatenation grammar in general and fills a gap in the existing literature

    TuLiPA - Parsing Extensions of TAG with Range Concatenation Grammars

    Get PDF
    4 pages, oral presentationInternational audienceIn this paper we present a parsing framework for extensions of Tree Adjoining Grammars (TAG) called TuLiPA (Tuebingen Linguistic Parsing Architecture). In particular, besides TAG, the parser can process Tree-Tuple MCTAG with shared nodes (TT-MCTAG), a TAG-extension that has been proposed to deal with scrambling in free word order languages such as German. The central strategy of the parser is such that the incoming TT-MCTAG (or TAG) is transformed into an equivalent Range Concatenation Grammar (RCG) which, in turn, is then used for parsing. The RCG parser is an incremental Earley-style chart parser. In addition to the syntactic anlysis, TuLiPA computes also an underspecified semantic analysis for grammars that are equipped with semantic representations

    A declarative characterization of different types of multicomponent tree adjoining grammars

    Get PDF
    Multicomponent Tree Adjoining Grammars (MCTAGs) are a formalism that has been shown to be useful for many natural language applications. The definition of non-local MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. In this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees (in the underlying TAG) the MCTAG licences. We provide similar characterizations for various types of MCTAG. These characterizations give a better understanding of the formalisms, they allow a more systematic comparison of different types of MCTAG, and, furthermore, they can be exploited for parsing

    Chomsky-SchĂŒtzenberger parsing for weighted multiple context-free languages

    Full text link

    Two characterisation results of multiple context-free grammars and their application to parsing

    Get PDF
    In the first part of this thesis, a Chomsky-SchĂŒtzenberger characterisation and an automaton characterisation of multiple context-free grammars are proved. Furthermore, a framework for approximation of automata with storage is described. The second part develops each of the three theoretical results into a parsing algorithm

    Parsing linear context-free rewriting systems

    No full text
    We describe four different parsing algorithms for Linear Context-Free Rewriting Systems (Vijay-Shanker et al., 1987). The algorithms are described as deduction systems, and possible optimizations are discussed
    corecore