77 research outputs found

    Word Ordering as a Graph Rewriting Process

    Get PDF
    International audienceThis paper shows how the correspondence between a unordered dependency tree and a sentence that expresses it can be achieved by transforming the tree into a string where each linear precedence link corresponds to one specific syntactic relation. We propose a formal grammar with a distributed architecture that can be used for both synthesis and analysis. We argue for the introduction of a topological tree as an intermediate step between dependency syntax and word order

    Approximating dependency grammars through intersection of regular languages

    Get PDF
    Springer; 3-540-24318-6;Peer reviewe

    A meta-grammatical framework for dependency grammar

    Get PDF
    We propose a meta-grammatical framework for dependency grammar, accommodating any number of dimensions and restricting their licensed models through the application of parametric principles. We first instantiate this framework to obtain the version of Topological Dependency Grammar (TDG) proposed by Duchier and Debusmann 2001. We then describe an extended instantiation which adds support for semantic dependencies (thus also providing an account of control and raising constructions) and meaning assembly

    Syntax-based machine translation using dependency grammars and discriminative machine learning

    Get PDF
    Machine translation underwent huge improvements since the groundbreaking introduction of statistical methods in the early 2000s, going from very domain-specific systems that still performed relatively poorly despite the painstakingly crafting of thousands of ad-hoc rules, to general-purpose systems automatically trained on large collections of bilingual texts which manage to deliver understandable translations that convey the general meaning of the original input. These approaches however still perform quite below the level of human translators, typically failing to convey detailed meaning and register, and producing translations that, while readable, are often ungrammatical and unidiomatic. This quality gap, which is considerably large compared to most other natural language processing tasks, has been the focus of the research in recent years, with the development of increasingly sophisticated models that attempt to exploit the syntactical structure of human languages, leveraging the technology of statistical parsers, as well as advanced machine learning methods such as marging-based structured prediction algorithms and neural networks. The translation software itself became more complex in order to accommodate for the sophistication of these advanced models: the main translation engine (the decoder) is now often combined with a pre-processor which reorders the words of the source sentences to a target language word order, or with a post-processor that ranks and selects a translation according according to fine model from a list of candidate translations generated by a coarse model. In this thesis we investigate the statistical machine translation problem from various angles, focusing on translation from non-analytic languages whose syntax is best described by fluid non-projective dependency grammars rather than the relatively strict phrase-structure grammars or projectivedependency grammars which are most commonly used in the literature. We propose a framework for modeling word reordering phenomena between language pairs as transitions on non-projective source dependency parse graphs. We quantitatively characterize reordering phenomena for the German-to-English language pair as captured by this framework, specifically investigating the incidence and effects of the non-projectivity of source syntax and the non-locality of word movement w.r.t. the graph structure. We evaluated several variants of hand-coded pre-ordering rules in order to assess the impact of these phenomena on translation quality. We propose a class of dependency-based source pre-ordering approaches that reorder sentences based on a flexible models trained by SVMs and and several recurrent neural network architectures. We also propose a class of translation reranking models, both syntax-free and source dependency-based, which make use of a type of neural networks known as graph echo state networks which is highly flexible and requires extremely little training resources, overcoming one of the main limitations of neural network models for natural language processing tasks

    A meta-grammatical framework for dependency grammar

    Get PDF
    We propose a meta-grammatical framework for dependency grammar, accommodating any number of dimensions and restricting their licensed models through the application of parametric principles. We first instantiate this framework to obtain the version of Topological Dependency Grammar (TDG) proposed by Duchier and Debusmann 2001. We then describe an extended instantiation which adds support for semantic dependencies (thus also providing an account of control and raising constructions) and meaning assembly

    Linking syntactic and semantic arguments in a dependency-based formalism

    Get PDF
    We propose a formal characterization of variation in the syntactic realization of semantic arguments, using hierarchies of syntactic relations and thematic roles, and a mechanism of lexical inheritance to obtain emph{valency frames} from individual emph{linking types}. We embed the formalization in the new lexicalized, dependency-based grammar formalism of Topological Dependency Grammar (TDG) cite{rade-acl2001}. We account for arguments that can be alternatively realized as a NP or a PP, and model thematic role alternations. We also treat auxiliary constructions, where the correspondance between syntactic and semantic argumenthood is indirect

    Efficient Lagrangian relaxation algorithms for exact inference in natural language tasks

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 95-99).For many tasks in natural language processing, finding the best solution requires a search over a large set of possible structures. Solving these combinatorial search problems exactly can be inefficient, and so researchers often use approximate techniques at the cost of model accuracy. In this thesis, we turn to Lagrangian relaxation as an alternative to approximate inference in natural language tasks. We demonstrate that Lagrangian relaxation algorithms provide efficient solutions while still maintaining formal guarantees. The approach leads to inference algorithms with the following properties: " The resulting algorithms are simple and efficient, building on standard combinatorial algorithms for relaxed problems. " The algorithms provably solve a linear programming (LP) relaxation of the original inference problem. " Empirically, the relaxation often leads to an exact solution to the original problem. We develop Lagrangian relaxation algorithms for several important tasks in natural language processing including higher-order non-projective dependency parsing, syntactic machine translation, integrated constituency and dependency parsing, and part-of-speech tagging with inter-sentence constraints. For each of these tasks, we show that the Lagrangian relaxation algorithms are often significantly faster than exact methods while finding the exact solution with a certificate of optimality in the vast majority of examples.by Alexander M. Rush.S.M

    Algebraic dependency grammar

    Get PDF
    We propose a mathematical formalism called Algebraic Dependency Grammar with applications to formal linguistics and to formal language theory. Regarding formal linguistics we aim to address the problem of grammaticality with special attention to cross-linguistic cases. In the field of formal language theory this formalism provides a new perspective allowing an algebraic classification of languages. Notably our approach suggests the existence of so-called anti-classes of languages associated to certain classes of languages. Our notion of a dependency grammar is as of a definition of a set of well-constructed dependency trees (we call this algebraic governance) and a relation which associates word-orders to dependency trees (we call this algebraic linearization). In relation to algebraic governance, we define a manifold which is a set of dependency trees satisfying an agreement condition throughout a pattern, which is the algebraic form of a collection of syntactic addresses over the dependency tree. A boolean condition on the words formalizes the notion of agreement. In relation to algebraic linearization, first we observe that the notion of projectivity is quintessentially that certain substructures of a dependency tree always form an interval in its linearization. So we have to establish well what is a substructure; we see again that patterns proportion the key, generalizing the notion of projectivity with recursive linearization procedures. Combining the above modules we have the formalism: an algebraic dependency grammar is a manifold together with a linearization. Notice that patterns sustain both manifolds and linearizations. We study their interrelation in terms of a new algebraic classification of classes of languages. We highlight the main contributions of the thesis. Regarding mathematical linguistics, algebraic dependency grammar considers trees and word-order different modules in the architecture, which allows description of languages with varied word-order. Ellipses are permitted; this issue is usually avoided because it makes some formalisms non-decidable. We differentiate linguistic phenomena structurally by their algebraic description. Algebraic dependency grammar permits observance of affinity between linguistic constructions which seem superficially different. Regarding formal language theory, a new system for understanding a very large family of languages is presented which permits observation of languages in broader contexts. We identify a new class named anti-context-free languages containing constructions structurally symmetric to context-free languages. Informally we could say that context-free languages are well-parenthesized, while anti-context-free languages are cross-serial-parenthesized. For example copy languages and respectively languages are anti-context-free.Es proposa un formalisme matemàtic anomenat Gramàtica de Dependències Algebraica amb aplicacions a la lingüística formal i a la teoria de llenguatges formals. Pel que fa a la lingüística formal es pretén abordar el problema de la gramaticalitat, amb un èmfasi especial en la transversalitat, això és, que el formalisme sigui apte per a un bon nombre de llengües. En el camp dels llenguatges formals aquest formalisme proporciona una nova perspectiva que permet una classificació algebraica dels llenguatges. Aquest enfocament suggereix a més a més l'existència de les aquí anomenades anti-classes de llenguatges associades a certes classes de llenguatges. La nostra idea d'una gramàtica de dependències és en un conjunt de sintagmes ben construïts (d'això en diem recció algebraica) i una relació que associa ordres de paraules als sintagmes d'aquest conjunt (d'això en diem linearització algebraica). Pel que fa a la recció algebraica, introduïm el concepte de varietat sintàctica com el conjunt de sintagmes que satisfan una concordança sobre un determinat patró. Un patró és un conjunt d'adreces sintàctiques descrit algebraicament. La concordança es formalitza a través d'una condició booleana sobre el vocabulari. En relació amb linearització algebraica, en primer lloc, observem que l'essencial de la noció clàssica de projectivitat rau en el fet que certes subestructures d'un arbre de dependències formen sempre un interval en la seva linearització. Així doncs, primer hem d'establir bé que vol dir subestructura. Un cop més veiem que els patrons en proporcionen la clau, tot generalitzant la noció de projectivitat a través d'un procediment recursiu de linearització. Tot unint els dos mòduls anteriors ja tenim el nostre formalisme a punt: una gramàtica de dependències algebraica és una varietat sintàctica juntament amb una linearització. Notem que els patrons són a la base de tots dos mòduls: varietats i linearitzacions, així que resulta del tot natural estudiar-ne la interrelació en termes d'un nou sistema de classificació algebraica de classes de llenguatges. Destaquem les principals contribucions d'aquesta tesi. Pel que fa a la matemàtica lingüística, la gramàtica de dependències algebraica considera els arbres i l'ordre de les paraules diferents mòduls dins l'arquitectura la qual cosa permet de descriure llenguatges amb una gran varietat d'ordre. L'ús d'el·lipsis és permès; aquesta qüestió és normalment evitada en altres formalismes per tal com la possibilitat d'el·lipsis fa que els models es tornin no decidibles. El nostre model també ens permet classificar estructuralment fenòmens lingüístics segons la seva descripció algebraica, així com de copsar afinitats entre construccions que semblen superficialment diferents. Pel que fa a la teoria dels llenguatges formals, presentem un nou sistema de classificació que ens permet d'entendre els llenguatges en un context més ampli. Identifiquem una nova classe que anomenem llenguatges anti-lliures-de-context que conté construccions estructuralment simètriques als llenguatges lliures de context. Informalment podríem dir que els llenguatges lliures de context estan ben parentetitzats, mentre que els anti-lliures-de-context estan parentetitzats segons dependències creuades en sèrie. En són mostres d'aquesta classe els llenguatges còpia i els llenguatges respectivament.Postprint (published version

    SMT and Hybrid systems of the QTLeap project in the WMT16 IT-task

    Get PDF
    This paper presents the description of 12 systems submitted to the WMT16 IT-task, covering six different languages, namely Basque, Bulgarian, Dutch, Czech, Portuguese and Spanish. All these systems were developed under the scope of the QTLeap project, presenting a common strategy. For each language two different systems were submitted, namely a phrase-based MT system built using Moses, and a system exploiting deep language engineering approaches, that in all the languages but Bulgarian was implemented using TectoMT. For 4 of the 6 languages, the TectoMT-based system performs better than the Moses-based one
    corecore