Search CORE

576 research outputs found

Latent Tree Language Model

Author: Brychcin Tomas
Publication venue
Publication date: 01/01/2016
Field of study

In this paper we introduce Latent Tree Language Model (LTLM), a novel approach to language modeling that encodes syntax and semantics of a given sentence as a tree of word roles. The learning phase iteratively updates the trees by moving nodes according to Gibbs sampling. We introduce two algorithms to infer a tree for a given sentence. The first one is based on Gibbs sampling. It is fast, but does not guarantee to find the most probable tree. The second one is based on dynamic programming. It is slower, but guarantees to find the most probable tree. We provide comparison of both algorithms. We combine LTLM with 4-gram Modified Kneser-Ney language model via linear interpolation. Our experiments with English and Czech corpora show significant perplexity reductions (up to 46% for English and 49% for Czech) compared with standalone 4-gram Modified Kneser-Ney language model.Comment: Accepted to EMNLP 201

arXiv.org e-Print Archive

Crossref

Dependency Grammar Induction with Neural Lexicalization and Big Training Data

Author: Han Wenjuan
Jiang Yong
Tu Kewei
Publication venue
Publication date: 01/01/2017
Field of study

We study the impact of big models (in terms of the degree of lexicalization) and big data (in terms of the training corpus size) on dependency grammar induction. We experimented with L-DMV, a lexicalized version of Dependency Model with Valence and L-NDMV, our lexicalized extension of the Neural Dependency Model with Valence. We find that L-DMV only benefits from very small degrees of lexicalization and moderate sizes of training corpora. L-NDMV can benefit from big training data and lexicalization of greater degrees, especially when enhanced with good model initialization, and it achieves a result that is competitive with the current state-of-the-art.Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization

Author: Cohen S. B.
Smith N. A.
Publication venue
Publication date: 01/01/2010
Field of study

We consider the search for a maximum likelihood assignment of hidden derivations and grammar weights for a probabilistic context-free grammar, the problem approximately solved by “Viterbi training.” We show that solving and even approximating Viterbi training for PCFGs is NP-hard. We motivate the use of uniformat-random initialization for Viterbi EM as an optimal initializer in absence of further information about the correct model parameters, providing an approximate bound on the log-likelihood.

CiteSeerX

Edinburgh Research Explorer

Turning the pipeline into a loop: Iterated unsupervised dependency parsing and PoS induction

Author: Christodoulopoulos Christos
Goldwater Sharon
Steedman Mark
Publication venue
Publication date: 01/06/2012
Field of study

Edinburgh Research Explorer

Using semantic cues to learn syntax

Author: Barzilay Regina
Naseem Tahira
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 01/08/2011
Field of study

We present a method for dependency grammar induction that utilizes sparse annotations of semantic relations. This induction set-up is attractive because such annotations provide useful clues about the underlying syntactic structure, and they are readily available in many domains (e.g., info-boxes and HTML markup). Our method is based on the intuition that syntactic realizations of the same semantic predicate exhibit some degree of consistency. We incorporate this intuition in a directed graphical model that tightly links the syntactic and semantic structures. This design enables us to exploit syntactic regularities while still allowing for variations. Another strength of the model lies in its ability to capture non-local dependency relations. Our results demonstrate that even a small amount of semantic annotations greatly improves the accuracy of learned dependencies when tested on both in-domain and out-of-domain texts.United States. Defense Advanced Research Projects Agency (Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA8750-09-C-0172)United States. Defense Advanced Research Projects Agency (Air Force Research Laboratory (AFRL) prime contract no. FA8750-09-C-0172)U.S. Army Research Laboratory (contract no. W911NF-10-1-0533

DSpace@MIT

Association for the Advancement of Artificial Intelligence: AAAI Publications

Improved Constituent Context Model with Features

Author: Huang Yun
Tan Chew Lim
Zhang Min
Publication venue: 'Faculty of Computer Science, Universitas Indonesia'
Publication date: 01/01/2012
Field of study

Waseda University Repository

Recommended from our members

Unsupervised Formal Grammar Induction with Confidence

Author: Collard Jacob
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2020
Field of study

I present a novel algorithm for minimally supervised formal grammar induction using a linguistically-motivated grammar formalism. This algorithm, called the Missing Link algorithm (ML), is built off of classic chart parsing methods, but makes use of a probabilistic confidence measure to keep track of potentially ambiguous lexical items. Because ML uses a structured grammar formalism, each step of the algorithm can be easily understood by linguists, making it ideal for studying the learnability of different linguistic phenomena. The algorithm requires minimal annotation in its training data, but is capable of learning nuanced data from relatively small training sets and can be applied to a variety of grammar formalisms. Though evaluating an unsupervised syntactic model is difficult, I present an evaluation using the Corpus of Linguistic Acceptability and show state-of-the-art performance

ScholarWorks@UMass Amherst

On Language Acquisition through Womb Grammars

Author: Becerra-Bonache Leonor
Dahl Veronica
Miralles Emilio
Publication venue: HAL CCSD
Publication date: 01/01/2012
Field of study

International audienceWe propose to automate the field of language acquisition evaluation through Constraint Solving; in particular through the use of Womb Grammars. Womb Grammar Parsing is a novel constraint based paradigm that was devised mainly to induce grammatical structure from the description of its syntactic constraints in a related language. In this paper we argue that it is also ideal for automating the evaluation of language acquisition, and present as proof of concept a CHRG system for detecting which of fourteen levels of morphological proficiency a child is at, from a representative sample of the child's expressions. Our results also uncover ways in which the linguistic constraints that characterize a grammar need to be tailored to language acquisition applications. We also put forward a proposal for discovering in what order such levels are typically acquired in other languages than English. Our findings have great potential practical value, in that they can help educators tailor the games, stories, songs, etc. that can aid a child (or a second language learner) to progress in timely fashion into the next level of proficiency, and can as well help shed light on the processes by which languages less studied than English are acquired

HAL-UJM