49 research outputs found

    Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments

    Full text link
    Semantic role labeling (SRL) is a fundamental yet challenging task in the NLP community. Recent works of SRL mainly fall into two lines: 1) BIO-based; 2) span-based. Despite ubiquity, they share some intrinsic drawbacks of not considering internal argument structures, potentially hindering the model's expressiveness. The key challenge is arguments are flat structures, and there are no determined subtree realizations for words inside arguments. To remedy this, in this paper, we propose to regard flat argument spans as latent subtrees, accordingly reducing SRL to a tree parsing task. In particular, we equip our formulation with a novel span-constrained TreeCRF to make tree structures span-aware and further extend it to the second-order case. We conduct extensive experiments on CoNLL05 and CoNLL12 benchmarks. Results reveal that our methods perform favorably better than all previous syntax-agnostic works, achieving new state-of-the-art under both end-to-end and w/ gold predicates settings.Comment: COLING 202

    Simple Hardware-Efficient PCFGs with Independent Left and Right Productions

    Full text link
    Scaling dense PCFGs to thousands of nonterminals via a low-rank parameterization of the rule probability tensor has been shown to be beneficial for unsupervised parsing. However, PCFGs scaled this way still perform poorly as a language model, and even underperform similarly-sized HMMs. This work introduces \emph{SimplePCFG}, a simple PCFG formalism with independent left and right productions. Despite imposing a stronger independence assumption than the low-rank approach, we find that this formalism scales more effectively both as a language model and as an unsupervised parser. As an unsupervised parser, our simple PCFG obtains an average F1 of 65.1 on the English PTB, and as a language model, it obtains a perplexity of 119.0, outperforming similarly-sized low-rank PCFGs. We further introduce \emph{FlashInside}, a hardware IO-aware implementation of the inside algorithm for efficiently scaling simple PCFGs.Comment: Accepted to Findings of EMNLP, 202

    Logistic Normal Priors for Unsupervised Probabilistic Grammar Induction

    Get PDF
    We explore a new Bayesian model for probabilistic grammars, a family of distributions over discrete structures that includes hidden Markov models and probabilistic context-free grammars. Our model extends the correlated topic model framework to probabilistic grammars, exploiting the logistic normal distribution as a prior over the grammar parameters. We derive a variational EM algorithm for that model, and then experiment with the task of unsupervised grammar induction for natural language dependency parsing. We show that our model achieves superior results over previous models that use different priors.

    Identifiability and Unmixing of Latent Parse Trees

    Full text link
    This paper explores unsupervised learning of parsing models along two directions. First, which models are identifiable from infinite data? We use a general technique for numerically checking identifiability based on the rank of a Jacobian matrix, and apply it to several standard constituency and dependency parsing models. Second, for identifiable models, how do we estimate the parameters efficiently? EM suffers from local optima, while recent work using spectral methods cannot be directly applied since the topology of the parse tree varies across sentences. We develop a strategy, unmixing, which deals with this additional complexity for restricted classes of parsing models

    Empirical studies on word representations

    Get PDF

    Empirical studies on word representations

    Get PDF

    Empirical studies on word representations

    Get PDF
    One of the most fundamental tasks in natural language processing is representing words with mathematical objects (such as vectors). The word representations, which are most often estimated from data, allow capturing the meaning of words. They enable comparing words according to their semantic similarity, and have been shown to work extremely well when included in complex real-world applications. A large part of our work deals with ways of estimating word representations directly from large quantities of text. Our methods exploit the idea that words which occur in similar contexts have a similar meaning. How we define the context is an important focus of our thesis. The context can consist of a number of words to the left and to the right of the word in question, but, as we show, obtaining context words via syntactic links (such as the link between the verb and its subject) often works better. We furthermore investigate word representations that accurately capture multiple meanings of a single word. We show that translation of a word in context contains information that can be used to disambiguate the meaning of that word
    corecore