Search CORE

23 research outputs found

The Importance of Category Labels in Grammar Induction with Child-directed Utterances

Author: Jin Lifeng
Schuler William
Publication venue
Publication date: 01/01/2020
Field of study

Recent progress in grammar induction has shown that grammar induction is possible without explicit assumptions of language-specific knowledge. However, evaluation of induced grammars usually has ignored phrasal labels, an essential part of a grammar. Experiments in this work using a labeled evaluation metric, RH, show that linguistically motivated predictions about grammar sparsity and use of categories can only be revealed through labeled evaluation. Furthermore, depth-bounding as an implementation of human memory constraints in grammar inducers is still effective with labeled evaluation on multilingual transcribed child-directed utterances.Comment: The 16th International Conference on Parsing Technologies (IWPT 2020

arXiv.org e-Print Archive

Crossref

Co-training an Unsupervised Constituency Parser with Weak Supervision

Author: Cohen Shay
Maveli Nickil
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 18/03/2022
Field of study

We introduce a method for unsupervised parsing that relies on bootstrapping classifiers to identify if a node dominates a specific span in a sentence. There are two types of classifiers, an inside classifier that acts on a span, and an outside classifier that acts on everything outside of a given span. Through self-training and co-training with the two classifiers, we show that the interplay between them helps improve the accuracy of both, and as a result, effectively parse. A seed bootstrapping technique prepares the data to train these classifiers. Our analyses further validate that such an approach in conjunction with weak supervision using prior branching knowledge of a known language (left/right-branching) and minimal heuristics injects strong inductive bias into the parser, achieving 63.1 F

_1

on the English (PTB) test set. In addition, we show the effectiveness of our architecture by evaluating on treebanks for Chinese (CTB) and Japanese (KTB) and achieve new state-of-the-art results. Our code and pre-trained models are available at https://github.com/Nickil21/weakly-supervised-parsing.Comment: Accepted to Findings of ACL 202

arXiv.org e-Print Archive

Edinburgh Research Explorer

Deep Clustering of Text Representations for Supervision-free Probing of Syntax

Author: Gimpel Kevin
Gupta Vikram
Sachan Mrinmaya
Shi Haoyue
Publication venue
Publication date: 01/12/2021
Field of study

We explore deep clustering of text representations for unsupervised model interpretation and induction of syntax. As these representations are high-dimensional, out-of-the-box methods like KMeans do not work well. Thus, our approach jointly transforms the representations into a lower-dimensional cluster-friendly space and clusters them. We consider two notions of syntax: Part of speech Induction (POSI) and constituency labelling (CoLab) in this work. Interestingly, we find that Multilingual BERT (mBERT) contains surprising amount of syntactic knowledge of English; possibly even as much as English BERT (EBERT). Our model can be used as a supervision-free probe which is arguably a less-biased way of probing. We find that unsupervised probes show benefits from higher layers as compared to supervised probes. We further note that our unsupervised probe utilizes EBERT and mBERT representations differently, especially for POSI. We validate the efficacy of our probe by demonstrating its capabilities as an unsupervised syntax induction technique. Our probe works well for both syntactic formalisms by simply adapting the input representations. We report competitive performance of our probe on 45-tag English POSI, state-of-the-art performance on 12-tag POSI across 10 languages, and competitive results on CoLab. We also perform zero-shot syntax induction on resource impoverished languages and report strong results

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

GFlowNet-EM for learning compositional latent variable models

Author: Bengio Yoshua
Everett Katie
Graikos Alexandros
Hu Edward J.
Jain Moksh
Malkin Nikolay
Publication venue
Publication date: 03/06/2023
Field of study

Latent variable models (LVMs) with discrete compositional latents are an important but challenging setting due to a combinatorially large number of possible configurations of the latents. A key tradeoff in modeling the posteriors over latents is between expressivity and tractable optimization. For algorithms based on expectation-maximization (EM), the E-step is often intractable without restrictive approximations to the posterior. We propose the use of GFlowNets, algorithms for sampling from an unnormalized density by learning a stochastic policy for sequential construction of samples, for this intractable E-step. By training GFlowNets to sample from the posterior over latents, we take advantage of their strengths as amortized variational inference algorithms for complex distributions over discrete structures. Our approach, GFlowNet-EM, enables the training of expressive LVMs with discrete compositional latents, as shown by experiments on non-context-free grammar induction and on images using discrete variational autoencoders (VAEs) without conditional independence enforced in the encoder.Comment: ICML 2023; code: https://github.com/GFNOrg/GFlowNet-E

arXiv.org e-Print Archive

Recommended from our members

Spectral Methods for Natural Language Processing

Author: Stratos Karl
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2016
Field of study

Many state-of-the-art results in natural language processing (NLP) are achieved with statistical models involving latent variables. Unfortunately, computational problems associated with such models (for instance, finding the optimal parameter values) are typically intractable, forcing practitioners to rely on heuristic methods without strong guarantees. While heuristics are often sufficient for empirical purposes, their de-emphasis on theoretical aspects has certain negative ramifications. First, it can impede the development of rigorous theoretical understanding which can generate new ideas and algorithms. Second, it can lead to black art solutions that are unreliable and difficult to reproduce. In this thesis, we argue that spectral methods---that is, methods that use singular value decomposition or other similar matrix or tensor factorization---can effectively remedy these negative ramifications. To this end, we develop spectral methods for two unsupervised language processing tasks. The first task is learning lexical representations from unannotated text (e.g., hierarchical clustering of a vocabulary). The second task is estimating parameters of latent-variable models used in NLP applications (e.g., for unsupervised part-of-speech tagging). We show that our spectral algorithms have the following advantages over previous methods: 1. The algorithms provide a new theoretical framework that is amenable to rigorous analysis. In particular, they are shown to be statistically consistent. 2. The algorithms are simple to implement, efficient, and scalable to large amounts of data. They also yield results that are competitive with the state-of-the-art

Columbia University Academic Commons

Recommended from our members

Learning with Joint Inference and Latent Linguistic Structure in Graphical Models

Author: Narad Jason
Publication venue: ScholarWorks@UMass Amherst
Publication date: 17/03/2015
Field of study

Constructing end-to-end NLP systems requires the processing of many types of linguistic information prior to solving the desired end task. A common approach to this problem is to construct a pipeline, one component for each task, with each system\u27s output becoming input for the next. This approach poses two problems. First, errors propagate, and, much like the childhood game of telephone , combining systems in this manner can lead to unintelligible outcomes. Second, each component task requires annotated training data to act as supervision for training the model. These annotations are often expensive and time-consuming to produce, may differ from each other in genre and style, and may not match the intended application. In this dissertation we present a general framework for constructing and reasoning on joint graphical model formulations of NLP problems. Individual models are composed using weighted Boolean logic constraints, and inference is performed using belief propagation. The systems we develop are composed of two parts: one a representation of syntax, the other a desired end task (semantic role labeling, named entity recognition, or relation extraction). By modeling these problems jointly, both models are trained in a single, integrated process, with uncertainty propagated between them. This mitigates the accumulation of errors typical of pipelined approaches. Additionally we propose a novel marginalization-based training method in which the error signal from end task annotations is used to guide the induction of a constrained latent syntactic representation. This allows training in the absence of syntactic training data, where the latent syntactic structure is instead optimized to best support the end task predictions. We find that across many NLP tasks this training method offers performance comparable to fully supervised training of each individual component, and in some instances improves upon it by learning latent structures which are more appropriate for the task

ScholarWorks@UMass Amherst

Macquarie University ResearchOnline