23 research outputs found
The Importance of Category Labels in Grammar Induction with Child-directed Utterances
Recent progress in grammar induction has shown that grammar induction is
possible without explicit assumptions of language-specific knowledge. However,
evaluation of induced grammars usually has ignored phrasal labels, an essential
part of a grammar. Experiments in this work using a labeled evaluation metric,
RH, show that linguistically motivated predictions about grammar sparsity and
use of categories can only be revealed through labeled evaluation. Furthermore,
depth-bounding as an implementation of human memory constraints in grammar
inducers is still effective with labeled evaluation on multilingual transcribed
child-directed utterances.Comment: The 16th International Conference on Parsing Technologies (IWPT 2020
Co-training an Unsupervised Constituency Parser with Weak Supervision
We introduce a method for unsupervised parsing that relies on bootstrapping
classifiers to identify if a node dominates a specific span in a sentence.
There are two types of classifiers, an inside classifier that acts on a span,
and an outside classifier that acts on everything outside of a given span.
Through self-training and co-training with the two classifiers, we show that
the interplay between them helps improve the accuracy of both, and as a result,
effectively parse. A seed bootstrapping technique prepares the data to train
these classifiers. Our analyses further validate that such an approach in
conjunction with weak supervision using prior branching knowledge of a known
language (left/right-branching) and minimal heuristics injects strong inductive
bias into the parser, achieving 63.1 F on the English (PTB) test set. In
addition, we show the effectiveness of our architecture by evaluating on
treebanks for Chinese (CTB) and Japanese (KTB) and achieve new state-of-the-art
results. Our code and pre-trained models are available at
https://github.com/Nickil21/weakly-supervised-parsing.Comment: Accepted to Findings of ACL 202
Deep Clustering of Text Representations for Supervision-free Probing of Syntax
We explore deep clustering of text representations for unsupervised model
interpretation and induction of syntax. As these representations are
high-dimensional, out-of-the-box methods like KMeans do not work well. Thus,
our approach jointly transforms the representations into a lower-dimensional
cluster-friendly space and clusters them. We consider two notions of syntax:
Part of speech Induction (POSI) and constituency labelling (CoLab) in this
work. Interestingly, we find that Multilingual BERT (mBERT) contains surprising
amount of syntactic knowledge of English; possibly even as much as English BERT
(EBERT). Our model can be used as a supervision-free probe which is arguably a
less-biased way of probing. We find that unsupervised probes show benefits from
higher layers as compared to supervised probes. We further note that our
unsupervised probe utilizes EBERT and mBERT representations differently,
especially for POSI. We validate the efficacy of our probe by demonstrating its
capabilities as an unsupervised syntax induction technique. Our probe works
well for both syntactic formalisms by simply adapting the input
representations. We report competitive performance of our probe on 45-tag
English POSI, state-of-the-art performance on 12-tag POSI across 10 languages,
and competitive results on CoLab. We also perform zero-shot syntax induction on
resource impoverished languages and report strong results
GFlowNet-EM for learning compositional latent variable models
Latent variable models (LVMs) with discrete compositional latents are an
important but challenging setting due to a combinatorially large number of
possible configurations of the latents. A key tradeoff in modeling the
posteriors over latents is between expressivity and tractable optimization. For
algorithms based on expectation-maximization (EM), the E-step is often
intractable without restrictive approximations to the posterior. We propose the
use of GFlowNets, algorithms for sampling from an unnormalized density by
learning a stochastic policy for sequential construction of samples, for this
intractable E-step. By training GFlowNets to sample from the posterior over
latents, we take advantage of their strengths as amortized variational
inference algorithms for complex distributions over discrete structures. Our
approach, GFlowNet-EM, enables the training of expressive LVMs with discrete
compositional latents, as shown by experiments on non-context-free grammar
induction and on images using discrete variational autoencoders (VAEs) without
conditional independence enforced in the encoder.Comment: ICML 2023; code: https://github.com/GFNOrg/GFlowNet-E
Recommended from our members
Spectral Methods for Natural Language Processing
Many state-of-the-art results in natural language processing (NLP) are achieved with statistical models involving latent variables. Unfortunately, computational problems associated with such models (for instance, finding the optimal parameter values) are typically intractable, forcing practitioners to rely on heuristic methods without strong guarantees. While heuristics are often sufficient for empirical purposes, their de-emphasis on theoretical aspects has certain negative ramifications. First, it can impede the development of rigorous theoretical understanding which can generate new ideas and algorithms. Second, it can lead to black art solutions that are unreliable and difficult to reproduce.
In this thesis, we argue that spectral methods---that is, methods that use singular value decomposition or other similar matrix or tensor factorization---can effectively remedy these negative ramifications. To this end, we develop spectral methods for two unsupervised language processing tasks. The first task is learning lexical representations from unannotated text (e.g., hierarchical clustering of a vocabulary). The second task is estimating parameters of latent-variable models used in NLP applications (e.g., for unsupervised part-of-speech tagging). We show that our spectral algorithms have the following advantages over previous methods:
1. The algorithms provide a new theoretical framework that is amenable to rigorous analysis. In particular, they are shown to be statistically consistent.
2. The algorithms are simple to implement, efficient, and scalable to large amounts of data. They also yield results that are competitive with the state-of-the-art
Recommended from our members
Learning with Joint Inference and Latent Linguistic Structure in Graphical Models
Constructing end-to-end NLP systems requires the processing of many types of linguistic information prior to solving the desired end task. A common approach to this problem is to construct a pipeline, one component for each task, with each system\u27s output becoming input for the next. This approach poses two problems. First, errors propagate, and, much like the childhood game of telephone , combining systems in this manner can lead to unintelligible outcomes. Second, each component task requires annotated training data to act as supervision for training the model. These annotations are often expensive and time-consuming to produce, may differ from each other in genre and style, and may not match the intended application.
In this dissertation we present a general framework for constructing and reasoning on joint graphical model formulations of NLP problems. Individual models are composed using weighted Boolean logic constraints, and inference is performed using belief propagation. The systems we develop are composed of two parts: one a representation of syntax, the other a desired end task (semantic role labeling, named entity recognition, or relation extraction). By modeling these problems jointly, both models are trained in a single, integrated process, with uncertainty propagated between them. This mitigates the accumulation of errors typical of pipelined approaches.
Additionally we propose a novel marginalization-based training method in which the error signal from end task annotations is used to guide the induction of a constrained latent syntactic representation. This allows training in the absence of syntactic training data, where the latent syntactic structure is instead optimized to best support the end task predictions. We find that across many NLP tasks this training method offers performance comparable to fully supervised training of each individual component, and in some instances improves upon it by learning latent structures which are more appropriate for the task