3 research outputs found
Knowledge-enhanced neural grammar Induction
Natural language is usually presented as a word sequence, but the inherent structure
of language is not necessarily sequential. Automatic grammar induction for natural
language is a long-standing research topic in the field of computational linguistics and
still remains an open problem today. From the perspective of cognitive science, the
goal of a grammar induction system is to mimic children: learning a grammar that can
generalize to infinitely many utterances by only consuming finite data. With regard to
computational linguistics, an automatic grammar induction system could be beneficial
for a wide variety of natural language processing (NLP) applications: providing syntactic analysis explicitly for a pipeline or a joint learning system; injecting structural
bias implicitly into an end-to-end model.
Typically, approaches to grammar induction only have access to raw text. Due to
the huge search space of trees as well as data sparsity and ambiguity issues, grammar
induction is a difficult problem. Thanks to the rapid development of neural networks
and their capacity of over-parameterization and continuous representation learning,
neural models have been recently introduced to grammar induction. Given its large
capacity, introducing external knowledge into a neural system is an effective approach
in practice, especially for an unsupervised problem. This thesis explores how to incorporate external knowledge into neural grammar induction models. We develop several approaches to combine different types of knowledge with neural grammar induction models on two grammar formalisms — constituency and dependency grammar.
We first investigate how to inject symbolic knowledge, universal linguistic rules,
into unsupervised dependency parsing. In contrast to previous state-of-the-art models that utilize time-consuming global inference, we propose a neural transition-based
parser using variational inference. Our parser is able to employ rich features and supports inference in linear time for both training and testing. The core component in our parser is posterior regularization, where the posterior distribution of the dependency trees is constrained by the universal linguistic rules. The resulting parser outperforms previous unsupervised transition-based dependency parsers and achieves performance comparable to global inference-based models. Our parser also substantially increases parsing speed over global inference-based models.
Recently, tree structures have been considered as latent variables that are learned
through downstream NLP tasks, such as language modeling and natural language inference. More specifically, auxiliary syntax-aware components are embedded into the
neural networks and are trained end-to-end on the downstream tasks. However, such latent tree models either struggle to produce linguistically plausible tree structures, or require an external biased parser to obtain good parsing performance. In the second part of this thesis, we focus on constituency structure and propose to use imitation learning to couple two heterogeneous latent tree models: we transfer the knowledge learned from a continuous latent tree model trained using language modeling to a discrete one, and further fine-tune the discrete model using a natural language inference objective.
Through this two-stage training scheme, the discrete latent tree model achieves stateof-the-art unsupervised parsing performance.
The transformer is a newly proposed neural model for NLP. Transformer-based
pre-trained language models (PLMs) like BERT have achieved remarkable success on
various NLP tasks by training on an enormous corpus using word prediction tasks. Recent studies show that PLMs can learn considerable syntactical knowledge in a syntaxagnostic manner. In the third part of this thesis, we leverage PLMs as a source of
external knowledge. We propose a parameter-free approach to select syntax-sensitive
self-attention heads from PLMs and perform chart-based unsupervised constituency
parsing. In contrast to previous approaches, our head-selection approach only relies
on raw text without any annotated development data. Experimental results on both
English and eight other languages show that our approach achieves competitive performance
Supervised Training on Synthetic Languages: A Novel Framework for Unsupervised Parsing
This thesis focuses on unsupervised dependency parsing—parsing sentences of a language into dependency trees without accessing the training data of that language. Different from most prior work that uses unsupervised learning to estimate the parsing parameters, we estimate the parameters by supervised training on synthetic languages. Our parsing framework has three major components: Synthetic language generation gives a rich set of training languages by mix-and-match over the real languages; surface-form feature extraction maps an unparsed corpus of a language into a fixed-length vector as the syntactic signature of that language; and, finally, language-agnostic parsing incorporates the syntactic signature during parsing so that the decision on each word token is reliant upon the general syntax of the target language.
The fundamental question we are trying to answer is whether some useful information about the syntax of a language could be inferred from its surface-form evidence (unparsed corpus). This is the same question that has been implicitly asked by previous papers on unsupervised parsing, which only assumes an unparsed corpus to be available for the target language. We show that, indeed, useful features of the target language can be extracted automatically from an unparsed corpus, which consists only of gold part-of-speech (POS) sequences. Providing these features to our neural parser enables it to parse sequences like those in the corpus. Strikingly, our system has no supervision in the target language. Rather, it is a multilingual system that is trained end-to-end on a variety of other languages, so it learns a feature extractor that works well.
This thesis contains several large-scale experiments requiring hundreds of thousands of CPU-hours. To our knowledge, this is the largest study of unsupervised parsing yet attempted. We show experimentally across multiple languages: (1) Features computed from the unparsed corpus improve parsing accuracy. (2) Including thousands of synthetic languages in the training yields further improvement. (3) Despite being computed from unparsed corpora, our learned task-specific features beat previous works’ interpretable typological features that require parsed corpora or expert categorization of the language