3 research outputs found

    Knowledge-enhanced neural grammar Induction

    Get PDF
    Natural language is usually presented as a word sequence, but the inherent structure of language is not necessarily sequential. Automatic grammar induction for natural language is a long-standing research topic in the field of computational linguistics and still remains an open problem today. From the perspective of cognitive science, the goal of a grammar induction system is to mimic children: learning a grammar that can generalize to infinitely many utterances by only consuming finite data. With regard to computational linguistics, an automatic grammar induction system could be beneficial for a wide variety of natural language processing (NLP) applications: providing syntactic analysis explicitly for a pipeline or a joint learning system; injecting structural bias implicitly into an end-to-end model. Typically, approaches to grammar induction only have access to raw text. Due to the huge search space of trees as well as data sparsity and ambiguity issues, grammar induction is a difficult problem. Thanks to the rapid development of neural networks and their capacity of over-parameterization and continuous representation learning, neural models have been recently introduced to grammar induction. Given its large capacity, introducing external knowledge into a neural system is an effective approach in practice, especially for an unsupervised problem. This thesis explores how to incorporate external knowledge into neural grammar induction models. We develop several approaches to combine different types of knowledge with neural grammar induction models on two grammar formalisms — constituency and dependency grammar. We first investigate how to inject symbolic knowledge, universal linguistic rules, into unsupervised dependency parsing. In contrast to previous state-of-the-art models that utilize time-consuming global inference, we propose a neural transition-based parser using variational inference. Our parser is able to employ rich features and supports inference in linear time for both training and testing. The core component in our parser is posterior regularization, where the posterior distribution of the dependency trees is constrained by the universal linguistic rules. The resulting parser outperforms previous unsupervised transition-based dependency parsers and achieves performance comparable to global inference-based models. Our parser also substantially increases parsing speed over global inference-based models. Recently, tree structures have been considered as latent variables that are learned through downstream NLP tasks, such as language modeling and natural language inference. More specifically, auxiliary syntax-aware components are embedded into the neural networks and are trained end-to-end on the downstream tasks. However, such latent tree models either struggle to produce linguistically plausible tree structures, or require an external biased parser to obtain good parsing performance. In the second part of this thesis, we focus on constituency structure and propose to use imitation learning to couple two heterogeneous latent tree models: we transfer the knowledge learned from a continuous latent tree model trained using language modeling to a discrete one, and further fine-tune the discrete model using a natural language inference objective. Through this two-stage training scheme, the discrete latent tree model achieves stateof-the-art unsupervised parsing performance. The transformer is a newly proposed neural model for NLP. Transformer-based pre-trained language models (PLMs) like BERT have achieved remarkable success on various NLP tasks by training on an enormous corpus using word prediction tasks. Recent studies show that PLMs can learn considerable syntactical knowledge in a syntaxagnostic manner. In the third part of this thesis, we leverage PLMs as a source of external knowledge. We propose a parameter-free approach to select syntax-sensitive self-attention heads from PLMs and perform chart-based unsupervised constituency parsing. In contrast to previous approaches, our head-selection approach only relies on raw text without any annotated development data. Experimental results on both English and eight other languages show that our approach achieves competitive performance

    Supervised Training on Synthetic Languages: A Novel Framework for Unsupervised Parsing

    Get PDF
    This thesis focuses on unsupervised dependency parsing—parsing sentences of a language into dependency trees without accessing the training data of that language. Different from most prior work that uses unsupervised learning to estimate the parsing parameters, we estimate the parameters by supervised training on synthetic languages. Our parsing framework has three major components: Synthetic language generation gives a rich set of training languages by mix-and-match over the real languages; surface-form feature extraction maps an unparsed corpus of a language into a fixed-length vector as the syntactic signature of that language; and, finally, language-agnostic parsing incorporates the syntactic signature during parsing so that the decision on each word token is reliant upon the general syntax of the target language. The fundamental question we are trying to answer is whether some useful information about the syntax of a language could be inferred from its surface-form evidence (unparsed corpus). This is the same question that has been implicitly asked by previous papers on unsupervised parsing, which only assumes an unparsed corpus to be available for the target language. We show that, indeed, useful features of the target language can be extracted automatically from an unparsed corpus, which consists only of gold part-of-speech (POS) sequences. Providing these features to our neural parser enables it to parse sequences like those in the corpus. Strikingly, our system has no supervision in the target language. Rather, it is a multilingual system that is trained end-to-end on a variety of other languages, so it learns a feature extractor that works well. This thesis contains several large-scale experiments requiring hundreds of thousands of CPU-hours. To our knowledge, this is the largest study of unsupervised parsing yet attempted. We show experimentally across multiple languages: (1) Features computed from the unparsed corpus improve parsing accuracy. (2) Including thousands of synthetic languages in the training yields further improvement. (3) Despite being computed from unparsed corpora, our learned task-specific features beat previous works’ interpretable typological features that require parsed corpora or expert categorization of the language