163 research outputs found

    Predicting Linguistic Structure with Incomplete and Cross-Lingual Supervision

    Get PDF
    Contemporary approaches to natural language processing are predominantly based on statistical machine learning from large amounts of text, which has been manually annotated with the linguistic structure of interest. However, such complete supervision is currently only available for the world's major languages, in a limited number of domains and for a limited range of tasks. As an alternative, this dissertation considers methods for linguistic structure prediction that can make use of incomplete and cross-lingual supervision, with the prospect of making linguistic processing tools more widely available at a lower cost. An overarching theme of this work is the use of structured discriminative latent variable models for learning with indirect and ambiguous supervision; as instantiated, these models admit rich model features while retaining efficient learning and inference properties. The first contribution to this end is a latent-variable model for fine-grained sentiment analysis with coarse-grained indirect supervision. The second is a model for cross-lingual word-cluster induction and the application thereof to cross-lingual model transfer. The third is a method for adapting multi-source discriminative cross-lingual transfer models to target languages, by means of typologically informed selective parameter sharing. The fourth is an ambiguity-aware self- and ensemble-training algorithm, which is applied to target language adaptation and relexicalization of delexicalized cross-lingual transfer parsers. The fifth is a set of sequence-labeling models that combine constraints at the level of tokens and types, and an instantiation of these models for part-of-speech tagging with incomplete cross-lingual and crowdsourced supervision. In addition to these contributions, comprehensive overviews are provided of structured prediction with no or incomplete supervision, as well as of learning in the multilingual and cross-lingual settings. Through careful empirical evaluation, it is established that the proposed methods can be used to create substantially more accurate tools for linguistic processing, compared to both unsupervised methods and to recently proposed cross-lingual methods. The empirical support for this claim is particularly strong in the latter case; our models for syntactic dependency parsing and part-of-speech tagging achieve the hitherto best published results for a wide number of target languages, in the setting where no annotated training data is available in the target language

    Posterior Regularization for Learning with Side Information and Weak Supervision

    Get PDF
    Supervised machine learning techniques have been very successful for a variety of tasks and domains including natural language processing, computer vision, and computational biology. Unfortunately, their use often requires creation of large problem-specific training corpora that can make these methods prohibitively expensive. At the same time, we often have access to external problem-specific information that we cannot alway easily incorporate. We might know how to solve the problem in another domain (e.g. for a different language); we might have access to cheap but noisy training data; or a domain expert might be available who would be able to guide a human learner much more efficiently than by simply creating an IID training corpus. A key challenge for weakly supervised learning is then how to incorporate such kinds of auxiliary information arising from indirect supervision. In this thesis, we present Posterior Regularization, a probabilistic framework for structured, weakly supervised learning. Posterior Regularization is applicable to probabilistic models with latent variables and exports a language for specifying constraints or preferences about posterior distributions of latent variables. We show that this language is powerful enough to specify realistic prior knowledge for a variety applications in natural language processing. Additionally, because Posterior Regularization separates model complexity from the complexity of structural constraints, it can be used for structured problems with relatively little computational overhead. We apply Posterior Regularization to several problems in natural language processing including word alignment for machine translation, transfer of linguistic resources across languages and grammar induction. Additionally, we find that we can apply Posterior Regularization to the problem of multi-view learning, achieving particularly good results for transfer learning. We also explore the theoretical relationship between Posterior Regularization and other proposed frameworks for encoding this kind of prior knowledge, and show a close relationship to Constraint Driven Learning as well as to Generalized Expectation Constraints

    Semi-continuous hidden Markov models for speech recognition

    Get PDF

    Unsupervised grammar induction with Combinatory Categorial Grammars

    Get PDF
    Language is a highly structured medium for communication. An idea starts in the speaker's mind (semantics) and is transformed into a well formed, intelligible, sentence via the specific syntactic rules of a language. We aim to discover the fingerprints of this process in the choice and location of words used in the final utterance. What is unclear is how much of this latent process can be discovered from the linguistic signal alone and how much requires shared non-linguistic context, knowledge, or cues. Unsupervised grammar induction is the task of analyzing strings in a language to discover the latent syntactic structure of the language without access to labeled training data. Successes in unsupervised grammar induction shed light on the amount of syntactic structure that is discoverable from raw or part-of-speech tagged text. In this thesis, we present a state-of-the-art grammar induction system based on Combinatory Categorial Grammars. Our choice of syntactic formalism enables the first labeled evaluation of an unsupervised system. This allows us to perform an in-depth analysis of the system’s linguistic strengths and weaknesses. In order to completely eliminate reliance on any supervised systems, we also examine how performance is affected when we use induced word clusters instead of gold-standard POS tags. Finally, we perform a semantic evaluation of induced grammars, providing unique insights into future directions for unsupervised grammar induction systems

    Scalable Text Mining with Sparse Generative Models

    Get PDF
    The information age has brought a deluge of data. Much of this is in text form, insurmountable in scope for humans and incomprehensible in structure for computers. Text mining is an expanding field of research that seeks to utilize the information contained in vast document collections. General data mining methods based on machine learning face challenges with the scale of text data, posing a need for scalable text mining methods. This thesis proposes a solution to scalable text mining: generative models combined with sparse computation. A unifying formalization for generative text models is defined, bringing together research traditions that have used formally equivalent models, but ignored parallel developments. This framework allows the use of methods developed in different processing tasks such as retrieval and classification, yielding effective solutions across different text mining tasks. Sparse computation using inverted indices is proposed for inference on probabilistic models. This reduces the computational complexity of the common text mining operations according to sparsity, yielding probabilistic models with the scalability of modern search engines. The proposed combination provides sparse generative models: a solution for text mining that is general, effective, and scalable. Extensive experimentation on text classification and ranked retrieval datasets are conducted, showing that the proposed solution matches or outperforms the leading task-specific methods in effectiveness, with a order of magnitude decrease in classification times for Wikipedia article categorization with a million classes. The developed methods were further applied in two 2014 Kaggle data mining prize competitions with over a hundred competing teams, earning first and second places

    Gesture in Automatic Discourse Processing

    Get PDF
    Computers cannot fully understand spoken language without access to the wide range of modalities that accompany speech. This thesis addresses the particularly expressive modality of hand gesture, and focuses on building structured statistical models at the intersection of speech, vision, and meaning.My approach is distinguished in two key respects. First, gestural patterns are leveraged to discover parallel structures in the meaning of the associated speech. This differs from prior work that attempted to interpret individual gestures directly, an approach that was prone to a lack of generality across speakers. Second, I present novel, structured statistical models for multimodal language processing, which enable learning about gesture in its linguistic context, rather than in the abstract.These ideas find successful application in a variety of language processing tasks: resolving ambiguous noun phrases, segmenting speech into topics, and producing keyframe summaries of spoken language. In all three cases, the addition of gestural features -- extracted automatically from video -- yields significantly improved performance over a state-of-the-art text-only alternative. This marks the first demonstration that hand gesture improves automatic discourse processing

    Essential Speech and Language Technology for Dutch: Results by the STEVIN-programme

    Get PDF
    Computational Linguistics; Germanic Languages; Artificial Intelligence (incl. Robotics); Computing Methodologie

    Essays on monetary policy

    Get PDF
    This is a summary of the four chapters that comprise this D.Phil. thesis.1 This thesis examines two major aspects of policy. The first two chapters examine monetary policy communication. The second two examine the causes and consequences of a time-varying reaction function of the central bank. 1. Central Bank Communication and Higher Moments In this first chapter, I investigate which parts of central bank communication affect the higher moments of expectations embedded in financial market pricing. Much of the literature on central bank communication has focused on how communication impacts the conditional expected mean of future policy. But this chapter asks how central bank communication affects the second and third moments of the financial market’s perceived distribution of future policy decisions. I use high frequency changes in option-prices around Bank of England communications to show that communication affects higher moments of the distribution of expectations. I find that the relevant communication in the case of the Bank of England is primarily confined to the information contained in the Q&A and Statement, rather than the longer Inflation Report. 2. Mark My Words: The Transmission of Central Bank Communication to the General Public via the Print Media In the second chapter, jointly with James Brookes, I ask how central banks can change their communication in order to receive greater newspaper coverage, if that is indeed an objective of theirs. We use computational linguistics combined with an event-study methodology to measure the extent of news coverage a central bank communication receives, and the textual features that might cause a communication to be more (or less) likely to be considered newsworthy. We consider the case of the Bank of England, and estimate the relationship between news coverage and central bank communication implied by our model. We find that the interaction between the state of the economy and the way in which the Bank of England writes its communication is important for determining news coverage. We provide concrete suggestions for ways in which central bank communication can increase its news coverage by improving readability in line with our results. 3. Uncertainty and Time-varying Monetary Policy In the third chapter, together with Michael McMahon, I investigate the links between uncertainty and the reaction function of the Federal Reserve. US macroeconomic evidence points to higher economic volatility being positively correlated with more aggressive monetary policy responses. This represents a challenge for “good policy” explanations of the Great Moderation which map a more aggressive monetary response to reduced volatility. While some models of monetary policy under uncertainty can match this comovement qualitatively, these models do not, on their own, account for the reaction-function changes quantitatively for reasonable changes in uncertainty. We present a number of alternative sources of uncertainty that we believe should be more prevalent in the literature on monetary policy. 4. The Element(s) of Surprise In the final chapter, together with Michael McMahon, I analyse the implications for monetary surprises of time-varying reaction functions. Monetary policy surprises are driven by several separate forces. We argue that many of the surprises in monetary policy instruments are driven by unexpected changes in the reaction function of policymakers. We show that these reaction function surprises are fundamentally different from monetary policy shocks in their effect on the economy, are likely endogenous to the state, and unable to removed using current orthogonalisation procedures. As a result monetary policy surprises should not be used to measure the effect of a monetary policy “shock” to the economy. We find evidence for reaction function surprises in the features of the high frequency asset price surprise data and in analysing the text of a major US economic forecaster. Further, we show that periods in which an estimated macro model suggests policymakers have switched reaction functions provide the majority of variation in monetary policy surprises
    corecore