52 research outputs found

    Fast Rhetorical Structure Theory Discourse Parsing

    Full text link
    In recent years, There has been a variety of research on discourse parsing, particularly RST discourse parsing. Most of the recent work on RST parsing has focused on implementing new types of features or learning algorithms in order to improve accuracy, with relatively little focus on efficiency, robustness, or practical use. Also, most implementations are not widely available. Here, we describe an RST segmentation and parsing system that adapts models and feature sets from various previous work, as described below. Its accuracy is near state-of-the-art, and it was developed to be fast, robust, and practical. For example, it can process short documents such as news articles or essays in less than a second

    Discourse Structure in Machine Translation Evaluation

    Full text link
    In this article, we explore the potential of using sentence-level discourse structure for machine translation evaluation. We first design discourse-aware similarity measures, which use all-subtree kernels to compare discourse parse trees in accordance with the Rhetorical Structure Theory (RST). Then, we show that a simple linear combination with these measures can help improve various existing machine translation evaluation metrics regarding correlation with human judgments both at the segment- and at the system-level. This suggests that discourse information is complementary to the information used by many of the existing evaluation metrics, and thus it could be taken into account when developing richer evaluation metrics, such as the WMT-14 winning combined metric DiscoTKparty. We also provide a detailed analysis of the relevance of various discourse elements and relations from the RST parse trees for machine translation evaluation. In particular we show that: (i) all aspects of the RST tree are relevant, (ii) nuclearity is more useful than relation type, and (iii) the similarity of the translation RST tree to the reference tree is positively correlated with translation quality.Comment: machine translation, machine translation evaluation, discourse analysis. Computational Linguistics, 201

    Automatic inference of causal reasoning chains from student essays

    Get PDF
    While there has been an increasing focus on higher-level thinking skills arising from the Common Core Standards, many high-school and middle-school students struggle to combine and integrate information from multiple sources when writing essays. Writing is an important learning skill, and there is increasing evidence that writing about a topic develops a deeper understanding in the student. However, grading essays is time consuming for teachers, resulting in an increasing focus on shallower forms of assessment that are easier to automate, such as multiple-choice tests. Existing essay grading software has attempted to ease this burden but relies on shallow lexico-syntactic features and is unable to understand the structure or validity of a student’s arguments or explanations. Without the ability to understand a student’s reasoning processes, it is impossible to write automated formative assessment systems to assist students with improving their thinking skills through essay writing. In order to understand the arguments put forth in an explanatory essay in the science domain, we need a method of representing the causal structure of a piece of explanatory text. Psychologists use a representation called a causal model to represent a student\u27s understanding of an explanatory text. This consists of a number of core concepts, and a set of causal relations linking them into one or more causal chains, forming a causal model. In this thesis I present a novel system for automatically constructing causal models from student scientific essays using Natural Language Processing (NLP) techniques. The problem was decomposed into 4 sub-problems - assigning essay concepts to words, detecting causal-relations between these concepts, resolving coreferences within each essay, and using the structure of the whole essay to reconstruct a causal model. Solutions to each of these sub-problems build upon the predictions from the solutions to earlier problems, forming a sequential pipeline of models. Designing a system in this way allows later models to correct for false positive predictions from downstream models. However, this also has the disadvantage that errors made in earlier models can propagate through the system, negatively impacting the upstream models, and limiting their accuracy. Producing robust solutions for the initial 2 sub problems, detecting concepts, and parsing causal relations between them, was critical in building a robust system. A number of sequence labeling models were trained to classify the concepts associated with each word, with the most effective approach being a bidirectional recurrent neural network (RNN), a deep learning model commonly applied to word labeling problems. This is because the RNN used pre-trained word embeddings to better generalize to rarer words, and was able to use information from both ends of each sentence to infer a word\u27s concept. The concepts predicted by this model were then used to develop causal relation parsing models for detecting causal connections between these concepts. A shift-reduce dependency parsing model was trained using the SEARN algorithm and out-performed a number of other approaches by better utilizing the structure of the problem and directly optimizing the error metric used. Two pre-trained coreference resolution systems were used to resolve coreferences within the essays. However a word tagging model trained to predict anaphors combined with a heuristic for determining the antecedent out-performed these two systems. Finally, a model was developed for parsing a causal model from an entire essay, utilizing the solutions to the three previous problems. A beam search algorithm was used to produce multiple parses for each sentence, which in turn were combined to generate multiple candidate causal models for each student essay. A reranking algorithm was then used to select the optimal causal model from all of the generated candidates. An important contribution of this work is that it represents a system for parsing a complete causal model of a scientific essay from a student\u27s written answer. Existing systems have been developed to parse individual causal relations, but no existing system attempts to parse a sequence of linked causal relations forming a causal model from an explanatory scientific essay. It is hoped that this work can lead to the development of more robust essay grading software and formative assessment tools, and can be extended to build solutions for extracting causality from text in other domains. In addition, I also present 2 novel approaches for optimizing the micro-F1 score within the design of two of the algorithms studied: the dependency parser and the reranking algorithm. The dependency parser uses a custom cost function to estimate the impact of parsing mistakes on the overall micro-F1 score, while the reranking algorithm allows the micro-F1 score to be optimized by tuning the beam search parameter to balance recall and precision

    Learning Chinese language structures with multiple views

    Get PDF
    Motivated by the inadequacy of single view approaches in many areas in NLP, we study multi-view Chinese language processing, including word segmentation, part-of-speech (POS) tagging, syntactic parsing and semantic role labeling (SRL), in this thesis. We consider three situations of multiple views in statistical NLP: (1) Heterogeneous computational models have been designed for a given problem; (2) Heterogeneous annotation data is available to train systems; (3) Supervised and unsupervised machine learning techniques are applicable. First, we comparatively analyze successful single view approaches for Chinese lexical, syntactic and semantic processing. Our analysis highlights the diversity between heterogenous systems built on different views, and motivates us to improve the state-of-the-art by combining or integrating heterogeneous approaches. Second, we study the annotation ensemble problem, i.e. learning from multiple data sets under different annotation standards. We propose a series of generalized stacking models to effectively utilize heterogeneous labeled data to reduce approximation errors for word segmentation and parsing. Finally, we are concerned with bridging the gap between unsupervised and supervised learning paradigms. We introduce feature induction solutions that harvest useful linguistic knowledge from large-scale unlabeled data and effectively use them as new features to enhance discriminative learning based systems. For word segmentation, we present a comparative study of word-based and character-based approaches. Inspired by the diversity of the two views, we design a novel stacked sub-word tagging model for joint word segmentation and POS tagging, which is robust to integrate different models, even models trained on heterogeneous annotations. To benefit from unsupervised word segmentation, we derive expressive string knowledge from unlabeled data which significantly enhances a strong supervised segmenter. For POS tagging, we introduce two linguistically motivated improvements: (1) combining syntax-free sequential tagging and syntax-based chart parsing results to better capture syntagmatic lexical relations and (2) integrating word clusters acquired from unlabeled data to better capture paradigmatic lexical relations. For syntactic parsing, we present a comparative analysis for generative PCFG-LA constituency parsing and discriminative graph-based dependency parsing. To benefit from the diversity of parsing in different formalisms, we implement a previously introduced stacking method and propose a novel Bagging model to combine complementary strengths of grammar-free and grammar-based models. In addition to the study on the syntactic formalism, we also propose a reranking model to explore heterogenous treebanks that are labeled under different annotation scheme. Finally, we continue our efforts on combining strengths of supervised and unsupervised learning, and evaluate the impact of word clustering on different syntactic processing tasks. Our work on SRL focus on improving the full parsing method with linguistically rich features and a chunking strategy. Furthermore, we developed a partial parsing based semantic chunking method, which has complementary strengths to the full parsing based method. Based on our work, Zhuang and Zong (2010) successfully improve the state-of-the-art by combining full and partial parsing based SRL systems.Motiviert durch die Unzulänglichkeit der Ansätze mit dem einzigen Ansicht in vielen Bereichen in NLP, untersuchen wir Chinesische Sprache Verarbeitung mit mehrfachen Ansichten, einschließlich Wortsegmentierung, Part-of-Speech (POS)-Tagging und syntaktische Parsing und die Kennzeichnung der semantische Rolle (SRL) in dieser Arbeit . Wir betrachten drei Situationen von mehreren Ansichten in der statistischen NLP: (1) Heterogene computergestützte Modelle sind für ein gegebenes Problem entwurft, (2) Heterogene Annotationsdaten sind verfügbar, um die Systeme zu trainieren, (3) überwachten und unüberwachten Methoden des maschinellen Lernens sind zur Verfügung gestellt. Erstens, wir analysieren vergleichsweise erfolgreiche Ansätze mit einzigen Ansicht für chinesische lexikalische, syntaktische und semantische Verarbeitung. Unsere Analyse zeigt die Unterschiede zwischen den heterogenen Systemen, die auf verschiedenen Ansichten gebaut werden, und motiviert uns, die state-of-the-Art durch die Kombination oder Integration heterogener Ansätze zu verbessern. Zweitens, untersuchen wir die Annotation Ensemble Problem, d.h. das Lernen aus mehreren Datensätzen unter verschiedenen Annotation Standards. Wir schlagen eine Reihe allgemeiner Stapeln Modelle, um eine effektive Nutzung heterogener Daten zu beschriften, und um Approximationsfehler für Wort Segmentierung und Analyse zu reduzieren. Schließlich sind wir besorgt mit der Überbrückung der Kluft zwischen unüberwachten und überwachten Lernens Paradigmen. Wir führen Induktion Feature-Lösungen, die nützliche Sprachkenntnisse von großflächigen unmarkierter Daten ernte, und die effektiv nutzen als neue Features, um die unterscheidenden Lernen basierten Systemen zu verbessern. Für die Wortsegmentierung, präsentieren wir eine vergleichende Studie der Wort-basierte und Charakter-basierten Ansätzen. Inspiriert von der Vielfalt der beiden Ansichten, entwerfen wir eine neuartige gestapelt Sub-Wort-Tagging-Modell für gemeinsame Wort-Segmentierung und POS-Tagging, die robust ist, um verschiedene Modelle zu integrieren, auch Modelle auf heterogenen Annotationen geschult. Um den unbeaufsichtigten Wortsegmentierung zu profitieren, leiten wir ausdrucksstarke Zeichenfolge Wissen von unmarkierten Daten. Diese Methode hat eine überwachte Methode erheblich verbessert. Für POS-Tagging, führen wir zwei linguistisch motiviert Verbesserungen: (1) die Kombination von Syntaxfreie sequentielle Tagging und Syntaxbasierten Grafik-Parsing-Ergebnisse, um syntagmatische lexikalische Beziehungen besser zu erfassen (2) die Integration von Wortclusteren von nicht markierte Daten, um die paradigmatische lexikalische Beziehungen besser zu erfassen. Für syntaktische Parsing präsentieren wir eine vergleichenbare Analyse für generative PCFG-LA Wahlkreis Parsing und diskriminierende Graphen-basierte Abhängigkeit Parsing. Um aus der Vielfalt der Parsen in unterschiedlichen Formalismen zu profitieren, setzen wir eine zuvor eingeführte Stacking-Methode und schlagen eine neuartige Schrumpfbeutel-Modell vor, um die ergänzenden Stärken der Grammatik und Grammatik-free-basierte Modelle zu kombinieren. Neben dem syntaktischen Formalismus, wir schlagen auch ein Modell, um heterogene reranking Baumbanken, die unter verschiedenen Annotationsschema beschriftet sind zu erkunden. Schließlich setzen wir unsere Bemühungen auf die Bündelung von Stärken des überwachten und unüberwachten Lernen, und bewerten wir die Auswirkungen der Wort-Clustering auf verschiedene syntaktische Verarbeitung Aufgaben. Unsere Arbeit an SRL ist konzentriert auf die Verbesserung der vollen Parsingsmethode mit linguistischen umfangreichen Funktionen und einer Chunkingstrategie. Weiterhin entwickelten wir eine semantische Chunkingmethode basiert auf dem partiellen Parsing, die die komplementäre Stärken gegen die die Methode basiert auf dem vollen Parsing hat. Basiert auf unserer Arbeit, Zhuang und Zong (2010) hat den aktuelle Stand erfolgreich verbessert durch die Kombination von voll-und partielle-Parsing basierte SRL Systeme

    Syntax-based machine translation using dependency grammars and discriminative machine learning

    Get PDF
    Machine translation underwent huge improvements since the groundbreaking introduction of statistical methods in the early 2000s, going from very domain-specific systems that still performed relatively poorly despite the painstakingly crafting of thousands of ad-hoc rules, to general-purpose systems automatically trained on large collections of bilingual texts which manage to deliver understandable translations that convey the general meaning of the original input. These approaches however still perform quite below the level of human translators, typically failing to convey detailed meaning and register, and producing translations that, while readable, are often ungrammatical and unidiomatic. This quality gap, which is considerably large compared to most other natural language processing tasks, has been the focus of the research in recent years, with the development of increasingly sophisticated models that attempt to exploit the syntactical structure of human languages, leveraging the technology of statistical parsers, as well as advanced machine learning methods such as marging-based structured prediction algorithms and neural networks. The translation software itself became more complex in order to accommodate for the sophistication of these advanced models: the main translation engine (the decoder) is now often combined with a pre-processor which reorders the words of the source sentences to a target language word order, or with a post-processor that ranks and selects a translation according according to fine model from a list of candidate translations generated by a coarse model. In this thesis we investigate the statistical machine translation problem from various angles, focusing on translation from non-analytic languages whose syntax is best described by fluid non-projective dependency grammars rather than the relatively strict phrase-structure grammars or projectivedependency grammars which are most commonly used in the literature. We propose a framework for modeling word reordering phenomena between language pairs as transitions on non-projective source dependency parse graphs. We quantitatively characterize reordering phenomena for the German-to-English language pair as captured by this framework, specifically investigating the incidence and effects of the non-projectivity of source syntax and the non-locality of word movement w.r.t. the graph structure. We evaluated several variants of hand-coded pre-ordering rules in order to assess the impact of these phenomena on translation quality. We propose a class of dependency-based source pre-ordering approaches that reorder sentences based on a flexible models trained by SVMs and and several recurrent neural network architectures. We also propose a class of translation reranking models, both syntax-free and source dependency-based, which make use of a type of neural networks known as graph echo state networks which is highly flexible and requires extremely little training resources, overcoming one of the main limitations of neural network models for natural language processing tasks

    Getting Past the Language Gap: Innovations in Machine Translation

    Get PDF
    In this chapter, we will be reviewing state of the art machine translation systems, and will discuss innovative methods for machine translation, highlighting the most promising techniques and applications. Machine translation (MT) has benefited from a revitalization in the last 10 years or so, after a period of relatively slow activity. In 2005 the field received a jumpstart when a powerful complete experimental package for building MT systems from scratch became freely available as a result of the unified efforts of the MOSES international consortium. Around the same time, hierarchical methods had been introduced by Chinese researchers, which allowed the introduction and use of syntactic information in translation modeling. Furthermore, the advances in the related field of computational linguistics, making off-the-shelf taggers and parsers readily available, helped give MT an additional boost. Yet there is still more progress to be made. For example, MT will be enhanced greatly when both syntax and semantics are on board: this still presents a major challenge though many advanced research groups are currently pursuing ways to meet this challenge head-on. The next generation of MT will consist of a collection of hybrid systems. It also augurs well for the mobile environment, as we look forward to more advanced and improved technologies that enable the working of Speech-To-Speech machine translation on hand-held devices, i.e. speech recognition and speech synthesis. We review all of these developments and point out in the final section some of the most promising research avenues for the future of MT

    The Best Explanation:Beyond Right and Wrong in Question Answering

    Get PDF

    Modeling Dependencies in Natural Languages with Latent Variables

    Get PDF
    In this thesis, we investigate the use of latent variables to model complex dependencies in natural languages. Traditional models, which have a fixed parameterization, often make strong independence assumptions that lead to poor performance. This problem is often addressed by incorporating additional dependencies into the model (e.g., using higher order N-grams for language modeling). These added dependencies can increase data sparsity and/or require expert knowledge, together with trial and error, in order to identify and incorporate the most important dependencies (as in lexicalized parsing models). Traditional models, when developed for a particular genre, domain, or language, are also often difficult to adapt to another. In contrast, previous work has shown that latent variable models, which automatically learn dependencies in a data-driven way, are able to flexibly adjust the number of parameters based on the type and the amount of training data available. We have created several different types of latent variable models for a diverse set of natural language processing applications, including novel models for part-of-speech tagging, language modeling, and machine translation, and an improved model for parsing. These models perform significantly better than traditional models. We have also created and evaluated three different methods for improving the performance of latent variable models. While these methods can be applied to any of our applications, we focus our experiments on parsing. The first method involves self-training, i.e., we train models using a combination of gold standard training data and a large amount of automatically labeled training data. We conclude from a series of experiments that the latent variable models benefit much more from self-training than conventional models, apparently due to their flexibility to adjust their model parameterization to learn more accurate models from the additional automatically labeled training data. The second method takes advantage of the variability among latent variable models to combine multiple models for enhanced performance. We investigate several different training protocols to combine self-training with model combination. We conclude that these two techniques are complementary to each other and can be effectively combined to train very high quality parsing models. The third method replaces the generative multinomial lexical model of latent variable grammars with a feature-rich log-linear lexical model to provide a principled solution to address data sparsity, handle out-of-vocabulary words, and exploit overlapping features during model induction. We conclude from experiments that the resulting grammars are able to effectively parse three different languages. This work contributes to natural language processing by creating flexible and effective latent variable models for several different languages. Our investigation of self-training, model combination, and log-linear models also provides insights into the effective application of these machine learning techniques to other disciplines
    • …
    corecore