Search CORE

243 research outputs found

Detecting grammatical errors with treebank-induced, probabilistic parsers

Author: Wagner Joachim
Publication venue: Dublin City University. School of Computing
Publication date: 01/03/2012
Field of study

Today's grammar checkers often use hand-crafted rule systems that define acceptable language. The development of such rule systems is labour-intensive and has to be repeated for each language. At the same time, grammars automatically induced from syntactically annotated corpora (treebanks) are successfully employed in other applications, for example text understanding and machine translation. At first glance, treebank-induced grammars seem to be unsuitable for grammar checking as they massively over-generate and fail to reject ungrammatical input due to their high robustness. We present three new methods for judging the grammaticality of a sentence with probabilistic, treebank-induced grammars, demonstrating that such grammars can be successfully applied to automatically judge the grammaticality of an input string. Our best-performing method exploits the differences between parse results for grammars trained on grammatical and ungrammatical treebanks. The second approach builds an estimator of the probability of the most likely parse using grammatical training data that has previously been parsed and annotated with parse probabilities. If the estimated probability of an input sentence (whose grammaticality is to be judged by the system) is higher by a certain amount than the actual parse probability, the sentence is flagged as ungrammatical. The third approach extracts discriminative parse tree fragments in the form of CFG rules from parsed grammatical and ungrammatical corpora and trains a binary classifier to distinguish grammatical from ungrammatical sentences. The three approaches are evaluated on a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting common grammatical errors into the British National Corpus. The results are compared to two traditional approaches, one that uses a hand-crafted, discriminative grammar, the XLE ParGram English LFG, and one based on part-of-speech n-grams. In addition, the baseline methods and the new methods are combined in a machine learning-based framework, yielding further improvements

Irish Universities

DCU Online Research Access Service

Formal grammars in linguistics and psycholinguistics: Vol.II, Applications in linguistic theory

Author: Levelt W.
Publication venue: Mouton
Publication date: 01/01/1974
Field of study

MPG.PuRe

Data-Oriented Parsing with Discontinuous Constituents and Function Tags

Author: Bod R.
Scha R.
van Cranenburgh A.
Publication venue: 'Institute of Computer Science, Polish Academy of Sciences'
Publication date: 01/01/2016
Field of study

Statistical parsers are e ective but are typically limited to producing projective dependencies or constituents. On the other hand, linguisti- cally rich parsers recognize non-local relations and analyze both form and function phenomena but rely on extensive manual grammar development. We combine advantages of the two by building a statistical parser that produces richer analyses. We investigate new techniques to implement treebank-based parsers that allow for discontinuous constituents. We present two systems. One system is based on a string-rewriting Linear Context-Free Rewriting System (LCFRS), while using a Probabilistic Discontinuous Tree Substitution Grammar (PDTSG) to improve disambiguation performance. Another system encodes the discontinuities in the labels of phrase structure trees, allowing for efficient context-free grammar parsing. The two systems demonstrate that tree fragments as used in tree-substitution grammar improve disambiguation performance while capturing non-local relations on an as-needed basis. Additionally, we present results of models that produce function tags, resulting in a more linguistically adequate model of the data. We report substantial accuracy improvements in discontinuous parsing for German, English, and Dutch, including results on spoken Dutch

Biblioteka Nauki - repozytorium artykuÅÃ³w

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Data-Oriented Parsing with discontinuous constituents and function tags

Author: Bod Rens
Scha Remko
van Cranenburgh Andreas
Publication venue: 'Institute of Computer Science, Polish Academy of Sciences'
Publication date: 01/01/2016
Field of study

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Directory of Open Access Journals

Dissertations of the University of Groningen

Monolingual Sentence Rewriting as Machine Translation: Generation and Evaluation

Author: Napoles-Cohen Courtney
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 30/07/2019
Field of study

In this thesis, we investigate approaches to paraphrasing entire sentences within the constraints of a given task, which we call monolingual sentence rewriting. We introduce a unified framework for monolingual sentence rewriting, and apply it to three representative tasks: sentence compression, text simplification, and grammatical error correction. We also perform a detailed analysis of the evaluation methodologies for each task, identify bias in common evaluation techniques, and propose more reliable practices. Monolingual rewriting can be thought of as translating between two types of English (such as from complex to simple), and therefore our approach is inspired by statistical machine translation. In machine translation, a large quantity of parallel data is necessary to model the transformations from input to output text. Parallel bilingual data naturally occurs between common language pairs (such as English and French), but for monolingual sentence rewriting, there is little existing parallel data and annotation is costly. We modify the statistical machine translation pipeline to harness monolingual resources and insights into task constraints in order to drastically diminish the amount of annotated data necessary to train a robust system. Our method generates more meaning-preserving and grammatical sentences than earlier approaches and requires less task-specific data. Once candidate sentences are generated, it is crucial to have reliable evaluation methods. Sentential paraphrases must fulfill a variety of requirements: preserve the meaning of the original sentence, be grammatical, and meet any stylistic or task-specific constraints. We analyze common evaluation practices and propose better methods that more accurately measure the quality of output. Often overlooked, robust automatic evaluation methodology is necessary for improving systems, and this work presents new metrics and outlines important considerations for reliably measuring the quality of the generated text

JScholarship

Recommended from our members

A study of agrammatism with special reference to Hebrew

Author: Druks Judit
Publication venue
Publication date: 01/01/1991
Field of study

The aim of the thesis was to test empirically Grodzinsky's account of agrammatism. Grodzinsky's account is based on Chomsky's Government and Binding theory and it claims that the comprehension deficit in agrammatism is due to the deletion of 'trace' present in passive and relative clauses. English and Hebrew speaking patients were tested. The experiment exploited a special feature of the Hebrew language in which it is possible to construct passive sentences without trace. In addition to passive and obJect relative clause sentences, other sentence types were also used. The results did not support the trace deletion hypothesis of Grodzinsky. An alternative version of his hypothesis, according to which sentences that require coindexation between two elements in the sentence are difficult for agrommatic aphasics did obtain support. The results also suggested that reversible sentences are particularly difficult for agrammatic patients. Grodzinsky's account also claimed that in agrammatism governed prepositions are impaired and ungoverned prepositions are preserved. In order to test this part of the theory an indepth case study of a Hebrew speaking agrammatic patient who never used prepositions in her spontaneous speech was carried out. The study tested the hypotheses of Grodzinsky and Friederici and it concluded that Grodzinsky's hypothesis according to which governed prepositions are impaired and ungoverned prepositions are preserved is not supported by the evidence. Meaningful prepositions as Friederici suggested, were more likely to be produced in certain tasks. Although this also cannot explain the total omission of prepositions of this patient. In addition to the preposition case study, the patient's ability to deal with the Hebrew verb system was investigated. Both the preposition and the verb study suggested that in agrammatism it is not the principles of Universal Grammar that are violated but the particular features of individual languages

Open Research Online (The Open University)

OpenGrey Repository

Robust Parsing for Ungrammatical Sentences

Author: Baradaran Hashemi Homa
Publication venue
Publication date: 31/01/2018
Field of study

Natural Language Processing (NLP) is a research area that specializes in studying computational approaches to human language. However, not all of the natural language sentences are grammatically correct. Sentences that are ungrammatical, awkward, or too casual/colloquial tend to appear in a variety of NLP applications, from product reviews and social media analysis to intelligent language tutors or multilingual processing. In this thesis, we focus on parsing, because it is an essential component of many NLP applications. We investigate in what ways the performances of statistical parsers degrade when dealing with ungrammatical sentences. We also hypothesize that breaking up parse trees from problematic parts prevents NLP applications from degrading due to incorrect syntactic analysis. A parser is robust if it can overlook problems such as grammar mistakes and produce a parse tree that closely resembles the correct analysis for the intended sentence. We develop a robustness evaluation metric and conduct a series of experiments to compare the performances of state-of-the-art parsers on the ungrammatical sentences. The evaluation results show that ungrammatical sentences present challenges for statistical parsers, because the well-formed syntactic trees they produce may not be appropriate for ungrammatical sentences. We also define a new framework for reviewing the parses of ungrammatical sentences and extracting the coherent parts whose syntactic analyses make sense. We call this task parse tree fragmentation. The experimental results suggest that the proposed overall fragmentation framework is a promising way to handle syntactically unusual sentences

D-Scholarship@Pitt

On the metatheory of linguistics

Author: Wurm Christian
Publication venue: UB Bielefeld
Publication date: 01/01/2013
Field of study

Wurm C. On the metatheory of linguistics. Bielefeld: UB Bielefeld; 2013

Publications at Bielefeld University

Could grammatical encoding and grammatical decoding be subserved by the same processing module?

Author: Kempen G.
Publication venue
Publication date: 01/01/2000
Field of study

MPG.PuRe

A tree-to-tree model for statistical machine translation

Author: Cowan Brooke A. (Brooke Alissa), 1972-
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2008
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 227-234).In this thesis, we take a statistical tree-to-tree approach to solving the problem of machine translation (MT). In a statistical tree-to-tree approach, first the source-language input is parsed into a syntactic tree structure; then the source-language tree is mapped to a target-language tree. This kind of approach has several advantages. For one, parsing the input generates valuable information about its meaning. In addition, the mapping from a source-language tree to a target-language tree offers a mechanism for preserving the meaning of the input. Finally, producing a target-language tree helps to ensure the grammaticality of the output. A main focus of this thesis is to develop a statistical tree-to-tree mapping algorithm. Our solution involves a novel representation called an aligned extended projection, or AEP. The AEP, inspired by ideas in linguistic theory related to tree-adjoining grammars, is a parse-tree like structure that models clause-level phenomena such as verbal argument structure and lexical word-order. The AEP also contains alignment information that links the source-language input to the target-language output. Instead of learning a mapping from a source-language tree to a target-language tree, the AEP-based approach learns a mapping from a source-language tree to a target-language AEP. The AEP is a complex structure, and learning a mapping from parse trees to AEPs presents a challenging machine learning problem. In this thesis, we use a linear structured prediction model to solve this learning problem. A human evaluation of the AEP-based translation approach in a German-to-English task shows significant improvements in the grammaticality of translations. This thesis also presents a statistical parser for Spanish that could be used as part of a Spanish/English translation system.by Brooke Alissa Cowan.Ph.D

DSpace@MIT