Search CORE

96,754 research outputs found

Treebank annotation schemes and parser evaluation for German

Author: Rehbein Ines
van Genabith Josef
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2007
Field of study

Recent studies focussed on the question whether less-congurational languages like German are harder to parse than English, or whether the lower parsing scores are an artefact of treebank encoding schemes and data structures, as claimed by K¨ubler et al. (2006). This claim is based on the assumption that PARSEVAL metrics fully reflect parse quality across treebank encoding schemes. In this paper we present new experiments to test this claim. We use the PARSEVAL metric, the Leaf-Ancestor metric as well as a dependency-based evaluation, and present novel approaches measuring the effect of controlled error insertion on treebank trees and parser output. We also provide extensive past-parsing crosstreebank conversion. The results of the experiments show that, contrary to K¨ubler et al. (2006), the question whether or not German is harder to parse than English remains undecided

Irish Universities

DCU Online Research Access Service

The effectiveness of bottom up technique with probabilistic approach for a Malay parser

Author: Lailatul Qadri Zakaria
Mohd Juzaiddin Ab Aziz
Muhammad Azhar Fairuzz Hiloh
Publication venue: 'Penerbit Universiti Kebangsaan Malaysia (UKM Press)'
Publication date: 01/05/2018
Field of study

Parsing is a process of analyzing the input string in a sentence to define the syntax structures according to rules of grammar. This task is performed by a parser which will produce a parse tree as output. However, a problem occurs when the parsing process produces two or more parse trees in which the parser unable to represent a precise parse tree. This limitation is caused by ambiguity in the structure of sentences. Ambiguity is occurred when a word is classified more than one category of syntax and its usage will affect the semantics of the sentence. Thus, the parser needs to have an approach to solve the ambiguity problem and is able to process the most appropriate parse tree to present a sentence. Like other languages in the world, Malay language, a national language for Malaysian, is not exempted from ambiguity problem. However, due to its grammar being context-free grammar, the probabilistic context-free grammar approach can be used to support the parser in determining a more accurate parse tree. This study focuses on the development of statistical parser using a bottom-up technique for Malay language. The training data, in the form of simple Malay language sentences, are collected from various sources. Based on this training data, a statistical lexical corpus of Malay language which consists of vocabulary, grammar rules and their probability was developed. The bottom up parsing will be supported by implementing Cocke–Younger–Kasami (CYK) algorithm. The parser’s performance is evaluated based on its effectiveness to overcome ambiguity by suggesting a more precise parse tree. In conclusion, the Malay Language Parser can be useful to help user identify the appropriate parse tree and solve ambiguity issues in Malay Language

UKM Journal Article Repository

Syntax Error Handling in Scannerless Generalized LR Parsers

Author: Valkering R.
Publication venue: University of Amsterdam
Publication date: 01/08/2007
Field of study

This thesis is about a master's project as part of the one year master study 'Software-engineering'. This project is about methods for improving the quality of reporting and handling of syntax errors that are produced by a scannerless generalized left-to-right rightmost (SGLR) parser, and is done at Centrum voor Wiskunde en Informatica (CWI) in Amsterdam. SGLR is a parsing algorithm developed as part of Generic Language Technol- ogy Project at SEN1, one of the themes at CWI. SGLR is based on the GLR algorithm developed by Tomita. SGLR parsers are able to recognize arbitrary context-free grammars, which enables grammar modularization. Because SGLR does not use a separate scan- ner, also layout and comments are incorporated into the parse tree. This makes SGLR a powerful tool for code analysis and code transformations. A drawback is the way SGLR handles syntax errors. When a syntax error is detected, the current implementation of SGLR halts the parsing process and reports back to the user the point of error detection only. The text at the point of error detection is not necessarily the text that has to be changed to repair the error. This thesis describes three kinds of information that could be reported to the user, and how they could be derived from the parse process when an error is detected. These are: - The structure of the already parsed part of the input in the form of a partial parse tree. - A listing of expected symbols; those tokens or token sequences that are accept- able instead of the erroneous text. - The current parser state which could be translated into language dependent informative messages. Also two ways of recovering from an error condition are described. These are non-correcting recovery methods that enable SGLR to always return a parse tree that can be unparsed into the original input sentence. - A method that halts parsing but incorporates the remainder of the input into the parse tree. - A method that resumes parsing by means of substring parsing. During the course of the project the described approaches have been imple- mented and incorporated in the implementation of SGLR as used by the Meta- Environment, some fully, some more or less prototyped

CWI's Institutional Repository

A Bottom-Up Design and Implementation for Ambiguity-Compatible Natural Language Sentence Parsing

Author: Thrasher Elise
Publication venue: Digital Commons @ Trinity
Publication date: 20/04/2011
Field of study

Although many theory-focused computer science textbooks give a brief outline of a context-free grammar model of natural language, the approach is often vague and, in reality, greatly simplifies the English language’s grammatical complexities. When applied to commonly-seen sentences, these sentence parsing models often fall short. In this paper, I detail my process of creating a programmable natural language context-free grammar that is able to parse (i.e. diagram) many common sentence forms, as well as the research which influenced the design of this project. In order to create a grammar that recognized the intricacies of the English language, I also incorporated the ability to identify and represent ambiguous sentences into my program. While the resulting program is not able to correctly parse every possible English sentence, ambiguous or not, it does function as an introduction to the field of computational linguistics and the difficulties present in this field

Trinity University

Syntactic phrase-based statistical machine translation

Author: Hassan Hany
Hearne Mary
Sima'an Khalil
Way Andy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Phrase-based statistical machine translation (PBSMT) systems represent the dominant approach in MT today. However, unlike systems in other paradigms, it has proven difficult to date to incorporate syntactic knowledge in order to improve translation quality. This paper improves on recent research which uses 'syntactified' target language phrases, by incorporating supertags as constraints to better resolve parse tree fragments. In addition, we do not impose any sentence-length limit, and using a log-linear decoder, we outperform a state-of-the-art PBSMT system by over 1.3 BLEU points (or 3.51% relative) on the NIST 2003 Arabic-English test corpus

Crossref

Irish Universities

DCU Online Research Access Service

International Migration, Integration and Social Cohesion online publications

Farmers' vulnerability to climate shocks in Benin

Author: Lokonon Boris Odilon Kounagbè
Publication venue: London School of Economics and Political Science
Publication date: 20/04/2017
Field of study

Farmers' vulnerability to climate shocks is affected by their exposure, sensitivity, and adaptive capacity. One must parse these components before designing policies for climate resilience, says Boris Odilon Kounagbè Lokonon. Agro-ecological factors are especially important, as their variation means that households with low adaptive capacity do not necessarily have high exposure or sensitivity to climate shocks

LSE Research Online