96,754 research outputs found

    Treebank annotation schemes and parser evaluation for German

    Get PDF
    Recent studies focussed on the question whether less-congurational languages like German are harder to parse than English, or whether the lower parsing scores are an artefact of treebank encoding schemes and data structures, as claimed by K¨ubler et al. (2006). This claim is based on the assumption that PARSEVAL metrics fully reflect parse quality across treebank encoding schemes. In this paper we present new experiments to test this claim. We use the PARSEVAL metric, the Leaf-Ancestor metric as well as a dependency-based evaluation, and present novel approaches measuring the effect of controlled error insertion on treebank trees and parser output. We also provide extensive past-parsing crosstreebank conversion. The results of the experiments show that, contrary to K¨ubler et al. (2006), the question whether or not German is harder to parse than English remains undecided

    The effectiveness of bottom up technique with probabilistic approach for a Malay parser

    Get PDF
    Parsing is a process of analyzing the input string in a sentence to define the syntax structures according to rules of grammar. This task is performed by a parser which will produce a parse tree as output. However, a problem occurs when the parsing process produces two or more parse trees in which the parser unable to represent a precise parse tree. This limitation is caused by ambiguity in the structure of sentences. Ambiguity is occurred when a word is classified more than one category of syntax and its usage will affect the semantics of the sentence. Thus, the parser needs to have an approach to solve the ambiguity problem and is able to process the most appropriate parse tree to present a sentence. Like other languages in the world, Malay language, a national language for Malaysian, is not exempted from ambiguity problem. However, due to its grammar being context-free grammar, the probabilistic context-free grammar approach can be used to support the parser in determining a more accurate parse tree. This study focuses on the development of statistical parser using a bottom-up technique for Malay language. The training data, in the form of simple Malay language sentences, are collected from various sources. Based on this training data, a statistical lexical corpus of Malay language which consists of vocabulary, grammar rules and their probability was developed. The bottom up parsing will be supported by implementing Cocke–Younger–Kasami (CYK) algorithm. The parser’s performance is evaluated based on its effectiveness to overcome ambiguity by suggesting a more precise parse tree. In conclusion, the Malay Language Parser can be useful to help user identify the appropriate parse tree and solve ambiguity issues in Malay Language

    Syntax Error Handling in Scannerless Generalized LR Parsers

    Get PDF
    This thesis is about a master's project as part of the one year master study 'Software-engineering'. This project is about methods for improving the quality of reporting and handling of syntax errors that are produced by a scannerless generalized left-to-right rightmost (SGLR) parser, and is done at Centrum voor Wiskunde en Informatica (CWI) in Amsterdam. SGLR is a parsing algorithm developed as part of Generic Language Technol- ogy Project at SEN1, one of the themes at CWI. SGLR is based on the GLR algorithm developed by Tomita. SGLR parsers are able to recognize arbitrary context-free grammars, which enables grammar modularization. Because SGLR does not use a separate scan- ner, also layout and comments are incorporated into the parse tree. This makes SGLR a powerful tool for code analysis and code transformations. A drawback is the way SGLR handles syntax errors. When a syntax error is detected, the current implementation of SGLR halts the parsing process and reports back to the user the point of error detection only. The text at the point of error detection is not necessarily the text that has to be changed to repair the error. This thesis describes three kinds of information that could be reported to the user, and how they could be derived from the parse process when an error is detected. These are: - The structure of the already parsed part of the input in the form of a partial parse tree. - A listing of expected symbols; those tokens or token sequences that are accept- able instead of the erroneous text. - The current parser state which could be translated into language dependent informative messages. Also two ways of recovering from an error condition are described. These are non-correcting recovery methods that enable SGLR to always return a parse tree that can be unparsed into the original input sentence. - A method that halts parsing but incorporates the remainder of the input into the parse tree. - A method that resumes parsing by means of substring parsing. During the course of the project the described approaches have been imple- mented and incorporated in the implementation of SGLR as used by the Meta- Environment, some fully, some more or less prototyped

    A Bottom-Up Design and Implementation for Ambiguity-Compatible Natural Language Sentence Parsing

    Get PDF
    Although many theory-focused computer science textbooks give a brief outline of a context-free grammar model of natural language, the approach is often vague and, in reality, greatly simplifies the English language’s grammatical complexities. When applied to commonly-seen sentences, these sentence parsing models often fall short. In this paper, I detail my process of creating a programmable natural language context-free grammar that is able to parse (i.e. diagram) many common sentence forms, as well as the research which influenced the design of this project. In order to create a grammar that recognized the intricacies of the English language, I also incorporated the ability to identify and represent ambiguous sentences into my program. While the resulting program is not able to correctly parse every possible English sentence, ambiguous or not, it does function as an introduction to the field of computational linguistics and the difficulties present in this field

    Syntactic phrase-based statistical machine translation

    Get PDF
    Phrase-based statistical machine translation (PBSMT) systems represent the dominant approach in MT today. However, unlike systems in other paradigms, it has proven difficult to date to incorporate syntactic knowledge in order to improve translation quality. This paper improves on recent research which uses 'syntactified' target language phrases, by incorporating supertags as constraints to better resolve parse tree fragments. In addition, we do not impose any sentence-length limit, and using a log-linear decoder, we outperform a state-of-the-art PBSMT system by over 1.3 BLEU points (or 3.51% relative) on the NIST 2003 Arabic-English test corpus

    Farmers' vulnerability to climate shocks in Benin

    Get PDF
    Farmers' vulnerability to climate shocks is affected by their exposure, sensitivity, and adaptive capacity. One must parse these components before designing policies for climate resilience, says Boris Odilon Kounagbè Lokonon. Agro-ecological factors are especially important, as their variation means that households with low adaptive capacity do not necessarily have high exposure or sensitivity to climate shocks
    • …
    corecore