7,527 research outputs found

    GED - a generalised syntax editor : a thesis presented in partial fulfilment of the requirements for the degree of Master of Science in Computer Science at Massey University

    Get PDF
    This thesis traces the development of a full-screen syntax-directed editor - a type of editor that operates on a program in terms of its syntactic tree structure instead of its sequential character representation. The editor is table-driven, reading as input an extended BNF syntax of the target language. It can therefore be used for any language whose syntax can be defined in EBNF. Print formatting information can be included with the syntactic definition to enable programs to be pretty-printed when they are displayed. The user is presented with a pretty-printed skeletal outline of a program with the currently selected construct highlighted and all required syntactic items provided by the editor. Any constructs with alternatives, such as "", which occurs in many languages, are initially denoted by a placeholder in the form of a non-terminal name (i.e. "") which is expanded when the user indicates which alternative is wanted. All symbols entered by the user are parsed immediately and any erroneous symbols rejected, making it impossible to create a syntactically incorrect program. The editor cannot detect semantic errors as no semantic information is available from the EBNF syntax. However the first use of all identifiers is flagged by the editor as an aid to the detection of undeclared identifiers. A "help" area at the bottom of the screen continuously displays a list of the correct next symbols and the syntactic definition of the currently selected program construct. This display, together with a multi-level "undo" command and the provision of a skeletal program by the editor, provides a way of exploring the various constructs in a programming language, while ensuring the syntactic correctness of the resultant program

    Hybrid rule-based - example-based MT: feeding apertium with sub-sentential translation units

    Get PDF
    This paper describes a hybrid machine translation (MT) approach that consists of integrating bilingual chunks (sub-sentential translation units) obtained from parallel corpora into an MT system built using the Apertium free/open-source rule-based machine translation platform, which uses a shallow-transfer translation approach. In the integration of bilingual chunks, special care has been taken so as not to break the application of the existing Apertium structural transfer rules, since this would increase the number of ungrammatical translations. The method consists of (i) the application of a dynamic-programming algorithm to compute the best translation coverage of the input sentence given the collection of bilingual chunks available; (ii) the translation of the input sentence as usual by Apertium; and (iii) the application of a language model to choose one of the possible translations for each of the bilingual chunks detected. Results are reported for the translation from English-to-Spanish, and vice versa, when marker-based bilingual chunks automatically obtained from parallel corpora are used

    Preparing, restructuring, and augmenting a French treebank: lexicalised parsers or coherent treebanks?

    Get PDF
    We present the Modified French Treebank (MFT), a completely revamped French Treebank, derived from the Paris 7 Treebank (P7T), which is cleaner, more coherent, has several transformed structures, and introduces new linguistic analyses. To determine the effect of these changes, we investigate how theMFT fares in statistical parsing. Probabilistic parsers trained on the MFT training set (currently 3800 trees) already perform better than their counterparts trained on five times the P7T data (18,548 trees), providing an extreme example of the importance of data quality over quantity in statistical parsing. Moreover, regression analysis on the learning curve of parsers trained on the MFT lead to the prediction that parsers trained on the full projected 18,548 tree MFT training set will far outscore their counterparts trained on the full P7T. These analyses also show how problematic data can lead to problematic conclusions–in particular, we find that lexicalisation in the probabilistic parsing of French is probably not as crucial as was once thought (Arun and Keller (2005))

    An approach to software maintenance support using a syntactic source code analyser data base : this thesis is presented in a partial fulfillment of the requirements for the degree of Master of Arts in Computer Science at Massey University

    Get PDF
    In this thesis, the development of a software maintenance tool called a syntactic source code analyser (SSCA) is summarised. An SSCA supports other maintenance tools which interact with source code by creating a data base of source information which has links to a formatted version of program source code. The particular SSCA presented handles programs written in a version of COBOL. Before developing a SSCA system, aspects of software maintenance need to be considered. Hence, the scope, definitions and problems of maintenance activities are briefly reviewed and maintenance support through environments, software metrics, and specific tools and techniques examined. A complete maintenance support environment for an application is found to overlap considerably with the application documentation system and shares some tools with development environments. Program source code is also identified as the fundamental documentation of an application and interaction with this source code is a requirement of many maintenance support tools

    Parsing Using the Role and Reference Grammar Paradigm

    Get PDF
    Much effort has been put into finding ways of parsing natural language. Role and Reference Grammar (RRG) is a linguistic paradigm that has credibility in linguistic circles. In this paper we give a brief overview of RRG and show how this can be implemented into a standard rule-based parser. We used the chart parser to test the concept on sentences from student work. We present results that show the potential role of this method for parsing ungrammatical sentences
    • …
    corecore