5 research outputs found

    Domes as a Prodigal Shape in Synthesis-Enhanced Parsers.

    Get PDF
    Abstract. Research on logic based bottom-up parsing -in particular, around Constraint Handling Rule Grammars [3]-is uncovering shape as an untapped fertile ground for natural language processing in general, and for bottom-up parsing and grammar induction in particula

    Extraction of transforming sequences and sentence histories from writing process data : a first step towards linguistic modeling of writing

    Get PDF
    Online first, part of special issue "Methods for understanding writing process by analysis of writing timecourse" Erworben im Rahmen der Schweizer Nationallizenzen (http://www.nationallizenzen.ch)Producing written texts is a non-linear process: in contrast to speech, writers are free to change already written text at any place at any point in time. Linguistic considerations are likely to play an important role, but so far, no linguistic models of the writing process exist. We present an approach for the analysis of writing processes with a focus on linguistic structures based on the novel concepts of transforming sequences, text history, and sentence history. The processing of raw keystroke logging data and the application of natural language processing tools allows for the extraction and filtering of product and process data to be stored in a hierarchical data structure. This structure is used to re-create and visualize the genesis and history for a text and its individual sentences. Focusing on sentences as primary building blocks of written language and full texts, we aim to complement established writing process analyses and, ultimately, to interpret writing timecourse data with respect to linguistic structures. To enable researchers to explore this view, we provide a fully functional implementation of our approach as an open-source software tool and visualizations of the results. We report on a small scale exploratory study in German where we used our tool. The results indicate both the feasibility of the approach and that writers actually revise on a linguistic level. The latter confirms the need for modeling written text production from the perspective of linguistic structures beyond the word level

    Corpus-consulting probabilistic approach to parsing: the CCPX parser and its complementary components

    Get PDF
    Corpus linguistics is now a major field in the study of language. In recent years corpora that are syntactically analysed have become available to researchers, and these clearly have great potential for use in the field of parsing natural language. This thesis describes a project that exploits this possibility. It makes four distinct contributions to these two fields. The first is an updated version of a corpus that is (a) analysed in terms of the rich syntax of Systemic Functional Grammar (SFG), and (b) annotated using the extensible Mark-up Language (XML). The second contribution is a native XML corpus database, and the third is a sophisticated corpus query tool for accessing it. The fourth contribution is a new type of parser that is both corpus-consulting and probabilistic. It draws its knowledge of syntactic probabilities from the corpus database, and it stores its working data within the database, so that it is strongly database-oriented. SFG has been widely used in natural language generation for approaching two decades, but it has been used far less frequently in parsing (the first stage in natural language understanding). Previous SFG corpus-based parsers have utilised traditional parsing algorithms, but they have experienced problems of efficiency and coverage, due to (a) the richness of the syntax and (b) the challenge of parsing unrestricted spoken and written texts. The present research overcomes these problems by introducing a new type of parsing algorithm that is 'semi-deterministic' (as human readers are), and utilises its knowledge of the rules—including probabilities—of English syntax. A language, however, is constantly evolving. New words and uses are added, while others become less frequent and drop out altogether. The new parsing system seeks to replicate this. As new sentences are parsed they are added to the corpus, and this slowly changes the frequencies of the words and the syntactic patterns. The corpus is in this sense dynamic, and so simulates a human's changing knowledge of words and syntax

    Interactive Incremental Chart Parsing

    No full text
    This paper presents an algorithm for incremental chart parsing, outlines how this could be embedded in an interactive parsing system, and discusses why this might be useful. Incremental parsing here means that input is analysed in a piecemeal fashion, in particular allowing arbitrary changes of previous input without exhaustive reanalysis. Interactive parsing means that the analysis process is prompted immediately at the onset of new input, and possibly that the system then may interact with the user in order to resolve problems that occur. The combination of these techniques could be used as a parsing kernel for highly interactive and 'reactive naturallanguage processors, such as parsers for dialogue systems, interactive computer-aided translation systems, and language-sensitive text editors. An incremental chart parser embodying the ideas put forward in this paper has been implemented, and an embedding of this in an interactive parsing system is near completion
    corecore