2,365 research outputs found

    Learning Recursive Segments for Discourse Parsing

    Full text link
    Automatically detecting discourse segments is an important preliminary step towards full discourse parsing. Previous research on discourse segmentation have relied on the assumption that elementary discourse units (EDUs) in a document always form a linear sequence (i.e., they can never be nested). Unfortunately, this assumption turns out to be too strong, for some theories of discourse like SDRT allows for nested discourse units. In this paper, we present a simple approach to discourse segmentation that is able to produce nested EDUs. Our approach builds on standard multi-class classification techniques combined with a simple repairing heuristic that enforces global coherence. Our system was developed and evaluated on the first round of annotations provided by the French Annodis project (an ongoing effort to create a discourse bank for French). Cross-validated on only 47 documents (1,445 EDUs), our system achieves encouraging performance results with an F-score of 73% for finding EDUs.Comment: published at LREC 201

    Increased recall in annotation variance detection in treebanks

    Get PDF
    Automatic inconsistency detection in parsed corpora is significantly helpful for building more and larger corpora of annotated texts. Inconsistencies are inevitable and originate from variance in annotation caused by different factors as, for instance, the lack of attention or the absence of clear annotation guidelines. In this paper, some results involving the automatic detection of annotation variance in parsed corpora are presented. In particular, it is shown that a generalization procedure substantially increases the recall of the variant detection algorithm proposed in [1]930257858618th International Conference on Text, Speech and Dialogue (TSD)2015-09RepĂşblica ChecaInt Speech Commun Assoc; Czech Soc Cybernet & Informat; Kerio Technol; Univ West Bohemia, Fac Appl Sci; Masaryk Univ, Fac InformatPilse

    Building and querying parallel treebanks

    Get PDF
    This paper describes our work on building a trilingual parallel treebank. We have annotated constituent structure trees from three text genres (a philosophy novel, economy reports and a technical user manual). Our parallel treebank includes word and phrase alignments. The alignment information was manually checked using a graphical tool that allows the annotator to view a pair of trees from parallel sentences. This tool comes with a powerful search facility which supersedes the expressivity of previous popular treebank query engines
    • …
    corecore