83 research outputs found

    TüSBL : a similarity-based chunk parser for robust syntactic processing

    Get PDF
    Chunk parsing has focused on the recognition of partial constituent structures at the level of individual chunks. Little attention has been paid to the question of how such partial analyses can be combined into larger structures for complete utterances. The TüSBL parser extends current chunk parsing techniques by a tree-construction component that extends partial chunk parses to complete tree structures including recursive phrase structure as well as function-argument structure. TüSBLs tree construction algorithm relies on techniques from memory-based learning that allow similarity-based classification of a given input structure relative to a pre-stored set of tree instances from a fully annotated treebank. A quantitative evaluation of TüSBL has been conducted using a semi-automatically constructed treebank of German that consists of appr. 67,000 fully annotated sentences. The basic PARSEVAL measures were used although they were developed for parsers that have as their main goal a complete analysis that spans the entire input.This runs counter to the basic philosophy underlying TüSBL, which has as its main goal robustness of partially analyzed structures

    From chunks to function-argument structure : a similarity-based approach

    Get PDF
    Chunk parsing has focused on the recognition of partial constituent structures at the level of individual chunks. Little attention has been paid to the question of how such partial analyses can be combined into larger structures for complete utterances. Such larger structures are not only desirable for a deeper syntactic analysis. They also constitute a necessary prerequisite for assigning function-argument structure. The present paper offers a similaritybased algorithm for assigning functional labels such as subject, object, head, complement, etc. to complete syntactic structures on the basis of prechunked input. The evaluation of the algorithm has concentrated on measuring the quality of functional labels. It was performed on a German and an English treebank using two different annotation schemes at the level of function argument structure. The results of 89.73% correct functional labels for German and 90.40%for English validate the general approach

    Challenges and solutions for Latin named entity recognition

    Get PDF
    Although spanning thousands of years and genres as diverse as liturgy, historiography, lyric and other forms of prose and poetry, the body of Latin texts is still relatively sparse compared to English. Data sparsity in Latin presents a number of challenges for traditional Named Entity Recognition techniques. Solving such challenges and enabling reliable Named Entity Recognition in Latin texts can facilitate many down-stream applications, from machine translation to digital historiography, enabling Classicists, historians, and archaeologists for instance, to track the relationships of historical persons, places, and groups on a large scale. This paper presents the first annotated corpus for evaluating Named Entity Recognition in Latin, as well as a fully supervised model that achieves over 90% F-score on a held-out test set, significantly outperforming a competitive baseline. We also present a novel active learning strategy that predicts how many and which sentences need to be annotated for named entities in order to attain a specified degree of accuracy when recognizing named entities automatically in a given text. This maximizes the productivity of annotators while simultaneously controlling quality

    Measuring efficiency in high-accuracy, broad-coverage statistical parsing

    Full text link
    Very little attention has been paid to the comparison of efficiency between high accuracy statistical parsers. This paper proposes one machine-independent metric that is general enough to allow comparisons across very different parsing architectures. This metric, which we call ``events considered'', measures the number of ``events'', however they are defined for a particular parser, for which a probability must be calculated, in order to find the parse. It is applicable to single-pass or multi-stage parsers. We discuss the advantages of the metric, and demonstrate its usefulness by using it to compare two parsers which differ in several fundamental ways.Comment: 8 pages, 4 figures, 2 table

    From news to comment: Resources and benchmarks for parsing the language of web 2.0

    Get PDF
    We investigate the problem of parsing the noisy language of social media. We evaluate four all-Street-Journal-trained statistical parsers (Berkeley, Brown, Malt and MST) on a new dataset containing 1,000 phrase structure trees for sentences from microblogs (tweets) and discussion forum posts. We compare the four parsers on their ability to produce Stanford dependencies for these Web 2.0 sentences. We find that the parsers have a particular problem with tweets and that a substantial part of this problem is related to POS tagging accuracy. We attempt three retraining experiments involving Malt, Brown and an in-house Berkeley-style parser and obtain a statistically significant improvement for all three parsers

    Can Subcategorisation Probabilities Help a Statistical Parser?

    Full text link
    Research into the automatic acquisition of lexical information from corpora is starting to produce large-scale computational lexicons containing data on the relative frequencies of subcategorisation alternatives for individual verbal predicates. However, the empirical question of whether this type of frequency information can in practice improve the accuracy of a statistical parser has not yet been answered. In this paper we describe an experiment with a wide-coverage statistical grammar and parser for English and subcategorisation frequencies acquired from ten million words of text which shows that this information can significantly improve parse accuracy.Comment: 9 pages, uses colacl.st

    Statistical Function Tagging and Grammatical Relations of Myanmar Sentences

    Get PDF
    This paper describes a context free grammar (CFG) based grammatical relations for Myanmar sentences which combine corpus-based function tagging system. Part of the challenge of statistical function tagging for Myanmar sentences comes from the fact that Myanmar has free-phrase-order and a complex morphological system. Function tagging is a pre-processing step to show grammatical relations of Myanmar sentences. In the task of function tagging, which tags the function of Myanmar sentences with correct segmentation, POS (part-of-speech) tagging and chunking information, we use Naive Bayesian theory to disambiguate the possible function tags of a word. We apply context free grammar (CFG) to find out the grammatical relations of the function tags. We also create a functional annotated tagged corpus for Myanmar and propose the grammar rules for Myanmar sentences. Experiments show that our analysis achieves a good result with simple sentences and complex sentences.Comment: 16 pages, 7 figures, 8 tables, AIAA-2011 (India). arXiv admin note: text overlap with arXiv:0912.1820 by other author
    corecore