87 research outputs found

    A hybrid architecture for robust parsing of german

    Get PDF
    This paper provides an overview of current research on a hybrid and robust parsing architecture for the morphological, syntactic and semantic annotation of German text corpora. The novel contribution of this research lies not in the individual parsing modules, each of which relies on state-of-the-art algorithms and techniques. Rather what is new about the present approach is the combination of these modules into a single architecture. This combination provides a means to significantly optimize the performance of each component, resulting in an increased accuracy of annotation

    Meta-Learning for Phonemic Annotation of Corpora

    Get PDF
    We apply rule induction, classifier combination and meta-learning (stacked classifiers) to the problem of bootstrapping high accuracy automatic annotation of corpora with pronunciation information. The task we address in this paper consists of generating phonemic representations reflecting the Flemish and Dutch pronunciations of a word on the basis of its orthographic representation (which in turn is based on the actual speech recordings). We compare several possible approaches to achieve the text-to-pronunciation mapping task: memory-based learning, transformation-based learning, rule induction, maximum entropy modeling, combination of classifiers in stacked learning, and stacking of meta-learners. We are interested both in optimal accuracy and in obtaining insight into the linguistic regularities involved. As far as accuracy is concerned, an already high accuracy level (93% for Celex and 86% for Fonilex at word level) for single classifiers is boosted significantly with additional error reductions of 31% and 38% respectively using combination of classifiers, and a further 5% using combination of meta-learners, bringing overall word level accuracy to 96% for the Dutch variant and 92% for the Flemish variant. We also show that the application of machine learning methods indeed leads to increased insight into the linguistic regularities determining the variation between the two pronunciation variants studied.Comment: 8 page

    A Comparative Study of Classifier Combination Methods Applied to NLP Tasks

    Get PDF
    There are many classification tools that can be used for various NLP tasks, although none of them can be considered the best of all since each one has a particular list of virtues and defects. The combination methods can serve both to maximize the strengths of the base classifiers and to reduce errors caused by their defects improving the results in terms of accuracy. Here is a comparative study on the most relevant methods that shows that combination seems to be a robust and reliable way of improving our results

    Evaluating parts-of-speech taggers for use in a text-to-scene conversion system

    Get PDF
    This paper presents parts-of-speech tagging as a first step towards an autonomous text-to-scene conversion system. It categorizes some freely available taggers, according to the techniques used by each in order to automatically identify word-classes. In addition, the performance of each identified tagger is verified experimentally. The SUSANNE corpus is used for testing and reveals the complexity of working with different tagsets, resulting in substantially lower accuracies in our tests than in those reported by the developers of each tagger. The taggers are then grouped to form a voting system to attempt to raise accuracies, but in no cases do the combined results improve upon the individual accuracies. Additionally a new metric, agreement, is tentatively proposed as an indication of confidence in the output of a group of taggers where such output cannot be validated

    Is Part-of-Speech Tagging a Solved Problem for Icelandic?

    Get PDF

    Ensemble Morphosyntactic Analyser for Classical Arabic

    Get PDF
    In Modern Standard Arabic text (MSA), there are at least seven available morphological analysers (MA). Several Part-of-Speech (POS) taggers use these MAs to improve accuracy. However, the choice between these analysers is challenging, and there is none designed for Classical Arabic. Several morphological analysers have been studied and combined to be evaluated on a common ground. The goal of our language resource is to build a freely accessible multi-component toolkit (named SAWAREF1) for part-of-speech tagging and morphological analysers that can provide a comparative evaluation, standardise the outputs of each component, combine different solutions, and analyse and vote for the best candidates. We illustrate the use of SAWAREF in tagging adjectives and shows how accuracy of tagging adjectives is still very low. This paper describes the research method and design, and discusses the key issues and obstacles

    A comparative study of classifier combination applied to NLP tasks

    Get PDF
    The paper is devoted to a comparative study of classifier combination methods, which have been successfully applied to multiple tasks including Natural Language Processing (NLP) tasks. There is variety of classifier combination techniques and the major difficulty is to choose one that is the best fit for a particular task. In our study we explored the performance of a number of combination methods such as voting, Bayesian merging, behavior knowledge space, bagging, stacking, feature sub-spacing and cascading, for the part-of-speech tagging task using nine corpora in five languages. The results show that some methods that, currently, are not very popular could demonstrate much better performance. In addition, we learned how the corpus size and quality influence the combination methods performance. We also provide the results of applying the classifier combination methods to the other NLP tasks, such as name entity recognition and chunking. We believe that our study is the most exhaustive comparison made with combination methods applied to NLP tasks so far
    corecore