Search CORE

87 research outputs found

A hybrid architecture for robust parsing of german

Author: Erhard W. Hinrichs
Frank H. Müller
Ra Kübler
Tylman Ule
Universität Tübingen
Publication venue
Publication date: 01/01/2002
Field of study

This paper provides an overview of current research on a hybrid and robust parsing architecture for the morphological, syntactic and semantic annotation of German text corpora. The novel contribution of this research lies not in the individual parsing modules, each of which relies on state-of-the-art algorithms and techniques. Rather what is new about the present approach is the combination of these modules into a single architecture. This combination provides a means to significantly optimize the performance of each component, resulting in an increased accuracy of annotation

CiteSeerX

Hochschulschriftenserver - Universität Frankfurt am Main

Meta-Learning for Phonemic Annotation of Corpora

Author: Daelemans W.
Gillis S.
Hoste V.
Tjong Kim Sang E.F.
van den Bosch A.
Weigand H.
Publication venue
Publication date: 01/01/2000
Field of study

We apply rule induction, classifier combination and meta-learning (stacked classifiers) to the problem of bootstrapping high accuracy automatic annotation of corpora with pronunciation information. The task we address in this paper consists of generating phonemic representations reflecting the Flemish and Dutch pronunciations of a word on the basis of its orthographic representation (which in turn is based on the actual speech recordings). We compare several possible approaches to achieve the text-to-pronunciation mapping task: memory-based learning, transformation-based learning, rule induction, maximum entropy modeling, combination of classifiers in stacked learning, and stacking of meta-learners. We are interested both in optimal accuracy and in obtaining insight into the linguistic regularities involved. As far as accuracy is concerned, an already high accuracy level (93% for Celex and 86% for Fonilex at word level) for single classifiers is boosted significantly with additional error reductions of 31% and 38% respectively using combination of classifiers, and a further 5% using combination of meta-learners, bringing overall word level accuracy to 96% for the Dutch variant and 92% for the Flemish variant. We also show that the application of machine learning methods indeed leads to increased insight into the linguistic regularities determining the variation between the two pronunciation variants studied.Comment: 8 page

arXiv.org e-Print Archive

CiteSeerX

Ghent University Academic Bibliography

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

A Comparative Study of Classifier Combination Methods Applied to NLP Tasks

Author: Cruz Mata Fermín
Enríquez de Salamanca Ros Fernando
Ortega Rodríguez Francisco Javier
Troyano Jiménez José Antonio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

There are many classification tools that can be used for various NLP tasks, although none of them can be considered the best of all since each one has a particular list of virtues and defects. The combination methods can serve both to maximize the strengths of the base classifiers and to reduce errors caused by their defects improving the results in terms of accuracy. Here is a comparative study on the most relevant methods that shows that combination seems to be a robust and reliable way of improving our results

idUS. Depósito de Investigación Universidad de Sevilla

Evaluating parts-of-speech taggers for use in a text-to-scene conversion system

Author: Glass Kevin R
Bangay Shaun Douglas
Publication venue
Publication date: 01/01/2005
Field of study

This paper presents parts-of-speech tagging as a first step towards an autonomous text-to-scene conversion system. It categorizes some freely available taggers, according to the techniques used by each in order to automatically identify word-classes. In addition, the performance of each identified tagger is verified experimentally. The SUSANNE corpus is used for testing and reveals the complexity of working with different tagsets, resulting in substantially lower accuracies in our tests than in those reported by the developers of each tagger. The taggers are then grouped to form a voting system to attempt to raise accuracies, but in no cases do the combined results improve upon the individual accuracies. Additionally a new metric, agreement, is tentatively proposed as an indication of confidence in the output of a group of taggers where such output cannot be validated

South East Academic Libraries System (SEALS)

The Australian National University

Is Part-of-Speech Tagging a Solved Problem for Icelandic?

Author: Kárason Örvar
Loftsson Hrafn
Publication venue: University of Tartu Library
Publication date: 01/05/2023
Field of study

DSpace at Tartu University Library

Ensemble Morphosyntactic Analyser for Classical Arabic

Author: Alosaimy AMS
Atwell E
Publication venue
Publication date
Field of study

In Modern Standard Arabic text (MSA), there are at least seven available morphological analysers (MA). Several Part-of-Speech (POS) taggers use these MAs to improve accuracy. However, the choice between these analysers is challenging, and there is none designed for Classical Arabic. Several morphological analysers have been studied and combined to be evaluated on a common ground. The goal of our language resource is to build a freely accessible multi-component toolkit (named SAWAREF1) for part-of-speech tagging and morphological analysers that can provide a comparative evaluation, standardise the outputs of each component, combine different solutions, and analyse and vote for the best candidates. We illustrate the use of SAWAREF in tagging adjectives and shows how accuracy of tagging adjectives is still very low. This paper describes the research method and design, and discusses the key issues and obstacles

White Rose Research Online

A comparative study of classifier combination applied to NLP tasks

Author: Cruz Mata Fermín
Enríquez de Salamanca Ros Fernando
García Vallejo Carlos Antonio
Ortega Rodríguez Francisco Javier
Troyano Jiménez José Antonio
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

The paper is devoted to a comparative study of classifier combination methods, which have been successfully applied to multiple tasks including Natural Language Processing (NLP) tasks. There is variety of classifier combination techniques and the major difficulty is to choose one that is the best fit for a particular task. In our study we explored the performance of a number of combination methods such as voting, Bayesian merging, behavior knowledge space, bagging, stacking, feature sub-spacing and cascading, for the part-of-speech tagging task using nine corpora in five languages. The results show that some methods that, currently, are not very popular could demonstrate much better performance. In addition, we learned how the corpus size and quality influence the combination methods performance. We also provide the results of applying the classifier combination methods to the other NLP tasks, such as name entity recognition and chunking. We believe that our study is the most exhaustive comparison made with combination methods applied to NLP tasks so far

idUS. Depósito de Investigación Universidad de Sevilla

Towards robust multi-tool tagging: an OWL/DL-based approach

Author: Chiarcos Christian
Publication venue
Publication date: 19/05/2023
Field of study

OPUS Augsburg