87 research outputs found
A hybrid architecture for robust parsing of german
This paper provides an overview of current research on a hybrid and robust parsing architecture for the morphological, syntactic and semantic annotation of German text corpora. The novel contribution of this research lies not in the individual parsing modules, each of which relies on state-of-the-art algorithms and techniques. Rather what is new about the present approach is the combination of these modules into a single architecture. This combination provides a means to significantly optimize the performance of each component, resulting in an increased accuracy of annotation
Meta-Learning for Phonemic Annotation of Corpora
We apply rule induction, classifier combination and meta-learning (stacked
classifiers) to the problem of bootstrapping high accuracy automatic annotation
of corpora with pronunciation information. The task we address in this paper
consists of generating phonemic representations reflecting the Flemish and
Dutch pronunciations of a word on the basis of its orthographic representation
(which in turn is based on the actual speech recordings). We compare several
possible approaches to achieve the text-to-pronunciation mapping task:
memory-based learning, transformation-based learning, rule induction, maximum
entropy modeling, combination of classifiers in stacked learning, and stacking
of meta-learners. We are interested both in optimal accuracy and in obtaining
insight into the linguistic regularities involved. As far as accuracy is
concerned, an already high accuracy level (93% for Celex and 86% for Fonilex at
word level) for single classifiers is boosted significantly with additional
error reductions of 31% and 38% respectively using combination of classifiers,
and a further 5% using combination of meta-learners, bringing overall word
level accuracy to 96% for the Dutch variant and 92% for the Flemish variant. We
also show that the application of machine learning methods indeed leads to
increased insight into the linguistic regularities determining the variation
between the two pronunciation variants studied.Comment: 8 page
A Comparative Study of Classifier Combination Methods Applied to NLP Tasks
There are many classification tools that can be used for various
NLP tasks, although none of them can be considered the best of
all since each one has a particular list of virtues and defects. The combination
methods can serve both to maximize the strengths of the base
classifiers and to reduce errors caused by their defects improving the
results in terms of accuracy. Here is a comparative study on the most
relevant methods that shows that combination seems to be a robust and
reliable way of improving our results
Evaluating parts-of-speech taggers for use in a text-to-scene conversion system
This paper presents parts-of-speech tagging as a first step towards an autonomous text-to-scene conversion system. It categorizes some freely available taggers, according to the techniques used by each in order to automatically identify word-classes. In addition, the performance of each identified tagger is verified experimentally. The SUSANNE corpus is used for testing and reveals the complexity of working with different tagsets, resulting in substantially lower accuracies in our tests than in those reported by the developers of each tagger. The taggers are then grouped to form a voting system to attempt to raise accuracies, but in no cases do the combined results improve upon the individual accuracies. Additionally a new metric, agreement, is tentatively proposed as an indication of confidence in the output of a group of taggers where such output cannot be validated
Ensemble Morphosyntactic Analyser for Classical Arabic
In Modern Standard Arabic text (MSA), there are at least seven available morphological analysers (MA). Several Part-of-Speech (POS) taggers use these MAs to improve accuracy. However, the choice between these analysers is challenging, and there is none designed for Classical Arabic. Several morphological analysers have been studied and combined to be evaluated on a common ground. The goal of our language resource is to build a freely accessible multi-component toolkit (named SAWAREF1) for part-of-speech tagging and morphological analysers that can provide a comparative evaluation, standardise the outputs of each component, combine different solutions, and analyse and vote for the best candidates. We illustrate the use of SAWAREF in tagging adjectives and shows how accuracy of tagging adjectives is still very low. This paper describes the research method and design, and discusses the key issues and obstacles
A comparative study of classifier combination applied to NLP tasks
The paper is devoted to a comparative study of classifier combination methods, which have been successfully
applied to multiple tasks including Natural Language Processing (NLP) tasks. There is variety of classifier
combination techniques and the major difficulty is to choose one that is the best fit for a particular
task. In our study we explored the performance of a number of combination methods such as voting,
Bayesian merging, behavior knowledge space, bagging, stacking, feature sub-spacing and cascading, for
the part-of-speech tagging task using nine corpora in five languages. The results show that some methods
that, currently, are not very popular could demonstrate much better performance. In addition, we learned
how the corpus size and quality influence the combination methods performance. We also provide the
results of applying the classifier combination methods to the other NLP tasks, such as name entity recognition
and chunking. We believe that our study is the most exhaustive comparison made with combination
methods applied to NLP tasks so far
- …