3,693 research outputs found
An automatic part-of-speech tagger for Middle Low German
Syntactically annotated corpora are highly important for enabling large-scale diachronic and diatopic language research. Such corpora have recently been developed for a variety of historical languages, or are still under development. One of those under development is the fully tagged and parsed Corpus of Historical Low German (CHLG), which is aimed at facilitating research into the highly under-researched diachronic syntax of Low German. The present paper reports on a crucial step in creating the corpus, viz. the creation of a part-of-speech tagger for Middle Low German (MLG). Having been transmitted in several non-standardised written varieties, MLG poses a challenge to standard POS taggers, which usually rely on normalized spelling. We outline the major issues faced in the creation of the tagger and present our solutions to them
Using Genetic Algorithms for Building Metrics of Collaborative Systems
he paper objective is to reveal the importance of genetic algorithms in building robust metrics of collaborative systems. The main types of collaborative systems in economy are presented and some characteristics of genetic algorithms are described. A genetic algorithm was implemented in order to determine the local maximum and minimum points of the relative complexity function associated to a collaborative banking system. The intelligent collaborative systems based on genetic algorithms, representing the new generation of collaborative systems, are analyzed and the implementation of auto-adaptive interfaces in a banking application is described.Collaborative Systems, Genetic Algorithms, Metrics, Banking, Auto-Adaptive Interfaces
Genetic Algorithm (GA) in Feature Selection for CRF Based Manipuri Multiword Expression (MWE) Identification
This paper deals with the identification of Multiword Expressions (MWEs) in
Manipuri, a highly agglutinative Indian Language. Manipuri is listed in the
Eight Schedule of Indian Constitution. MWE plays an important role in the
applications of Natural Language Processing(NLP) like Machine Translation, Part
of Speech tagging, Information Retrieval, Question Answering etc. Feature
selection is an important factor in the recognition of Manipuri MWEs using
Conditional Random Field (CRF). The disadvantage of manual selection and
choosing of the appropriate features for running CRF motivates us to think of
Genetic Algorithm (GA). Using GA we are able to find the optimal features to
run the CRF. We have tried with fifty generations in feature selection along
with three fold cross validation as fitness function. This model demonstrated
the Recall (R) of 64.08%, Precision (P) of 86.84% and F-measure (F) of 73.74%,
showing an improvement over the CRF based Manipuri MWE identification without
GA application.Comment: 14 pages, 6 figures, see
http://airccse.org/journal/jcsit/1011csit05.pd
Combined optimization of feature selection and algorithm parameters in machine learning of language
Comparative machine learning experiments have become an important methodology in empirical approaches to natural language processing (i) to investigate which machine learning algorithms have the 'right bias' to solve specific natural language processing tasks, and (ii) to investigate which sources of information add to accuracy in a learning approach. Using automatic word sense disambiguation as an example task, we show that with the methodology currently used in comparative machine learning experiments, the results may often not be reliable because of the role of and interaction between feature selection and algorithm parameter optimization. We propose genetic algorithms as a practical approach to achieve both higher accuracy within a single approach, and more reliable comparisons
- …