17 research outputs found
Use of Weighted Finite State Transducers in Part of Speech Tagging
This paper addresses issues in part of speech disambiguation using
finite-state transducers and presents two main contributions to the field. One
of them is the use of finite-state machines for part of speech tagging.
Linguistic and statistical information is represented in terms of weights on
transitions in weighted finite-state transducers. Another contribution is the
successful combination of techniques -- linguistic and statistical -- for word
disambiguation, compounded with the notion of word classes.Comment: uses psfig, ipamac
Verbal chunk extraction in French using limited resources
A way of extracting French verbal chunks, inflected and infinitive, is
explored and tested on effective corpus. Declarative morphological and local
grammar rules specifying chunks and some simple contextual structures are used,
relying on limited lexical information and some simple heuristic/statistic
properties obtained from restricted corpora. The specific goals, the
architecture and the formalism of the system, the linguistic information on
which it relies and the obtained results on effective corpus are presented
A Machine learning approach to POS tagging
We have applied inductive learning of statistical decision trees
and relaxation labelling to the Natural Language Processing (NLP)
task of morphosyntactic disambiguation (Part Of Speech Tagging).
The learning process is supervised and obtains a language
model oriented to resolve POS ambiguities. This model consists
of a set of statistical decision trees expressing distribution of
tags and words in some relevant contexts.
The acquired language models are complete enough to be directly
used as sets of POS disambiguation rules, and include more complex
contextual information than simple collections of n-grams usually
used in statistical taggers.
We have implemented a quite simple and fast tagger that has been
tested and evaluated on the Wall Street Journal (WSJ) corpus with
a remarkable accuracy.
However, better results can be obtained by translating the trees
into rules to feed a flexible relaxation labelling based tagger.
In this direction we describe a tagger which is able to use
information of any kind (n-grams, automatically acquired constraints,
linguistically motivated manually written constraints, etc.), and in
particular to incorporate the machine learned decision trees.
Simultaneously, we address the problem of tagging when only
small training material is available, which is crucial in any process
of constructing, from scratch, an annotated corpus. We show that quite
high accuracy can be achieved with our system in this situation.Postprint (published version
Towards learning a constraint grammar from annotated corpora using decision trees
Inside the framework of robust parsers for the syntactic analysis of
unrestricted text, the aim of this work is the construction of a system
capable of automatically learning Constraint Grammar rules from a POS
annotated Corpus. The system presented is able by now to acquire constraint
rules for POS tagging and we plan to extend it to cover syntactic rules.
The learning process uses a supervised learning algorithm based on
building a discrimination forest, with a decision tree attached to each
case of POS ambiguity. The system has been applied to four representative
cases of ambiguity performing on a Spanish Corpus. The results obtained
in these experiments and some discussion about the appropriateness of the
proposed learning technique are presented in this paper.Postprint (published version
Естественно-языковой интерфейс интеллектуальных систем. Лабораторный практикум : пособие
Сформулированы основные положения, касающиеся теории и лабораторного
практикума по дисциплине «Естественно-языковой интерфейс интеллектуальных
систем», приведен подробный теоретический материал по курсу, даются
рекомендации по выполнению лабораторных работ
Conocimiento de la lengua y técnicas estadísticas en el análisis lingüístico
International audienceSon comparados los resultados obtenidos sobre un mismo corpus en la tarea del POS tagging por dos sistemas orientados por enfoques diferentes en lingüística computacional, el uno orientado por el Conocimiento de la lengua (sistema CL) y el otro por Técnicas estadísticas (sistema EST). Se trata de no limitarse a consideraciones globales sobre el « costo » de obtención de los dos tipos de resultados, noción mal definida, ni a cotejar resultados globales, sino de poner en relación los resultados obtenidos con las características lingüísticas involucradas. La problemática de la comparación es clarificada, los sistemas CL y EST presentados, la metodología de la comparación definida y los resultados obtenidos presentados. En el caso comparado, el sistema CL ofrece mejores resultados, pero la conclusión más interesante es la posibilidad de establecer correlaciones entre aspectos de la estructura lingüística y resultados obtenidos por técnicas estadísticas
Desarrollo, implementación y utilización de modelos para el procesamiento automático de textos
El libro recoge ponencias y talleres seleccionados de JALIMI 2005 (Jornadas Argentinas de Lingüística Informática: Modelización e Ingeniería), y está organizado en nueve capítulos y un apéndice. Si bien hay sustantivas diferencias en los enfoques, las metodologías, las propiedades específicas estudiadas y las aplicaciones propuestas o proyectadas, todos los capítulos comunican resultados de investigaciones que pretenden contribuir a alcanzar el objetivo a largo plazo de la Lingüística Informática, a saber: emular en términos cibernéticos la extraordinaria capacidad humana de producir y comprender textos en lengua natural