Search CORE

17 research outputs found

Use of Weighted Finite State Transducers in Part of Speech Tagging

Author: Radev Dragomir R.
Tzoukermann Evelyne
Publication venue
Publication date: 01/01/1997
Field of study

This paper addresses issues in part of speech disambiguation using finite-state transducers and presents two main contributions to the field. One of them is the use of finite-state machines for part of speech tagging. Linguistic and statistical information is represented in terms of weights on transitions in weighted finite-state transducers. Another contribution is the successful combination of techniques -- linguistic and statistical -- for word disambiguation, compounded with the notion of word classes.Comment: uses psfig, ipamac

arXiv.org e-Print Archive

CiteSeerX

Verbal chunk extraction in French using limited resources

Author: Bes Gabriel G.
Lamadon Lionel
Trouilleux Francois
Publication venue
Publication date: 01/01/2004
Field of study

A way of extracting French verbal chunks, inflected and infinitive, is explored and tested on effective corpus. Declarative morphological and local grammar rules specifying chunks and some simple contextual structures are used, relying on limited lexical information and some simple heuristic/statistic properties obtained from restricted corpora. The specific goals, the architecture and the formalism of the system, the linguistic information on which it relies and the obtained results on effective corpus are presented

arXiv.org e-Print Archive

CiteSeerX

HAL Clermont Université

A Machine learning approach to POS tagging

Author: Màrquez Villodre Lluís
Padró Lluís
Rodríguez Hontoria Horacio
Publication venue
Publication date: 01/01/1997
Field of study

We have applied inductive learning of statistical decision trees and relaxation labelling to the Natural Language Processing (NLP) task of morphosyntactic disambiguation (Part Of Speech Tagging). The learning process is supervised and obtains a language model oriented to resolve POS ambiguities. This model consists of a set of statistical decision trees expressing distribution of tags and words in some relevant contexts. The acquired language models are complete enough to be directly used as sets of POS disambiguation rules, and include more complex contextual information than simple collections of n-grams usually used in statistical taggers. We have implemented a quite simple and fast tagger that has been tested and evaluated on the Wall Street Journal (WSJ) corpus with a remarkable accuracy. However, better results can be obtained by translating the trees into rules to feed a flexible relaxation labelling based tagger. In this direction we describe a tagger which is able to use information of any kind (n-grams, automatically acquired constraints, linguistically motivated manually written constraints, etc.), and in particular to incorporate the machine learned decision trees. Simultaneously, we address the problem of tagging when only small training material is available, which is crucial in any process of constructing, from scratch, an annotated corpus. We show that quite high accuracy can be achieved with our system in this situation.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

A Bare-bones Constraint Grammar

Author: Bick Eckhard
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Waseda University Repository

University of Southern Denmark Research Output

Towards learning a constraint grammar from annotated corpora using decision trees

Author: Màrquez Villodre Lluís
Rodríguez Hontoria Horacio
Publication venue
Publication date: 01/01/1996
Field of study

Inside the framework of robust parsers for the syntactic analysis of unrestricted text, the aim of this work is the construction of a system capable of automatically learning Constraint Grammar rules from a POS annotated Corpus. The system presented is able by now to acquire constraint rules for POS tagging and we plan to extend it to cover syntactic rules. The learning process uses a supervised learning algorithm based on building a discrimination forest, with a decision tree attached to each case of POS ambiguity. The system has been applied to four representative cases of ambiguity performing on a Spanish Corpus. The results obtained in these experiments and some discussion about the appropriateness of the proposed learning technique are presented in this paper.Postprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Automatic disambiguation of morphosyntax in spoken language corpora

Author: A. Andreewsky
A. L. Theakston
B. MacWhinney
B. MacWhinney
B. MacWhinney
B. Merialdo
C. Parisse
C. S. A. M. Chevrie-Muller
D. Crystal
D. V. M. Bishop
E. Brill
E. Charniak
J. A. Rondal
J. F. Miller
J. F. Miller
L. Baker-Van den Goorbergh
L. Baker-Van den Goorbergh
M. Perkins
M. T. Normand Le
S. H. Long
S. H. Long
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Естественно-языковой интерфейс интеллектуальных систем. Лабораторный практикум : пособие

Author: Крапивин Ю. Б.
Publication venue: БГУИР
Publication date: 01/01/2023
Field of study

Сформулированы основные положения, касающиеся теории и лабораторного практикума по дисциплине «Естественно-языковой интерфейс интеллектуальных систем», приведен подробный теоретический материал по курсу, даются рекомендации по выполнению лабораторных работ

Belarusian State University of Informatics and Radioelectronics Repository

Conocimiento de la lengua y técnicas estadísticas en el análisis lingüístico

Author: Beltrán Celina
Bès Gabriel G.
Solana Zulema
Publication venue: HAL CCSD
Publication date: 01/01/2005
Field of study

International audienceSon comparados los resultados obtenidos sobre un mismo corpus en la tarea del POS tagging por dos sistemas orientados por enfoques diferentes en lingüística computacional, el uno orientado por el Conocimiento de la lengua (sistema CL) y el otro por Técnicas estadísticas (sistema EST). Se trata de no limitarse a consideraciones globales sobre el « costo » de obtención de los dos tipos de resultados, noción mal definida, ni a cotejar resultados globales, sino de poner en relación los resultados obtenidos con las características lingüísticas involucradas. La problemática de la comparación es clarificada, los sistemas CL y EST presentados, la metodología de la comparación definida y los resultados obtenidos presentados. En el caso comparado, el sistema CL ofrece mejores resultados, pero la conclusión más interesante es la posibilidad de establecer correlaciones entre aspectos de la estructura lingüística y resultados obtenidos por técnicas estadísticas

HAL Clermont Université

Desarrollo, implementación y utilización de modelos para el procesamiento automático de textos

Author: Beltrán Celina
Bender Cristina
Bonino Rodolfo
Bés Gabriel G.
Castel Víctor M.
Chiari Mario
Deco Claudia
González Capdevila Gustavo A.
Guillot Daniel E.
Infante-López Gabriel
Parodi S. Giovanni
Perló Liliana
Rodrigo Andrea
Saer Jorge
Solana Zulema
Valenti Viviana
Vilela Demetrio
Publication venue: Ediciones Biblioteca Digital UNCuyo
Publication date: 01/01/2005
Field of study

El libro recoge ponencias y talleres seleccionados de JALIMI 2005 (Jornadas Argentinas de Lingüística Informática: Modelización e Ingeniería), y está organizado en nueve capítulos y un apéndice. Si bien hay sustantivas diferencias en los enfoques, las metodologías, las propiedades específicas estudiadas y las aplicaciones propuestas o proyectadas, todos los capítulos comunican resultados de investigaciones que pretenden contribuir a alcanzar el objetivo a largo plazo de la Lingüística Informática, a saber: emular en términos cibernéticos la extraordinaria capacidad humana de producir y comprender textos en lengua natural

Repositorio OAI Biblioteca Digital Universidad Nacional de Cuyo