3,486 research outputs found
Introduction to the CoNLL-2000 Shared Task: Chunking
We describe the CoNLL-2000 shared task: dividing text into syntactically
related non-overlapping groups of words, so-called text chunking. We give
background information on the data sets, present a general overview of the
systems that have taken part in the shared task and briefly discuss their
performance.Comment: 6 page
A Maximum-Entropy Partial Parser for Unrestricted Text
This paper describes a partial parser that assigns syntactic structures to
sequences of part-of-speech tags. The program uses the maximum entropy
parameter estimation method, which allows a flexible combination of different
knowledge sources: the hierarchical structure, parts of speech and phrasal
categories. In effect, the parser goes beyond simple bracketing and recognises
even fairly complex structures. We give accuracy figures for different
applications of the parser.Comment: 9 pages, LaTe
Chunk Tagger - Statistical Recognition of Noun Phrases
We describe a stochastic approach to partial parsing, i.e., the recognition
of syntactic structures of limited depth. The technique utilises Markov Models,
but goes beyond usual bracketing approaches, since it is capable of recognising
not only the boundaries, but also the internal structure and syntactic category
of simple as well as complex NP's, PP's, AP's and adverbials. We compare
tagging accuracy for different applications and encoding schemes.Comment: 7 pages, LaTe
A MT System from Turkmen to Turkish employing finite state and statistical methods
In this work, we present a MT system from Turkmen to Turkish. Our system exploits the similarity of the languages by using a modified version of direct translation method. However, the complex inflectional and derivational morphology of the Turkic languages necessitate special treatment for word-by-word translation model. We also employ morphology-aware multi-word processing and statistical disambiguation processes in our system. We believe that this approach is valid for most of the Turkic languages and the architecture implemented using FSTs can be easily extended to those languages
Use of Weighted Finite State Transducers in Part of Speech Tagging
This paper addresses issues in part of speech disambiguation using
finite-state transducers and presents two main contributions to the field. One
of them is the use of finite-state machines for part of speech tagging.
Linguistic and statistical information is represented in terms of weights on
transitions in weighted finite-state transducers. Another contribution is the
successful combination of techniques -- linguistic and statistical -- for word
disambiguation, compounded with the notion of word classes.Comment: uses psfig, ipamac
- …