902 research outputs found
Chunk Tagger - Statistical Recognition of Noun Phrases
We describe a stochastic approach to partial parsing, i.e., the recognition
of syntactic structures of limited depth. The technique utilises Markov Models,
but goes beyond usual bracketing approaches, since it is capable of recognising
not only the boundaries, but also the internal structure and syntactic category
of simple as well as complex NP's, PP's, AP's and adverbials. We compare
tagging accuracy for different applications and encoding schemes.Comment: 7 pages, LaTe
Introduction to the CoNLL-2000 Shared Task: Chunking
We describe the CoNLL-2000 shared task: dividing text into syntactically
related non-overlapping groups of words, so-called text chunking. We give
background information on the data sets, present a general overview of the
systems that have taken part in the shared task and briefly discuss their
performance.Comment: 6 page
Memory-Based Shallow Parsing
We present memory-based learning approaches to shallow parsing and apply
these to five tasks: base noun phrase identification, arbitrary base phrase
recognition, clause detection, noun phrase parsing and full parsing. We use
feature selection techniques and system combination methods for improving the
performance of the memory-based learner. Our approach is evaluated on standard
data sets and the results are compared with that of other systems. This reveals
that our approach works well for base phrase identification while its
application towards recognizing embedded structures leaves some room for
improvement
Tagging Complex Non-Verbal German Chunks with Conditional Random Fields
We report on chunk tagging methods for German that recognize complex non-verbal phrases using structural chunk tags with Conditional Random Fields (CRFs). This state-of-the-art method for sequence classification achieves 93.5% accuracy on newspaper text. For the same task, a classical trigram tagger approach based on Hidden Markov Models reaches a baseline of 88.1%. CRFs allow for a clean and principled integration of linguistic knowledge such as part-of-speech tags, morphological constraints and lemmas. The structural chunk tags encode phrase structures up to a depth of 3 syntactic nodes. They include complex prenominal and postnominal modifiers that occur frequently in German noun phrases
Chunking clinical text containing non-canonical language
Free text notes typed by primary care physicians during patient consultations typically contain highly non-canonical language. Shallow syntactic analysis of free text notes can help to reveal valuable information for the study of disease and treatment. We present an exploratory study into chunking such text using off-the-shelf language processing tools and pre-trained statistical models. We evaluate chunking accuracy with respect to part-of-speech tagging quality, choice of chunk representation, and breadth of context features. Our results indicate that narrow context feature windows give the best results, but that chunk representation and minor differences in tagging quality do not have a significant impact on chunking accuracy
A hybrid architecture for robust parsing of german
This paper provides an overview of current research on a hybrid and robust parsing architecture for the morphological, syntactic and semantic annotation of German text corpora. The novel contribution of this research lies not in the individual parsing modules, each of which relies on state-of-the-art algorithms and techniques. Rather what is new about the present approach is the combination of these modules into a single architecture. This combination provides a means to significantly optimize the performance of each component, resulting in an increased accuracy of annotation
A Maximum-Entropy Partial Parser for Unrestricted Text
This paper describes a partial parser that assigns syntactic structures to
sequences of part-of-speech tags. The program uses the maximum entropy
parameter estimation method, which allows a flexible combination of different
knowledge sources: the hierarchical structure, parts of speech and phrasal
categories. In effect, the parser goes beyond simple bracketing and recognises
even fairly complex structures. We give accuracy figures for different
applications of the parser.Comment: 9 pages, LaTe
From chunks to function-argument structure : a similarity-based approach
Chunk parsing has focused on the recognition of partial constituent structures at the level of individual chunks. Little attention has been paid to the question of how such partial analyses can be combined into larger structures for complete utterances. Such larger structures are not only desirable for a deeper syntactic analysis. They also constitute a necessary prerequisite for assigning function-argument structure. The present paper offers a similaritybased algorithm for assigning functional labels such as subject, object, head, complement, etc. to complete syntactic structures on the basis of prechunked input. The evaluation of the algorithm has concentrated on measuring the quality of functional labels. It was performed on a German and an English treebank using two different annotation schemes at the level of function argument structure. The results of 89.73% correct functional labels for German and 90.40%for English validate the general approach
- …