14,659 research outputs found
Label Pre-annotation for Building Non-projective Dependency Treebanks for French
posterInternational audienceThe current interest in accurate dependency parsing make it necessary to build dependency treebanks for French containing both projective and non-projective dependencies. In order to alleviate the work of the annotator, we propose to automatically pre-annotate the sentences with the labels of the dependencies ending on the words. The selection of the dependency labels reduces the ambiguity of the parsing. We show that a maximum entropy Markov model method reaches the label accuracy score of a standard dependency parser (MaltParser). Moreover, this method allows to find more than one label per word, i.e. the more probable ones, in order to improve the recall score. It improves the quality of the parsing step of the annotation process. Therefore, the inclusion of the method in the process of annotation makes the work quicker and more natural to annotators
Maximum Entropy Models For Natural Language Ambiguity Resolution
This thesis demonstrates that several important kinds of natural language ambiguities can be resolved to state-of-the-art accuracies using a single statistical modeling technique based on the principle of maximum entropy.
We discuss the problems of sentence boundary detection, part-of-speech tagging, prepositional phrase attachment, natural language parsing, and text categorization under the maximum entropy framework. In practice, we have found that maximum entropy models offer the following advantages:
State-of-the-art Accuracy: The probability models for all of the tasks discussed perform at or near state-of-the-art accuracies, or outperform competing learning algorithms when trained and tested under similar conditions. Methods which outperform those presented here require much more supervision in the form of additional human involvement or additional supporting resources.
Knowledge-Poor Features: The facts used to model the data, or features, are linguistically very simple, or knowledge-poor but yet succeed in approximating complex linguistic relationships.
Reusable Software Technology: The mathematics of the maximum entropy framework are essentially independent of any particular task, and a single software implementation can be used for all of the probability models in this thesis.
The experiments in this thesis suggest that experimenters can obtain state-of-the-art accuracies on a wide range of natural language tasks, with little task-specific effort, by using maximum entropy probability models
A Maximum-Entropy Partial Parser for Unrestricted Text
This paper describes a partial parser that assigns syntactic structures to
sequences of part-of-speech tags. The program uses the maximum entropy
parameter estimation method, which allows a flexible combination of different
knowledge sources: the hierarchical structure, parts of speech and phrasal
categories. In effect, the parser goes beyond simple bracketing and recognises
even fairly complex structures. We give accuracy figures for different
applications of the parser.Comment: 9 pages, LaTe
Learning Language from a Large (Unannotated) Corpus
A novel approach to the fully automated, unsupervised extraction of
dependency grammars and associated syntax-to-semantic-relationship mappings
from large text corpora is described. The suggested approach builds on the
authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well
as on a number of prior papers and approaches from the statistical language
learning literature. If successful, this approach would enable the mining of
all the information needed to power a natural language comprehension and
generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa
Probabilistic Constraint Logic Programming
This paper addresses two central problems for probabilistic processing
models: parameter estimation from incomplete data and efficient retrieval of
most probable analyses. These questions have been answered satisfactorily only
for probabilistic regular and context-free models. We address these problems
for a more expressive probabilistic constraint logic programming model. We
present a log-linear probability model for probabilistic constraint logic
programming. On top of this model we define an algorithm to estimate the
parameters and to select the properties of log-linear models from incomplete
data. This algorithm is an extension of the improved iterative scaling
algorithm of Della-Pietra, Della-Pietra, and Lafferty (1995). Our algorithm
applies to log-linear models in general and is accompanied with suitable
approximation methods when applied to large data spaces. Furthermore, we
present an approach for searching for most probable analyses of the
probabilistic constraint logic programming model. This method can be applied to
the ambiguity resolution problem in natural language processing applications.Comment: 35 pages, uses sfbart.cl
- …