6,189 research outputs found
Grammar induction for mildly context sensitive languages using variational Bayesian inference
The following technical report presents a formal approach to probabilistic
minimalist grammar induction. We describe a formalization of a minimalist
grammar. Based on this grammar, we define a generative model for minimalist
derivations. We then present a generalized algorithm for the application of
variational Bayesian inference to lexicalized mildly context sensitive language
grammars which in this paper is applied to the previously defined minimalist
grammar
Probabilistic Constraint Logic Programming
This paper addresses two central problems for probabilistic processing
models: parameter estimation from incomplete data and efficient retrieval of
most probable analyses. These questions have been answered satisfactorily only
for probabilistic regular and context-free models. We address these problems
for a more expressive probabilistic constraint logic programming model. We
present a log-linear probability model for probabilistic constraint logic
programming. On top of this model we define an algorithm to estimate the
parameters and to select the properties of log-linear models from incomplete
data. This algorithm is an extension of the improved iterative scaling
algorithm of Della-Pietra, Della-Pietra, and Lafferty (1995). Our algorithm
applies to log-linear models in general and is accompanied with suitable
approximation methods when applied to large data spaces. Furthermore, we
present an approach for searching for most probable analyses of the
probabilistic constraint logic programming model. This method can be applied to
the ambiguity resolution problem in natural language processing applications.Comment: 35 pages, uses sfbart.cl
Unsupervised Extraction of Representative Concepts from Scientific Literature
This paper studies the automated categorization and extraction of scientific
concepts from titles of scientific articles, in order to gain a deeper
understanding of their key contributions and facilitate the construction of a
generic academic knowledgebase. Towards this goal, we propose an unsupervised,
domain-independent, and scalable two-phase algorithm to type and extract key
concept mentions into aspects of interest (e.g., Techniques, Applications,
etc.). In the first phase of our algorithm we propose PhraseType, a
probabilistic generative model which exploits textual features and limited POS
tags to broadly segment text snippets into aspect-typed phrases. We extend this
model to simultaneously learn aspect-specific features and identify academic
domains in multi-domain corpora, since the two tasks mutually enhance each
other. In the second phase, we propose an approach based on adaptor grammars to
extract fine grained concept mentions from the aspect-typed phrases without the
need for any external resources or human effort, in a purely data-driven
manner. We apply our technique to study literature from diverse scientific
domains and show significant gains over state-of-the-art concept extraction
techniques. We also present a qualitative analysis of the results obtained.Comment: Published as a conference paper at CIKM 201
Probabilistic Parsing Strategies
We present new results on the relation between purely symbolic context-free
parsing strategies and their probabilistic counter-parts. Such parsing
strategies are seen as constructions of push-down devices from grammars. We
show that preservation of probability distribution is possible under two
conditions, viz. the correct-prefix property and the property of strong
predictiveness. These results generalize existing results in the literature
that were obtained by considering parsing strategies in isolation. From our
general results we also derive negative results on so-called generalized LR
parsing.Comment: 36 pages, 1 figur
A Bibliography on Fuzzy Automata, Grammars and Lanuages
This bibliography contains references to papers on fuzzy formal languages, the generation of fuzzy languages by means of fuzzy grammars, the recognition of fuzzy languages by fuzzy automata and machines, as well as some applications of fuzzy set theory to syntactic pattern recognition, linguistics and natural language processing
Synthesizing Program Input Grammars
We present an algorithm for synthesizing a context-free grammar encoding the
language of valid program inputs from a set of input examples and blackbox
access to the program. Our algorithm addresses shortcomings of existing grammar
inference algorithms, which both severely overgeneralize and are prohibitively
slow. Our implementation, GLADE, leverages the grammar synthesized by our
algorithm to fuzz test programs with structured inputs. We show that GLADE
substantially increases the incremental coverage on valid inputs compared to
two baseline fuzzers
- …