15,374 research outputs found
Languages, machines, and classical computation
3rd ed, 2021. A circumscription of the classical theory of computation building up from the Chomsky hierarchy. With the usual topics in formal language and automata theory
A Tutorial on the Expectation-Maximization Algorithm Including Maximum-Likelihood Estimation and EM Training of Probabilistic Context-Free Grammars
The paper gives a brief review of the expectation-maximization algorithm
(Dempster 1977) in the comprehensible framework of discrete mathematics. In
Section 2, two prominent estimation methods, the relative-frequency estimation
and the maximum-likelihood estimation are presented. Section 3 is dedicated to
the expectation-maximization algorithm and a simpler variant, the generalized
expectation-maximization algorithm. In Section 4, two loaded dice are rolled. A
more interesting example is presented in Section 5: The estimation of
probabilistic context-free grammars.Comment: Presented at the 15th European Summer School in Logic, Language and
Information (ESSLLI 2003). Example 5 extended (and partially corrected
A Context-theoretic Framework for Compositionality in Distributional Semantics
Techniques in which words are represented as vectors have proved useful in
many applications in computational linguistics, however there is currently no
general semantic formalism for representing meaning in terms of vectors. We
present a framework for natural language semantics in which words, phrases and
sentences are all represented as vectors, based on a theoretical analysis which
assumes that meaning is determined by context.
In the theoretical analysis, we define a corpus model as a mathematical
abstraction of a text corpus. The meaning of a string of words is assumed to be
a vector representing the contexts in which it occurs in the corpus model.
Based on this assumption, we can show that the vector representations of words
can be considered as elements of an algebra over a field. We note that in
applications of vector spaces to representing meanings of words there is an
underlying lattice structure; we interpret the partial ordering of the lattice
as describing entailment between meanings. We also define the context-theoretic
probability of a string, and, based on this and the lattice structure, a degree
of entailment between strings.
We relate the framework to existing methods of composing vector-based
representations of meaning, and show that our approach generalises many of
these, including vector addition, component-wise multiplication, and the tensor
product.Comment: Submitted to Computational Linguistics on 20th January 2010 for
revie
Weakly-Unambiguous Parikh Automata and Their Link to Holonomic Series
We investigate the connection between properties of formal languages and properties of their generating series, with a focus on the class of holonomic power series. We first prove a strong version of a conjecture by Castiglione and Massazza: weakly-unambiguous Parikh automata are equivalent to unambiguous two-way reversal bounded counter machines, and their multivariate generating series are holonomic. We then show that the converse is not true: we construct a language whose generating series is algebraic (thus holonomic), but which is inherently weakly-ambiguous as a Parikh automata language. Finally, we prove an effective decidability result for the inclusion problem for weakly-unambiguous Parikh automata, and provide an upper-bound on its complexity
On Measuring Non-Recursive Trade-Offs
We investigate the phenomenon of non-recursive trade-offs between
descriptional systems in an abstract fashion. We aim at categorizing
non-recursive trade-offs by bounds on their growth rate, and show how to deduce
such bounds in general. We also identify criteria which, in the spirit of
abstract language theory, allow us to deduce non-recursive tradeoffs from
effective closure properties of language families on the one hand, and
differences in the decidability status of basic decision problems on the other.
We develop a qualitative classification of non-recursive trade-offs in order to
obtain a better understanding of this very fundamental behaviour of
descriptional systems
Message-Passing Protocols for Real-World Parsing -- An Object-Oriented Model and its Preliminary Evaluation
We argue for a performance-based design of natural language grammars and
their associated parsers in order to meet the constraints imposed by real-world
NLP. Our approach incorporates declarative and procedural knowledge about
language and language use within an object-oriented specification framework. We
discuss several message-passing protocols for parsing and provide reasons for
sacrificing completeness of the parse in favor of efficiency based on a
preliminary empirical evaluation.Comment: 12 pages, uses epsfig.st
- …