2 research outputs found
Library of Practical Abstractions, Release 1.2
The library of practical abstractions (LIBPA) provides efficient
implementations of conceptually simple abstractions, in the C programming
language. We believe that the best library code is conceptually simple so that
it will be easily understood by the application programmer; parameterized by
type so that it enjoys wide applicability; and at least as efficient as a
straightforward special-purpose implementation. You will find that our software
satisfies the highest standards of software design, implementation, testing,
and benchmarking.
The current LIBPA release is a source code distribution only. It consists of
modules for portable memory management, one dimensional arrays of arbitrary
types, compact symbol tables, hash tables for arbitrary types, a trie module
for length-delimited strings over arbitrary alphabets, single precision
floating point numbers with extended exponents, and logarithmic representations
of probability values using either fixed or floating point numbers.
We have used LIBPA to implement a wide range of statistical models for both
continuous and discrete domains. The time and space efficiency of LIBPA has
allowed us to build larger statistical models than previously reported, and to
investigate more computationally-intensive techniques than previously possible.
We have found LIBPA to be indispensible in our own research, and hope that you
will find it useful in yours.Comment: 19 pages, texinfo forma
Maximum Entropy Modeling Toolkit
The Maximum Entropy Modeling Toolkit supports parameter estimation and
prediction for statistical language models in the maximum entropy framework.
The maximum entropy framework provides a constructive method for obtaining the
unique conditional distribution p*(y|x) that satisfies a set of linear
constraints and maximizes the conditional entropy H(p|f) with respect to the
empirical distribution f(x). The maximum entropy distribution p*(y|x) also has
a unique parametric representation in the class of exponential models, as
m(y|x) = r(y|x)/Z(x) where the numerator m(y|x) = prod_i alpha_i^g_i(x,y) is a
product of exponential weights, with alpha_i = exp(lambda_i), and the
denominator Z(x) = sum_y r(y|x) is required to satisfy the axioms of
probability.
This manual explains how to build maximum entropy models for discrete domains
with the Maximum Entropy Modeling Toolkit (MEMT). First we summarize the steps
necessary to implement a language model using the toolkit. Next we discuss the
executables provided by the toolkit and explain the file formats required by
the toolkit. Finally, we review the maximum entropy framework and apply it to
the problem of statistical language modeling.
Keywords: statistical language models, maximum entropy, exponential models,
improved iterative scaling, Markov models, triggers.Comment: 32 pages, texinfo forma