6,538 research outputs found
Diffusion of Context and Credit Information in Markovian Models
This paper studies the problem of ergodicity of transition probability
matrices in Markovian models, such as hidden Markov models (HMMs), and how it
makes very difficult the task of learning to represent long-term context for
sequential data. This phenomenon hurts the forward propagation of long-term
context information, as well as learning a hidden state representation to
represent long-term context, which depends on propagating credit information
backwards in time. Using results from Markov chain theory, we show that this
problem of diffusion of context and credit is reduced when the transition
probabilities approach 0 or 1, i.e., the transition probability matrices are
sparse and the model essentially deterministic. The results found in this paper
apply to learning approaches based on continuous optimization, such as gradient
descent and the Baum-Welch algorithm.Comment: See http://www.jair.org/ for any accompanying file
Optimal Kullback-Leibler Aggregation via Information Bottleneck
In this paper, we present a method for reducing a regular, discrete-time
Markov chain (DTMC) to another DTMC with a given, typically much smaller number
of states. The cost of reduction is defined as the Kullback-Leibler divergence
rate between a projection of the original process through a partition function
and a DTMC on the correspondingly partitioned state space. Finding the reduced
model with minimal cost is computationally expensive, as it requires an
exhaustive search among all state space partitions, and an exact evaluation of
the reduction cost for each candidate partition. Our approach deals with the
latter problem by minimizing an upper bound on the reduction cost instead of
minimizing the exact cost; The proposed upper bound is easy to compute and it
is tight if the original chain is lumpable with respect to the partition. Then,
we express the problem in the form of information bottleneck optimization, and
propose using the agglomerative information bottleneck algorithm for searching
a sub-optimal partition greedily, rather than exhaustively. The theory is
illustrated with examples and one application scenario in the context of
modeling bio-molecular interactions.Comment: 13 pages, 4 figure
Telescoping Recursive Representations and Estimation of Gauss-Markov Random Fields
We present \emph{telescoping} recursive representations for both continuous
and discrete indexed noncausal Gauss-Markov random fields. Our recursions start
at the boundary (a hypersurface in , ) and telescope inwards.
For example, for images, the telescoping representation reduce recursions from
to , i.e., to recursions on a single dimension. Under
appropriate conditions, the recursions for the random field are linear
stochastic differential/difference equations driven by white noise, for which
we derive recursive estimation algorithms, that extend standard algorithms,
like the Kalman-Bucy filter and the Rauch-Tung-Striebel smoother, to noncausal
Markov random fields.Comment: To appear in the Transactions on Information Theor
kLog: A Language for Logical and Relational Learning with Kernels
We introduce kLog, a novel approach to statistical relational learning.
Unlike standard approaches, kLog does not represent a probability distribution
directly. It is rather a language to perform kernel-based learning on
expressive logical and relational representations. kLog allows users to specify
learning problems declaratively. It builds on simple but powerful concepts:
learning from interpretations, entity/relationship data modeling, logic
programming, and deductive databases. Access by the kernel to the rich
representation is mediated by a technique we call graphicalization: the
relational representation is first transformed into a graph --- in particular,
a grounded entity/relationship diagram. Subsequently, a choice of graph kernel
defines the feature space. kLog supports mixed numerical and symbolic data, as
well as background knowledge in the form of Prolog or Datalog programs as in
inductive logic programming systems. The kLog framework can be applied to
tackle the same range of tasks that has made statistical relational learning so
popular, including classification, regression, multitask learning, and
collective classification. We also report about empirical comparisons, showing
that kLog can be either more accurate, or much faster at the same level of
accuracy, than Tilde and Alchemy. kLog is GPLv3 licensed and is available at
http://klog.dinfo.unifi.it along with tutorials
- …