762 research outputs found
On the Prior and Posterior Distributions Used in Graphical Modelling
Graphical model learning and inference are often performed using Bayesian
techniques. In particular, learning is usually performed in two separate steps.
First, the graph structure is learned from the data; then the parameters of the
model are estimated conditional on that graph structure. While the probability
distributions involved in this second step have been studied in depth, the ones
used in the first step have not been explored in as much detail.
In this paper, we will study the prior and posterior distributions defined
over the space of the graph structures for the purpose of learning the
structure of a graphical model. In particular, we will provide a
characterisation of the behaviour of those distributions as a function of the
possible edges of the graph. We will then use the properties resulting from
this characterisation to define measures of structural variability for both
Bayesian and Markov networks, and we will point out some of their possible
applications.Comment: 28 pages, 6 figure
Generalized Measure of Entropy, Mathai's Distributional Pathway Model, and Tsallis Statistics
The pathway model of Mathai (2005) mainly deals with the rectangular
matrix-variate case. In this paper the scalar version is shown to be associated
with a large number of probability models used in physics. Different families
of densities are listed here, which are all connected through the pathway
parameter 'alpha', generating a distributional pathway. The idea is to switch
from one functional form to another through this parameter and it is shown that
basically one can proceed from the generalized type-1 beta family to
generalized type-2 beta family to generalized gamma family when the real
variable is positive and a wider set of families when the variable can take
negative values also. For simplicity, only the real scalar case is discussed
here but corresponding families are available when the variable is in the
complex domain. A large number of densities used in physics are shown to be
special cases of or associated with the pathway model. It is also shown that
the pathway model is available by maximizing a generalized measure of entropy,
leading to an entropic pathway. Particular cases of the pathway model are shown
to cover Tsallis statistics (Tsallis, 1988) and the superstatistics introduced
by Beck and Cohen (2003).Comment: LaTeX, 13 pages, title changed, introduction, conclusions, and
references update
Log-concavity, ultra-log-concavity, and a maximum entropy property of discrete compound Poisson measures
Sufficient conditions are developed, under which the compound Poisson
distribution has maximal entropy within a natural class of probability measures
on the nonnegative integers. Recently, one of the authors [O. Johnson, {\em
Stoch. Proc. Appl.}, 2007] used a semigroup approach to show that the Poisson
has maximal entropy among all ultra-log-concave distributions with fixed mean.
We show via a non-trivial extension of this semigroup approach that the natural
analog of the Poisson maximum entropy property remains valid if the compound
Poisson distributions under consideration are log-concave, but that it fails in
general. A parallel maximum entropy result is established for the family of
compound binomial measures. Sufficient conditions for compound distributions to
be log-concave are discussed and applications to combinatorics are examined;
new bounds are derived on the entropy of the cardinality of a random
independent set in a claw-free graph, and a connection is drawn to Mason's
conjecture for matroids. The present results are primarily motivated by the
desire to provide an information-theoretic foundation for compound Poisson
approximation and associated limit theorems, analogous to the corresponding
developments for the central limit theorem and for Poisson approximation. Our
results also demonstrate new links between some probabilistic methods and the
combinatorial notions of log-concavity and ultra-log-concavity, and they add to
the growing body of work exploring the applications of maximum entropy
characterizations to problems in discrete mathematics.Comment: 30 pages. This submission supersedes arXiv:0805.4112v1. Changes in
v2: Updated references, typos correcte
Committee-Based Sample Selection for Probabilistic Classifiers
In many real-world learning tasks, it is expensive to acquire a sufficient
number of labeled examples for training. This paper investigates methods for
reducing annotation cost by `sample selection'. In this approach, during
training the learning program examines many unlabeled examples and selects for
labeling only those that are most informative at each stage. This avoids
redundantly labeling examples that contribute little new information. Our work
follows on previous research on Query By Committee, extending the
committee-based paradigm to the context of probabilistic classification. We
describe a family of empirical methods for committee-based sample selection in
probabilistic classification models, which evaluate the informativeness of an
example by measuring the degree of disagreement between several model variants.
These variants (the committee) are drawn randomly from a probability
distribution conditioned by the training set labeled so far. The method was
applied to the real-world natural language processing task of stochastic
part-of-speech tagging. We find that all variants of the method achieve a
significant reduction in annotation cost, although their computational
efficiency differs. In particular, the simplest variant, a two member committee
with no parameters to tune, gives excellent results. We also show that sample
selection yields a significant reduction in the size of the model used by the
tagger
On Similarities between Inference in Game Theory and Machine Learning
In this paper, we elucidate the equivalence between inference in game theory and machine learning. Our aim in so doing is to establish an equivalent vocabulary between the two domains so as to facilitate developments at the intersection of both fields, and as proof of the usefulness of this approach, we use recent developments in each field to make useful improvements to the other. More specifically, we consider the analogies between smooth best responses in fictitious play and Bayesian inference methods. Initially, we use these insights to develop and demonstrate an improved algorithm for learning in games based on probabilistic moderation. That is, by integrating over the distribution of opponent strategies (a Bayesian approach within machine learning) rather than taking a simple empirical average (the approach used in standard fictitious play) we derive a novel moderated fictitious play algorithm and show that it is more likely than standard fictitious play to converge to a payoff-dominant but risk-dominated Nash equilibrium in a simple coordination game. Furthermore we consider the converse case, and show how insights from game theory can be used to derive two improved mean field variational learning algorithms. We first show that the standard update rule of mean field variational learning is analogous to a Cournot adjustment within game theory. By analogy with fictitious play, we then suggest an improved update rule, and show that this results in fictitious variational play, an improved mean field variational learning algorithm that exhibits better convergence in highly or strongly connected graphical models. Second, we use a recent advance in fictitious play, namely dynamic fictitious play, to derive a derivative action variational learning algorithm, that exhibits superior convergence properties on a canonical machine learning problem (clustering a mixture distribution)
The Information Geometry of Sparse Goodness-of-Fit Testing
This paper takes an information-geometric approach to the challenging issue of goodness-of-fit testing in the high dimensional, low sample size context whereâpotentiallyâboundary effects dominate. The main contributions of this paper are threefold: first, we present and prove two new theorems on the behaviour of commonly used test statistics in this context; second, we investigateâin the novel environment of the extended multinomial modelâthe links between information geometry-based divergences and standard goodness-of-fit statistics, allowing us to formalise relationships which have been missing in the literature; finally, we use simulation studies to validate and illustrate our theoretical results and to explore currently open research questions about the way that discretisation effects can dominate sampling distributions near the boundary. Novelly accommodating these discretisation effects contrasts sharply with the essentially continuous approach of skewness and other corrections flowing from standard higher-order asymptotic analysis
- âŚ