50,799 research outputs found
Structured Prediction of Sequences and Trees using Infinite Contexts
Linguistic structures exhibit a rich array of global phenomena, however
commonly used Markov models are unable to adequately describe these phenomena
due to their strong locality assumptions. We propose a novel hierarchical model
for structured prediction over sequences and trees which exploits global
context by conditioning each generation decision on an unbounded context of
prior decisions. This builds on the success of Markov models but without
imposing a fixed bound in order to better represent global phenomena. To
facilitate learning of this large and unbounded model, we use a hierarchical
Pitman-Yor process prior which provides a recursive form of smoothing. We
propose prediction algorithms based on A* and Markov Chain Monte Carlo
sampling. Empirical results demonstrate the potential of our model compared to
baseline finite-context Markov models on part-of-speech tagging and syntactic
parsing
RG inspired Machine Learning for lattice field theory
Machine learning has been a fast growing field of research in several areas
dealing with large datasets. We report recent attempts to use Renormalization
Group (RG) ideas in the context of machine learning. We examine coarse graining
procedures for perceptron models designed to identify the digits of the MNIST
data. We discuss the correspondence between principal components analysis (PCA)
and RG flows across the transition for worm configurations of the 2D Ising
model. Preliminary results regarding the logarithmic divergence of the leading
PCA eigenvalue were presented at the conference and have been improved after.
More generally, we discuss the relationship between PCA and observables in
Monte Carlo simulations and the possibility of reduction of the number of
learning parameters in supervised learning based on RG inspired hierarchical
ansatzes.Comment: Talk given by Yannick Meurice at the conference Lattice 2017,
Granada, Spai
Hierarchical Graphical Models for Multigroup Shape Analysis using Expectation Maximization with Sampling in Kendall's Shape Space
This paper proposes a novel framework for multi-group shape analysis relying
on a hierarchical graphical statistical model on shapes within a population.The
framework represents individual shapes as point setsmodulo translation,
rotation, and scale, following the notion in Kendall shape space.While
individual shapes are derived from their group shape model, each group shape
model is derived from a single population shape model. The hierarchical model
follows the natural organization of population data and the top level in the
hierarchy provides a common frame of reference for multigroup shape analysis,
e.g. classification and hypothesis testing. Unlike typical shape-modeling
approaches, the proposed model is a generative model that defines a joint
distribution of object-boundary data and the shape-model variables.
Furthermore, it naturally enforces optimal correspondences during the process
of model fitting and thereby subsumes the so-called correspondence problem. The
proposed inference scheme employs an expectation maximization (EM) algorithm
that treats the individual and group shape variables as hidden random variables
and integrates them out before estimating the parameters (population mean and
variance and the group variances). The underpinning of the EM algorithm is the
sampling of pointsets, in Kendall shape space, from their posterior
distribution, for which we exploit a highly-efficient scheme based on
Hamiltonian Monte Carlo simulation. Experiments in this paper use the fitted
hierarchical model to perform (1) hypothesis testing for comparison between
pairs of groups using permutation testing and (2) classification for image
retrieval. The paper validates the proposed framework on simulated data and
demonstrates results on real data.Comment: 9 pages, 7 figures, International Conference on Machine Learning 201
Multimodal Hierarchical Dirichlet Process-based Active Perception
In this paper, we propose an active perception method for recognizing object
categories based on the multimodal hierarchical Dirichlet process (MHDP). The
MHDP enables a robot to form object categories using multimodal information,
e.g., visual, auditory, and haptic information, which can be observed by
performing actions on an object. However, performing many actions on a target
object requires a long time. In a real-time scenario, i.e., when the time is
limited, the robot has to determine the set of actions that is most effective
for recognizing a target object. We propose an MHDP-based active perception
method that uses the information gain (IG) maximization criterion and lazy
greedy algorithm. We show that the IG maximization criterion is optimal in the
sense that the criterion is equivalent to a minimization of the expected
Kullback--Leibler divergence between a final recognition state and the
recognition state after the next set of actions. However, a straightforward
calculation of IG is practically impossible. Therefore, we derive an efficient
Monte Carlo approximation method for IG by making use of a property of the
MHDP. We also show that the IG has submodular and non-decreasing properties as
a set function because of the structure of the graphical model of the MHDP.
Therefore, the IG maximization problem is reduced to a submodular maximization
problem. This means that greedy and lazy greedy algorithms are effective and
have a theoretical justification for their performance. We conducted an
experiment using an upper-torso humanoid robot and a second one using synthetic
data. The experimental results show that the method enables the robot to select
a set of actions that allow it to recognize target objects quickly and
accurately. The results support our theoretical outcomes.Comment: submitte
Reducing Reparameterization Gradient Variance
Optimization with noisy gradients has become ubiquitous in statistics and
machine learning. Reparameterization gradients, or gradient estimates computed
via the "reparameterization trick," represent a class of noisy gradients often
used in Monte Carlo variational inference (MCVI). However, when these gradient
estimators are too noisy, the optimization procedure can be slow or fail to
converge. One way to reduce noise is to use more samples for the gradient
estimate, but this can be computationally expensive. Instead, we view the noisy
gradient as a random variable, and form an inexpensive approximation of the
generating procedure for the gradient sample. This approximation has high
correlation with the noisy gradient by construction, making it a useful control
variate for variance reduction. We demonstrate our approach on non-conjugate
multi-level hierarchical models and a Bayesian neural net where we observed
gradient variance reductions of multiple orders of magnitude (20-2,000x)
- …