11,359 research outputs found
Stochastic Online Learning with Probabilistic Graph Feedback
We consider a problem of stochastic online learning with general
probabilistic graph feedback, where each directed edge in the feedback graph
has probability . Two cases are covered. (a) The one-step case, where
after playing arm the learner observes a sample reward feedback of arm
with independent probability . (b) The cascade case where after playing
arm the learner observes feedback of all arms in a probabilistic
cascade starting from -- for each with probability , if arm
is played or observed, then a reward sample of arm would be observed
with independent probability . Previous works mainly focus on
deterministic graphs which corresponds to one-step case with , an adversarial sequence of graphs with certain topology guarantees,
or a specific type of random graphs. We analyze the asymptotic lower bounds and
design algorithms in both cases. The regret upper bounds of the algorithms
match the lower bounds with high probability
Semantics, Representations and Grammars for Deep Learning
Deep learning is currently the subject of intensive study. However,
fundamental concepts such as representations are not formally defined --
researchers "know them when they see them" -- and there is no common language
for describing and analyzing algorithms. This essay proposes an abstract
framework that identifies the essential features of current practice and may
provide a foundation for future developments.
The backbone of almost all deep learning algorithms is backpropagation, which
is simply a gradient computation distributed over a neural network. The main
ingredients of the framework are thus, unsurprisingly: (i) game theory, to
formalize distributed optimization; and (ii) communication protocols, to track
the flow of zeroth and first-order information. The framework allows natural
definitions of semantics (as the meaning encoded in functions), representations
(as functions whose semantics is chosen to optimized a criterion) and grammars
(as communication protocols equipped with first-order convergence guarantees).
Much of the essay is spent discussing examples taken from the literature. The
ultimate aim is to develop a graphical language for describing the structure of
deep learning algorithms that backgrounds the details of the optimization
procedure and foregrounds how the components interact. Inspiration is taken
from probabilistic graphical models and factor graphs, which capture the
essential structural features of multivariate distributions.Comment: 20 pages, many diagram
Bounded Degree Approximations of Stochastic Networks
We propose algorithms to approximate directed information graphs. Directed
information graphs are probabilistic graphical models that depict causal
dependencies between stochastic processes in a network. The proposed algorithms
identify optimal and near-optimal approximations in terms of Kullback-Leibler
divergence. The user-chosen sparsity trades off the quality of the
approximation against visual conciseness and computational tractability. One
class of approximations contains graphs with specified in-degrees. Another
class additionally requires that the graph is connected. For both classes, we
propose algorithms to identify the optimal approximations and also near-optimal
approximations, using a novel relaxation of submodularity. We also propose
algorithms to identify the r-best approximations among these classes, enabling
robust decision making
Causal Bandits: Learning Good Interventions via Causal Inference
We study the problem of using causal models to improve the rate at which good
interventions can be learned online in a stochastic environment. Our formalism
combines multi-arm bandits and causal inference to model a novel type of bandit
feedback that is not exploited by existing approaches. We propose a new
algorithm that exploits the causal feedback and prove a bound on its simple
regret that is strictly better (in all quantities) than algorithms that do not
use the additional causal information
SLAP: Simultaneous Localization and Planning Under Uncertainty for Physical Mobile Robots via Dynamic Replanning in Belief Space: Extended version
Simultaneous localization and Planning (SLAP) is a crucial ability for an
autonomous robot operating under uncertainty. In its most general form, SLAP
induces a continuous POMDP (partially-observable Markov decision process),
which needs to be repeatedly solved online. This paper addresses this problem
and proposes a dynamic replanning scheme in belief space. The underlying POMDP,
which is continuous in state, action, and observation space, is approximated
offline via sampling-based methods, but operates in a replanning loop online to
admit local improvements to the coarse offline policy. This construct enables
the proposed method to combat changing environments and large localization
errors, even when the change alters the homotopy class of the optimal
trajectory. It further outperforms the state-of-the-art FIRM (Feedback-based
Information RoadMap) method by eliminating unnecessary stabilization steps.
Applying belief space planning to physical systems brings with it a plethora of
challenges. A key focus of this paper is to implement the proposed planner on a
physical robot and show the SLAP solution performance under uncertainty, in
changing environments and in the presence of large disturbances, such as a
kidnapped robot situation.Comment: 20 pages, updated figures, extended theory and simulation result
Scalable Probabilistic Entity-Topic Modeling
We present an LDA approach to entity disambiguation. Each topic is associated
with a Wikipedia article and topics generate either content words or entity
mentions. Training such models is challenging because of the topic and
vocabulary size, both in the millions. We tackle these problems using a novel
distributed inference and representation framework based on a parallel Gibbs
sampler guided by the Wikipedia link graph, and pipelines of MapReduce allowing
fast and memory-frugal processing of large datasets. We report state-of-the-art
performance on a public dataset
Combinatorial Multi-Armed Bandit and Its Extension to Probabilistically Triggered Arms
We define a general framework for a large class of combinatorial multi-armed
bandit (CMAB) problems, where subsets of base arms with unknown distributions
form super arms. In each round, a super arm is played and the base arms
contained in the super arm are played and their outcomes are observed. We
further consider the extension in which more based arms could be
probabilistically triggered based on the outcomes of already triggered arms.
The reward of the super arm depends on the outcomes of all played arms, and it
only needs to satisfy two mild assumptions, which allow a large class of
nonlinear reward instances. We assume the availability of an offline
(\alpha,\beta)-approximation oracle that takes the means of the outcome
distributions of arms and outputs a super arm that with probability {\beta}
generates an {\alpha} fraction of the optimal expected reward. The objective of
an online learning algorithm for CMAB is to minimize
(\alpha,\beta)-approximation regret, which is the difference between the
\alpha{\beta} fraction of the expected reward when always playing the optimal
super arm, and the expected reward of playing super arms according to the
algorithm. We provide CUCB algorithm that achieves O(log n)
distribution-dependent regret, where n is the number of rounds played, and we
further provide distribution-independent bounds for a large class of reward
functions. Our regret analysis is tight in that it matches the bound of UCB1
algorithm (up to a constant factor) for the classical MAB problem, and it
significantly improves the regret bound in a earlier paper on combinatorial
bandits with linear rewards. We apply our CMAB framework to two new
applications, probabilistic maximum coverage and social influence maximization,
both having nonlinear reward structures. In particular, application to social
influence maximization requires our extension on probabilistically triggered
arms.Comment: A preliminary version of the paper is published in ICML'201
Stochastic Structured Prediction under Bandit Feedback
Stochastic structured prediction under bandit feedback follows a learning
protocol where on each of a sequence of iterations, the learner receives an
input, predicts an output structure, and receives partial feedback in form of a
task loss evaluation of the predicted structure. We present applications of
this learning scenario to convex and non-convex objectives for structured
prediction and analyze them as stochastic first-order methods. We present an
experimental evaluation on problems of natural language processing over
exponential output spaces, and compare convergence speed across different
objectives under the practical criterion of optimal task performance on
development data and the optimization-theoretic criterion of minimal squared
gradient norm. Best results under both criteria are obtained for a non-convex
objective for pairwise preference learning under bandit feedback.Comment: 30th Conference on Neural Information Processing Systems (NIPS 2016),
Barcelona, Spai
Approximate Inference-based Motion Planning by Learning and Exploiting Low-Dimensional Latent Variable Models
This work presents an efficient framework to generate a motion plan of a
robot with high degrees of freedom (e.g., a humanoid robot).
High-dimensionality of the robot configuration space often leads to
difficulties in utilizing the widely-used motion planning algorithms, since the
volume of the decision space increases exponentially with the number of
dimensions. To handle complications arising from the large decision space, and
to solve a corresponding motion planning problem efficiently, two key concepts
are adopted in this work: First, the Gaussian process latent variable model
(GP-LVM) is utilized for low-dimensional representation of the original
configuration space. Second, an approximate inference algorithm is used,
exploiting through the duality between control and estimation, to explore the
decision space and to compute a high-quality motion trajectory of the robot.
Utilizing the GP-LVM and the duality between control and estimation, we
construct a fully probabilistic generative model with which a high-dimensional
motion planning problem is transformed into a tractable inference problem.
Finally, we compute the motion trajectory via an approximate inference
algorithm based on a variant of the particle filter. The resulting motions can
be viewed in the supplemental video. ( https://youtu.be/kngEaOR4Esc )Comment: Accepted for publication in IEEE Robotics and Automation Letters
(RA-L), 201
Variational inference of latent state sequences using Recurrent Networks
Recent advances in the estimation of deep directed graphical models and
recurrent networks let us contribute to the removal of a blind spot in the area
of probabilistc modelling of time series. The proposed methods i) can infer
distributed latent state-space trajectories with nonlinear transitions, ii)
scale to large data sets thanks to the use of a stochastic objective and fast,
approximate inference, iii) enable the design of rich emission models which iv)
will naturally lead to structured outputs. Two different paths of introducing
latent state sequences are pursued, leading to the variational recurrent auto
encoder (VRAE) and the variational one step predictor (VOSP). The use of
independent Wiener processes as priors on the latent state sequence is a viable
compromise between efficient computation of the Kullback-Leibler divergence
from the variational approximation of the posterior and maintaining a
reasonable belief in the dynamics. We verify our methods empirically, obtaining
results close or superior to the state of the art. We also show qualitative
results for denoising and missing value imputation.Comment: This paper has been withdrawn due to a derivation/implementation
error and the resulting invalidation of the result
- β¦