2,436 research outputs found
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
This paper presents the MAXQ approach to hierarchical reinforcement learning
based on decomposing the target Markov decision process (MDP) into a hierarchy
of smaller MDPs and decomposing the value function of the target MDP into an
additive combination of the value functions of the smaller MDPs. The paper
defines the MAXQ hierarchy, proves formal results on its representational
power, and establishes five conditions for the safe use of state abstractions.
The paper presents an online model-free learning algorithm, MAXQ-Q, and proves
that it converges wih probability 1 to a kind of locally-optimal policy known
as a recursively optimal policy, even in the presence of the five kinds of
state abstraction. The paper evaluates the MAXQ representation and MAXQ-Q
through a series of experiments in three domains and shows experimentally that
MAXQ-Q (with state abstractions) converges to a recursively optimal policy much
faster than flat Q learning. The fact that MAXQ learns a representation of the
value function has an important benefit: it makes it possible to compute and
execute an improved, non-hierarchical policy via a procedure similar to the
policy improvement step of policy iteration. The paper demonstrates the
effectiveness of this non-hierarchical execution experimentally. Finally, the
paper concludes with a comparison to related work and a discussion of the
design tradeoffs in hierarchical reinforcement learning.Comment: 63 pages, 15 figure
State Abstraction in MAXQ Hierarchical Reinforcement Learning
Many researchers have explored methods for hierarchical reinforcement
learning (RL) with temporal abstractions, in which abstract actions are defined
that can perform many primitive actions before terminating. However, little is
known about learning with state abstractions, in which aspects of the state
space are ignored. In previous work, we developed the MAXQ method for
hierarchical RL. In this paper, we define five conditions under which state
abstraction can be combined with the MAXQ value function decomposition. We
prove that the MAXQ-Q learning algorithm converges under these conditions and
show experimentally that state abstraction is important for the successful
application of MAXQ-Q learning.Comment: 7 pages, 2 figure
Solving Multiclass Learning Problems via Error-Correcting Output Codes
Multiclass learning problems involve finding a definition for an unknown
function f(x) whose range is a discrete set containing k > 2 values (i.e., k
``classes''). The definition is acquired by studying collections of training
examples of the form [x_i, f (x_i)]. Existing approaches to multiclass learning
problems include direct application of multiclass algorithms such as the
decision-tree algorithms C4.5 and CART, application of binary concept learning
algorithms to learn individual binary functions for each of the k classes, and
application of binary concept learning algorithms with distributed output
representations. This paper compares these three approaches to a new technique
in which error-correcting codes are employed as a distributed output
representation. We show that these output representations improve the
generalization performance of both C4.5 and backpropagation on a wide range of
multiclass learning tasks. We also demonstrate that this approach is robust
with respect to changes in the size of the training sample, the assignment of
distributed representations to particular classes, and the application of
overfitting avoidance techniques such as decision-tree pruning. Finally, we
show that---like the other methods---the error-correcting code technique can
provide reliable class probability estimates. Taken together, these results
demonstrate that error-correcting output codes provide a general-purpose method
for improving the performance of inductive learning programs on multiclass
problems.Comment: See http://www.jair.org/ for any accompanying file
Integrating Learning from Examples into the Search for Diagnostic Policies
This paper studies the problem of learning diagnostic policies from training
examples. A diagnostic policy is a complete description of the decision-making
actions of a diagnostician (i.e., tests followed by a diagnostic decision) for
all possible combinations of test results. An optimal diagnostic policy is one
that minimizes the expected total cost, which is the sum of measurement costs
and misdiagnosis costs. In most diagnostic settings, there is a tradeoff
between these two kinds of costs. This paper formalizes diagnostic decision
making as a Markov Decision Process (MDP). The paper introduces a new family of
systematic search algorithms based on the AO* algorithm to solve this MDP. To
make AO* efficient, the paper describes an admissible heuristic that enables
AO* to prune large parts of the search space. The paper also introduces several
greedy algorithms including some improvements over previously-published
methods. The paper then addresses the question of learning diagnostic policies
from examples. When the probabilities of diseases and test results are computed
from training data, there is a great danger of overfitting. To reduce
overfitting, regularizers are integrated into the search algorithms. Finally,
the paper compares the proposed methods on five benchmark diagnostic data sets.
The studies show that in most cases the systematic search methods produce
better diagnostic policies than the greedy methods. In addition, the studies
show that for training sets of realistic size, the systematic search algorithms
are practical on todays desktop computers
The use of provenance in information retrieval
The volume of electronic information that users accumulate is steadily rising. A recent study [2] found that there were on average 32,000 pieces of information (e-mails, web pages, documents, etc.) for each user. The problem of organizin
Gaussian Approximation of Collective Graphical Models
The Collective Graphical Model (CGM) models a population of independent and
identically distributed individuals when only collective statistics (i.e.,
counts of individuals) are observed. Exact inference in CGMs is intractable,
and previous work has explored Markov Chain Monte Carlo (MCMC) and MAP
approximations for learning and inference. This paper studies Gaussian
approximations to the CGM. As the population grows large, we show that the CGM
distribution converges to a multivariate Gaussian distribution (GCGM) that
maintains the conditional independence properties of the original CGM. If the
observations are exact marginals of the CGM or marginals that are corrupted by
Gaussian noise, inference in the GCGM approximation can be computed efficiently
in closed form. If the observations follow a different noise model (e.g.,
Poisson), then expectation propagation provides efficient and accurate
approximate inference. The accuracy and speed of GCGM inference is compared to
the MCMC and MAP methods on a simulated bird migration problem. The GCGM
matches or exceeds the accuracy of the MAP method while being significantly
faster.Comment: Accepted by ICML 2014. 10 page version with appendi
Learning from Noisy Label Distributions
In this paper, we consider a novel machine learning problem, that is,
learning a classifier from noisy label distributions. In this problem, each
instance with a feature vector belongs to at least one group. Then, instead of
the true label of each instance, we observe the label distribution of the
instances associated with a group, where the label distribution is distorted by
an unknown noise. Our goals are to (1) estimate the true label of each
instance, and (2) learn a classifier that predicts the true label of a new
instance. We propose a probabilistic model that considers true label
distributions of groups and parameters that represent the noise as hidden
variables. The model can be learned based on a variational Bayesian method. In
numerical experiments, we show that the proposed model outperforms existing
methods in terms of the estimation of the true labels of instances.Comment: Accepted in ICANN201
- …