7,017 research outputs found
Mixed Integer Linear Programming For Exact Finite-Horizon Planning In Decentralized Pomdps
We consider the problem of finding an n-agent joint-policy for the optimal
finite-horizon control of a decentralized Pomdp (Dec-Pomdp). This is a problem
of very high complexity (NEXP-hard in n >= 2). In this paper, we propose a new
mathematical programming approach for the problem. Our approach is based on two
ideas: First, we represent each agent's policy in the sequence-form and not in
the tree-form, thereby obtaining a very compact representation of the set of
joint-policies. Second, using this compact representation, we solve this
problem as an instance of combinatorial optimization for which we formulate a
mixed integer linear program (MILP). The optimal solution of the MILP directly
yields an optimal joint-policy for the Dec-Pomdp. Computational experience
shows that formulating and solving the MILP requires significantly less time to
solve benchmark Dec-Pomdp problems than existing algorithms. For example, the
multi-agent tiger problem for horizon 4 is solved in 72 secs with the MILP
whereas existing algorithms require several hours to solve it
An Exponential Lower Bound for the Latest Deterministic Strategy Iteration Algorithms
This paper presents a new exponential lower bound for the two most popular
deterministic variants of the strategy improvement algorithms for solving
parity, mean payoff, discounted payoff and simple stochastic games. The first
variant improves every node in each step maximizing the current valuation
locally, whereas the second variant computes the globally optimal improvement
in each step. We outline families of games on which both variants require
exponentially many strategy iterations
Variance Reduction in Monte Carlo Counterfactual Regret Minimization (VR-MCCFR) for Extensive Form Games using Baselines
Learning strategies for imperfect information games from samples of
interaction is a challenging problem. A common method for this setting, Monte
Carlo Counterfactual Regret Minimization (MCCFR), can have slow long-term
convergence rates due to high variance. In this paper, we introduce a variance
reduction technique (VR-MCCFR) that applies to any sampling variant of MCCFR.
Using this technique, per-iteration estimated values and updates are
reformulated as a function of sampled values and state-action baselines,
similar to their use in policy gradient reinforcement learning. The new
formulation allows estimates to be bootstrapped from other estimates within the
same episode, propagating the benefits of baselines along the sampled
trajectory; the estimates remain unbiased even when bootstrapping from other
estimates. Finally, we show that given a perfect baseline, the variance of the
value estimates can be reduced to zero. Experimental evaluation shows that
VR-MCCFR brings an order of magnitude speedup, while the empirical variance
decreases by three orders of magnitude. The decreased variance allows for the
first time CFR+ to be used with sampling, increasing the speedup to two orders
of magnitude
Compressive Mining: Fast and Optimal Data Mining in the Compressed Domain
Real-world data typically contain repeated and periodic patterns. This
suggests that they can be effectively represented and compressed using only a
few coefficients of an appropriate basis (e.g., Fourier, Wavelets, etc.).
However, distance estimation when the data are represented using different sets
of coefficients is still a largely unexplored area. This work studies the
optimization problems related to obtaining the \emph{tightest} lower/upper
bound on Euclidean distances when each data object is potentially compressed
using a different set of orthonormal coefficients. Our technique leads to
tighter distance estimates, which translates into more accurate search,
learning and mining operations \textit{directly} in the compressed domain.
We formulate the problem of estimating lower/upper distance bounds as an
optimization problem. We establish the properties of optimal solutions, and
leverage the theoretical analysis to develop a fast algorithm to obtain an
\emph{exact} solution to the problem. The suggested solution provides the
tightest estimation of the -norm or the correlation. We show that typical
data-analysis operations, such as k-NN search or k-Means clustering, can
operate more accurately using the proposed compression and distance
reconstruction technique. We compare it with many other prevalent compression
and reconstruction techniques, including random projections and PCA-based
techniques. We highlight a surprising result, namely that when the data are
highly sparse in some basis, our technique may even outperform PCA-based
compression.
The contributions of this work are generic as our methodology is applicable
to any sequential or high-dimensional data as well as to any orthogonal data
transformation used for the underlying data compression scheme.Comment: 25 pages, 20 figures, accepted in VLD
Efficient Regularized Least-Squares Algorithms for Conditional Ranking on Relational Data
In domains like bioinformatics, information retrieval and social network
analysis, one can find learning tasks where the goal consists of inferring a
ranking of objects, conditioned on a particular target object. We present a
general kernel framework for learning conditional rankings from various types
of relational data, where rankings can be conditioned on unseen data objects.
We propose efficient algorithms for conditional ranking by optimizing squared
regression and ranking loss functions. We show theoretically, that learning
with the ranking loss is likely to generalize better than with the regression
loss. Further, we prove that symmetry or reciprocity properties of relations
can be efficiently enforced in the learned models. Experiments on synthetic and
real-world data illustrate that the proposed methods deliver state-of-the-art
performance in terms of predictive power and computational efficiency.
Moreover, we also show empirically that incorporating symmetry or reciprocity
properties can improve the generalization performance
Generating and Solving Symbolic Parity Games
We present a new tool for verification of modal mu-calculus formulae for
process specifications, based on symbolic parity games. It enhances an existing
method, that first encodes the problem to a Parameterised Boolean Equation
System (PBES) and then instantiates the PBES to a parity game. We improved the
translation from specification to PBES to preserve the structure of the
specification in the PBES, we extended LTSmin to instantiate PBESs to symbolic
parity games, and implemented the recursive parity game solving algorithm by
Zielonka for symbolic parity games. We use Multi-valued Decision Diagrams
(MDDs) to represent sets and relations, thus enabling the tools to deal with
very large systems. The transition relation is partitioned based on the
structure of the specification, which allows for efficient manipulation of the
MDDs. We performed two case studies on modular specifications, that demonstrate
that the new method has better time and memory performance than existing PBES
based tools and can be faster (but slightly less memory efficient) than the
symbolic model checker NuSMV.Comment: In Proceedings GRAPHITE 2014, arXiv:1407.767
Feature Dynamic Bayesian Networks
Feature Markov Decision Processes (PhiMDPs) are well-suited for learning
agents in general environments. Nevertheless, unstructured (Phi)MDPs are
limited to relatively simple environments. Structured MDPs like Dynamic
Bayesian Networks (DBNs) are used for large-scale real-world problems. In this
article I extend PhiMDP to PhiDBN. The primary contribution is to derive a cost
criterion that allows to automatically extract the most relevant features from
the environment, leading to the "best" DBN representation. I discuss all
building blocks required for a complete general learning algorithm.Comment: 7 page
- …