8,699 research outputs found
Model-free Representation Learning and Exploration in Low-rank MDPs
The low rank MDP has emerged as an important model for studying
representation learning and exploration in reinforcement learning. With a known
representation, several model-free exploration strategies exist. In contrast,
all algorithms for the unknown representation setting are model-based, thereby
requiring the ability to model the full dynamics. In this work, we present the
first model-free representation learning algorithms for low rank MDPs. The key
algorithmic contribution is a new minimax representation learning objective,
for which we provide variants with differing tradeoffs in their statistical and
computational properties. We interleave this representation learning step with
an exploration strategy to cover the state space in a reward-free manner. The
resulting algorithms are provably sample efficient and can accommodate general
function approximation to scale to complex environments
On statistics, computation and scalability
How should statistical procedures be designed so as to be scalable
computationally to the massive datasets that are increasingly the norm? When
coupled with the requirement that an answer to an inferential question be
delivered within a certain time budget, this question has significant
repercussions for the field of statistics. With the goal of identifying
"time-data tradeoffs," we investigate some of the statistical consequences of
computational perspectives on scability, in particular divide-and-conquer
methodology and hierarchies of convex relaxations.Comment: Published in at http://dx.doi.org/10.3150/12-BEJSP17 the Bernoulli
(http://isi.cbs.nl/bernoulli/) by the International Statistical
Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm
Sharp Time--Data Tradeoffs for Linear Inverse Problems
In this paper we characterize sharp time-data tradeoffs for optimization
problems used for solving linear inverse problems. We focus on the minimization
of a least-squares objective subject to a constraint defined as the sub-level
set of a penalty function. We present a unified convergence analysis of the
gradient projection algorithm applied to such problems. We sharply characterize
the convergence rate associated with a wide variety of random measurement
ensembles in terms of the number of measurements and structural complexity of
the signal with respect to the chosen penalty function. The results apply to
both convex and nonconvex constraints, demonstrating that a linear convergence
rate is attainable even though the least squares objective is not strongly
convex in these settings. When specialized to Gaussian measurements our results
show that such linear convergence occurs when the number of measurements is
merely 4 times the minimal number required to recover the desired signal at all
(a.k.a. the phase transition). We also achieve a slower but geometric rate of
convergence precisely above the phase transition point. Extensive numerical
results suggest that the derived rates exactly match the empirical performance
High-performance Kernel Machines with Implicit Distributed Optimization and Randomization
In order to fully utilize "big data", it is often required to use "big
models". Such models tend to grow with the complexity and size of the training
data, and do not make strong parametric assumptions upfront on the nature of
the underlying statistical dependencies. Kernel methods fit this need well, as
they constitute a versatile and principled statistical methodology for solving
a wide range of non-parametric modelling problems. However, their high
computational costs (in storage and time) pose a significant barrier to their
widespread adoption in big data applications.
We propose an algorithmic framework and high-performance implementation for
massive-scale training of kernel-based statistical models, based on combining
two key technical ingredients: (i) distributed general purpose convex
optimization, and (ii) the use of randomization to improve the scalability of
kernel methods. Our approach is based on a block-splitting variant of the
Alternating Directions Method of Multipliers, carefully reconfigured to handle
very large random feature matrices, while exploiting hybrid parallelism
typically found in modern clusters of multicore machines. Our implementation
supports a variety of statistical learning tasks by enabling several loss
functions, regularization schemes, kernels, and layers of randomized
approximations for both dense and sparse datasets, in a highly extensible
framework. We evaluate the ability of our framework to learn models on data
from applications, and provide a comparison against existing sequential and
parallel libraries.Comment: Work presented at MMDS 2014 (June 2014) and JSM 201
More data speeds up training time in learning halfspaces over sparse vectors
The increased availability of data in recent years has led several authors to
ask whether it is possible to use data as a {\em computational} resource. That
is, if more data is available, beyond the sample complexity limit, is it
possible to use the extra examples to speed up the computation time required to
perform the learning task?
We give the first positive answer to this question for a {\em natural
supervised learning problem} --- we consider agnostic PAC learning of
halfspaces over -sparse vectors in . This class is
inefficiently learnable using examples. Our main
contribution is a novel, non-cryptographic, methodology for establishing
computational-statistical gaps, which allows us to show that, under a widely
believed assumption that refuting random formulas is hard, it
is impossible to efficiently learn this class using only
examples. We further show that under stronger
hardness assumptions, even examples do not
suffice. On the other hand, we show a new algorithm that learns this class
efficiently using examples. This
formally establishes the tradeoff between sample and computational complexity
for a natural supervised learning problem.Comment: 13 page
- …