197 research outputs found
Learning to Control in Metric Space with Optimal Regret
We study online reinforcement learning for finite-horizon deterministic
control systems with {\it arbitrary} state and action spaces. Suppose that the
transition dynamics and reward function is unknown, but the state and action
space is endowed with a metric that characterizes the proximity between
different states and actions. We provide a surprisingly simple upper-confidence
reinforcement learning algorithm that uses a function approximation oracle to
estimate optimistic Q functions from experiences. We show that the regret of
the algorithm after episodes is where is a
smoothness parameter, and is the doubling dimension of the state-action
space with respect to the given metric. We also establish a near-matching
regret lower bound. The proposed method can be adapted to work for more
structured transition systems, including the finite-state case and the case
where value functions are linear combinations of features, where the method
also achieve the optimal regret
Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization
Stochastic optimization naturally arises in machine learning. Efficient
algorithms with provable guarantees, however, are still largely missing, when
the objective function is nonconvex and the data points are dependent. This
paper studies this fundamental challenge through a streaming PCA problem for
stationary time series data. Specifically, our goal is to estimate the
principle component of time series data with respect to the covariance matrix
of the stationary distribution. Computationally, we propose a variant of Oja's
algorithm combined with downsampling to control the bias of the stochastic
gradient caused by the data dependency. Theoretically, we quantify the
uncertainty of our proposed stochastic algorithm based on diffusion
approximations. This allows us to prove the asymptotic rate of convergence and
further implies near optimal asymptotic sample complexity. Numerical
experiments are provided to support our analysis
Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model
In this paper we consider the problem of computing an -optimal
policy of a discounted Markov Decision Process (DMDP) provided we can only
access its transition function through a generative sampling model that given
any state-action pair samples from the transition function in time.
Given such a DMDP with states , actions , discount factor
, and rewards in range we provide an algorithm which
computes an -optimal policy with probability where
\emph{both} the time spent and number of sample taken are upper bounded by For fixed values
of , this improves upon the previous best known bounds by a factor of
and matches the sample complexity lower bounds proved in
Azar et al. (2013) up to logarithmic factors. We also extend our method to
computing -optimal policies for finite-horizon MDP with a generative
model and provide a nearly matching sample complexity lower bound.Comment: 31 pages. Accepted to NeurIPS, 201
Federated Multi-Level Optimization over Decentralized Networks
Multi-level optimization has gained increasing attention in recent years, as
it provides a powerful framework for solving complex optimization problems that
arise in many fields, such as meta-learning, multi-player games, reinforcement
learning, and nested composition optimization. In this paper, we study the
problem of distributed multi-level optimization over a network, where agents
can only communicate with their immediate neighbors. This setting is motivated
by the need for distributed optimization in large-scale systems, where
centralized optimization may not be practical or feasible. To address this
problem, we propose a novel gossip-based distributed multi-level optimization
algorithm that enables networked agents to solve optimization problems at
different levels in a single timescale and share information through network
propagation. Our algorithm achieves optimal sample complexity, scaling linearly
with the network size, and demonstrates state-of-the-art performance on various
applications, including hyper-parameter tuning, decentralized reinforcement
learning, and risk-averse optimization.Comment: arXiv admin note: substantial text overlap with arXiv:2206.1087
- …