Search CORE

197 research outputs found

Learning to Control in Metric Space with Optimal Regret

Author: Ni Chengzhuo
Wang Mengdi
Yang Lin F.
Publication venue
Publication date: 04/05/2019
Field of study

We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces. Suppose that the transition dynamics and reward function is unknown, but the state and action space is endowed with a metric that characterizes the proximity between different states and actions. We provide a surprisingly simple upper-confidence reinforcement learning algorithm that uses a function approximation oracle to estimate optimistic Q functions from experiences. We show that the regret of the algorithm after

K

episodes is

O(HL(KH)^{\frac{d-1}{d}})

where

L

is a smoothness parameter, and

d

is the doubling dimension of the state-action space with respect to the given metric. We also establish a near-matching regret lower bound. The proposed method can be adapted to work for more structured transition systems, including the finite-state case and the case where value functions are linear combinations of features, where the method also achieve the optimal regret

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization

Author: Chen Minshuo
Wang Mengdi
Yang Lin
Zhao Tuo
Publication venue
Publication date: 01/01/2018
Field of study

Stochastic optimization naturally arises in machine learning. Efficient algorithms with provable guarantees, however, are still largely missing, when the objective function is nonconvex and the data points are dependent. This paper studies this fundamental challenge through a streaming PCA problem for stationary time series data. Specifically, our goal is to estimate the principle component of time series data with respect to the covariance matrix of the stationary distribution. Computationally, we propose a variant of Oja's algorithm combined with downsampling to control the bias of the stochastic gradient caused by the data dependency. Theoretically, we quantify the uncertainty of our proposed stochastic algorithm based on diffusion approximations. This allows us to prove the asymptotic rate of convergence and further implies near optimal asymptotic sample complexity. Numerical experiments are provided to support our analysis

arXiv.org e-Print Archive

Princeton University Open Access Repository

Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model

Author: Sidford Aaron
Wang Mengdi
Wu Xian
Yang Lin F.
Ye Yinyu
Publication venue
Publication date: 04/06/2018
Field of study

In this paper we consider the problem of computing an

\epsilon

-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in

O(1)

time. Given such a DMDP with states

S

, actions

A

, discount factor

\gamma\in(0,1)

, and rewards in range

[0, 1]

we provide an algorithm which computes an

\epsilon

-optimal policy with probability

1 - \delta

where \emph{both} the time spent and number of sample taken are upper bounded by

O\left[\frac{|S||A|}{(1-\gamma)^3 \epsilon^2} \log \left(\frac{|S||A|}{(1-\gamma)\delta \epsilon} \right) \log\left(\frac{1}{(1-\gamma)\epsilon}\right)\right] ~.

For fixed values of

\epsilon

, this improves upon the previous best known bounds by a factor of

(1 - \gamma)^{-1}

and matches the sample complexity lower bounds proved in Azar et al. (2013) up to logarithmic factors. We also extend our method to computing

\epsilon

-optimal policies for finite-horizon MDP with a generative model and provide a nearly matching sample complexity lower bound.Comment: 31 pages. Accepted to NeurIPS, 201

arXiv.org e-Print Archive

eScholarship - University of California

Federated Multi-Level Optimization over Decentralized Networks

Author: Wang Mengdi
Yang Shuoguang
Zhang Xuezhou
Publication venue
Publication date: 09/10/2023
Field of study

Multi-level optimization has gained increasing attention in recent years, as it provides a powerful framework for solving complex optimization problems that arise in many fields, such as meta-learning, multi-player games, reinforcement learning, and nested composition optimization. In this paper, we study the problem of distributed multi-level optimization over a network, where agents can only communicate with their immediate neighbors. This setting is motivated by the need for distributed optimization in large-scale systems, where centralized optimization may not be practical or feasible. To address this problem, we propose a novel gossip-based distributed multi-level optimization algorithm that enables networked agents to solve optimization problems at different levels in a single timescale and share information through network propagation. Our algorithm achieves optimal sample complexity, scaling linearly with the network size, and demonstrates state-of-the-art performance on various applications, including hyper-parameter tuning, decentralized reinforcement learning, and risk-averse optimization.Comment: arXiv admin note: substantial text overlap with arXiv:2206.1087

arXiv.org e-Print Archive