197 research outputs found

    Learning to Control in Metric Space with Optimal Regret

    Full text link
    We study online reinforcement learning for finite-horizon deterministic control systems with {\it arbitrary} state and action spaces. Suppose that the transition dynamics and reward function is unknown, but the state and action space is endowed with a metric that characterizes the proximity between different states and actions. We provide a surprisingly simple upper-confidence reinforcement learning algorithm that uses a function approximation oracle to estimate optimistic Q functions from experiences. We show that the regret of the algorithm after KK episodes is O(HL(KH)d1d)O(HL(KH)^{\frac{d-1}{d}}) where LL is a smoothness parameter, and dd is the doubling dimension of the state-action space with respect to the given metric. We also establish a near-matching regret lower bound. The proposed method can be adapted to work for more structured transition systems, including the finite-state case and the case where value functions are linear combinations of features, where the method also achieve the optimal regret

    Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization

    Full text link
    Stochastic optimization naturally arises in machine learning. Efficient algorithms with provable guarantees, however, are still largely missing, when the objective function is nonconvex and the data points are dependent. This paper studies this fundamental challenge through a streaming PCA problem for stationary time series data. Specifically, our goal is to estimate the principle component of time series data with respect to the covariance matrix of the stationary distribution. Computationally, we propose a variant of Oja's algorithm combined with downsampling to control the bias of the stochastic gradient caused by the data dependency. Theoretically, we quantify the uncertainty of our proposed stochastic algorithm based on diffusion approximations. This allows us to prove the asymptotic rate of convergence and further implies near optimal asymptotic sample complexity. Numerical experiments are provided to support our analysis

    Near-Optimal Time and Sample Complexities for Solving Discounted Markov Decision Process with a Generative Model

    Get PDF
    In this paper we consider the problem of computing an ϵ\epsilon-optimal policy of a discounted Markov Decision Process (DMDP) provided we can only access its transition function through a generative sampling model that given any state-action pair samples from the transition function in O(1)O(1) time. Given such a DMDP with states SS, actions AA, discount factor γ(0,1)\gamma\in(0,1), and rewards in range [0,1][0, 1] we provide an algorithm which computes an ϵ\epsilon-optimal policy with probability 1δ1 - \delta where \emph{both} the time spent and number of sample taken are upper bounded by O[SA(1γ)3ϵ2log(SA(1γ)δϵ)log(1(1γ)ϵ)] . O\left[\frac{|S||A|}{(1-\gamma)^3 \epsilon^2} \log \left(\frac{|S||A|}{(1-\gamma)\delta \epsilon} \right) \log\left(\frac{1}{(1-\gamma)\epsilon}\right)\right] ~. For fixed values of ϵ\epsilon, this improves upon the previous best known bounds by a factor of (1γ)1(1 - \gamma)^{-1} and matches the sample complexity lower bounds proved in Azar et al. (2013) up to logarithmic factors. We also extend our method to computing ϵ\epsilon-optimal policies for finite-horizon MDP with a generative model and provide a nearly matching sample complexity lower bound.Comment: 31 pages. Accepted to NeurIPS, 201

    Federated Multi-Level Optimization over Decentralized Networks

    Full text link
    Multi-level optimization has gained increasing attention in recent years, as it provides a powerful framework for solving complex optimization problems that arise in many fields, such as meta-learning, multi-player games, reinforcement learning, and nested composition optimization. In this paper, we study the problem of distributed multi-level optimization over a network, where agents can only communicate with their immediate neighbors. This setting is motivated by the need for distributed optimization in large-scale systems, where centralized optimization may not be practical or feasible. To address this problem, we propose a novel gossip-based distributed multi-level optimization algorithm that enables networked agents to solve optimization problems at different levels in a single timescale and share information through network propagation. Our algorithm achieves optimal sample complexity, scaling linearly with the network size, and demonstrates state-of-the-art performance on various applications, including hyper-parameter tuning, decentralized reinforcement learning, and risk-averse optimization.Comment: arXiv admin note: substantial text overlap with arXiv:2206.1087
    corecore