31,273 research outputs found

    Learning theory estimates with observations from general stationary stochastic processes

    Full text link
    This paper investigates the supervised learning problem with observations drawn from certain general stationary stochastic processes. Here by \emph{general}, we mean that many stationary stochastic processes can be included. We show that when the stochastic processes satisfy a generalized Bernstein-type inequality, a unified treatment on analyzing the learning schemes with various mixing processes can be conducted and a sharp oracle inequality for generic regularized empirical risk minimization schemes can be established. The obtained oracle inequality is then applied to derive convergence rates for several learning schemes such as empirical risk minimization (ERM), least squares support vector machines (LS-SVMs) using given generic kernels, and SVMs using Gaussian kernels for both least squares and quantile regression. It turns out that for i.i.d.~processes, our learning rates for ERM recover the optimal rates. On the other hand, for non-i.i.d.~processes including geometrically α\alpha-mixing Markov processes, geometrically α\alpha-mixing processes with restricted decay, ϕ\phi-mixing processes, and (time-reversed) geometrically C\mathcal{C}-mixing processes, our learning rates for SVMs with Gaussian kernels match, up to some arbitrarily small extra term in the exponent, the optimal rates. For the remaining cases, our rates are at least close to the optimal rates. As a by-product, the assumed generalized Bernstein-type inequality also provides an interpretation of the so-called "effective number of observations" for various mixing processes.Comment: arXiv admin note: text overlap with arXiv:1501.0305

    Learning from dependent observations

    Full text link
    In most papers establishing consistency for learning algorithms it is assumed that the observations used for training are realizations of an i.i.d. process. In this paper we go far beyond this classical framework by showing that support vector machines (SVMs) essentially only require that the data-generating process satisfies a certain law of large numbers. We then consider the learnability of SVMs for \a-mixing (not necessarily stationary) processes for both classification and regression, where for the latter we explicitly allow unbounded noise.Comment: submitted to Journal of Multivariate Analysi

    Reinforcement Learning: Stochastic Approximation Algorithms for Markov Decision Processes

    Full text link
    This article presents a short and concise description of stochastic approximation algorithms in reinforcement learning of Markov decision processes. The algorithms can also be used as a suboptimal method for partially observed Markov decision processes

    Fast Bayesian inference of the multivariate Ornstein-Uhlenbeck process

    Full text link
    The multivariate Ornstein-Uhlenbeck process is used in many branches of science and engineering to describe the regression of a system to its stationary mean. Here we present an O(N)O(N) Bayesian method to estimate the drift and diffusion matrices of the process from NN discrete observations of a sample path. We use exact likelihoods, expressed in terms of four sufficient statistic matrices, to derive explicit maximum a posteriori parameter estimates and their standard errors. We apply the method to the Brownian harmonic oscillator, a bivariate Ornstein-Uhlenbeck process, to jointly estimate its mass, damping, and stiffness and to provide Bayesian estimates of the correlation functions and power spectral densities. We present a Bayesian model comparison procedure, embodying Ockham's razor, to guide a data-driven choice between the Kramers and Smoluchowski limits of the oscillator. These provide novel methods of analyzing the inertial motion of colloidal particles in optical traps.Comment: add published versio

    Managing engineering systems with large state and action spaces through deep reinforcement learning

    Full text link
    Decision-making for engineering systems can be efficiently formulated as a Markov Decision Process (MDP) or a Partially Observable MDP (POMDP). Typical MDP and POMDP solution procedures utilize offline knowledge about the environment and provide detailed policies for relatively small systems with tractable state and action spaces. However, in large multi-component systems the sizes of these spaces easily explode, as system states and actions scale exponentially with the number of components, whereas environment dynamics are difficult to be described in explicit forms for the entire system and may only be accessible through numerical simulators. In this work, to address these issues, an integrated Deep Reinforcement Learning (DRL) framework is introduced. The Deep Centralized Multi-agent Actor Critic (DCMAC) is developed, an off-policy actor-critic DRL approach, providing efficient life-cycle policies for large multi-component systems operating in high-dimensional spaces. Apart from deep function approximations that parametrize large state spaces, DCMAC also adopts a factorized representation of the system actions, being able to designate individualized component- and subsystem-level decisions, while maintaining a centralized value function for the entire system. DCMAC compares well against Deep Q-Network (DQN) solutions and exact policies, where applicable, and outperforms optimized baselines that are based on time-based, condition-based and periodic policies

    A Bernstein-type Inequality for Some Mixing Processes and Dynamical Systems with an Application to Learning

    Full text link
    We establish a Bernstein-type inequality for a class of stochastic processes that include the classical geometrically Ï•\phi-mixing processes, Rio's generalization of these processes, as well as many time-discrete dynamical systems. Modulo a logarithmic factor and some constants, our Bernstein-type inequality coincides with the classical Bernstein inequality for i.i.d.~data. We further use this new Bernstein-type inequality to derive an oracle inequality for generic regularized empirical risk minimization algorithms and data generated by such processes. Applying this oracle inequality to support vector machines using the Gaussian kernels for both least squares and quantile regression, it turns out that the resulting learning rates match, up to some arbitrarily small extra term in the exponent, the optimal rates for i.i.d.~processes

    A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

    Full text link
    The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target moves. Disparate streams of research have approached non-stationarity from several angles, which make a variety of implicit assumptions that make it hard to keep an overview of the state of the art and to validate the innovation and significance of new works. This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits. Further, we reflect on the principle approaches how algorithms model and cope with this non-stationarity, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind. A wide range of state-of-the-art algorithms is classified into a taxonomy, using these categories and key characteristics of the environment (e.g., observability) and adaptation behaviour of the opponents (e.g., smooth, abrupt). To clarify even further we present illustrative variations of one domain, contrasting the strengths and limitations of each category. Finally, we discuss in which environments the different approaches yield most merit, and point to promising avenues of future research.Comment: 64 pages, 7 figures. Under review since November 201

    On Parameter Estimation of Hidden Telegraph Process

    Full text link
    The problem of parameter estimation by the observations of the two-state telegraph process in the presence of white Gaussian noise is considered. The properties of estimator of the method of moments are described in the asymptotics of large samples. Then this estimator is used as preliminary one to construct the one-step MLE-process, which provides the asymptotically normal and asymptotically efficient estimation of the unknown parameters.Comment: 26 page

    Bayesian Nonparametric Spectral Estimation

    Full text link
    Spectral estimation (SE) aims to identify how the energy of a signal (e.g., a time series) is distributed across different frequencies. This can become particularly challenging when only partial and noisy observations of the signal are available, where current methods fail to handle uncertainty appropriately. In this context, we propose a joint probabilistic model for signals, observations and spectra, where SE is addressed as an exact inference problem. Assuming a Gaussian process prior over the signal, we apply Bayes' rule to find the analytic posterior distribution of the spectrum given a set of observations. Besides its expressiveness and natural account of spectral uncertainty, the proposed model also provides a functional-form representation of the power spectral density, which can be optimised efficiently. Comparison with previous approaches, in particular against Lomb-Scargle, is addressed theoretically and also experimentally in three different scenarios. Code and demo available at https://github.com/GAMES-UChile/BayesianSpectralEstimation.Comment: 11 pages. In Advances in Neural Information Processing Systems, 201

    Nonparametric Online Learning Using Lipschitz Regularized Deep Neural Networks

    Full text link
    Deep neural networks are considered to be state of the art models in many offline machine learning tasks. However, their performance and generalization abilities in online learning tasks are much less understood. Therefore, we focus on online learning and tackle the challenging problem where the underlying process is stationary and ergodic and thus removing the i.i.d. assumption and allowing observations to depend on each other arbitrarily. We prove the generalization abilities of Lipschitz regularized deep neural networks and show that by using those networks, a convergence to the best possible prediction strategy is guaranteed
    • …
    corecore