636 research outputs found

    Agent Behavior Prediction and Its Generalization Analysis

    Full text link
    Machine learning algorithms have been applied to predict agent behaviors in real-world dynamic systems, such as advertiser behaviors in sponsored search and worker behaviors in crowdsourcing. The behavior data in these systems are generated by live agents: once the systems change due to the adoption of the prediction models learnt from the behavior data, agents will observe and respond to these changes by changing their own behaviors accordingly. As a result, the behavior data will evolve and will not be identically and independently distributed, posing great challenges to the theoretical analysis on the machine learning algorithms for behavior prediction. To tackle this challenge, in this paper, we propose to use Markov Chain in Random Environments (MCRE) to describe the behavior data, and perform generalization analysis of the machine learning algorithms on its basis. Since the one-step transition probability matrix of MCRE depends on both previous states and the random environment, conventional techniques for generalization analysis cannot be directly applied. To address this issue, we propose a novel technique that transforms the original MCRE into a higher-dimensional time-homogeneous Markov chain. The new Markov chain involves more variables but is more regular, and thus easier to deal with. We prove the convergence of the new Markov chain when time approaches infinity. Then we prove a generalization bound for the machine learning algorithms on the behavior data generated by the new Markov chain, which depends on both the Markovian parameters and the covering number of the function class compounded by the loss function for behavior prediction and the behavior prediction model. To the best of our knowledge, this is the first work that performs the generalization analysis on data generated by complex processes in real-world dynamic systems

    Online Variance Reduction for Stochastic Optimization

    Full text link
    Modern stochastic optimization methods often rely on uniform sampling which is agnostic to the underlying characteristics of the data. This might degrade the convergence by yielding estimates that suffer from a high variance. A possible remedy is to employ non-uniform importance sampling techniques, which take the structure of the dataset into account. In this work, we investigate a recently proposed setting which poses variance reduction as an online optimization problem with bandit feedback. We devise a novel and efficient algorithm for this setting that finds a sequence of importance sampling distributions competitive with the best fixed distribution in hindsight, the first result of this kind. While we present our method for sampling datapoints, it naturally extends to selecting coordinates or even blocks of thereof. Empirical validations underline the benefits of our method in several settings.Comment: COLT 201

    Regularized approximate policy iteration using kernel for on-line reinforcement learning

    Get PDF
    By using Reinforcement Learning (RL), an autonomous agent interacting with the environment can learn how to take adequate actions for every situation in order to optimally achieve its own goal. RL provides a general methodology able to solve uncertain and complex decision problems which may be present in many real-world applications. RL problems are usually modeled as a Markov Decision Processes (MDPs) deeply studied in the literature. The main peculiarity of a RL algorithm is that the RL agent is assumed to learn the optimal policies from its experiences without knowing the parameters of the MDP. The key element in solving the MDP is learning a value function which gives the expectation of total reward an agent might expect at its current state taking a given action. This value function allows to obtain the optimal policy. In this thesis we study the capacity of SVR using kernel methods to adapt and solve complex RL problems in large or continuous state space. SVR can be studied using a geometrical interpretation in terms of optimal margin or can be seen as a regularization problem given in a Reproducing Kernel Hilbert Space (RKHS) SVR have good properties over the generalization ability and as they are based a on convex optimization problem, they do not suffer from sub-optimality. SVR are non-parametric showing the ability to automatically adapt to the complexity of the problem. Accordingly, applying SVR to approximate value functions sounds to be a good approach. SVR can be solved both in batch mode when the whole set of training sample are at disposal of the learning agents or incrementally which enables the addition or removal of training samples very effectively. Incremental SVR finds the appropriate KKT conditions for new or updated data by modifying their influences into the regression function maintaining consistence in the KKT conditions for the rest of data used for learning. In RL problems an incremental SVR should be able to approximate the action value function leading to the optimal policy. Accordingly, computation load should be lower, learning speed faster and generalization more effective than other existing method The overall contribution coming from of our work is to develop, formalize, implement and study a new RL technique for generalization in discrete and continuous state spaces with finite actions. Our method uses the Approximate Policy Iteration (API) framework with the BRM criterion which allows to represent the action value function using SVR. This approach for RL is the first one we know using SVR compatible to the agent interaction- with-the-environment framework of RL which shows his power by solving a large number of benchmark problems, including very difficult ones, like the bicycle driving and riding control problem. In addition, unlike most RL approaches to generalization, we develop a proof finding theoretical bounds for the convergence of the method to the optimal solution under given conditions.Mediante el uso de aprendizaje por refuerzo (RL), un agente autónomo interactuando con el medio ambiente puede aprender a tomar adecuada acciones para cada situación con el fin de lograr de manera óptima su propia meta. RL proporciona una metodología general capaz de resolver problemas de decisión complejos que pueden estar presentes en muchas aplicaciones del mundo real. Problemas RL usualmente se modelan como una Procesos de Decisión de Markov (MDP) estudiados profundamente en la literatura. La principal peculiaridad de un algoritmo de RL es que el agente es asumido para aprender las políticas óptimas de sus experiencias sin saber los parámetros de la MDP. El elemento clave en resolver el MDP está en el aprender una función de valor que da la expectativa de recompensa total que un agente puede esperar en su estado actual para tomar una acción determinada. Esta función de valor permite obtener la política óptima. En esta tesis se estudia la capacidad del SVR utilizando núcleo métodos para adaptarse y resolver problemas RL complejas en el espacio estado grande o continua. RVS puede ser estudiado mediante un interpretación geométrica en términos de margen óptimo o puede ser visto como un problema de regularización dado en un Reproducing Kernel Hilbert Space (RKHS). SVR tiene buenas propiedades sobre la capacidad de generalización y ya que se basan en una optimización convexa problema, ellos no sufren de sub-optimalidad. SVR son no paramétrico que muestra la capacidad de adaptarse automáticamente a la complejidad del problema. En consecuencia, la aplicación de RVS para aproximar funciones de valor suena para ser un buen enfoque. SVR puede resolver tanto en modo batch cuando todo el conjunto de muestra de entrenamiento están a disposición de los agentes de aprendizaje o incrementalmente que permite la adición o eliminación de muestras de entrenamiento muy eficaz. Incremental SVR encuentra las condiciones adecuadas para KKT nuevas o actualizadas de datos modificando sus influencias en la función de regresión mantener consistencia en las condiciones KKT para el resto de los datos utilizados para el aprendizaje. En los problemas de RL una RVS elemental será capaz de aproximar la función de valor de acción que conduce a la política óptima. En consecuencia, la carga de cálculo debería ser menor, la velocidad de aprendizaje más rápido y generalización más efectivo que el otro método existente La contribución general que viene de nuestro trabajo es desarrollar, formalizar, ejecutar y estudiar una nueva técnica de RL para la generalización en espacio de estados discretos y continuos con acciones finitas. Nuestro método utiliza el marco de la Approximate Policy Iteration (API) con el criterio de BRM que permite representar la función de valor de acción utilizando SVR. Este enfoque de RL es el primero que conocemos usando SVR compatible con el marco de RL con agentes interaccionado con el ambiente que muestra su poder mediante la resolución de un gran número de problemas de referencia, incluyendo los muy difíciles, como la conducción de bicicletas y problema de control de conducción. Además, a diferencia de la mayoría RL se acerca a la generalización, desarrollamos un hallazgo prueba límites teóricos para la convergencia del método a la solución óptima en condiciones dadas.Postprint (published version

    Decentralized Learning with Separable Data: Generalization and Fast Algorithms

    Full text link
    Decentralized learning offers privacy and communication efficiency when data are naturally distributed among agents communicating over an underlying graph. Motivated by overparameterized learning settings, in which models are trained to zero training loss, we study algorithmic and generalization properties of decentralized learning with gradient descent on separable data. Specifically, for decentralized gradient descent (DGD) and a variety of loss functions that asymptote to zero at infinity (including exponential and logistic losses), we derive novel finite-time generalization bounds. This complements a long line of recent work that studies the generalization performance and the implicit bias of gradient descent over separable data, but has thus far been limited to centralized learning scenarios. Notably, our generalization bounds match in order their centralized counterparts. Critical behind this, and of independent interest, is establishing novel bounds on the training loss and the rate-of-consensus of DGD for a class of self-bounded losses. Finally, on the algorithmic front, we design improved gradient-based routines for decentralized learning with separable data and empirically demonstrate orders-of-magnitude of speed-up in terms of both training and generalization performance
    corecore