6,006 research outputs found

    Reinforcement Learning: A Survey

    Full text link
    This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

    Slip and Adhesion in a Railway Wheelset Simulink Model Proposed for Detection Driving Conditions Via Neural Networks

    Get PDF
    Constantly enlarging operation of locomotives with a very high tractive power in modern railway transport has caused problems with optimal supplying torque from motor to wheel-sets. Losses emerging with inadequate torque values lead to wheel slipping connected with excessive wear and limited acceleration. In models simulating dynamics of torque transmission from the drive units to wheels, the most important are the submodel of the drive and the submodel of balance between traction forces and drive resistances. Some issues of this field studied within a PhD program and SGS (CTU Students Grant Competition) has been focused on increasing quality of these submodels. This contribution is aimed at an innovated part in the existing Simulink model utilizing new data sources and modeling techniques. This improvement supports application of operating point detection methods based on machine learning techniques. New control facilities provided with pulse-width modulated frequency control of the asynchronous motor will be used for automatic submission of optimal operating points. The idea of utilization of via simulation obtained data is an on-line training of polynomial neural unit as an approximation of current driving conditions.Neustále narůstající provoz lokomotiv s velmi vysokým trakčním výkonem v moderní železniční dopravě způsobuje problémy s přenosem optimálního hnacího momentu z motoru na dvojkolí. Ztráty vyplývající z nevhodných hodnot točivého momentu vedou k prokluzu kol spojeným s nadměrným opotřebením a omezeným zrychlením. V modelech simulujících dynamiku přenosu točivého momentu z pohonné jednotky na dvojkolí jsou nejdůležitější submodely pohonu a rovnováhy mezi trakčními silami a jízdními odpory. Výzkum prováděný v rámci doktorských studijních programů a SGS (Studentská grantová soutěž ČVUT) se zaměřuje na zvyšování kvality těchto submodelů. Tento příspěvek je zaměřen na inovovanou část v existujícím Simulink modelu využívajícím nové zdroje dat a technik modelování. Nové možnosti regulace zajištěné pulzně-šířkovou frekvenční regulací asynchronního motoru budou použity pro automatické poskytnutí optimálních provozních bodů. Představa využití simulací získaných dat je on-line učení polynomické neuronové jednotky jako aproximace současných jízdních podmínek

    Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure

    Get PDF
    Stochastic optimization algorithms with variance reduction have proven successful for minimizing large finite sums of functions. Unfortunately, these techniques are unable to deal with stochastic perturbations of input data, induced for example by data augmentation. In such cases, the objective is no longer a finite sum, and the main candidate for optimization is the stochastic gradient descent method (SGD). In this paper, we introduce a variance reduction approach for these settings when the objective is composite and strongly convex. The convergence rate outperforms SGD with a typically much smaller constant factor, which depends on the variance of gradient estimates only due to perturbations on a single example.Comment: Advances in Neural Information Processing Systems (NIPS), Dec 2017, Long Beach, CA, United State
    corecore