7,664 research outputs found

    Reinforcement Learning: A Survey

    Full text link
    This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

    Slip and Adhesion in a Railway Wheelset Simulink Model Proposed for Detection Driving Conditions Via Neural Networks

    Get PDF
    Constantly enlarging operation of locomotives with a very high tractive power in modern railway transport has caused problems with optimal supplying torque from motor to wheel-sets. Losses emerging with inadequate torque values lead to wheel slipping connected with excessive wear and limited acceleration. In models simulating dynamics of torque transmission from the drive units to wheels, the most important are the submodel of the drive and the submodel of balance between traction forces and drive resistances. Some issues of this field studied within a PhD program and SGS (CTU Students Grant Competition) has been focused on increasing quality of these submodels. This contribution is aimed at an innovated part in the existing Simulink model utilizing new data sources and modeling techniques. This improvement supports application of operating point detection methods based on machine learning techniques. New control facilities provided with pulse-width modulated frequency control of the asynchronous motor will be used for automatic submission of optimal operating points. The idea of utilization of via simulation obtained data is an on-line training of polynomial neural unit as an approximation of current driving conditions.Neustále narůstající provoz lokomotiv s velmi vysokým trakčním výkonem v moderní železniční dopravě způsobuje problémy s přenosem optimálního hnacího momentu z motoru na dvojkolí. Ztráty vyplývající z nevhodných hodnot točivého momentu vedou k prokluzu kol spojeným s nadměrným opotřebením a omezeným zrychlením. V modelech simulujících dynamiku přenosu točivého momentu z pohonné jednotky na dvojkolí jsou nejdůležitější submodely pohonu a rovnováhy mezi trakčními silami a jízdními odpory. Výzkum prováděný v rámci doktorských studijních programů a SGS (Studentská grantová soutěž ČVUT) se zaměřuje na zvyšování kvality těchto submodelů. Tento příspěvek je zaměřen na inovovanou část v existujícím Simulink modelu využívajícím nové zdroje dat a technik modelování. Nové možnosti regulace zajištěné pulzně-šířkovou frekvenční regulací asynchronního motoru budou použity pro automatické poskytnutí optimálních provozních bodů. Představa využití simulací získaných dat je on-line učení polynomické neuronové jednotky jako aproximace současných jízdních podmínek

    A Generic Approach for Escaping Saddle points

    Full text link
    A central challenge to using first-order methods for optimizing nonconvex problems is the presence of saddle points. First-order methods often get stuck at saddle points, greatly deteriorating their performance. Typically, to escape from saddles one has to use second-order methods. However, most works on second-order methods rely extensively on expensive Hessian-based computations, making them impractical in large-scale settings. To tackle this challenge, we introduce a generic framework that minimizes Hessian based computations while at the same time provably converging to second-order critical points. Our framework carefully alternates between a first-order and a second-order subroutine, using the latter only close to saddle points, and yields convergence results competitive to the state-of-the-art. Empirical results suggest that our strategy also enjoys a good practical performance
    corecore