7,664 research outputs found
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
Slip and Adhesion in a Railway Wheelset Simulink Model Proposed for Detection Driving Conditions Via Neural Networks
Constantly enlarging operation of locomotives with a very high tractive power in modern railway transport has caused problems with optimal supplying torque from motor to wheel-sets. Losses emerging with inadequate torque values lead to wheel slipping connected with excessive wear and limited acceleration. In models simulating dynamics of torque transmission from the drive units to wheels, the most important are the submodel of the drive and the submodel of balance between traction forces and drive resistances. Some issues of this field studied within a PhD program and SGS (CTU Students Grant Competition) has been focused on increasing quality of these submodels. This contribution is aimed at an innovated part in the existing Simulink model utilizing new data sources and modeling techniques. This improvement supports application of operating point detection methods based on machine learning techniques. New control facilities provided with pulse-width modulated frequency control of the asynchronous motor will be used for automatic submission of optimal operating points. The idea of utilization of via simulation obtained data is an on-line training of polynomial neural unit as an approximation of current driving conditions.Neustále narůstající provoz lokomotiv s velmi vysokým trakčním výkonem v moderní železniční dopravě způsobuje problémy s přenosem optimálního hnacího momentu z motoru na dvojkolí. Ztráty vyplývající z nevhodných hodnot točivého momentu vedou k prokluzu kol spojeným s nadměrným opotřebením a omezeným zrychlením. V modelech simulujících dynamiku přenosu točivého momentu z pohonné jednotky na dvojkolí jsou nejdůležitější submodely pohonu a rovnováhy mezi trakčními silami a jízdními odpory. Výzkum prováděný v rámci doktorských studijních programů a SGS (Studentská grantová soutěž ČVUT) se zaměřuje na zvyšování kvality těchto submodelů. Tento příspěvek je zaměřen na inovovanou část v existujícím Simulink modelu využívajícím nové zdroje dat a technik modelování. Nové možnosti regulace zajištěné pulzně-šířkovou frekvenční regulací asynchronního motoru budou použity pro automatické poskytnutí optimálních provozních bodů. Představa využití simulací získaných dat je on-line učení polynomické neuronové jednotky jako aproximace současných jízdních podmínek
A Generic Approach for Escaping Saddle points
A central challenge to using first-order methods for optimizing nonconvex
problems is the presence of saddle points. First-order methods often get stuck
at saddle points, greatly deteriorating their performance. Typically, to escape
from saddles one has to use second-order methods. However, most works on
second-order methods rely extensively on expensive Hessian-based computations,
making them impractical in large-scale settings. To tackle this challenge, we
introduce a generic framework that minimizes Hessian based computations while
at the same time provably converging to second-order critical points. Our
framework carefully alternates between a first-order and a second-order
subroutine, using the latter only close to saddle points, and yields
convergence results competitive to the state-of-the-art. Empirical results
suggest that our strategy also enjoys a good practical performance
- …