6,006 research outputs found
Reinforcement Learning: A Survey
This paper surveys the field of reinforcement learning from a
computer-science perspective. It is written to be accessible to researchers
familiar with machine learning. Both the historical basis of the field and a
broad selection of current work are summarized. Reinforcement learning is the
problem faced by an agent that learns behavior through trial-and-error
interactions with a dynamic environment. The work described here has a
resemblance to work in psychology, but differs considerably in the details and
in the use of the word ``reinforcement.'' The paper discusses central issues of
reinforcement learning, including trading off exploration and exploitation,
establishing the foundations of the field via Markov decision theory, learning
from delayed reinforcement, constructing empirical models to accelerate
learning, making use of generalization and hierarchy, and coping with hidden
state. It concludes with a survey of some implemented systems and an assessment
of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file
Slip and Adhesion in a Railway Wheelset Simulink Model Proposed for Detection Driving Conditions Via Neural Networks
Constantly enlarging operation of locomotives with a very high tractive power in modern railway transport has caused problems with optimal supplying torque from motor to wheel-sets. Losses emerging with inadequate torque values lead to wheel slipping connected with excessive wear and limited acceleration. In models simulating dynamics of torque transmission from the drive units to wheels, the most important are the submodel of the drive and the submodel of balance between traction forces and drive resistances. Some issues of this field studied within a PhD program and SGS (CTU Students Grant Competition) has been focused on increasing quality of these submodels. This contribution is aimed at an innovated part in the existing Simulink model utilizing new data sources and modeling techniques. This improvement supports application of operating point detection methods based on machine learning techniques. New control facilities provided with pulse-width modulated frequency control of the asynchronous motor will be used for automatic submission of optimal operating points. The idea of utilization of via simulation obtained data is an on-line training of polynomial neural unit as an approximation of current driving conditions.Neustále narůstající provoz lokomotiv s velmi vysokým trakčním výkonem v moderní železniční dopravě způsobuje problémy s přenosem optimálního hnacího momentu z motoru na dvojkolí. Ztráty vyplývající z nevhodných hodnot točivého momentu vedou k prokluzu kol spojeným s nadměrným opotřebením a omezeným zrychlením. V modelech simulujících dynamiku přenosu točivého momentu z pohonné jednotky na dvojkolí jsou nejdůležitější submodely pohonu a rovnováhy mezi trakčními silami a jízdními odpory. Výzkum prováděný v rámci doktorských studijních programů a SGS (Studentská grantová soutěž ČVUT) se zaměřuje na zvyšování kvality těchto submodelů. Tento příspěvek je zaměřen na inovovanou část v existujícím Simulink modelu využívajícím nové zdroje dat a technik modelování. Nové možnosti regulace zajištěné pulzně-šířkovou frekvenční regulací asynchronního motoru budou použity pro automatické poskytnutí optimálních provozních bodů. Představa využití simulací získaných dat je on-line učení polynomické neuronové jednotky jako aproximace současných jízdních podmínek
Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure
Stochastic optimization algorithms with variance reduction have proven
successful for minimizing large finite sums of functions. Unfortunately, these
techniques are unable to deal with stochastic perturbations of input data,
induced for example by data augmentation. In such cases, the objective is no
longer a finite sum, and the main candidate for optimization is the stochastic
gradient descent method (SGD). In this paper, we introduce a variance reduction
approach for these settings when the objective is composite and strongly
convex. The convergence rate outperforms SGD with a typically much smaller
constant factor, which depends on the variance of gradient estimates only due
to perturbations on a single example.Comment: Advances in Neural Information Processing Systems (NIPS), Dec 2017,
Long Beach, CA, United State
- …