2,303 research outputs found
A Survey of Adaptive Resonance Theory Neural Network Models for Engineering Applications
This survey samples from the ever-growing family of adaptive resonance theory
(ART) neural network models used to perform the three primary machine learning
modalities, namely, unsupervised, supervised and reinforcement learning. It
comprises a representative list from classic to modern ART models, thereby
painting a general picture of the architectures developed by researchers over
the past 30 years. The learning dynamics of these ART models are briefly
described, and their distinctive characteristics such as code representation,
long-term memory and corresponding geometric interpretation are discussed.
Useful engineering properties of ART (speed, configurability, explainability,
parallelization and hardware implementation) are examined along with current
challenges. Finally, a compilation of online software libraries is provided. It
is expected that this overview will be helpful to new and seasoned ART
researchers
Faster Rates for Policy Learning
This article improves the existing proven rates of regret decay in optimal
policy estimation. We give a margin-free result showing that the regret decay
for estimating a within-class optimal policy is second-order for empirical risk
minimizers over Donsker classes, with regret decaying at a faster rate than the
standard error of an efficient estimator of the value of an optimal policy. We
also give a result from the classification literature that shows that faster
regret decay is possible via plug-in estimation provided a margin condition
holds. Four examples are considered. In these examples, the regret is expressed
in terms of either the mean value or the median value; the number of possible
actions is either two or finitely many; and the sampling scheme is either
independent and identically distributed or sequential, where the latter
represents a contextual bandit sampling scheme
Learning in Real-Time Search: A Unifying Framework
Real-time search methods are suited for tasks in which the agent is
interacting with an initially unknown environment in real time. In such
simultaneous planning and learning problems, the agent has to select its
actions in a limited amount of time, while sensing only a local part of the
environment centered at the agents current location. Real-time heuristic search
agents select actions using a limited lookahead search and evaluating the
frontier states with a heuristic function. Over repeated experiences, they
refine heuristic values of states to avoid infinite loops and to converge to
better solutions. The wide spread of such settings in autonomous software and
hardware agents has led to an explosion of real-time search algorithms over the
last two decades. Not only is a potential user confronted with a hodgepodge of
algorithms, but he also faces the choice of control parameters they use. In
this paper we address both problems. The first contribution is an introduction
of a simple three-parameter framework (named LRTS) which extracts the core
ideas behind many existing algorithms. We then prove that LRTA*, epsilon-LRTA*,
SLA*, and gamma-Trap algorithms are special cases of our framework. Thus, they
are unified and extended with additional features. Second, we prove
completeness and convergence of any algorithm covered by the LRTS framework.
Third, we prove several upper-bounds relating the control parameters and
solution quality. Finally, we analyze the influence of the three control
parameters empirically in the realistic scalable domains of real-time
navigation on initially unknown maps from a commercial role-playing game as
well as routing in ad hoc sensor networks
Log Barriers for Safe Black-box Optimization with Application to Safe Reinforcement Learning
Optimizing noisy functions online, when evaluating the objective requires
experiments on a deployed system, is a crucial task arising in manufacturing,
robotics and many others. Often, constraints on safe inputs are unknown ahead
of time, and we only obtain noisy information, indicating how close we are to
violating the constraints. Yet, safety must be guaranteed at all times, not
only for the final output of the algorithm.
We introduce a general approach for seeking a stationary point in high
dimensional non-linear stochastic optimization problems in which maintaining
safety during learning is crucial. Our approach called LB-SGD is based on
applying stochastic gradient descent (SGD) with a carefully chosen adaptive
step size to a logarithmic barrier approximation of the original problem. We
provide a complete convergence analysis of non-convex, convex, and
strongly-convex smooth constrained problems, with first-order and zeroth-order
feedback. Our approach yields efficient updates and scales better with
dimensionality compared to existing approaches.
We empirically compare the sample complexity and the computational cost of
our method with existing safe learning approaches. Beyond synthetic benchmarks,
we demonstrate the effectiveness of our approach on minimizing constraint
violation in policy search tasks in safe reinforcement learning (RL).Comment: 36 pages, 9 pages of appendi
- …