112,498 research outputs found
A vision-guided parallel parking system for a mobile robot using approximate policy iteration
Reinforcement Learning (RL) methods enable autonomous robots to learn skills from scratch by interacting with the environment. However, reinforcement learning can be very time consuming. This paper focuses on accelerating the reinforcement learning process on a mobile robot in an unknown environment. The presented algorithm is based on approximate policy iteration with a continuous state space and a fixed number of actions. The action-value function is represented by a weighted combination of basis functions.
Furthermore, a complexity analysis is provided to show that the implemented approach is guaranteed to converge on an optimal policy with less computational time.
A parallel parking task is selected for testing purposes. In the experiments, the efficiency of the proposed approach is demonstrated and analyzed through a set of simulated and real robot experiments, with comparison drawn from two well known algorithms (Dyna-Q and Q-learning)
Speculative Approximations for Terascale Analytics
Model calibration is a major challenge faced by the plethora of statistical
analytics packages that are increasingly used in Big Data applications.
Identifying the optimal model parameters is a time-consuming process that has
to be executed from scratch for every dataset/model combination even by
experienced data scientists. We argue that the incapacity to evaluate multiple
parameter configurations simultaneously and the lack of support to quickly
identify sub-optimal configurations are the principal causes. In this paper, we
develop two database-inspired techniques for efficient model calibration.
Speculative parameter testing applies advanced parallel multi-query processing
methods to evaluate several configurations concurrently. The number of
configurations is determined adaptively at runtime, while the configurations
themselves are extracted from a distribution that is continuously learned
following a Bayesian process. Online aggregation is applied to identify
sub-optimal configurations early in the processing by incrementally sampling
the training dataset and estimating the objective function corresponding to
each configuration. We design concurrent online aggregation estimators and
define halting conditions to accurately and timely stop the execution. We apply
the proposed techniques to distributed gradient descent optimization -- batch
and incremental -- for support vector machines and logistic regression models.
We implement the resulting solutions in GLADE PF-OLA -- a state-of-the-art Big
Data analytics system -- and evaluate their performance over terascale-size
synthetic and real datasets. The results confirm that as many as 32
configurations can be evaluated concurrently almost as fast as one, while
sub-optimal configurations are detected accurately in as little as a
fraction of the time
Competitive function approximation for reinforcement learning
The application of reinforcement learning to problems with continuous domains requires representing the value function by means of function approximation. We identify two aspects of reinforcement learning that make the function approximation process hard: non-stationarity of the target function and biased sampling. Non-stationarity is the result of the bootstrapping nature of dynamic programming where the value function is estimated using its current approximation. Biased sampling occurs when some regions of the state space are visited too often, causing a reiterated updating with similar values which fade out the occasional updates of infrequently sampled regions.
We propose a competitive approach for function approximation where many different local approximators are available at a given input and the one with expectedly best approximation is selected by means of a relevance function. The local nature of the approximators allows their fast adaptation to non-stationary changes and mitigates the biased sampling problem. The coexistence of multiple approximators updated and tried in parallel permits obtaining a good estimation much faster than would be possible with a single approximator. Experiments in different benchmark problems show that the competitive strategy provides a faster and more stable learning than non-competitive approaches.Preprin
A Robust Approach to Optimal Matched Filter Design in Ultrasonic Non-Destructive Evaluation (NDE)
The matched filter was demonstrated to be a powerful yet efficient technique to enhance defect detection and imaging in ultrasonic non-destructive evaluation (NDE) of coarse grain materials, provided that the filter was properly designed and optimized. In the literature, in order to accurately approximate the defect echoes, the design utilized the real excitation signals, which made it time consuming and less straightforward to implement in practice. In this paper, we present a more robust and flexible approach to optimal matched filter design using the simulated excitation signals, and the control parameters are chosen and optimized based on the real scenario of array transducer, transmitter-receiver system response, and the test sample, as a result, the filter response is optimized and depends on the material characteristics. Experiments on industrial samples are conducted and the results confirm the great benefits of the method
- …