23,415 research outputs found
Learning in Real-Time Search: A Unifying Framework
Real-time search methods are suited for tasks in which the agent is
interacting with an initially unknown environment in real time. In such
simultaneous planning and learning problems, the agent has to select its
actions in a limited amount of time, while sensing only a local part of the
environment centered at the agents current location. Real-time heuristic search
agents select actions using a limited lookahead search and evaluating the
frontier states with a heuristic function. Over repeated experiences, they
refine heuristic values of states to avoid infinite loops and to converge to
better solutions. The wide spread of such settings in autonomous software and
hardware agents has led to an explosion of real-time search algorithms over the
last two decades. Not only is a potential user confronted with a hodgepodge of
algorithms, but he also faces the choice of control parameters they use. In
this paper we address both problems. The first contribution is an introduction
of a simple three-parameter framework (named LRTS) which extracts the core
ideas behind many existing algorithms. We then prove that LRTA*, epsilon-LRTA*,
SLA*, and gamma-Trap algorithms are special cases of our framework. Thus, they
are unified and extended with additional features. Second, we prove
completeness and convergence of any algorithm covered by the LRTS framework.
Third, we prove several upper-bounds relating the control parameters and
solution quality. Finally, we analyze the influence of the three control
parameters empirically in the realistic scalable domains of real-time
navigation on initially unknown maps from a commercial role-playing game as
well as routing in ad hoc sensor networks
Eligibility Propagation to Speed up Time Hopping for Reinforcement Learning
A mechanism called Eligibility Propagation is proposed to speed up the Time
Hopping technique used for faster Reinforcement Learning in simulations.
Eligibility Propagation provides for Time Hopping similar abilities to what
eligibility traces provide for conventional Reinforcement Learning. It
propagates values from one state to all of its temporal predecessors using a
state transitions graph. Experiments on a simulated biped crawling robot
confirm that Eligibility Propagation accelerates the learning process more than
3 times.Comment: 7 page
A convergence acceleration operator for multiobjective optimisation
A novel multiobjective optimisation accelerator is
introduced that uses direct manipulation in objective space
together with neural network mappings from objective space to decision space. This operator is a portable component that can be hybridized with any multiobjective optimisation algorithm. The purpose of this Convergence Acceleration Operator (CAO) is to enhance the search capability and the speed of convergence of the host algorithm. The operator acts directly in objective space to suggest improvements to solutions obtained by a multiobjective evolutionary algorithm (MOEA). These suggested improved objective vectors are then mapped into decision variable space and tested. The CAO is incorporated with two leading MOEAs, the Non-Dominated Sorting Genetic Algorithm (NSGA-II) and the Strength Pareto Evolutionary Algorithm (SPEA2) and tested. Results show that the hybridized algorithms consistently improve the speed of convergence of the original algorithm whilst maintaining the desired distribution of solutions
Towards efficient SimRank computation on large networks
SimRank has been a powerful model for assessing the similarity of pairs of vertices in a graph. It is based on the concept that two vertices are similar if they are referenced by similar vertices. Due to its self-referentiality, fast SimRank computation on large graphs poses significant challenges. The state-of-the-art work [17] exploits partial sums memorization for computing SimRank in O(Kmn) time on a graph with n vertices and m edges, where K is the number of iterations. Partial sums memorizing can reduce repeated calculations by caching part of similarity summations for later reuse. However, we observe that computations among different partial sums may have duplicate redundancy. Besides, for a desired accuracy ϵ, the existing SimRank model requires K = [logC ϵ] iterations [17], where C is a damping factor. Nevertheless, such a geometric rate of convergence is slow in practice if a high accuracy is desirable. In this paper, we address these gaps. (1) We propose an adaptive clustering strategy to eliminate partial sums redundancy (i.e., duplicate computations occurring in partial sums), and devise an efficient algorithm for speeding up the computation of SimRank to 0(Kdn2) time, where d is typically much smaller than the average in-degree of a graph. (2) We also present a new notion of SimRank that is based on a differential equation and can be represented as an exponential sum of transition matrices, as opposed to the geometric sum of the conventional counterpart. This leads to a further speedup in the convergence rate of SimRank iterations. (3) Using real and synthetic data, we empirically verify that our approach of partial sums sharing outperforms the best known algorithm by up to one order of magnitude, and that our revised notion of SimRank further achieves a 5X speedup on large graphs while also fairly preserving the relative order of original SimRank scores
A similarity-based cooperative co-evolutionary algorithm for dynamic interval multi-objective optimization problems
The file attached to this record is the author's final peer reviewed version. The Publisher's final version can be found by following the DOI link.Dynamic interval multi-objective optimization problems (DI-MOPs) are very common in real-world applications. However, there are few evolutionary algorithms that are suitable for tackling DI-MOPs up to date. A framework of dynamic interval multi-objective cooperative co-evolutionary optimization based on the interval similarity is presented in this paper to handle DI-MOPs. In the framework, a strategy for decomposing decision variables is first proposed, through which all the decision variables are divided into two groups according to the interval similarity between each decision variable and interval parameters. Following that, two sub-populations are utilized to cooperatively optimize decision variables in the two groups. Furthermore, two response strategies, rgb0.00,0.00,0.00i.e., a strategy based on the change intensity and a random mutation strategy, are employed to rapidly track the changing Pareto front of the optimization problem. The proposed algorithm is applied to eight benchmark optimization instances rgb0.00,0.00,0.00as well as a multi-period portfolio selection problem and compared with five state-of-the-art evolutionary algorithms. The experimental results reveal that the proposed algorithm is very competitive on most optimization instances
- …