11 research outputs found
A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks
A* search is an informed search algorithm that uses a heuristic function to
guide the order in which nodes are expanded. Since the computation required to
expand a node and compute the heuristic values for all of its generated
children grows linearly with the size of the action space, A* search can become
impractical for problems with large action spaces. This computational burden
becomes even more apparent when heuristic functions are learned by general, but
computationally expensive, deep neural networks. To address this problem, we
introduce DeepCubeAQ, a deep reinforcement learning and search algorithm that
builds on the DeepCubeA algorithm and deep Q-networks. DeepCubeAQ learns a
heuristic function that, with a single forward pass through a deep neural
network, computes the sum of the transition cost and the heuristic value of all
of the children of a node without explicitly generating any of the children,
eliminating the need for node expansions. DeepCubeAQ then uses a novel variant
of A* search, called AQ* search, that uses the deep Q-network to guide search.
We use DeepCubeAQ to solve the Rubik's cube when formulated with a large action
space that includes 1872 meta-actions and show that this 157-fold increase in
the size of the action space incurs less than a 4-fold increase in computation
time when performing AQ* search and that AQ* search is orders of magnitude
faster than A* search
Learning Heuristic Selection with Dynamic Algorithm Configuration
A key challenge in satisficing planning is to use multiple heuristics within
one heuristic search. An aggregation of multiple heuristic estimates, for
example by taking the maximum, has the disadvantage that bad estimates of a
single heuristic can negatively affect the whole search. Since the performance
of a heuristic varies from instance to instance, approaches such as algorithm
selection can be successfully applied. In addition, alternating between
multiple heuristics during the search makes it possible to use all heuristics
equally and improve performance. However, all these approaches ignore the
internal search dynamics of a planning system, which can help to select the
most useful heuristics for the current expansion step. We show that dynamic
algorithm configuration can be used for dynamic heuristic selection which takes
into account the internal search dynamics of a planning system. Furthermore, we
prove that this approach generalizes over existing approaches and that it can
exponentially improve the performance of the heuristic search. To learn dynamic
heuristic selection, we propose an approach based on reinforcement learning and
show empirically that domain-wise learned policies, which take the internal
search dynamics of a planning system into account, can exceed existing
approaches.Comment: Long version of the paper at the International Conference on
Automated Planning and Scheduling (ICAPS) 202
Policy-Guided Heuristic Search with Guarantees
The use of a policy and a heuristic function for guiding search can be quite
effective in adversarial problems, as demonstrated by AlphaGo and its
successors, which are based on the PUCT search algorithm. While PUCT can also
be used to solve single-agent deterministic problems, it lacks guarantees on
its search effort and it can be computationally inefficient in practice.
Combining the A* algorithm with a learned heuristic function tends to work
better in these domains, but A* and its variants do not use a policy. Moreover,
the purpose of using A* is to find solutions of minimum cost, while we seek
instead to minimize the search loss (e.g., the number of search steps). LevinTS
is guided by a policy and provides guarantees on the number of search steps
that relate to the quality of the policy, but it does not make use of a
heuristic function. In this work we introduce Policy-guided Heuristic Search
(PHS), a novel search algorithm that uses both a heuristic function and a
policy and has theoretical guarantees on the search loss that relates to both
the quality of the heuristic and of the policy. We show empirically on the
sliding-tile puzzle, Sokoban, and a puzzle from the commercial game `The
Witness' that PHS enables the rapid learning of both a policy and a heuristic
function and compares favorably with A*, Weighted A*, Greedy Best-First Search,
LevinTS, and PUCT in terms of number of problems solved and search time in all
three domains tested
Koneoppimisen soveltaminen graafihakualgoritmien ohjaamisessa
Graafirakenteita käytetään monissa eri sovelluksissa kuvaamaan niiden toimintaympäristöjen lainalaisuuksia. Graafien laajan sovelluskentän vuoksi graafihakualgoritmit ovat keskeisessä roolissa sovellusten toteutuksessa. Graafihakuongelmaan on kehitetty erittäin hyvin toimivia algoritmeja, mutta tietoaineistojen kasvaminen ja sovelluskohteiden monimutkaistuminen asettavat yhä korkeampia tehokkuusvaatimuksia graafihakualgoritmeille.
Graafihakumenetelmät ovat karkeasti jaettavissa kahteen joukkoon: heikkoihin hakumenetelmiin, jotka ovat suoritettavissa missä tahansa graafissa, ja heuristisiin hakumenetelmiin, jotka käyttävät sovellus-, hakutehtävä- ja graafikohtaista tietoa tehostaakseen hakua. Heikot hakumenetelmät ovat toimintavarmoja, mutta suorittavat paljon ylimääräisiä toimenpiteitä haun aikana, mikä tuottaa tehottomuutta. Heuristiset hakumenetelmät pystyvät välttämään osan ylimääräisistä vaiheista, mutta heuristisia hakuja on mahdollista räätälöidä vain rajattuun määrään sovelluskohteita.
Tässä tutkielmassa esittelen uuden koneoppimista käyttävän graafihakumenetelmän, jonka oppimistehtävänä on tuottaa malli, joka ohjaa hakua heurististen hakumenetelmien tapaan. Menetelmän oppiva osa perustuu vahvistusoppimiseen, jonka avulla voidaan purkaa tehokkaiden hakumenetelmien asettamia vaatimuksia graafien kuvaukseen ja laajentaa siten näiden hakumenetelmien sovelluskenttää.
Oppivan haun toimintaa mitattiin kahdessa kokeessa. Ensimmäisessä kokeessa oppivaa hakua verrattiin tehokkaaseen heuristiseen hakumenetelmään sille soveltuvissa hakutehtävissä. Toisessa kokeessa oppivan haun toimintaa tutkittiin heuristisille hauille sopimattomassa hakutehtävässä, jossa oppivan haun vertailukohtana käytettiin yksinkertaista leveyshakua.
Kokeiden tulosten perusteella oppivaa hakua voidaan pitää hyvänä keinona toteuttaa tehokkaampia hakumenetelmiä sovelluskohteissa, joissa heuristista hakua ei ole mahdollista käyttää. Tutkielmassa esitetty oppiva haku vaatii kuitenkin suuren muistimäärän sisäisen mallinsa ylläpitämiseen, mikä on otettava huomioon oppivan haun soveltamisessa ja jatkokehityksessä
Classical Planning in Deep Latent Space
Current domain-independent, classical planners require symbolic models of the
problem domain and instance as input, resulting in a knowledge acquisition
bottleneck. Meanwhile, although deep learning has achieved significant success
in many fields, the knowledge is encoded in a subsymbolic representation which
is incompatible with symbolic systems such as planners. We propose Latplan, an
unsupervised architecture combining deep learning and classical planning. Given
only an unlabeled set of image pairs showing a subset of transitions allowed in
the environment (training inputs), Latplan learns a complete propositional PDDL
action model of the environment. Later, when a pair of images representing the
initial and the goal states (planning inputs) is given, Latplan finds a plan to
the goal state in a symbolic latent space and returns a visualized plan
execution. We evaluate Latplan using image-based versions of 6 planning
domains: 8-puzzle, 15-Puzzle, Blocksworld, Sokoban and Two variations of
LightsOut.Comment: Under review at Journal of Artificial Intelligence Research (JAIR