11 research outputs found

    A* Search Without Expansions: Learning Heuristic Functions with Deep Q-Networks

    Full text link
    A* search is an informed search algorithm that uses a heuristic function to guide the order in which nodes are expanded. Since the computation required to expand a node and compute the heuristic values for all of its generated children grows linearly with the size of the action space, A* search can become impractical for problems with large action spaces. This computational burden becomes even more apparent when heuristic functions are learned by general, but computationally expensive, deep neural networks. To address this problem, we introduce DeepCubeAQ, a deep reinforcement learning and search algorithm that builds on the DeepCubeA algorithm and deep Q-networks. DeepCubeAQ learns a heuristic function that, with a single forward pass through a deep neural network, computes the sum of the transition cost and the heuristic value of all of the children of a node without explicitly generating any of the children, eliminating the need for node expansions. DeepCubeAQ then uses a novel variant of A* search, called AQ* search, that uses the deep Q-network to guide search. We use DeepCubeAQ to solve the Rubik's cube when formulated with a large action space that includes 1872 meta-actions and show that this 157-fold increase in the size of the action space incurs less than a 4-fold increase in computation time when performing AQ* search and that AQ* search is orders of magnitude faster than A* search

    Learning Heuristic Selection with Dynamic Algorithm Configuration

    Full text link
    A key challenge in satisficing planning is to use multiple heuristics within one heuristic search. An aggregation of multiple heuristic estimates, for example by taking the maximum, has the disadvantage that bad estimates of a single heuristic can negatively affect the whole search. Since the performance of a heuristic varies from instance to instance, approaches such as algorithm selection can be successfully applied. In addition, alternating between multiple heuristics during the search makes it possible to use all heuristics equally and improve performance. However, all these approaches ignore the internal search dynamics of a planning system, which can help to select the most useful heuristics for the current expansion step. We show that dynamic algorithm configuration can be used for dynamic heuristic selection which takes into account the internal search dynamics of a planning system. Furthermore, we prove that this approach generalizes over existing approaches and that it can exponentially improve the performance of the heuristic search. To learn dynamic heuristic selection, we propose an approach based on reinforcement learning and show empirically that domain-wise learned policies, which take the internal search dynamics of a planning system into account, can exceed existing approaches.Comment: Long version of the paper at the International Conference on Automated Planning and Scheduling (ICAPS) 202

    Policy-Guided Heuristic Search with Guarantees

    Full text link
    The use of a policy and a heuristic function for guiding search can be quite effective in adversarial problems, as demonstrated by AlphaGo and its successors, which are based on the PUCT search algorithm. While PUCT can also be used to solve single-agent deterministic problems, it lacks guarantees on its search effort and it can be computationally inefficient in practice. Combining the A* algorithm with a learned heuristic function tends to work better in these domains, but A* and its variants do not use a policy. Moreover, the purpose of using A* is to find solutions of minimum cost, while we seek instead to minimize the search loss (e.g., the number of search steps). LevinTS is guided by a policy and provides guarantees on the number of search steps that relate to the quality of the policy, but it does not make use of a heuristic function. In this work we introduce Policy-guided Heuristic Search (PHS), a novel search algorithm that uses both a heuristic function and a policy and has theoretical guarantees on the search loss that relates to both the quality of the heuristic and of the policy. We show empirically on the sliding-tile puzzle, Sokoban, and a puzzle from the commercial game `The Witness' that PHS enables the rapid learning of both a policy and a heuristic function and compares favorably with A*, Weighted A*, Greedy Best-First Search, LevinTS, and PUCT in terms of number of problems solved and search time in all three domains tested

    Koneoppimisen soveltaminen graafihakualgoritmien ohjaamisessa

    Get PDF
    Graafirakenteita käytetään monissa eri sovelluksissa kuvaamaan niiden toimintaympäristöjen lainalaisuuksia. Graafien laajan sovelluskentän vuoksi graafihakualgoritmit ovat keskeisessä roolissa sovellusten toteutuksessa. Graafihakuongelmaan on kehitetty erittäin hyvin toimivia algoritmeja, mutta tietoaineistojen kasvaminen ja sovelluskohteiden monimutkaistuminen asettavat yhä korkeampia tehokkuusvaatimuksia graafihakualgoritmeille. Graafihakumenetelmät ovat karkeasti jaettavissa kahteen joukkoon: heikkoihin hakumenetelmiin, jotka ovat suoritettavissa missä tahansa graafissa, ja heuristisiin hakumenetelmiin, jotka käyttävät sovellus-, hakutehtävä- ja graafikohtaista tietoa tehostaakseen hakua. Heikot hakumenetelmät ovat toimintavarmoja, mutta suorittavat paljon ylimääräisiä toimenpiteitä haun aikana, mikä tuottaa tehottomuutta. Heuristiset hakumenetelmät pystyvät välttämään osan ylimääräisistä vaiheista, mutta heuristisia hakuja on mahdollista räätälöidä vain rajattuun määrään sovelluskohteita. Tässä tutkielmassa esittelen uuden koneoppimista käyttävän graafihakumenetelmän, jonka oppimistehtävänä on tuottaa malli, joka ohjaa hakua heurististen hakumenetelmien tapaan. Menetelmän oppiva osa perustuu vahvistusoppimiseen, jonka avulla voidaan purkaa tehokkaiden hakumenetelmien asettamia vaatimuksia graafien kuvaukseen ja laajentaa siten näiden hakumenetelmien sovelluskenttää. Oppivan haun toimintaa mitattiin kahdessa kokeessa. Ensimmäisessä kokeessa oppivaa hakua verrattiin tehokkaaseen heuristiseen hakumenetelmään sille soveltuvissa hakutehtävissä. Toisessa kokeessa oppivan haun toimintaa tutkittiin heuristisille hauille sopimattomassa hakutehtävässä, jossa oppivan haun vertailukohtana käytettiin yksinkertaista leveyshakua. Kokeiden tulosten perusteella oppivaa hakua voidaan pitää hyvänä keinona toteuttaa tehokkaampia hakumenetelmiä sovelluskohteissa, joissa heuristista hakua ei ole mahdollista käyttää. Tutkielmassa esitetty oppiva haku vaatii kuitenkin suuren muistimäärän sisäisen mallinsa ylläpitämiseen, mikä on otettava huomioon oppivan haun soveltamisessa ja jatkokehityksessä

    Classical Planning in Deep Latent Space

    Full text link
    Current domain-independent, classical planners require symbolic models of the problem domain and instance as input, resulting in a knowledge acquisition bottleneck. Meanwhile, although deep learning has achieved significant success in many fields, the knowledge is encoded in a subsymbolic representation which is incompatible with symbolic systems such as planners. We propose Latplan, an unsupervised architecture combining deep learning and classical planning. Given only an unlabeled set of image pairs showing a subset of transitions allowed in the environment (training inputs), Latplan learns a complete propositional PDDL action model of the environment. Later, when a pair of images representing the initial and the goal states (planning inputs) is given, Latplan finds a plan to the goal state in a symbolic latent space and returns a visualized plan execution. We evaluate Latplan using image-based versions of 6 planning domains: 8-puzzle, 15-Puzzle, Blocksworld, Sokoban and Two variations of LightsOut.Comment: Under review at Journal of Artificial Intelligence Research (JAIR
    corecore