A* search is an informed search algorithm that uses a heuristic function to
guide the order in which nodes are expanded. Since the computation required to
expand a node and compute the heuristic values for all of its generated
children grows linearly with the size of the action space, A* search can become
impractical for problems with large action spaces. This computational burden
becomes even more apparent when heuristic functions are learned by general, but
computationally expensive, deep neural networks. To address this problem, we
introduce DeepCubeAQ, a deep reinforcement learning and search algorithm that
builds on the DeepCubeA algorithm and deep Q-networks. DeepCubeAQ learns a
heuristic function that, with a single forward pass through a deep neural
network, computes the sum of the transition cost and the heuristic value of all
of the children of a node without explicitly generating any of the children,
eliminating the need for node expansions. DeepCubeAQ then uses a novel variant
of A* search, called AQ* search, that uses the deep Q-network to guide search.
We use DeepCubeAQ to solve the Rubik's cube when formulated with a large action
space that includes 1872 meta-actions and show that this 157-fold increase in
the size of the action space incurs less than a 4-fold increase in computation
time when performing AQ* search and that AQ* search is orders of magnitude
faster than A* search