109 research outputs found
A new upper bound to (a variant of) the pancake problem
The "pancake problem" asks how many prefix reversals are sufficient to sort
any permutation to the identity. We write to
denote this quantity.
The best known bounds are that . The proof of the upper bound is computer-assisted, and
considers thousands of cases.
We consider , how many prefix and suffix reversals are sufficient to
sort any . We observe that still holds, and give a human proof that .
The constant "" is a natural barrier for the pancake problem and
this variant, hence new techniques will be required to do better.Comment: 9 pages, comments welcome
tree also barbed wire
This manuscript contains poems
Planning under time pressure
Heuristic search is a technique used pervasively in artificial intelligence and automated planning. Often an agent is given a task that it would like to solve as quickly as possible. It must allocate its time between planning the actions to achieve the task and actually executing them. We call this problem planning under time pressure. Most popular heuristic search algorithms are ill-suited for this setting, as they either search a lot to find short plans or search a little and find long plans. The thesis of this dissertation is: when under time pressure, an automated agent should explicitly attempt to minimize the sum of planning and execution times, not just one or just the other.
This dissertation makes four contributions. First we present new algorithms that use modern multi-core CPUs to decrease planning time without increasing execution. Second, we introduce a new model for predicting the performance of iterative-deepening search. The model is as accurate as previous offline techniques when using less training data, but can also be used online to reduce the overhead of iterative-deepening search, resulting in faster planning. Third we show offline planning algorithms that directly attempt to minimize the sum of planning and execution times. And, fourth we consider algorithms that plan online in parallel with execution. Both offline and online algorithms account for a user-specified preference between search and execution, and can greatly outperform the standard utility-oblivious techniques. By addressing the problem of planning under time pressure, these contributions demonstrate that heuristic search is no longer restricted to optimizing solution cost, obviating the need to choose between slow search times and expensive solutions
Off-Policy Deep Reinforcement Learning Algorithms for Handling Various Robotic Manipulator Tasks
In order to avoid conventional controlling methods which created obstacles
due to the complexity of systems and intense demand on data density, developing
modern and more efficient control methods are required. In this way,
reinforcement learning off-policy and model-free algorithms help to avoid
working with complex models. In terms of speed and accuracy, they become
prominent methods because the algorithms use their past experience to learn the
optimal policies. In this study, three reinforcement learning algorithms; DDPG,
TD3 and SAC have been used to train Fetch robotic manipulator for four
different tasks in MuJoCo simulation environment. All of these algorithms are
off-policy and able to achieve their desired target by optimizing both policy
and value functions. In the current study, the efficiency and the speed of
these three algorithms are analyzed in a controlled environment
- β¦