Visual search of relevant targets in the environment is a crucial robot
skill. We propose a preliminary framework for the execution monitor of a robot
task, taking care of the robot attitude to visually searching the environment
for targets involved in the task. Visual search is also relevant to recover
from a failure. The framework exploits deep reinforcement learning to acquire a
"common sense" scene structure and it takes advantage of a deep convolutional
network to detect objects and relevant relations holding between them. The
framework builds on these methods to introduce a vision-based execution
monitoring, which uses classical planning as a backbone for task execution.
Experiments show that with the proposed vision-based execution monitor the
robot can complete simple tasks and can recover from failures in autonomy