The current transformation towards smart manufacturing has led to a growing
demand for human-robot collaboration (HRC) in the manufacturing process.
Perceiving and understanding the human co-worker's behaviour introduces
challenges for collaborative robots to efficiently and effectively perform
tasks in unstructured and dynamic environments. Integrating recent data-driven
machine vision capabilities into HRC systems is a logical next step in
addressing these challenges. However, in these cases, off-the-shelf components
struggle due to generalisation limitations. Real-world evaluation is required
in order to fully appreciate the maturity and robustness of these approaches.
Furthermore, understanding the pure-vision aspects is a crucial first step
before combining multiple modalities in order to understand the limitations. In
this paper, we propose GoferBot, a novel vision-based semantic HRC system for a
real-world assembly task. It is composed of a visual servoing module that
reaches and grasps assembly parts in an unstructured multi-instance and dynamic
environment, an action recognition module that performs human action prediction
for implicit communication, and a visual handover module that uses the
perceptual understanding of human behaviour to produce an intuitive and
efficient collaborative assembly experience. GoferBot is a novel assembly
system that seamlessly integrates all sub-modules by utilising implicit
semantic information purely from visual perception