15 research outputs found
Towards binocular active vision in a robot head system
This paper presents the first results of an investigation and pilot study into an active, binocular vision system that combines binocular vergence, object recognition and attention control in a unified framework. The prototype developed is capable of identifying, targeting, verging on and recognizing objects in a highly-cluttered scene without the need for calibration or other knowledge of the camera geometry. This is achieved by implementing all image analysis in a symbolic space without creating explicit pixel-space maps. The system structure is based on the ‘searchlight metaphor’ of biological systems. We present results of a first pilot investigation that yield a maximum vergence error of 6.4 pixels, while seven of nine known objects were recognized in a high-cluttered environment. Finally a “stepping stone” visual search strategy was demonstrated, taking a total of 40 saccades to find two known objects in the workspace, neither of which appeared simultaneously within the Field of View resulting from any individual saccade
On the Challenges of Open World Recognitionunder Shifting Visual Domains
Robotic visual systems operating in the wild must act in unconstrained
scenarios, under different environmental conditions while facing a variety of
semantic concepts, including unknown ones. To this end, recent works tried to
empower visual object recognition methods with the capability to i) detect
unseen concepts and ii) extended their knowledge over time, as images of new
semantic classes arrive. This setting, called Open World Recognition (OWR), has
the goal to produce systems capable of breaking the semantic limits present in
the initial training set. However, this training set imposes to the system not
only its own semantic limits, but also environmental ones, due to its bias
toward certain acquisition conditions that do not necessarily reflect the high
variability of the real-world. This discrepancy between training and test
distribution is called domain-shift. This work investigates whether OWR
algorithms are effective under domain-shift, presenting the first benchmark
setup for assessing fairly the performances of OWR algorithms, with and without
domain-shift. We then use this benchmark to conduct analyses in various
scenarios, showing how existing OWR algorithms indeed suffer a severe
performance degradation when train and test distributions differ. Our analysis
shows that this degradation is only slightly mitigated by coupling OWR with
domain generalization techniques, indicating that the mere plug-and-play of
existing algorithms is not enough to recognize new and unknown categories in
unseen domains. Our results clearly point toward open issues and future
research directions, that need to be investigated for building robot visual
systems able to function reliably under these challenging yet very real
conditions. Code available at
https://github.com/DarioFontanel/OWR-VisualDomainsComment: RAL/ICRA 202
ROVIS: RObust Machine VIsion for Service Robotic System FRIEND
Abstract-In this paper the vision architecture, named ROVIS, of the robotic system FRIEND is presented. The main concept of the ROVIS is the inclusion of feedback structures between different components of the vision system as well as between the vision and other modules of the robotic system to achieve high robustness against external influences of the individual system units as well as of the system as whole. The novelty of this work lies in the inclusion of feedback control at different levels of the 2D object recognition system to provide reliable inputs to the 3D object reconstruction and object manipulation modules of the robotic system FRIEND. The idea behind this approach is to change the processing parameters in a closed-loop manner so that the current image processing result at a particular processing level is driven to a desired result. The effectiveness of the ROVIS system is demonstrated through the presentation of experimental results on 3D reconstruction of different objects from FRIEND environment
Autonomous navigation for guide following in crowded indoor environments
The requirements for assisted living are rapidly changing as the number of elderly
patients over the age of 60 continues to increase. This rise places a high level of stress on
nurse practitioners who must care for more patients than they are capable. As this trend is
expected to continue, new technology will be required to help care for patients. Mobile
robots present an opportunity to help alleviate the stress on nurse practitioners by
monitoring and performing remedial tasks for elderly patients. In order to produce
mobile robots with the ability to perform these tasks, however, many challenges must be
overcome.
The hospital environment requires a high level of safety to prevent patient injury. Any
facility that uses mobile robots, therefore, must be able to ensure that no harm will come
to patients whilst in a care environment. This requires the robot to build a high level of
understanding about the environment and the people with close proximity to the robot.
Hitherto, most mobile robots have used vision-based sensors or 2D laser range finders.
3D time-of-flight sensors have recently been introduced and provide dense 3D point
clouds of the environment at real-time frame rates. This provides mobile robots with
previously unavailable dense information in real-time. I investigate the use of time-of-flight
cameras for mobile robot navigation in crowded environments in this thesis. A
unified framework to allow the robot to follow a guide through an indoor environment
safely and efficiently is presented. Each component of the framework is analyzed in
detail, with real-world scenarios illustrating its practical use.
Time-of-flight cameras are relatively new sensors and, therefore, have inherent problems
that must be overcome to receive consistent and accurate data. I propose a novel and
practical probabilistic framework to overcome many of the inherent problems in this
thesis. The framework fuses multiple depth maps with color information forming a
reliable and consistent view of the world. In order for the robot to interact with the
environment, contextual information is required. To this end, I propose a region-growing
segmentation algorithm to group points based on surface characteristics, surface normal
and surface curvature. The segmentation process creates a distinct set of surfaces,
however, only a limited amount of contextual information is available to allow for
interaction. Therefore, a novel classifier is proposed using spherical harmonics to
differentiate people from all other objects.
The added ability to identify people allows the robot to find potential candidates to
follow. However, for safe navigation, the robot must continuously track all visible
objects to obtain positional and velocity information. A multi-object tracking system is
investigated to track visible objects reliably using multiple cues, shape and color. The
tracking system allows the robot to react to the dynamic nature of people by building an
estimate of the motion flow. This flow provides the robot with the necessary information
to determine where and at what speeds it is safe to drive. In addition, a novel search
strategy is proposed to allow the robot to recover a guide who has left the field-of-view.
To achieve this, a search map is constructed with areas of the environment ranked
according to how likely they are to reveal the guide’s true location. Then, the robot can
approach the most likely search area to recover the guide. Finally, all components
presented are joined to follow a guide through an indoor environment. The results
achieved demonstrate the efficacy of the proposed components
Desenvolvimento de uma biblioteca para sistemas de visão estereoscópia para robótica móvel
Dissertação (mestrado) - Universidade Federal de Santa Catarina, Centro Tecnológico. Programa de Pós-Graduação em Engenharia Elétrica.A demanda por aplicações de robótica móvel vem crescendo consideravelmente nos últimos anos. Independente da natureza ou fim, robôs autônomos móveis devem interagir com o mundo para alcançar seus objetivos. Para isto, de alguma maneira precisam obter informações a respeito do ambiente. Dentre diferentes abordagens existentes, bons resultados têm sido alcançados pelo emprego de sistemas de visão estereoscópica. Através de um sistema como este é possível extrair informação tridimensional do meio em que o robô está inserido. A informação tridimensional pode então ser usada para orientar as ações do robô, seja para navegação, reconhecimento ou manipulação. No presente trabalho é apresentado o desenvolvimento de uma biblioteca para sistemas de visão estereoscópica para robótica móvel. Para tal, foram tratados diferentes problemas da estereoscopia, procurando prover mapas de profundidade com detalhes do ambiente suficientes para a operação geral de um robô móvel que tenha um sistema de visão estereoscópica como principal fonte de informação. Neste contexto, são avaliados e propostos modelos, métodos e soluções para diferentes problemas como calibração de câmeras, retificação de imagens, reconstrução e, principalmente, geração de mapas de profundidade densos. Os resultados obtidos demonstram a efetividade da utilização da infra-estrutura disponibilizada e dos métodos propostos no desenvolvimento de aplicações para robótica móvel