2,161 research outputs found
Real-Time fusion of visual images and laser data images for safe navigation in outdoor environments
[EN]In recent years, two dimensional laser range finders mounted on vehicles is becoming a
fruitful solution to achieve safety and environment recognition requirements (Keicher &
Seufert, 2000), (Stentz et al., 2002), (DARPA, 2007). They provide real-time accurate range
measurements in large angular fields at a fixed height above the ground plane, and enable
robots and vehicles to perform more confidently a variety of tasks by fusing images from
visual cameras with range data (Baltzakis et al., 2003). Lasers have normally been used in
industrial surveillance applications to detect unexpected objects and persons in indoor
environments. In the last decade, laser range finder are moving from indoor to outdoor rural
and urban applications for 3D imaging (Yokota et al., 2004), vehicle guidance (Barawid et
al., 2007), autonomous navigation (Garcia-Pérez et al., 2008), and objects recognition and
classification (Lee & Ehsani, 2008), (Edan & Kondo, 2009), (Katz et al., 2010). Unlike
industrial applications, which deal with simple, repetitive and well-defined objects, cameralaser
systems on board off-road vehicles require advanced real-time techniques and
algorithms to deal with dynamic unexpected objects. Natural environments are complex
and loosely structured with great differences among consecutive scenes and scenarios.
Vision systems still present severe drawbacks, caused by lighting variability that depends
on unpredictable weather conditions. Camera-laser objects feature fusion and classification
is still a challenge within the paradigm of artificial perception and mobile robotics in
outdoor environments with the presence of dust, dirty, rain, and extreme temperature and
humidity. Real time relevant objects perception, task driven, is a main issue for subsequent
actions decision in safe unmanned navigation. In comparison with industrial automation
systems, the precision required in objects location is usually low, as it is the speed of most
rural vehicles that operate in bounded and low structured outdoor environments.
To this aim, current work is focused on the development of algorithms and strategies for
fusing 2D laser data and visual images, to accomplish real-time detection and classification
of unexpected objects close to the vehicle, to guarantee safe navigation. Next, class
information can be integrated within the global navigation architecture, in control modules,
such as, stop, obstacle avoidance, tracking or mapping.Section 2 includes a description of the commercial vehicle, robot-tractor DEDALO and the
vision systems on board. Section 3 addresses some drawbacks in outdoor perception.
Section 4 analyses the proposed laser data and visual images fusion method, focused in the
reduction of the visual image area to the region of interest wherein objects are detected by
the laser. Two methods of segmentation are described in Section 5, to extract the shorter area
of the visual image (ROI) resulting from the fusion process. Section 6 displays the colour
based classification results of the largest segmented object in the region of interest. Some
conclusions are outlined in Section 7, and acknowledgements and references are displayed
in Section 8 and Section 9.projects: CICYT- DPI-2006-14497 by the Science
and Innovation Ministry, ROBOCITY2030 I y II: Service Robots-PRICIT-CAM-P-DPI-000176-
0505, and SEGVAUTO: Vehicle Safety-PRICIT-CAM-S2009-DPI-1509 by Madrid State
Government.Peer reviewe
Human robot interaction in a crowded environment
Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3].
Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person.
Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]
Fusion of aerial images and sensor data from a ground vehicle for improved semantic mapping
This work investigates the use of semantic information to link ground level occupancy maps and aerial images. A ground level semantic map, which shows open ground and indicates the probability of cells being occupied by walls of buildings, is obtained by a mobile robot equipped with an omnidirectional camera, GPS and a laser range finder. This semantic information is used for local and global segmentation of an aerial image. The result is a map where the semantic information has been extended beyond the range of the robot sensors and predicts where the mobile robot can find buildings and potentially driveable ground
Depth Prediction Without the Sensors: Leveraging Structure for Unsupervised Learning from Monocular Videos
Learning to predict scene depth from RGB inputs is a challenging task both
for indoor and outdoor robot navigation. In this work we address unsupervised
learning of scene depth and robot ego-motion where supervision is provided by
monocular videos, as cameras are the cheapest, least restrictive and most
ubiquitous sensor for robotics.
Previous work in unsupervised image-to-depth learning has established strong
baselines in the domain. We propose a novel approach which produces higher
quality results, is able to model moving objects and is shown to transfer
across data domains, e.g. from outdoors to indoor scenes. The main idea is to
introduce geometric structure in the learning process, by modeling the scene
and the individual objects; camera ego-motion and object motions are learned
from monocular videos as input. Furthermore an online refinement method is
introduced to adapt learning on the fly to unknown domains.
The proposed approach outperforms all state-of-the-art approaches, including
those that handle motion e.g. through learned flow. Our results are comparable
in quality to the ones which used stereo as supervision and significantly
improve depth prediction on scenes and datasets which contain a lot of object
motion. The approach is of practical relevance, as it allows transfer across
environments, by transferring models trained on data collected for robot
navigation in urban scenes to indoor navigation settings. The code associated
with this paper can be found at https://sites.google.com/view/struct2depth.Comment: Thirty-Third AAAI Conference on Artificial Intelligence (AAAI'19
- …