2,634 research outputs found

    Data association and occlusion handling for vision-based people tracking by mobile robots

    Get PDF
    This paper presents an approach for tracking multiple persons on a mobile robot with a combination of colour and thermal vision sensors, using several new techniques. First, an adaptive colour model is incorporated into the measurement model of the tracker. Second, a new approach for detecting occlusions is introduced, using a machine learning classifier for pairwise comparison of persons (classifying which one is in front of the other). Third, explicit occlusion handling is incorporated into the tracker. The paper presents a comprehensive, quantitative evaluation of the whole system and its different components using several real world data sets

    Improved data association and occlusion handling for vision-based people tracking by mobile robots

    Get PDF
    This paper presents an approach for tracking multiple persons using a combination of colour and thermal vision sensors on a mobile robot. First, an adaptive colour model is incorporated into the measurement model of the tracker. Second, a new approach for detecting occlusions is introduced, using a machine learning classifier for pairwise comparison of persons (classifying which one is in front of the other). Third, explicit occlusion handling is then incorporated into the tracker

    Pedestrian Models for Autonomous Driving Part I: Low-Level Models, from Sensing to Tracking

    Get PDF
    Abstract—Autonomous vehicles (AVs) must share space with pedestrians, both in carriageway cases such as cars at pedestrian crossings and off-carriageway cases such as delivery vehicles navigating through crowds on pedestrianized high-streets. Unlike static obstacles, pedestrians are active agents with complex, inter- active motions. Planning AV actions in the presence of pedestrians thus requires modelling of their probable future behaviour as well as detecting and tracking them. This narrative review article is Part I of a pair, together surveying the current technology stack involved in this process, organising recent research into a hierarchical taxonomy ranging from low-level image detection to high-level psychology models, from the perspective of an AV designer. This self-contained Part I covers the lower levels of this stack, from sensing, through detection and recognition, up to tracking of pedestrians. Technologies at these levels are found to be mature and available as foundations for use in high-level systems, such as behaviour modelling, prediction and interaction control

    Tracking by Prediction: A Deep Generative Model for Mutli-Person localisation and Tracking

    Full text link
    Current multi-person localisation and tracking systems have an over reliance on the use of appearance models for target re-identification and almost no approaches employ a complete deep learning solution for both objectives. We present a novel, complete deep learning framework for multi-person localisation and tracking. In this context we first introduce a light weight sequential Generative Adversarial Network architecture for person localisation, which overcomes issues related to occlusions and noisy detections, typically found in a multi person environment. In the proposed tracking framework we build upon recent advances in pedestrian trajectory prediction approaches and propose a novel data association scheme based on predicted trajectories. This removes the need for computationally expensive person re-identification systems based on appearance features and generates human like trajectories with minimal fragmentation. The proposed method is evaluated on multiple public benchmarks including both static and dynamic cameras and is capable of generating outstanding performance, especially among other recently proposed deep neural network based approaches.Comment: To appear in IEEE Winter Conference on Applications of Computer Vision (WACV), 201

    A lightweight method for detecting dynamic target occlusions by the robot body

    Full text link
    Robot vision is greatly affected by occlusions, which poses challenges to autonomous systems. The robot itself may hide targets of interest from the camera, while it moves within the field of view, leading to failures in task execution. For example, if a target of interest is partially occluded by the robot, detecting and grasping it correctly, becomes very challenging. To solve this problem, we propose a computationally lightweight method to determine the areas that the robot occludes. For this purpose, we use the Unified Robot Description Format (URDF) to generate a virtual depth image of the 3D robot model. Using the virtual depth image, we can effectively determine the partially occluded areas to improve the robustness of the information given by the perception system. Due to the real-time capabilities of the method, it can successfully detect occlusions of moving targets by the moving robot. We validate the effectiveness of the method in an experimental setup using a 6-DoF robot arm and an RGB-D camera by detecting and handling occlusions for two tasks: Pose estimation of a moving object for pickup and human tracking for robot handover. The code is available in \url{https://github.com/auth-arl/virtual\_depth\_image}.Comment: Submitted to RAAD 202

    Mobile Robot Navigation for Person Following in Indoor Environments

    Get PDF
    Service robotics is a rapidly growing area of interest in robotics research. Service robots inhabit human-populated environments and carry out specific tasks. The goal of this dissertation is to develop a service robot capable of following a human leader around populated indoor environments. A classification system for person followers is proposed such that it clearly defines the expected interaction between the leader and the robotic follower. In populated environments, the robot needs to be able to detect and identify its leader and track the leader through occlusions, a common characteristic of populated spaces. An appearance-based person descriptor, which augments the Kinect skeletal tracker, is developed and its performance in detecting and overcoming short and long-term leader occlusions is demonstrated. While following its leader, the robot has to ensure that it does not collide with stationary and moving obstacles, including other humans, in the environment. This requirement necessitates the use of a systematic navigation algorithm. A modified version of navigation function path planning, called the predictive fields path planner, is developed. This path planner models the motion of obstacles, uses a simplified representation of practical workspaces, and generates bounded, stable control inputs which guide the robot to its desired position without collisions with obstacles. The predictive fields path planner is experimentally verified on a non-person follower system and then integrated into the robot navigation module of the person follower system. To navigate the robot, it is necessary to localize it within its environment. A mapping approach based on depth data from the Kinect RGB-D sensor is used in generating a local map of the environment. The map is generated by combining inter-frame rotation and translation estimates based on scan generation and dead reckoning respectively. Thus, a complete mobile robot navigation system for person following in indoor environments is presented

    Human robot interaction in a crowded environment

    No full text
    Human Robot Interaction (HRI) is the primary means of establishing natural and affective communication between humans and robots. HRI enables robots to act in a way similar to humans in order to assist in activities that are considered to be laborious, unsafe, or repetitive. Vision based human robot interaction is a major component of HRI, with which visual information is used to interpret how human interaction takes place. Common tasks of HRI include finding pre-trained static or dynamic gestures in an image, which involves localising different key parts of the human body such as the face and hands. This information is subsequently used to extract different gestures. After the initial detection process, the robot is required to comprehend the underlying meaning of these gestures [3]. Thus far, most gesture recognition systems can only detect gestures and identify a person in relatively static environments. This is not realistic for practical applications as difficulties may arise from people‟s movements and changing illumination conditions. Another issue to consider is that of identifying the commanding person in a crowded scene, which is important for interpreting the navigation commands. To this end, it is necessary to associate the gesture to the correct person and automatic reasoning is required to extract the most probable location of the person who has initiated the gesture. In this thesis, we have proposed a practical framework for addressing the above issues. It attempts to achieve a coarse level understanding about a given environment before engaging in active communication. This includes recognizing human robot interaction, where a person has the intention to communicate with the robot. In this regard, it is necessary to differentiate if people present are engaged with each other or their surrounding environment. The basic task is to detect and reason about the environmental context and different interactions so as to respond accordingly. For example, if individuals are engaged in conversation, the robot should realize it is best not to disturb or, if an individual is receptive to the robot‟s interaction, it may approach the person. Finally, if the user is moving in the environment, it can analyse further to understand if any help can be offered in assisting this user. The method proposed in this thesis combines multiple visual cues in a Bayesian framework to identify people in a scene and determine potential intentions. For improving system performance, contextual feedback is used, which allows the Bayesian network to evolve and adjust itself according to the surrounding environment. The results achieved demonstrate the effectiveness of the technique in dealing with human-robot interaction in a relatively crowded environment [7]

    RGB-D datasets using microsoft kinect or similar sensors: a survey

    Get PDF
    RGB-D data has turned out to be a very useful representation of an indoor scene for solving fundamental computer vision problems. It takes the advantages of the color image that provides appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. In recent years, more and more RGB-D image/video datasets dedicated to various applications have become available, which are of great importance to benchmark the state-of-the-art. In this paper, we systematically survey popular RGB-D datasets for different applications including object recognition, scene classification, hand gesture recognition, 3D-simultaneous localization and mapping, and pose estimation. We provide the insights into the characteristics of each important dataset, and compare the popularity and the difficulty of those datasets. Overall, the main goal of this survey is to give a comprehensive description about the available RGB-D datasets and thus to guide researchers in the selection of suitable datasets for evaluating their algorithms
    corecore