207 research outputs found
Utilization and experimental evaluation of occlusion aware kernel correlation filter tracker using RGB-D
Unlike deep-learning which requires large training datasets, correlation filter-based trackers like Kernelized Correlation Filter (KCF) uses implicit properties of tracked images (circulant matrices) for training in real-time. Despite their practical application in tracking, a need for a better understanding of the fundamentals associated with KCF in terms of theoretically, mathematically, and experimentally exists. This thesis first details the workings prototype of the tracker and investigates its effectiveness in real-time applications and supporting visualizations. We further address some of the drawbacks of the tracker in cases of occlusions, scale changes, object rotation, out-of-view and model drift with our novel RGB-D Kernel Correlation tracker. We also study the use of particle filter to improve trackers\u27 accuracy. Our results are experimentally evaluated using a) standard dataset and b) real-time using Microsoft Kinect V2 sensor. We believe this work will set the basis for better understanding the effectiveness of kernel-based correlation filter trackers and to further define some of its possible advantages in tracking
Sobi: An Interactive Social Service Robot for Long-Term Autonomy in Open Environments
Long-term autonomy in service robotics is a current research topic, especially for dynamic, large-scale environments that change over time. We present Sobi, a mobile service robot developed as an interactive guide for open environments, such as public places with indoor and outdoor areas. The robot will serve as a platform for environmental modeling and human-robot interaction. Its main hardware and software components, which we freely license as a documented open source project, are presented. Another key focus is Sobi’s monitoring system for long-term autonomy, which restores system components in a targeted manner in order to extend the total system lifetime without unplanned intervention. We demonstrate first results of the long-term autonomous capabilities in a 16-day indoor deployment, in which the robot patrols a total of 66.6 km with an average of 5.5 hours of travel time per weekday, charging autonomously in between. In a user study with 12 participants, we evaluate the appearance and usability of the user interface, which allows users to interactively query information about the environment and directions.© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works
Robot Mapping and Navigation in Real-World Environments
Robots can perform various tasks, such as mapping hazardous sites, taking part in search-and-rescue scenarios, or delivering goods and people. Robots operating in the real world face many challenges on the way to the completion of their mission. Essential capabilities required for the operation of such robots are mapping, localization and navigation. Solving all of these tasks robustly presents a substantial difficulty as these components are usually interconnected, i.e., a robot that starts without any knowledge about the environment must simultaneously build a map, localize itself in it, analyze the surroundings and plan a path to efficiently explore an unknown environment. In addition to the interconnections between these tasks, they highly depend on the sensors used by the robot and on the type of the environment in which the robot operates. For example, an RGB camera can be used in an outdoor scene for computing visual odometry, or to detect dynamic objects but becomes less useful in an environment that does not have enough light for cameras to operate. The software that controls the behavior of the robot must seamlessly process all the data coming from different sensors. This often leads to systems that are tailored to a particular robot and a particular set of sensors. In this thesis, we challenge this concept by developing and implementing methods for a typical robot navigation pipeline that can work with different types of the sensors seamlessly both, in indoor and outdoor environments. With the emergence of new range-sensing RGBD and LiDAR sensors, there is an opportunity to build a single system that can operate robustly both in indoor and outdoor environments equally well and, thus, extends the application areas of mobile robots. The techniques presented in this thesis aim to be used with both RGBD and LiDAR sensors without adaptations for individual sensor models by using range image representation and aim to provide methods for navigation and scene interpretation in both static and dynamic environments. For a static world, we present a number of approaches that address the core components of a typical robot navigation pipeline. At the core of building a consistent map of the environment using a mobile robot lies point cloud matching. To this end, we present a method for photometric point cloud matching that treats RGBD and LiDAR sensors in a uniform fashion and is able to accurately register point clouds at the frame rate of the sensor. This method serves as a building block for the further mapping pipeline. In addition to the matching algorithm, we present a method for traversability analysis of the currently observed terrain in order to guide an autonomous robot to the safe parts of the surrounding environment. A source of danger when navigating difficult to access sites is the fact that the robot may fail in building a correct map of the environment. This dramatically impacts the ability of an autonomous robot to navigate towards its goal in a robust way, thus, it is important for the robot to be able to detect these situations and to find its way home not relying on any kind of map. To address this challenge, we present a method for analyzing the quality of the map that the robot has built to date, and safely returning the robot to the starting point in case the map is found to be in an inconsistent state. The scenes in dynamic environments are vastly different from the ones experienced in static ones. In a dynamic setting, objects can be moving, thus making static traversability estimates not enough. With the approaches developed in this thesis, we aim at identifying distinct objects and tracking them to aid navigation and scene understanding. We target these challenges by providing a method for clustering a scene taken with a LiDAR scanner and a measure that can be used to determine if two clustered objects are similar that can aid the tracking performance. All methods presented in this thesis are capable of supporting real-time robot operation, rely on RGBD or LiDAR sensors and have been tested on real robots in real-world environments and on real-world datasets. All approaches have been published in peer-reviewed conference papers and journal articles. In addition to that, most of the presented contributions have been released publicly as open source software
Recommended from our members
Recognizing human activity using RGBD data
textTraditional computer vision algorithms try to understand the world using visible light cameras. However, there are inherent limitations of this type of data source. First, visible light images are sensitive to illumination changes and background clutter. Second, the 3D structural information of the scene is lost when projecting the 3D world to 2D images. Recovering the 3D information from 2D images is a challenging problem. Range sensors have existed for over thirty years, which capture 3D characteristics of the scene. However, earlier range sensors were either too expensive, difficult to use in human environments, slow at acquiring data, or provided a poor estimation of distance. Recently, the easy access to the RGBD data at real-time frame rate is leading to a revolution in perception and inspired many new research using RGBD data. I propose algorithms to detect persons and understand the activities using RGBD data. I demonstrate the solutions to many computer vision problems may be improved with the added depth channel. The 3D structural information may give rise to algorithms with real-time and view-invariant properties in a faster and easier fashion. When both data sources are available, the features extracted from the depth channel may be combined with traditional features computed from RGB channels to generate more robust systems with enhanced recognition abilities, which may be able to deal with more challenging scenarios. As a starting point, the first problem is to find the persons of various poses in the scene, including moving or static persons. Localizing humans from RGB images is limited by the lighting conditions and background clutter. Depth image gives alternative ways to find the humans in the scene. In the past, detection of humans from range data is usually achieved by tracking, which does not work for indoor person detection. In this thesis, I propose a model based approach to detect the persons using the structural information embedded in the depth image. I propose a 2D head contour model and a 3D head surface model to look for the head-shoulder part of the person. Then, a segmentation scheme is proposed to segment the full human body from the background and extract the contour. I also give a tracking algorithm based on the detection result. I further research on recognizing human actions and activities. I propose two features for recognizing human activities. The first feature is drawn from the skeletal joint locations estimated from a depth image. It is a compact representation of the human posture called histograms of 3D joint locations (HOJ3D). This representation is view-invariant and the whole algorithm runs at real-time. This feature may benefit many applications to get a fast estimation of the posture and action of the human subject. The second feature is a spatio-temporal feature for depth video, which is called Depth Cuboid Similarity Feature (DCSF). The interest points are extracted using an algorithm that effectively suppresses the noise and finds salient human motions. DCSF is extracted centered on each interest point, which forms the description of the video contents. This descriptor can be used to recognize the activities with no dependence on skeleton information or pre-processing steps such as motion segmentation, tracking, or even image de-noising or hole-filling. It is more flexible and widely applicable to many scenarios. Finally, all the features herein developed are combined to solve a novel problem: first-person human activity recognition using RGBD data. Traditional activity recognition algorithms focus on recognizing activities from a third-person perspective. I propose to recognize activities from a first-person perspective with RGBD data. This task is very novel and extremely challenging due to the large amount of camera motion either due to self exploration or the response of the interaction. I extracted 3D optical flow features as the motion descriptor, 3D skeletal joints features as posture descriptors, spatio-temporal features as local appearance descriptors to describe the first-person videos. To address the ego-motion of the camera, I propose an attention mask to guide the recognition procedures and separate the features on the ego-motion region and independent-motion region. The 3D features are very useful at summarizing the discerning information of the activities. In addition, the combination of the 3D features with existing 2D features brings more robust recognition results and make the algorithm capable of dealing with more challenging cases.Electrical and Computer Engineerin
Multiple human tracking in RGB-depth data: A survey
© The Institution of Engineering and Technology. Multiple human tracking (MHT) is a fundamental task in many computer vision applications. Appearance-based approaches, primarily formulated on RGB data, are constrained and affected by problems arising from occlusions and/or illumination variations. In recent years, the arrival of cheap RGB-depth devices has led to many new approaches to MHT, and many of these integrate colour and depth cues to improve each and every stage of the process. In this survey, the authors present the common processing pipeline of these methods and review their methodology based (a) on how they implement this pipeline and (b) on what role depth plays within each stage of it. They identify and introduce existing, publicly available, benchmark datasets and software resources that fuse colour and depth data for MHT. Finally, they present a brief comparative evaluation of the performance of those works that have applied their methods to these datasets
Automatic and adaptable registration of live RGBD video streams sharing partially overlapping views
In this thesis, we introduce DeReEs-4v, an algorithm for unsupervised and automatic
registration of two video frames captured depth-sensing cameras. DeReEs-4V receives
two RGBD video streams from two depth-sensing cameras arbitrary located in an
indoor space that share a minimum amount of 25% overlap between their captured
scenes. The motivation of this research is to employ multiple depth-sensing cameras
to enlarge the field of view and acquire a more complete and accurate 3D information
of the environment. A typical way to combine multiple views from different cameras
is through manual calibration. However, this process is time-consuming and may
require some technical knowledge. Moreover, calibration has to be repeated when the
location or position of the cameras change.
In this research, we demonstrate how DeReEs-4V registration can be used to find
the transformation of the view of one camera with respect to the other at interactive
rates. Our algorithm automatically finds the 3D transformation to match the
views from two cameras, requires no human interference, and is robust to camera
movements while capturing. To validate this approach, a thorough examination of
the system performance under different scenarios is presented. The system presented
here supports any application that might benefit from the wider field-of-view provided
by the combined scene from both cameras, including applications in 3D telepresence,
gaming, people tracking, videoconferencing and computer vision
Camera Marker Networks for Pose Estimation and Scene Understanding in Construction Automation and Robotics.
The construction industry faces challenges that include high workplace injuries and fatalities, stagnant productivity, and skill shortage. Automation and Robotics in Construction (ARC) has been proposed in the literature as a potential solution that makes machinery easier to collaborate with, facilitates better decision-making, or enables autonomous behavior. However, there are two primary technical challenges in ARC: 1) unstructured and featureless environments; and 2) differences between the as-designed and the as-built. It is therefore impossible to directly replicate conventional automation methods adopted in industries such as manufacturing on construction sites. In particular, two fundamental problems, pose estimation and scene understanding, must be addressed to realize the full potential of ARC.
This dissertation proposes a pose estimation and scene understanding framework that addresses the identified research gaps by exploiting cameras, markers, and planar structures to mitigate the identified technical challenges. A fast plane extraction algorithm is developed for efficient modeling and understanding of built environments. A marker registration algorithm is designed for robust, accurate, cost-efficient, and rapidly reconfigurable pose estimation in unstructured and featureless environments. Camera marker networks are then established for unified and systematic design, estimation, and uncertainty analysis in larger scale applications.
The proposed algorithms' efficiency has been validated through comprehensive experiments. Specifically, the speed, accuracy and robustness of the fast plane extraction and the marker registration have been demonstrated to be superior to existing state-of-the-art algorithms. These algorithms have also been implemented in two groups of ARC applications to demonstrate the proposed framework's effectiveness, wherein the applications themselves have significant social and economic value. The first group is related to in-situ robotic machinery, including an autonomous manipulator for assembling digital architecture designs on construction sites to help improve productivity and quality; and an intelligent guidance and monitoring system for articulated machinery such as excavators to help improve safety. The second group emphasizes human-machine interaction to make ARC more effective, including a mobile Building Information Modeling and way-finding platform with discrete location recognition to increase indoor facility management efficiency; and a 3D scanning and modeling solution for rapid and cost-efficient dimension checking and concise as-built modeling.PHDCivil EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113481/1/cforrest_1.pd
Robust perception of humans for mobile robots RGB-depth algorithms for people tracking, re-identification and action recognition
Human perception is one of the most important skills for a mobile robot sharing its workspace with humans.
This is not only true for navigation, because people have to be avoided differently than other obstacles, but also because mobile robots must be able to truly interact with humans.
In a near future, we can imagine that robots will be more and more present in every house and will perform services useful to the well-being of humans.
For this purpose, robust people tracking algorithms must be exploited and person re-identification techniques play an important role for allowing robots to recognize a person after a full occlusion or after long periods of time.
Moreover, they must be able to recognize what humans are doing, in order to react accordingly, helping them if needed or also learning from them.
This thesis tackles these problems by proposing approaches which combine algorithms based on both RGB and depth information which can be obtained with recently introduced consumer RGB-D sensors.
Our key contribution to people detection and tracking research is a depth-clustering method which allows to apply a robust image-based people detector only to a small subset of possible detection windows, thus decreasing the number of false detections while reaching high computational efficiency.
We also advance person re-identification research by proposing two techniques exploiting depth-based skeletal tracking algorithms: one is targeted to short-term re-identification and creates a compact, yet discrimative signature of people based on computing features at skeleton keypoints, which are highly repeatable and semantically meaningful; the other extract long-term features, such as 3D shape, to compare people by matching the corresponding 3D point cloud acquired with a RGB-D sensor. In order to account for the fact that people are articulated and not rigid objects, it exploits 3D skeleton information for warping people point clouds to a standard pose, thus making them directly comparable by means of least square fitting.
Finally, we describe an extension of flow-based action recognition methods to the RGB-D domain which computes motion over time of persons' 3D points by exploiting joint color and depth information and recognizes human actions by classifying gridded descriptors of 3D flow.
A further contribution of this thesis is the creation of a number of new RGB-D datasets which allow to compare different algorithms on data acquired by consumer RGB-D sensors. All these datasets have been publically released in order to foster research in these fields
- …