961 research outputs found
Occlusion-Robust MVO: Multimotion Estimation Through Occlusion Via Motion Closure
Visual motion estimation is an integral and well-studied challenge in
autonomous navigation. Recent work has focused on addressing multimotion
estimation, which is especially challenging in highly dynamic environments.
Such environments not only comprise multiple, complex motions but also tend to
exhibit significant occlusion.
Previous work in object tracking focuses on maintaining the integrity of
object tracks but usually relies on specific appearance-based descriptors or
constrained motion models. These approaches are very effective in specific
applications but do not generalize to the full multimotion estimation problem.
This paper presents a pipeline for estimating multiple motions, including the
camera egomotion, in the presence of occlusions. This approach uses an
expressive motion prior to estimate the SE (3) trajectory of every motion in
the scene, even during temporary occlusions, and identify the reappearance of
motions through motion closure. The performance of this occlusion-robust
multimotion visual odometry (MVO) pipeline is evaluated on real-world data and
the Oxford Multimotion Dataset.Comment: To appear at the 2020 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS). An earlier version of this work first
appeared at the Long-term Human Motion Planning Workshop (ICRA 2019). 8
pages, 5 figures. Video available at
https://www.youtube.com/watch?v=o_N71AA6FR
Three hypothesis algorithm with occlusion reasoning for multiple people tracking
This work proposes a detection-based tracking algorithm able to locate and keep the identity of multiple
people, who may be occluded, in uncontrolled stationary environments. Our algorithm builds a tracking graph
that models spatio-temporal relationships among attributes of interacting people to predict and resolve partial
and total occlusions. When a total occlusion occurs, the algorithm generates various hypotheses about the
location of the occluded person considering three cases: (a) the person keeps the same direction and speed,
(b) the person follows the direction and speed of the occluder, and (c) the person remains motionless during
occlusion. By analyzing the graph, our algorithm can detect trajectories produced by false alarms and estimate
the location of missing or occluded people. Our algorithm performs acceptably under complex conditions, such
as partial visibility of individuals getting inside or outside the scene, continuous interactions and occlusions
among people, wrong or missing information on the detection of persons, as well as variation of the person’s
appearance due to illumination changes and background-clutter distracters. Our algorithm was evaluated on
test sequences in the field of intelligent surveillance achieving an overall precision of 93%. Results show
that our tracking algorithm outperforms even trajectory-based state-of-the-art algorithms
RGBD Datasets: Past, Present and Future
Since the launch of the Microsoft Kinect, scores of RGBD datasets have been
released. These have propelled advances in areas from reconstruction to gesture
recognition. In this paper we explore the field, reviewing datasets across
eight categories: semantics, object pose estimation, camera tracking, scene
reconstruction, object tracking, human actions, faces and identification. By
extracting relevant information in each category we help researchers to find
appropriate data for their needs, and we consider which datasets have succeeded
in driving computer vision forward and why.
Finally, we examine the future of RGBD datasets. We identify key areas which
are currently underexplored, and suggest that future directions may include
synthetic data and dense reconstructions of static and dynamic scenes.Comment: 8 pages excluding references (CVPR style
Multi-Object Tracking and Segmentation via Neural Message Passing
Graphs offer a natural way to formulate Multiple Object Tracking (MOT) and
Multiple Object Tracking and Segmentation (MOTS) within the
tracking-by-detection paradigm. However, they also introduce a major challenge
for learning methods, as defining a model that can operate on such structured
domain is not trivial. In this work, we exploit the classical network flow
formulation of MOT to define a fully differentiable framework based on Message
Passing Networks (MPNs). By operating directly on the graph domain, our method
can reason globally over an entire set of detections and exploit contextual
features. It then jointly predicts both final solutions for the data
association problem and segmentation masks for all objects in the scene while
exploiting synergies between the two tasks. We achieve state-of-the-art results
for both tracking and segmentation in several publicly available datasets. Our
code is available at github.com/ocetintas/MPNTrackSeg.Comment: arXiv admin note: substantial text overlap with arXiv:1912.0751
Fruit Detection and Tree Segmentation for Yield Mapping in Orchards
Accurate information gathering and processing is critical for precision horticulture, as growers aim to optimise their farm management practices. An accurate inventory of the crop that details its spatial distribution along with health and maturity, can help farmers efficiently target processes such as chemical and fertiliser spraying, crop thinning, harvest management, labour planning and marketing. Growers have traditionally obtained this information by using manual sampling techniques, which tend to be labour intensive, spatially sparse, expensive, inaccurate and prone to subjective biases. Recent advances in sensing and automation for field robotics allow for key measurements to be made for individual plants throughout an orchard in a timely and accurate manner. Farmer operated machines or unmanned robotic platforms can be equipped with a range of sensors to capture a detailed representation over large areas. Robust and accurate data processing techniques are therefore required to extract high level information needed by the grower to support precision farming. This thesis focuses on yield mapping in orchards using image and light detection and ranging (LiDAR) data captured using an unmanned ground vehicle (UGV). The contribution is the framework and algorithmic components for orchard mapping and yield estimation that is applicable to different fruit types and orchard configurations. The framework includes detection of fruits in individual images and tracking them over subsequent frames. The fruit counts are then associated to individual trees, which are segmented from image and LiDAR data, resulting in a structured spatial representation of yield. The first contribution of this thesis is the development of a generic and robust fruit detection algorithm. Images captured in the outdoor environment are susceptible to highly variable external factors that lead to significant appearance variations. Specifically in orchards, variability is caused by changes in illumination, target pose, tree types, etc. The proposed techniques address these issues by using state-of-the-art feature learning approaches for image classification, while investigating the utility of orchard domain knowledge for fruit detection. Detection is performed using both pixel-wise classification of images followed instance segmentation, and bounding-box regression approaches. The experimental results illustrate the versatility of complex deep learning approaches over a multitude of fruit types. The second contribution of this thesis is a tree segmentation approach to detect the individual trees that serve as a standard unit for structured orchard information systems. The work focuses on trellised trees, which present unique challenges for segmentation algorithms due to their intertwined nature. LiDAR data are used to segment the trellis face, and to generate proposals for individual trees trunks. Additional trunk proposals are provided using pixel-wise classification of the image data. The multi-modal observations are fine-tuned by modelling trunk locations using a hidden semi-Markov model (HSMM), within which prior knowledge of tree spacing is incorporated. The final component of this thesis addresses the visual occlusion of fruit within geometrically complex canopies by using a multi-view detection and tracking approach. Single image fruit detections are tracked over a sequence of images, and associated to individual trees or farm rows, with the spatial distribution of the fruit counting forming a yield map over the farm. The results show the advantage of using multi-view imagery (instead of single view analysis) for fruit counting and yield mapping. This thesis includes extensive experimentation in almond, apple and mango orchards, with data captured by a UGV spanning a total of 5 hectares of farm area, over 30 km of vehicle traversal and more than 7,000 trees. The validation of the different processes is performed using manual annotations, which includes fruit and tree locations in image and LiDAR data respectively. Additional evaluation of yield mapping is performed by comparison against fruit counts on trees at the farm and counts made by the growers post-harvest. The framework developed in this thesis is demonstrated to be accurate compared to ground truth at all scales of the pipeline, including fruit detection and tree mapping, leading to accurate yield estimation, per tree and per row, for the different crops. Through the multitude of field experiments conducted over multiple seasons and years, the thesis presents key practical insights necessary for commercial development of an information gathering system in orchards
- …