3,213 research outputs found
Latent-Class Hough Forests for 3D object detection and pose estimation of rigid objects
In this thesis we propose a novel framework, Latent-Class Hough Forests, for the problem of 3D object detection and pose estimation in heavily cluttered and occluded scenes. Firstly, we adapt the state-of-the-art template-based representation, LINEMOD [34, 36], into a scale-invariant patch descriptor and integrate it into a regression forest using a novel template-based split function. In training, rather than explicitly collecting representative negative samples, our method is trained on positive samples only and we treat the class distributions at the leaf nodes as latent variables. During the inference process we iteratively update these distributions, providing accurate estimation of background clutter and foreground occlusions and thus a better detection rate. Furthermore, as a by-product, the latent class distributions can provide accurate occlusion aware segmentation masks, even in the multi-instance scenario. In addition to an existing public dataset, which contains only single-instance sequences with large amounts of clutter, we have collected a new, more challenging, dataset for multiple-instance detection containing heavy 2D and 3D clutter as well as foreground occlusions. We evaluate the Latent-Class Hough Forest on both of these datasets where we outperform state-of-the art methods.Open Acces
Challenges for Monocular 6D Object Pose Estimation in Robotics
Object pose estimation is a core perception task that enables, for example,
object grasping and scene understanding. The widely available, inexpensive and
high-resolution RGB sensors and CNNs that allow for fast inference based on
this modality make monocular approaches especially well suited for robotics
applications. We observe that previous surveys on object pose estimation
establish the state of the art for varying modalities, single- and multi-view
settings, and datasets and metrics that consider a multitude of applications.
We argue, however, that those works' broad scope hinders the identification of
open challenges that are specific to monocular approaches and the derivation of
promising future challenges for their application in robotics. By providing a
unified view on recent publications from both robotics and computer vision, we
find that occlusion handling, novel pose representations, and formalizing and
improving category-level pose estimation are still fundamental challenges that
are highly relevant for robotics. Moreover, to further improve robotic
performance, large object sets, novel objects, refractive materials, and
uncertainty estimates are central, largely unsolved open challenges. In order
to address them, ontological reasoning, deformability handling, scene-level
reasoning, realistic datasets, and the ecological footprint of algorithms need
to be improved.Comment: arXiv admin note: substantial text overlap with arXiv:2302.1182
Simultaneous Object Recognition and Segmentation from Single or Multiple Model Views
We present a novel Object Recognition approach based on affine invariant regions. It actively counters the problems related to the limited repeatability of the region detectors, and the difficulty of matching, in the presence of large amounts of background clutter and particularly challenging viewing conditions. After producing an initial set of matches, the method gradually explores the surrounding image areas, recursively constructing more and more matching regions, increasingly farther from the initial ones. This process covers the object with matches, and simultaneously separates the correct matches from the wrong ones. Hence, recognition and segmentation are achieved at the same time. The approach includes a mechanism for capturing the relationships between multiple model views and exploiting these for integrating the contributions of the views at recognition time. This is based on an efficient algorithm for partitioning a set of region matches into groups lying on smooth surfaces. Integration is achieved by measuring the consistency of configurations of groups arising from different model views. Experimental results demonstrate the stronger power of the approach in dealing with extensive clutter, dominant occlusion, and large scale and viewpoint changes. Non-rigid deformations are explicitly taken into account, and the approximative contours of the object are produced. All presented techniques can extend any view-point invariant feature extracto
Multisensor Poisson Multi-Bernoulli Filter for Joint Target-Sensor State Tracking
In a typical multitarget tracking (MTT) scenario, the sensor state is either
assumed known, or tracking is performed in the sensor's (relative) coordinate
frame. This assumption does not hold when the sensor, e.g., an automotive
radar, is mounted on a vehicle, and the target state should be represented in a
global (absolute) coordinate frame. Then it is important to consider the
uncertain location of the vehicle on which the sensor is mounted for MTT. In
this paper, we present a multisensor low complexity Poisson multi-Bernoulli MTT
filter, which jointly tracks the uncertain vehicle state and target states.
Measurements collected by different sensors mounted on multiple vehicles with
varying location uncertainty are incorporated sequentially based on the arrival
of new sensor measurements. In doing so, targets observed from a sensor mounted
on a well-localized vehicle reduce the state uncertainty of other poorly
localized vehicles, provided that a common non-empty subset of targets is
observed. A low complexity filter is obtained by approximations of the joint
sensor-feature state density minimizing the Kullback-Leibler divergence (KLD).
Results from synthetic as well as experimental measurement data, collected in a
vehicle driving scenario, demonstrate the performance benefits of joint
vehicle-target state tracking.Comment: 13 pages, 7 figure
Recommended from our members
Video content analysis for automated detection and tracking of humans in CCTV surveillance applications
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The problems of achieving high detection rate with low false alarm rate for human detection and tracking in video sequence, performance scalability, and improving response time are addressed in this thesis. The underlying causes are the effect of scene complexity, human-to-human interactions, scale changes, and scene background-human interactions. A two-stage processing solution, namely, human detection, and human tracking with two novel pattern classifiers is presented. Scale independent human detection is achieved by processing in the wavelet domain using square wavelet features. These features used to characterise human silhouettes at different scales are similar to rectangular features used in [Viola 2001]. At the detection stage two detectors are combined to improve detection rate. The first detector is based on shape-outline of humans extracted from the scene using a reduced complexity outline extraction algorithm. A Shape mismatch measure is used to differentiate between the human and the background class. The second detector uses rectangular features as primitives for silhouette description in the wavelet domain. The marginal distribution of features collocated at a particular position on a candidate human (a patch of the image) is used to describe statistically the silhouette. Two similarity measures are computed between a candidate human and the model histograms of human and non human classes. The similarity measure is used to discriminate between the human and the non human class. At the tracking stage, a tracker based on joint probabilistic data association filter (JPDAF) for data association, and motion correspondence is presented. Track clustering is used to reduce hypothesis enumeration complexity. Towards improving response time with increase in frame dimension, scene complexity, and number of channels; a scalable algorithmic architecture and operating accuracy prediction technique is presented. A scheduling strategy for improving the response time and throughput by parallel processing is also presented
Efficient Belief Propagation for Perception and Manipulation in Clutter
Autonomous service robots are required to perform tasks in common human indoor environments. To achieve goals associated with these tasks, the robot should continually perceive, reason its environment, and plan to manipulate objects, which we term as goal-directed manipulation. Perception remains the most challenging aspect of all stages, as common indoor environments typically pose problems in recognizing objects under inherent occlusions with physical interactions among themselves. Despite recent progress in the field of robot perception, accommodating perceptual uncertainty due to partial observations remains challenging and needs to be addressed to achieve the desired autonomy.
In this dissertation, we address the problem of perception under uncertainty for robot manipulation in cluttered environments using generative inference methods. Specifically, we aim to enable robots to perceive partially observable environments by maintaining an approximate probability distribution as a belief over possible scene hypotheses. This belief representation captures uncertainty resulting from inter-object occlusions and physical interactions, which are inherently present in clutterred indoor environments. The research efforts presented in this thesis are towards developing appropriate state representations and inference techniques to generate and maintain such belief over contextually plausible scene states. We focus on providing the following features to generative inference while addressing the challenges due to occlusions: 1) generating and maintaining plausible scene hypotheses, 2) reducing the inference search space that typically grows exponentially with respect to the number of objects in a scene, 3) preserving scene hypotheses over continual observations.
To generate and maintain plausible scene hypotheses, we propose physics informed scene estimation methods that combine a Newtonian physics engine within a particle based generative inference framework. The proposed variants of our method with and without a Monte Carlo step showed promising results on generating and maintaining plausible hypotheses under complete occlusions. We show that estimating such scenarios would not be possible by the commonly adopted 3D registration methods without the notion of a physical context that our method provides.
To scale up the context informed inference to accommodate a larger number of objects, we describe a factorization of scene state into object and object-parts to perform collaborative particle-based inference. This resulted in the Pull Message Passing for Nonparametric Belief Propagation (PMPNBP) algorithm that caters to the demands of the high-dimensional multimodal nature of cluttered scenes while being computationally tractable. We demonstrate that PMPNBP is orders of magnitude faster than the state-of-the-art Nonparametric Belief Propagation method. Additionally, we show that PMPNBP successfully estimates poses of articulated objects under various simulated occlusion scenarios.
To extend our PMPNBP algorithm for tracking object states over continuous observations, we explore ways to propose and preserve hypotheses effectively over time. This resulted in an augmentation-selection method, where hypotheses are drawn from various proposals followed by the selection of a subset using PMPNBP that explained the current state of the objects. We discuss and analyze our augmentation-selection method with its counterparts in belief propagation literature. Furthermore, we develop an inference pipeline for pose estimation and tracking of articulated objects in clutter. In this pipeline, the message passing module with the augmentation-selection method is informed by segmentation heatmaps from a trained neural network. In our experiments, we show that our proposed pipeline can effectively maintain belief and track articulated objects over a sequence of observations under occlusion.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163159/1/kdesingh_1.pd
- …