18,250 research outputs found
Improving 6D Pose Estimation of Objects in Clutter via Physics-aware Monte Carlo Tree Search
This work proposes a process for efficiently searching over combinations of
individual object 6D pose hypotheses in cluttered scenes, especially in cases
involving occlusions and objects resting on each other. The initial set of
candidate object poses is generated from state-of-the-art object detection and
global point cloud registration techniques. The best-scored pose per object by
using these techniques may not be accurate due to overlaps and occlusions.
Nevertheless, experimental indications provided in this work show that object
poses with lower ranks may be closer to the real poses than ones with high
ranks according to registration techniques. This motivates a global
optimization process for improving these poses by taking into account
scene-level physical interactions between objects. It also implies that the
Cartesian product of candidate poses for interacting objects must be searched
so as to identify the best scene-level hypothesis. To perform the search
efficiently, the candidate poses for each object are clustered so as to reduce
their number but still keep a sufficient diversity. Then, searching over the
combinations of candidate object poses is performed through a Monte Carlo Tree
Search (MCTS) process that uses the similarity between the observed depth image
of the scene and a rendering of the scene given the hypothesized pose as a
score that guides the search procedure. MCTS handles in a principled way the
tradeoff between fine-tuning the most promising poses and exploring new ones,
by using the Upper Confidence Bound (UCB) technique. Experimental results
indicate that this process is able to quickly identify in cluttered scenes
physically-consistent object poses that are significantly closer to ground
truth compared to poses found by point cloud registration methods.Comment: 8 pages, 4 figure
A Framework for Symmetric Part Detection in Cluttered Scenes
The role of symmetry in computer vision has waxed and waned in importance
during the evolution of the field from its earliest days. At first figuring
prominently in support of bottom-up indexing, it fell out of favor as shape
gave way to appearance and recognition gave way to detection. With a strong
prior in the form of a target object, the role of the weaker priors offered by
perceptual grouping was greatly diminished. However, as the field returns to
the problem of recognition from a large database, the bottom-up recovery of the
parts that make up the objects in a cluttered scene is critical for their
recognition. The medial axis community has long exploited the ubiquitous
regularity of symmetry as a basis for the decomposition of a closed contour
into medial parts. However, today's recognition systems are faced with
cluttered scenes, and the assumption that a closed contour exists, i.e. that
figure-ground segmentation has been solved, renders much of the medial axis
community's work inapplicable. In this article, we review a computational
framework, previously reported in Lee et al. (2013), Levinshtein et al. (2009,
2013), that bridges the representation power of the medial axis and the need to
recover and group an object's parts in a cluttered scene. Our framework is
rooted in the idea that a maximally inscribed disc, the building block of a
medial axis, can be modeled as a compact superpixel in the image. We evaluate
the method on images of cluttered scenes.Comment: 10 pages, 8 figure
Computerized Analysis of Magnetic Resonance Images to Study Cerebral Anatomy in Developing Neonates
The study of cerebral anatomy in developing neonates is of great importance for
the understanding of brain development during the early period of life. This
dissertation therefore focuses on three challenges in the modelling of cerebral
anatomy in neonates during brain development. The methods that have been
developed all use Magnetic Resonance Images (MRI) as source data.
To facilitate study of vascular development in the neonatal period, a set of image
analysis algorithms are developed to automatically extract and model cerebral
vessel trees. The whole process consists of cerebral vessel tracking from
automatically placed seed points, vessel tree generation, and vasculature
registration and matching. These algorithms have been tested on clinical Time-of-
Flight (TOF) MR angiographic datasets.
To facilitate study of the neonatal cortex a complete cerebral cortex segmentation
and reconstruction pipeline has been developed. Segmentation of the neonatal
cortex is not effectively done by existing algorithms designed for the adult brain
because the contrast between grey and white matter is reversed. This causes pixels
containing tissue mixtures to be incorrectly labelled by conventional methods. The
neonatal cortical segmentation method that has been developed is based on a novel
expectation-maximization (EM) method with explicit correction for mislabelled
partial volume voxels. Based on the resulting cortical segmentation, an implicit
surface evolution technique is adopted for the reconstruction of the cortex in
neonates. The performance of the method is investigated by performing a detailed
landmark study.
To facilitate study of cortical development, a cortical surface registration algorithm
for aligning the cortical surface is developed. The method first inflates extracted
cortical surfaces and then performs a non-rigid surface registration using free-form
deformations (FFDs) to remove residual alignment. Validation experiments using
data labelled by an expert observer demonstrate that the method can capture local
changes and follow the growth of specific sulcus
Discrete Optimization Methods for Segmentation and Matching
This dissertation studies discrete optimization methods for several computer vision problems. In the first part, a new objective function for superpixel segmentation is proposed. This objective function consists of two components: entropy rate of a random walk on a graph and a balancing term. The entropy rate favors formation of compact and homogeneous clusters, while the balancing function encourages clusters with similar sizes. I present a new graph construction for images and show that this construction induces a matroid. The segmentation is then given by the graph topology which maximizes the objective function under the matroid constraint. By exploiting submodular and monotonic properties of the objective function, I develop an efficient algorithm with a worst-case performance bound of for the superpixel segmentation problem. Extensive experiments on the Berkeley segmentation benchmark show the proposed algorithm outperforms the state of the art in all the standard evaluation metrics.
Next, I propose a video segmentation algorithm by maximizing a submodular objective function subject to a matroid constraint. This function is similar to the standard energy function in computer vision with unary terms, pairwise terms from the Potts model, and a novel higher-order term based on appearance histograms. I show that the standard Potts model prior, which becomes non-submodular for multi-label problems, still induces a submodular function in a maximization framework. A new higher-order prior further enforces consistency in the appearance histograms both spatially and temporally across the video. The matroid constraint leads to a simple algorithm with a performance bound of . A branch and bound procedure is also presented to improve the solution computed by the algorithm.
The last part of the dissertation studies the object localization problem in images given a single hand-drawn example or a gallery of shapes as the object model. Although many shape matching algorithms have been proposed for the problem, chamfer matching remains to be the preferred method when speed and robustness are considered. In this dissertation, I significantly improve the accuracy of chamfer matching while reducing the computational time from linear to sublinear (shown empirically). It is achieved by incorporating edge orientation information in the matching algorithm so the resulting cost function is piecewise smooth and the cost variation is tightly bounded. Moreover, I present a sublinear time algorithm for exact computation of the directional chamfer matching score using techniques from 3D distance transforms and directional integral images. In addition, the smooth cost function allows one to bound the cost distribution of large neighborhoods and skip the bad hypotheses. Experiments show that the proposed approach improves the speed of the original chamfer matching up to an order of 45 times, and it is much faster than many state of art techniques while the accuracy is comparable. I further demonstrate the application of the proposed algorithm in providing seamless operation for a robotic bin picking system
Product recognition in store shelves as a sub-graph isomorphism problem
The arrangement of products in store shelves is carefully planned to maximize
sales and keep customers happy. However, verifying compliance of real shelves
to the ideal layout is a costly task routinely performed by the store
personnel. In this paper, we propose a computer vision pipeline to recognize
products on shelves and verify compliance to the planned layout. We deploy
local invariant features together with a novel formulation of the product
recognition problem as a sub-graph isomorphism between the items appearing in
the given image and the ideal layout. This allows for auto-localizing the given
image within the aisle or store and improving recognition dramatically.Comment: Slightly extended version of the paper accepted at ICIAP 2017. More
information @project_page -->
http://vision.disi.unibo.it/index.php?option=com_content&view=article&id=111&catid=7
- …