76,646 research outputs found
A Framework for Image Segmentation Using Shape Models and Kernel Space Shape Priors
©2008 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or distribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.DOI: 10.1109/TPAMI.2007.70774Segmentation involves separating an object from the background in a given image. The use of image information alone often leads to poor segmentation results due to the presence of noise, clutter or occlusion. The introduction of shape priors in the geometric active contour (GAC) framework has proved to be an effective way to ameliorate some of these problems. In this work, we propose a novel segmentation method combining image information with prior shape knowledge, using level-sets. Following the work of Leventon et al., we propose to revisit the use of PCA to introduce prior knowledge about shapes in a more robust manner. We utilize kernel PCA (KPCA) and show that this method outperforms linear PCA by allowing only those shapes that are close enough to the training data. In our segmentation framework, shape knowledge and image information are encoded into two energy functionals entirely described in terms of shapes. This consistent description permits to fully take advantage of the Kernel PCA methodology and leads to promising segmentation results. In particular, our shape-driven segmentation technique allows for the simultaneous encoding of multiple types of shapes, and offers a convincing level of robustness with respect to noise, occlusions, or smearing
View subspaces for indexing and retrieval of 3D models
View-based indexing schemes for 3D object retrieval are gaining popularity
since they provide good retrieval results. These schemes are coherent with the
theory that humans recognize objects based on their 2D appearances. The
viewbased techniques also allow users to search with various queries such as
binary images, range images and even 2D sketches. The previous view-based
techniques use classical 2D shape descriptors such as Fourier invariants,
Zernike moments, Scale Invariant Feature Transform-based local features and 2D
Digital Fourier Transform coefficients. These methods describe each object
independent of others. In this work, we explore data driven subspace models,
such as Principal Component Analysis, Independent Component Analysis and
Nonnegative Matrix Factorization to describe the shape information of the
views. We treat the depth images obtained from various points of the view
sphere as 2D intensity images and train a subspace to extract the inherent
structure of the views within a database. We also show the benefit of
categorizing shapes according to their eigenvalue spread. Both the shape
categorization and data-driven feature set conjectures are tested on the PSB
database and compared with the competitor view-based 3D shape retrieval
algorithmsComment: Three-Dimensional Image Processing (3DIP) and Applications
(Proceedings Volume) Proceedings of SPIE Volume: 7526 Editor(s): Atilla M.
Baskurt ISBN: 9780819479198 Date: 2 February 201
RUR53: an Unmanned Ground Vehicle for Navigation, Recognition and Manipulation
This paper proposes RUR53: an Unmanned Ground Vehicle able to autonomously
navigate through, identify, and reach areas of interest; and there recognize,
localize, and manipulate work tools to perform complex manipulation tasks. The
proposed contribution includes a modular software architecture where each
module solves specific sub-tasks and that can be easily enlarged to satisfy new
requirements. Included indoor and outdoor tests demonstrate the capability of
the proposed system to autonomously detect a target object (a panel) and
precisely dock in front of it while avoiding obstacles. They show it can
autonomously recognize and manipulate target work tools (i.e., wrenches and
valve stems) to accomplish complex tasks (i.e., use a wrench to rotate a valve
stem). A specific case study is described where the proposed modular
architecture lets easy switch to a semi-teleoperated mode. The paper
exhaustively describes description of both the hardware and software setup of
RUR53, its performance when tests at the 2017 Mohamed Bin Zayed International
Robotics Challenge, and the lessons we learned when participating at this
competition, where we ranked third in the Gran Challenge in collaboration with
the Czech Technical University in Prague, the University of Pennsylvania, and
the University of Lincoln (UK).Comment: This article has been accepted for publication in Advanced Robotics,
published by Taylor & Franci
Stratified decision forests for accurate anatomical landmark localization in cardiac images
Accurate localization of anatomical landmarks is an important step in medical imaging, as it provides useful prior information for subsequent image analysis and acquisition methods. It is particularly useful for initialization of automatic image analysis tools (e.g. segmentation and registration) and detection of scan planes for automated image acquisition. Landmark localization has been commonly performed using learning based approaches, such as classifier and/or regressor models. However, trained models may not generalize well in heterogeneous datasets when the images contain large differences due to size, pose and shape variations of organs. To learn more data-adaptive and patient specific models, we propose a novel stratification based training model, and demonstrate its use in a decision forest. The proposed approach does not require any additional training information compared to the standard model training procedure and can be easily integrated into any decision tree framework. The proposed method is evaluated on 1080 3D highresolution and 90 multi-stack 2D cardiac cine MR images. The experiments show that the proposed method achieves state-of-theart landmark localization accuracy and outperforms standard regression and classification based approaches. Additionally, the proposed method is used in a multi-atlas segmentation to create a fully automatic segmentation pipeline, and the results show that it achieves state-of-the-art segmentation accuracy
The Visual Centrifuge: Model-Free Layered Video Representations
True video understanding requires making sense of non-lambertian scenes where
the color of light arriving at the camera sensor encodes information about not
just the last object it collided with, but about multiple mediums -- colored
windows, dirty mirrors, smoke or rain. Layered video representations have the
potential of accurately modelling realistic scenes but have so far required
stringent assumptions on motion, lighting and shape. Here we propose a
learning-based approach for multi-layered video representation: we introduce
novel uncertainty-capturing 3D convolutional architectures and train them to
separate blended videos. We show that these models then generalize to single
videos, where they exhibit interesting abilities: color constancy, factoring
out shadows and separating reflections. We present quantitative and qualitative
results on real world videos.Comment: Appears in: 2019 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR 2019). This arXiv contains the CVPR Camera Ready version of
the paper (although we have included larger figures) as well as an appendix
detailing the model architectur
Virtual Borders: Accurate Definition of a Mobile Robot's Workspace Using Augmented Reality
We address the problem of interactively controlling the workspace of a mobile
robot to ensure a human-aware navigation. This is especially of relevance for
non-expert users living in human-robot shared spaces, e.g. home environments,
since they want to keep the control of their mobile robots, such as vacuum
cleaning or companion robots. Therefore, we introduce virtual borders that are
respected by a robot while performing its tasks. For this purpose, we employ a
RGB-D Google Tango tablet as human-robot interface in combination with an
augmented reality application to flexibly define virtual borders. We evaluated
our system with 15 non-expert users concerning accuracy, teaching time and
correctness and compared the results with other baseline methods based on
visual markers and a laser pointer. The experimental results show that our
method features an equally high accuracy while reducing the teaching time
significantly compared to the baseline methods. This holds for different border
lengths, shapes and variations in the teaching process. Finally, we
demonstrated the correctness of the approach, i.e. the mobile robot changes its
navigational behavior according to the user-defined virtual borders.Comment: Accepted on 2018 IEEE/RSJ International Conference on Intelligent
Robots and Systems (IROS), supplementary video: https://youtu.be/oQO8sQ0JBR
Efficient illumination independent appearance-based face tracking
One of the major challenges that visual tracking algorithms face nowadays is being
able to cope with changes in the appearance of the target during tracking. Linear
subspace models have been extensively studied and are possibly the most popular
way of modelling target appearance. We introduce a linear subspace representation
in which the appearance of a face is represented by the addition of two approxi-
mately independent linear subspaces modelling facial expressions and illumination
respectively. This model is more compact than previous bilinear or multilinear ap-
proaches. The independence assumption notably simplifies system training. We only
require two image sequences. One facial expression is subject to all possible illumina-
tions in one sequence and the face adopts all facial expressions under one particular
illumination in the other. This simple model enables us to train the system with
no manual intervention. We also revisit the problem of efficiently fitting a linear
subspace-based model to a target image and introduce an additive procedure for
solving this problem. We prove that Matthews and Baker’s Inverse Compositional
Approach makes a smoothness assumption on the subspace basis that is equiva-
lent to Hager and Belhumeur’s, which worsens convergence. Our approach differs
from Hager and Belhumeur’s additive and Matthews and Baker’s compositional ap-
proaches in that we make no smoothness assumptions on the subspace basis. In the
experiments conducted we show that the model introduced accurately represents
the appearance variations caused by illumination changes and facial expressions.
We also verify experimentally that our fitting procedure is more accurate and has
better convergence rate than the other related approaches, albeit at the expense of
a slight increase in computational cost. Our approach can be used for tracking a
human face at standard video frame rates on an average personal computer
- …