4,238 research outputs found
Single-Shot Global Localization via Graph-Theoretic Correspondence Matching
This paper describes a method of global localization based on graph-theoretic
association of instances between a query and the prior map. The proposed
framework employs correspondence matching based on the maximum clique problem
(MCP). The framework is potentially applicable to other map and/or query
modalities thanks to the graph-based abstraction of the problem, while many of
existing global localization methods rely on a query and the dataset in the
same modality. We implement it with a semantically labeled 3D point cloud map,
and a semantic segmentation image as a query. Leveraging the graph-theoretic
framework, the proposed method realizes global localization exploiting only the
map and the query. The method shows promising results on multiple large-scale
simulated maps of urban scenes
Learning based automatic face annotation for arbitrary poses and expressions from frontal images only
Statistical approaches for building non-rigid deformable models, such as the active appearance model (AAM), have enjoyed great popularity in recent years, but typically require tedious manual annotation of training images. In this paper, a learning based approach for the automatic annotation of visually deformable objects from a single annotated frontal image is presented and demonstrated on the example of automatically annotating face images that can be used for building AAMs for fitting and tracking. This approach employs the idea of initially learning the correspondences between landmarks in a frontal image and a set of training images with a face in arbitrary poses. Using this learner, virtual images of unseen faces at any arbitrary pose for which the learner was trained can be reconstructed by predicting the new landmark locations and warping the texture from the frontal image. View-based AAMs are then built from the virtual images and used for automatically annotating unseen images, including images of different facial expressions, at any random pose within the maximum range spanned by the virtually reconstructed images. The approach is experimentally validated by automatically annotating face images from three different databases
-Metric: An N-Dimensional Information-Theoretic Framework for Groupwise Registration and Deep Combined Computing
This paper presents a generic probabilistic framework for estimating the
statistical dependency and finding the anatomical correspondences among an
arbitrary number of medical images. The method builds on a novel formulation of
the -dimensional joint intensity distribution by representing the common
anatomy as latent variables and estimating the appearance model with
nonparametric estimators. Through connection to maximum likelihood and the
expectation-maximization algorithm, an information\hyp{}theoretic metric called
-metric and a co-registration algorithm named -CoReg
are induced, allowing groupwise registration of the observed images with
computational complexity of . Moreover, the method naturally
extends for a weakly-supervised scenario where anatomical labels of certain
images are provided. This leads to a combined\hyp{}computing framework
implemented with deep learning, which performs registration and segmentation
simultaneously and collaboratively in an end-to-end fashion. Extensive
experiments were conducted to demonstrate the versatility and applicability of
our model, including multimodal groupwise registration, motion correction for
dynamic contrast enhanced magnetic resonance images, and deep combined
computing for multimodal medical images. Results show the superiority of our
method in various applications in terms of both accuracy and efficiency,
highlighting the advantage of the proposed representation of the imaging
process
Automated Complexity-Sensitive Image Fusion
To construct a complete representation of a scene with environmental obstacles such as fog, smoke, darkness, or textural homogeneity, multisensor video streams captured in diferent modalities are considered. A computational method for automatically fusing multimodal image streams into a highly informative and unified stream is proposed. The method consists of the following steps: 1. Image registration is performed to align video frames in the visible band over time, adapting to the nonplanarity of the scene by automatically subdividing the image domain into regions approximating planar patches
2. Wavelet coefficients are computed for each of the input frames in each modality
3. Corresponding regions and points are compared using spatial and temporal information across various scales
4. Decision rules based on the results of multimodal image analysis are used to combine thewavelet coefficients from different modalities
5. The combined wavelet coefficients are inverted to produce an output frame containing useful information gathered from the available modalities
Experiments show that the proposed system is capable of producing fused output containing the characteristics of color visible-spectrum imagery while adding information exclusive to infrared imagery, with attractive visual and informational properties
- …