18 research outputs found

    RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints

    Full text link
    We propose a Convolutional Neural Network (CNN)-based model "RotationNet," which takes multi-view images of an object as input and jointly estimates its pose and object category. Unlike previous approaches that use known viewpoint labels for training, our method treats the viewpoint labels as latent variables, which are learned in an unsupervised manner during the training using an unaligned object dataset. RotationNet is designed to use only a partial set of multi-view images for inference, and this property makes it useful in practical scenarios where only partial views are available. Moreover, our pose alignment strategy enables one to obtain view-specific feature representations shared across classes, which is important to maintain high accuracy in both object categorization and pose estimation. Effectiveness of RotationNet is demonstrated by its superior performance to the state-of-the-art methods of 3D object classification on 10- and 40-class ModelNet datasets. We also show that RotationNet, even trained without known poses, achieves the state-of-the-art performance on an object pose estimation dataset. The code is available on https://github.com/kanezaki/rotationnetComment: 24 pages, 23 figures. Accepted to CVPR 201

    Learning Descriptors for Object Recognition and 3D Pose Estimation

    Full text link
    Detecting poorly textured objects and estimating their 3D pose reliably is still a very challenging problem. We introduce a simple but powerful approach to computing descriptors for object views that efficiently capture both the object identity and 3D pose. By contrast with previous manifold-based approaches, we can rely on the Euclidean distance to evaluate the similarity between descriptors, and therefore use scalable Nearest Neighbor search methods to efficiently handle a large number of objects under a large range of poses. To achieve this, we train a Convolutional Neural Network to compute these descriptors by enforcing simple similarity and dissimilarity constraints between the descriptors. We show that our constraints nicely untangle the images from different objects and different views into clusters that are not only well-separated but also structured as the corresponding sets of poses: The Euclidean distance between descriptors is large when the descriptors are from different objects, and directly related to the distance between the poses when the descriptors are from the same object. These important properties allow us to outperform state-of-the-art object views representations on challenging RGB and RGB-D data.Comment: CVPR 201

    MORE: Simultaneous Multi-View 3D Object Recognition and Pose Estimation

    Get PDF
    Simultaneous object recognition and pose estimation are two key functionalities for robots to safely interact with humans as well as environments. Although both object recognition and pose estimation use visual input, most state-of-the-art tackles them as two separate problems since the former needs a view-invariant representation while object pose estimation necessitates a view-dependent description. Nowadays, multi-view Convolutional Neural Network (MVCNN) approaches show state-of-the-art classification performance. Although MVCNN object recognition has been widely explored, there has been very little research on multi-view object pose estimation methods, and even less on addressing these two problems simultaneously. The pose of virtual cameras in MVCNN methods is often pre-defined in advance, leading to bound the application of such approaches. In this paper, we propose an approach capable of handling object recognition and pose estimation simultaneously. In particular, we develop a deep object-agnostic entropy estimation model, capable of predicting the best viewpoints of a given 3D object. The obtained views of the object are then fed to the network to simultaneously predict the pose and category label of the target object. Experimental results showed that the views obtained from such positions are descriptive enough to achieve a good accuracy score. Code is available online at: https://github.com/tparisotto/more_mvcn

    Unsupervised Algorithms for Microarray Sample Stratification

    Get PDF
    The amount of data made available by microarrays gives researchers the opportunity to delve into the complexity of biological systems. However, the noisy and extremely high-dimensional nature of this kind of data poses significant challenges. Microarrays allow for the parallel measurement of thousands of molecular objects spanning different layers of interactions. In order to be able to discover hidden patterns, the most disparate analytical techniques have been proposed. Here, we describe the basic methodologies to approach the analysis of microarray datasets that focus on the task of (sub)group discovery.Peer reviewe

    Multigranularity Representations for Human Inter-Actions: Pose, Motion and Intention

    Get PDF
    Tracking people and their body pose in videos is a central problem in computer vision. Standard tracking representations reason about temporal coherence of detected people and body parts. They have difficulty tracking targets under partial occlusions or rare body poses, where detectors often fail, since the number of training examples is often too small to deal with the exponential variability of such configurations. We propose tracking representations that track and segment people and their body pose in videos by exploiting information at multiple detection and segmentation granularities when available, whole body, parts or point trajectories. Detections and motion estimates provide contradictory information in case of false alarm detections or leaking motion affinities. We consolidate contradictory information via graph steering, an algorithm for simultaneous detection and co-clustering in a two-granularity graph of motion trajectories and detections, that corrects motion leakage between correctly detected objects, while being robust to false alarms or spatially inaccurate detections. We first present a motion segmentation framework that exploits long range motion of point trajectories and large spatial support of image regions. We show resulting video segments adapt to targets under partial occlusions and deformations. Second, we augment motion-based representations with object detection for dealing with motion leakage. We demonstrate how to combine dense optical flow trajectory affinities with repulsions from confident detections to reach a global consensus of detection and tracking in crowded scenes. Third, we study human motion and pose estimation. We segment hard to detect, fast moving body limbs from their surrounding clutter and match them against pose exemplars to detect body pose under fast motion. We employ on-the-fly human body kinematics to improve tracking of body joints under wide deformations. We use motion segmentability of body parts for re-ranking a set of body joint candidate trajectories and jointly infer multi-frame body pose and video segmentation. We show empirically that such multi-granularity tracking representation is worthwhile, obtaining significantly more accurate multi-object tracking and detailed body pose estimation in popular datasets

    Structured Learning with Manifold Representations of Natural Data Variations

    Get PDF
    According to the manifold hypothesis, natural variations in high-dimensional data lie on or near a low-dimensional, nonlinear manifold. Additionally, many identity-preserving transformations are shared among classes of data which can allow for an efficient representation of data variations: a limited set of transformations can describe a majority of variations in many classes. This work demonstrates the learning of generative models of identity-preserving transformations on data manifolds in order to analyze, generate, and exploit the natural variations in data for machine learning tasks. The introduced transformation representations are incorporated into several novel models to highlight the ability to generate realistic samples of semantically meaningful transformations, to generalize transformations beyond their source domain, and to estimate transformations between data samples. We first develop a model for learning 3D manifold-based transformations from 2D projected inputs which can be used to perform depth inference from 2D moving inputs. We then confirm that our generative model of transformations can be generalized across classes by defining two transfer learning tasks that map transformations learned from a rich dataset to previously unseen data. Next, we develop the manifold autoencoder, which learns low-dimensional manifold structure from complex data in the latent space of an autoencoder and adapts the latent space to accommodate this structure. Finally, we introduce the Variational Autoencoder with Learned Latent Structure (VAELLS) which incorporates a learnable manifold model into the fully probabilistic generative framework of a variational autoencoder.Ph.D

    Morphometric Analysis of Mount Etna Lava Flows Using High Resolution Digital Elevation Models

    Get PDF
    Morphometric analysis of lava flows provides crucial information for a better understanding of the processes of lava flow dynamics and emplacement. In this thesis, high-resolution DEMs obtained by the airborne LiDAR system and the UAV-SfM system are used for an extensive morphometric analysis of the Mount Etna (Italy) lava flow. A digital comparison of pre- and post-eruptive LiDAR DEMs of Etna was made to quantify the lava volumes emitted in the 2004-2005, 2005-2006 and 2007-2010 intervals. The erupted volume of 2004-2005 is ~63.3 × 106 m3 entirely emitted by the 2004-05 eruption. The erupted volume of 2005-2007 is ~ 42.0 × 106 m3, of which ~33.5 × 106 m3 emitted by the September-December 2006 eruption. The erupted volume of 2007-2010 is >86 × 106m3, most of which (~74 × 106m3) is formed by the lava flows of the 2008-2009 flank eruption. Lava flow morphometric analysis was performed over LiDAR DEM for eleven channel-fed lava flows through a semi-automatic procedure and using SVF and openness down parameters to better detect and delimit surface-specific elements, i.e. lava levees, base and channel-bed. The results show an inverse relation between slope and channel width, a certain coherence between average slope of levees and pre-emplacing slope, and the same trend between the channel width and channel-bed width. Finally, in order to investigate less costly methods for producing DEMs, we created a high-resolution DEM of the 1974 lava flow using the UAV-SfM system and then we compared it with the LiDAR-derived DEM. The UAV-SfM system can be effectively used to produce topographic data for large areas with an accuracy and resolution that are even higher than those of the LiDAR system. Therefore, the UAV-SfM system can be effectively used to update the topography of active volcanic areas with reasonable costs and short time of deployment
    corecore