18 research outputs found
RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints
We propose a Convolutional Neural Network (CNN)-based model "RotationNet,"
which takes multi-view images of an object as input and jointly estimates its
pose and object category. Unlike previous approaches that use known viewpoint
labels for training, our method treats the viewpoint labels as latent
variables, which are learned in an unsupervised manner during the training
using an unaligned object dataset. RotationNet is designed to use only a
partial set of multi-view images for inference, and this property makes it
useful in practical scenarios where only partial views are available. Moreover,
our pose alignment strategy enables one to obtain view-specific feature
representations shared across classes, which is important to maintain high
accuracy in both object categorization and pose estimation. Effectiveness of
RotationNet is demonstrated by its superior performance to the state-of-the-art
methods of 3D object classification on 10- and 40-class ModelNet datasets. We
also show that RotationNet, even trained without known poses, achieves the
state-of-the-art performance on an object pose estimation dataset. The code is
available on https://github.com/kanezaki/rotationnetComment: 24 pages, 23 figures. Accepted to CVPR 201
Learning Descriptors for Object Recognition and 3D Pose Estimation
Detecting poorly textured objects and estimating their 3D pose reliably is
still a very challenging problem. We introduce a simple but powerful approach
to computing descriptors for object views that efficiently capture both the
object identity and 3D pose. By contrast with previous manifold-based
approaches, we can rely on the Euclidean distance to evaluate the similarity
between descriptors, and therefore use scalable Nearest Neighbor search methods
to efficiently handle a large number of objects under a large range of poses.
To achieve this, we train a Convolutional Neural Network to compute these
descriptors by enforcing simple similarity and dissimilarity constraints
between the descriptors. We show that our constraints nicely untangle the
images from different objects and different views into clusters that are not
only well-separated but also structured as the corresponding sets of poses: The
Euclidean distance between descriptors is large when the descriptors are from
different objects, and directly related to the distance between the poses when
the descriptors are from the same object. These important properties allow us
to outperform state-of-the-art object views representations on challenging RGB
and RGB-D data.Comment: CVPR 201
MORE: Simultaneous Multi-View 3D Object Recognition and Pose Estimation
Simultaneous object recognition and pose estimation are two key
functionalities for robots to safely interact with humans as well as
environments. Although both object recognition and pose estimation use visual
input, most state-of-the-art tackles them as two separate problems since the
former needs a view-invariant representation while object pose estimation
necessitates a view-dependent description. Nowadays, multi-view Convolutional
Neural Network (MVCNN) approaches show state-of-the-art classification
performance. Although MVCNN object recognition has been widely explored, there
has been very little research on multi-view object pose estimation methods, and
even less on addressing these two problems simultaneously. The pose of virtual
cameras in MVCNN methods is often pre-defined in advance, leading to bound the
application of such approaches. In this paper, we propose an approach capable
of handling object recognition and pose estimation simultaneously. In
particular, we develop a deep object-agnostic entropy estimation model, capable
of predicting the best viewpoints of a given 3D object. The obtained views of
the object are then fed to the network to simultaneously predict the pose and
category label of the target object. Experimental results showed that the views
obtained from such positions are descriptive enough to achieve a good accuracy
score. Code is available online at: https://github.com/tparisotto/more_mvcn
Unsupervised Algorithms for Microarray Sample Stratification
The amount of data made available by microarrays gives researchers the opportunity to delve into the complexity of biological systems. However, the noisy and extremely high-dimensional nature of this kind of data poses significant challenges. Microarrays allow for the parallel measurement of thousands of molecular objects spanning different layers of interactions. In order to be able to discover hidden patterns, the most disparate analytical techniques have been proposed. Here, we describe the basic methodologies to approach the analysis of microarray datasets that focus on the task of (sub)group discovery.Peer reviewe
Multigranularity Representations for Human Inter-Actions: Pose, Motion and Intention
Tracking people and their body pose in videos is a central problem in computer vision. Standard tracking representations reason about temporal coherence of detected people and body parts. They have difficulty tracking targets under partial occlusions or rare body poses, where detectors often fail, since the number of training examples is often too small to deal with the exponential variability of such configurations.
We propose tracking representations that track and segment people and their body pose in videos by exploiting information at multiple detection and segmentation granularities when available, whole body, parts or point trajectories.
Detections and motion estimates provide contradictory information in case of false alarm detections or leaking motion affinities. We consolidate contradictory information via graph steering, an algorithm for simultaneous detection and co-clustering in a two-granularity graph of motion trajectories and detections, that corrects motion leakage between correctly detected objects, while being robust to false alarms or spatially inaccurate detections.
We first present a motion segmentation framework that exploits long range motion of point trajectories and large spatial support of image regions.
We show resulting video segments adapt to targets under partial occlusions and deformations.
Second, we augment motion-based representations with object detection for dealing with motion leakage. We demonstrate how to combine dense optical flow trajectory affinities with repulsions from confident detections to reach a global consensus of detection and tracking in crowded scenes.
Third, we study human motion and pose estimation.
We segment hard to detect, fast moving body limbs from their surrounding clutter and match them against pose exemplars to detect body pose under fast motion. We employ on-the-fly human body kinematics to improve tracking of body joints under wide deformations.
We use motion segmentability of body parts for re-ranking a set of body joint candidate trajectories and jointly infer multi-frame body pose and video segmentation.
We show empirically that such multi-granularity tracking representation is worthwhile, obtaining significantly more accurate multi-object tracking and detailed body pose estimation in popular datasets
Structured Learning with Manifold Representations of Natural Data Variations
According to the manifold hypothesis, natural variations in high-dimensional data lie on or near a low-dimensional, nonlinear manifold. Additionally, many identity-preserving transformations are shared among classes of data which can allow for an efficient representation of data variations: a limited set of transformations can describe a majority of variations in many classes. This work demonstrates the learning of generative models of identity-preserving transformations on data manifolds in order to analyze, generate, and exploit the natural variations in data for machine learning tasks. The introduced transformation representations are incorporated into several novel models to highlight the ability to generate realistic samples of semantically meaningful transformations, to generalize transformations beyond their source domain, and to estimate transformations between data samples. We first develop a model for learning 3D manifold-based transformations from 2D projected inputs which can be used to perform depth inference from 2D moving inputs. We then confirm that our generative model of transformations can be generalized across classes by defining two transfer learning tasks that map transformations learned from a rich dataset to previously unseen data. Next, we develop the manifold autoencoder, which learns low-dimensional manifold structure from complex data in the latent space of an autoencoder and adapts the latent space to accommodate this structure. Finally, we introduce the Variational Autoencoder with Learned Latent Structure (VAELLS) which incorporates a learnable manifold model into the fully probabilistic generative framework of a variational autoencoder.Ph.D
Morphometric Analysis of Mount Etna Lava Flows Using High Resolution Digital Elevation Models
Morphometric analysis of lava flows provides crucial information for a better understanding of the processes of lava flow dynamics and emplacement. In this thesis, high-resolution DEMs obtained by the airborne LiDAR system and the UAV-SfM system are used for an extensive morphometric analysis of the Mount Etna (Italy) lava flow. A digital comparison of pre- and post-eruptive LiDAR DEMs of Etna was made to quantify the lava volumes emitted in the 2004-2005, 2005-2006 and 2007-2010 intervals. The erupted volume of 2004-2005 is ~63.3 × 106 m3 entirely emitted by the 2004-05 eruption. The erupted volume of 2005-2007 is ~ 42.0 × 106 m3, of which ~33.5 × 106 m3 emitted by the September-December 2006 eruption. The erupted volume of 2007-2010 is >86 × 106m3, most of which (~74 × 106m3) is formed by the lava flows of the 2008-2009 flank eruption. Lava flow morphometric analysis was performed over LiDAR DEM for eleven channel-fed lava flows through a semi-automatic procedure and using SVF and openness down parameters to better detect and delimit surface-specific elements, i.e. lava levees, base and channel-bed. The results show an inverse relation between slope and channel width, a certain coherence between average slope of levees and pre-emplacing slope, and the same trend between the channel width and channel-bed width. Finally, in order to investigate less costly methods for producing DEMs, we created a high-resolution DEM of the 1974 lava flow using the UAV-SfM system and then we compared it with the LiDAR-derived DEM. The UAV-SfM system can be effectively used to produce topographic data for large areas with an accuracy and resolution that are even higher than those of the LiDAR system. Therefore, the UAV-SfM system can be effectively used to update the topography of active volcanic areas with reasonable costs and short time of deployment