315 research outputs found

    RotationNet: Joint Object Categorization and Pose Estimation Using Multiviews from Unsupervised Viewpoints

    Full text link
    We propose a Convolutional Neural Network (CNN)-based model "RotationNet," which takes multi-view images of an object as input and jointly estimates its pose and object category. Unlike previous approaches that use known viewpoint labels for training, our method treats the viewpoint labels as latent variables, which are learned in an unsupervised manner during the training using an unaligned object dataset. RotationNet is designed to use only a partial set of multi-view images for inference, and this property makes it useful in practical scenarios where only partial views are available. Moreover, our pose alignment strategy enables one to obtain view-specific feature representations shared across classes, which is important to maintain high accuracy in both object categorization and pose estimation. Effectiveness of RotationNet is demonstrated by its superior performance to the state-of-the-art methods of 3D object classification on 10- and 40-class ModelNet datasets. We also show that RotationNet, even trained without known poses, achieves the state-of-the-art performance on an object pose estimation dataset. The code is available on https://github.com/kanezaki/rotationnetComment: 24 pages, 23 figures. Accepted to CVPR 201

    MORE: Simultaneous Multi-View 3D Object Recognition and Pose Estimation

    Get PDF
    Simultaneous object recognition and pose estimation are two key functionalities for robots to safely interact with humans as well as environments. Although both object recognition and pose estimation use visual input, most state-of-the-art tackles them as two separate problems since the former needs a view-invariant representation while object pose estimation necessitates a view-dependent description. Nowadays, multi-view Convolutional Neural Network (MVCNN) approaches show state-of-the-art classification performance. Although MVCNN object recognition has been widely explored, there has been very little research on multi-view object pose estimation methods, and even less on addressing these two problems simultaneously. The pose of virtual cameras in MVCNN methods is often pre-defined in advance, leading to bound the application of such approaches. In this paper, we propose an approach capable of handling object recognition and pose estimation simultaneously. In particular, we develop a deep object-agnostic entropy estimation model, capable of predicting the best viewpoints of a given 3D object. The obtained views of the object are then fed to the network to simultaneously predict the pose and category label of the target object. Experimental results showed that the views obtained from such positions are descriptive enough to achieve a good accuracy score. Code is available online at: https://github.com/tparisotto/more_mvcn

    Learning Robust and Discriminative Manifold Representations for Pattern Recognition

    Get PDF
    Face and object recognition find applications in domains such as biometrics, surveillance and human computer interaction. An important component in any recognition pipeline is to learn pertinent image representations that will help the system to discriminate one image class from another. These representations enable the system to learn a discriminative function that can classify a wide range of images. In practical situations, the images acquired are often corrupted with occlusions and noise. Thus, a robust and discriminative learning is necessary for good classification performance. This thesis explores two scenarios where robust and discriminative manifold representations help recognize face and object images. On one hand learning robust manifold projections enables the system to adapt to images across different domains including cases with noise and occlusions. And on the other hand learning discriminative manifold representations aid in image set comparison. The first contribution of this thesis is a robust approach to visual domain adaptation by learning a subspace with L1 principal component analysis (PCA) and L1 Grassmannian with applications to object and face recognition. Mapping data from different domains on a low dimensional subspace through PCA is a common step in subspace based unsupervised domain adaptation. Subspaces extracted by PCA are prone to be affected by outliers that lead to noisy projections. A robust subspace learning through L1-PCA helps in improving performance. The proposed approach was tested on the office, Caltech - 256, Yale-A and AT&T datasets. Results indicate the improvement of classification accuracy for face and object recognition task. The second contribution of this thesis is a biologically motivated manifold learning framework for image set classification by independent component analysis (ICA) for Grassmann manifolds. It has been discovered that the simple cells in the visual cortex learn spatially localized image representations. Similar representations can be learnt using ICA. Motivated by the manifold hypothesis, a Grassmann manifold is learnt using the independent components which enables compact representation through linear subspaces. The efficacy of the proposed approach is demonstrated for image set classification on face and object recognition datasets such as AT&T, extended Yale, labelled faces in the wild and ETH - 80

    Lurching Toward Chernobyl: Dysfunctions of Real-Time Computation

    Get PDF
    Cognitive biological structures, social organizations, and computing machines operating in real time are subject to Rate Distortion Theorem constraints driven by the homology between information source uncertainty and free energy density. This exposes the unitary structure/environment system to a relentless entropic torrent compounded by sudden large deviations causing increased distortion between intent and impact, particularly as demands escalate. The phase transitions characteristic of information phenomena suggest that, rather than graceful decay under increasing load, these structures will undergo punctuated degradation akin to spontaneous symmetry breaking in physical systems. Rate distortion problems, that also affect internal structural dynamics, can become synergistic with limitations equivalent to the inattentional blindness of natural cognitive process. These mechanisms, and their interactions, are unlikely to scale well, so that, depending on architecture, enlarging the structure or its duties may lead to a crossover point at which added resources must be almost entirely devoted to ensuring system stability -- a form of allometric scaling familiar from biological examples. This suggests a critical need to tune architecture to problem type and system demand. A real-time computational structure and its environment are a unitary phenomenon, and environments are usually idiosyncratic. Thus the resulting path dependence in the development of pathology could often require an individualized approach to remediation more akin to an arduous psychiatric intervention than to the traditional engineering or medical quick fix. Failure to recognize the depth of these problems seems likely to produce a relentless chain of the Chernobyl-like failures that are necessary, bot often insufficient, for remediation under our system
    • …
    corecore