334,764 research outputs found

    3D Object Recognition Using Multiple Views And Neural Networks.

    Get PDF
    This paper proposes a method for recognition and classification of 3D objects. The method is based on 2D moments and neural networks. The 2D moments are calculated based on 2D intensity images taken from multiple cameras that have been arranged using multiple views technique. 2D moments are commonly used for 2D pattern recognition

    Object recognition using multi-view imaging

    No full text
    Single view imaging data has been used in most previous research in computer vision and image understanding and lots of techniques have been developed. Recently with the fast development and dropping cost of multiple cameras, it has become possible to have many more views to achieve image processing tasks. This thesis will consider how to use the obtained multiple images in the application of target object recognition. In this context, we present two algorithms for object recognition based on scale- invariant feature points. The first is single view object recognition method (SOR), which operates on single images and uses a chirality constraint to reduce the recognition errors that arise when only a small number of feature points are matched. The procedure is extended in the second multi-view object recognition algorithm (MOR) which operates on a multi-view image sequence and, by tracking feature points using a dynamic programming method in the plenoptic domain subject to the epipolar constraint, is able to fuse feature point matches from all the available images, resulting in more robust recognition. We evaluated these algorithms using a number of data sets of real images capturing both indoor and outdoor scenes. We demonstrate that MOR is better than SOR particularly for noisy and low resolution images, and it is also able to recognize objects that are partially occluded by combining it with some segmentation techniques

    MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

    Full text link
    Robots and other smart devices need efficient object-based scene representations from their on-board vision systems to reason about contact, physics and occlusion. Recognized precise object models will play an important role alongside non-parametric reconstructions of unrecognized structures. We present a system which can estimate the accurate poses of multiple known objects in contact and occlusion from real-time, embodied multi-view vision. Our approach makes 3D object pose proposals from single RGB-D views, accumulates pose estimates and non-parametric occupancy information from multiple views as the camera moves, and performs joint optimization to estimate consistent, non-intersecting poses for multiple objects in contact. We verify the accuracy and robustness of our approach experimentally on 2 object datasets: YCB-Video, and our own challenging Cluttered YCB-Video. We demonstrate a real-time robotics application where a robot arm precisely and orderly disassembles complicated piles of objects, using only on-board RGB-D vision.Comment: 10 pages, 10 figures, IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 202

    Learning Dense Object Descriptors from Multiple Views for Low-shot Category Generalization

    Full text link
    A hallmark of the deep learning era for computer vision is the successful use of large-scale labeled datasets to train feature representations for tasks ranging from object recognition and semantic segmentation to optical flow estimation and novel view synthesis of 3D scenes. In this work, we aim to learn dense discriminative object representations for low-shot category recognition without requiring any category labels. To this end, we propose Deep Object Patch Encodings (DOPE), which can be trained from multiple views of object instances without any category or semantic object part labels. To train DOPE, we assume access to sparse depths, foreground masks and known cameras, to obtain pixel-level correspondences between views of an object, and use this to formulate a self-supervised learning task to learn discriminative object patches. We find that DOPE can directly be used for low-shot classification of novel categories using local-part matching, and is competitive with and outperforms supervised and self-supervised learning baselines. Code and data available at https://github.com/rehg-lab/dope_selfsup.Comment: Accepted at NeurIPS 2022. Code and data available at https://github.com/rehg-lab/dope_selfsu

    A system that learns to recognize 3-D objects

    Get PDF
    A system that learns to recognize 3-D objects from single and multiple views is presented. It consists of three parts: a simulator of 3-D figures, a Learner, and a recognizer. The 3-D figure simulator generates and plots line drawings of certain 3-D objects. A series of transformations leads to a number of 2-D images of a 3-D object, which are considered as different views and are the basic input to the next two parts. The learner works in three stages using the method of Learning from examples. In the first stage an elementary-concept learner learns the basic entities that make up a line drawing. In the second stage a multiple-view learner learns the definitions of 3-D objects that are to be recognized from multiple views. In the third stage a single-view learner learns how to recognize the same objects from single views. The recognizer is presented with line drawings representing 3-D scenes. A single-view recognizer segments the input into faces of possible 3-D objects, and attempts to match the segmented scene with a set of single-view definitions of 3-D objects. The result of the recognition may include several alternative answers, corresponding to different 3-D objects. A unique answer can be obtained by making assumptions about hidden elements (e. g. faces) of an object and using a multiple-view recognizer. Both single-view and multiple-view recognition are based on the structural relations of the elements that make up a 3-D object. Some analytical elements (e. g. angles) of the objects are also calculated, in order to determine point containment and conveziti. The system performs well on polyhedra with triangular and quadrilateral faces. A discussion of the system's performance and suggestions for further development is given at the end. The simulator and the part of the recognizer that makes the analytical calculations are written in C. The learner and the rest of the recognizer are written in PROLOG

    Robust arbitrary-view gait recognition based on 3D partial similarity matching

    Get PDF
    Existing view-invariant gait recognition methods encounter difficulties due to limited number of available gait views and varying conditions during training. This paper proposes gait partial similarity matching that assumes a 3-dimensional (3D) object shares common view surfaces in significantly different views. Detecting such surfaces aids the extraction of gait features from multiple views. 3D parametric body models are morphed by pose and shape deformation from a template model using 2-dimensional (2D) gait silhouette as observation. The gait pose is estimated by a level set energy cost function from silhouettes including incomplete ones. Body shape deformation is achieved via Laplacian deformation energy function associated with inpainting gait silhouettes. Partial gait silhouettes in different views are extracted by gait partial region of interest elements selection and re-projected onto 2D space to construct partial gait energy images. A synthetic database with destination views and multi-linear subspace classifier fused with majority voting are used to achieve arbitrary view gait recognition that is robust to varying conditions. Experimental results on CMU, CASIA B, TUM-IITKGP, AVAMVG and KY4D datasets show the efficacy of the propose method

    Exploiting object dynamics for recognition and control

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2007.Includes bibliographical references (p. 127-132).This thesis explores how state-of-the-art object recognition methods can benefit from integrating information across multiple observations of an object. Considered are active vision systems that allow to steer the camera along predetermined trajectories, resulting in sweeps of ordered views of an object. For systems of this kind, a solution is presented that exploits the order relationship between successive frames to derive a classifier based on the characteristic motion of local features across the sweep. It is shown that this motion model reveals structural information about the object that can be exploited for recognition. The main contribution of this thesis is a recognition system that extends invariant local features (shape context) into the time domain by adding the mentioned feature motion model into a joint classifier. Second, an entropy-based view selection scheme is presented that allows the vision system to skip ahead to highly discriminative viewing positions. Using two datasets, one standard (ETH-80) and one collected from our robot head, both feature motion and active view selection extensions are shown to achieve a higher-quality hypothesis about the presented object quicker than a baseline system treating object views as an unordered stream of images.by Philipp Robbel.S.M
    corecore