334,764 research outputs found
3D Object Recognition Using Multiple Views And Neural Networks.
This paper proposes a method for recognition and classification of 3D objects. The method is based on 2D moments and neural networks. The 2D moments are calculated based on 2D intensity images taken from multiple cameras that have been arranged using multiple views technique. 2D moments are commonly used for 2D pattern recognition
Object recognition using multi-view imaging
Single view imaging data has been used in most previous research in computer vision and
image understanding and lots of techniques have been developed. Recently with the fast
development and dropping cost of multiple cameras, it has become possible to have many
more views to achieve image processing tasks. This thesis will consider how to use the
obtained multiple images in the application of target object recognition.
In this context, we present two algorithms for object recognition based on scale-
invariant feature points. The first is single view object recognition method (SOR), which
operates on single images and uses a chirality constraint to reduce the recognition errors
that arise when only a small number of feature points are matched. The procedure is
extended in the second multi-view object recognition algorithm (MOR) which operates on
a multi-view image sequence and, by tracking feature points using a dynamic programming
method in the plenoptic domain subject to the epipolar constraint, is able to fuse feature
point matches from all the available images, resulting in more robust recognition.
We evaluated these algorithms using a number of data sets of real images capturing
both indoor and outdoor scenes. We demonstrate that MOR is better than SOR particularly for noisy and low resolution images, and it is also able to recognize objects that are
partially occluded by combining it with some segmentation techniques
MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion
Robots and other smart devices need efficient object-based scene
representations from their on-board vision systems to reason about contact,
physics and occlusion. Recognized precise object models will play an important
role alongside non-parametric reconstructions of unrecognized structures. We
present a system which can estimate the accurate poses of multiple known
objects in contact and occlusion from real-time, embodied multi-view vision.
Our approach makes 3D object pose proposals from single RGB-D views,
accumulates pose estimates and non-parametric occupancy information from
multiple views as the camera moves, and performs joint optimization to estimate
consistent, non-intersecting poses for multiple objects in contact.
We verify the accuracy and robustness of our approach experimentally on 2
object datasets: YCB-Video, and our own challenging Cluttered YCB-Video. We
demonstrate a real-time robotics application where a robot arm precisely and
orderly disassembles complicated piles of objects, using only on-board RGB-D
vision.Comment: 10 pages, 10 figures, IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) 202
Learning Dense Object Descriptors from Multiple Views for Low-shot Category Generalization
A hallmark of the deep learning era for computer vision is the successful use
of large-scale labeled datasets to train feature representations for tasks
ranging from object recognition and semantic segmentation to optical flow
estimation and novel view synthesis of 3D scenes. In this work, we aim to learn
dense discriminative object representations for low-shot category recognition
without requiring any category labels. To this end, we propose Deep Object
Patch Encodings (DOPE), which can be trained from multiple views of object
instances without any category or semantic object part labels. To train DOPE,
we assume access to sparse depths, foreground masks and known cameras, to
obtain pixel-level correspondences between views of an object, and use this to
formulate a self-supervised learning task to learn discriminative object
patches. We find that DOPE can directly be used for low-shot classification of
novel categories using local-part matching, and is competitive with and
outperforms supervised and self-supervised learning baselines. Code and data
available at https://github.com/rehg-lab/dope_selfsup.Comment: Accepted at NeurIPS 2022. Code and data available at
https://github.com/rehg-lab/dope_selfsu
A system that learns to recognize 3-D objects
A system that learns to recognize 3-D objects from single and
multiple views is presented. It consists of three parts: a simulator
of 3-D figures, a Learner, and a recognizer.
The 3-D figure simulator generates and plots line drawings of
certain 3-D objects. A series of transformations leads to a number of
2-D images of a 3-D object, which are considered as different views
and are the basic input to the next two parts.
The learner works in three stages using the method of Learning
from examples. In the first stage an elementary-concept learner learns
the basic entities that make up a line drawing. In the second stage a
multiple-view learner learns the definitions of 3-D objects that are to
be recognized from multiple views. In the third stage a single-view
learner learns how to recognize the same objects from single views.
The recognizer is presented with line drawings representing 3-D
scenes. A single-view recognizer segments the input into faces of
possible 3-D objects, and attempts to match the segmented scene with a
set of single-view definitions of 3-D objects. The result of the
recognition may include several alternative answers, corresponding to
different 3-D objects. A unique answer can be obtained by making
assumptions about hidden elements (e. g. faces) of an object and using a
multiple-view recognizer. Both single-view and multiple-view recognition
are based on the structural relations of the elements that make up a
3-D object. Some analytical elements (e. g. angles) of the objects are
also calculated, in order to determine point containment and conveziti.
The system performs well on polyhedra with triangular and
quadrilateral faces. A discussion of the system's performance and
suggestions for further development is given at the end.
The simulator and the part of the recognizer that makes the
analytical calculations are written in C. The learner and the rest
of the recognizer are written in PROLOG
Robust arbitrary-view gait recognition based on 3D partial similarity matching
Existing view-invariant gait recognition methods encounter difficulties due to limited number of available gait views and varying conditions during training. This paper proposes gait partial similarity matching that assumes a 3-dimensional (3D) object shares common view surfaces in significantly different views. Detecting such surfaces aids the extraction of gait features from multiple views. 3D parametric body models are morphed by pose and shape deformation from a template model using 2-dimensional (2D) gait silhouette as observation. The gait pose is estimated by a level set energy cost function from silhouettes including incomplete ones. Body shape deformation is achieved via Laplacian deformation energy function associated with inpainting gait silhouettes. Partial gait silhouettes in different views are extracted by gait partial region of interest elements selection and re-projected onto 2D space to construct partial gait energy images. A synthetic database with destination views and multi-linear subspace classifier fused with majority voting are used to achieve arbitrary view gait recognition that is robust to varying conditions. Experimental results on CMU, CASIA B, TUM-IITKGP, AVAMVG and KY4D datasets show the efficacy of the propose method
Exploiting object dynamics for recognition and control
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2007.Includes bibliographical references (p. 127-132).This thesis explores how state-of-the-art object recognition methods can benefit from integrating information across multiple observations of an object. Considered are active vision systems that allow to steer the camera along predetermined trajectories, resulting in sweeps of ordered views of an object. For systems of this kind, a solution is presented that exploits the order relationship between successive frames to derive a classifier based on the characteristic motion of local features across the sweep. It is shown that this motion model reveals structural information about the object that can be exploited for recognition. The main contribution of this thesis is a recognition system that extends invariant local features (shape context) into the time domain by adding the mentioned feature motion model into a joint classifier. Second, an entropy-based view selection scheme is presented that allows the vision system to skip ahead to highly discriminative viewing positions. Using two datasets, one standard (ETH-80) and one collected from our robot head, both feature motion and active view selection extensions are shown to achieve a higher-quality hypothesis about the presented object quicker than a baseline system treating object views as an unordered stream of images.by Philipp Robbel.S.M
- …