10,824 research outputs found
Render4Completion: Synthesizing Multi-View Depth Maps for 3D Shape Completion
We propose a novel approach for 3D shape completion by synthesizing
multi-view depth maps. While previous work for shape completion relies on
volumetric representations, meshes, or point clouds, we propose to use
multi-view depth maps from a set of fixed viewing angles as our shape
representation. This allows us to be free of the limitations of memory for
volumetric representations and point clouds by casting shape completion into an
image-to-image translation problem. Specifically, we render depth maps of the
incomplete shape from a fixed set of viewpoints, and perform depth map
completion in each view. Different from image-to-image translation network that
completes each view separately, our novel network, multi-view completion net
(MVCN), leverages information from all views of a 3D shape to help the
completion of each single view. This enables MVCN to leverage more information
from different depth views to achieve high accuracy in single depth view
completion and keep the consistency among the completed depth images in
different views. Benefited by the multi-view representation and the novel
network structure, MVCN significantly improves the accuracy of 3D shape
completion in large-scale benchmarks compared to the state of the art.Comment: ICCV 2019 workshop on Geometry meets Deep Learnin
Deep Learning Representation using Autoencoder for 3D Shape Retrieval
We study the problem of how to build a deep learning representation for 3D
shape. Deep learning has shown to be very effective in variety of visual
applications, such as image classification and object detection. However, it
has not been successfully applied to 3D shape recognition. This is because 3D
shape has complex structure in 3D space and there are limited number of 3D
shapes for feature learning. To address these problems, we project 3D shapes
into 2D space and use autoencoder for feature learning on the 2D images. High
accuracy 3D shape retrieval performance is obtained by aggregating the features
learned on 2D images. In addition, we show the proposed deep learning feature
is complementary to conventional local image descriptors. By combing the global
deep learning representation and the local descriptor representation, our
method can obtain the state-of-the-art performance on 3D shape retrieval
benchmarks.Comment: 6 pages, 7 figures, 2014ICSPA
Object Pose Estimation from Monocular Image using Multi-View Keypoint Correspondence
Understanding the geometry and pose of objects in 2D images is a fundamental
necessity for a wide range of real world applications. Driven by deep neural
networks, recent methods have brought significant improvements to object pose
estimation. However, they suffer due to scarcity of keypoint/pose-annotated
real images and hence can not exploit the object's 3D structural information
effectively. In this work, we propose a data-efficient method which utilizes
the geometric regularity of intraclass objects for pose estimation. First, we
learn pose-invariant local descriptors of object parts from simple 2D RGB
images. These descriptors, along with keypoints obtained from renders of a
fixed 3D template model are then used to generate keypoint correspondence maps
for a given monocular real image. Finally, a pose estimation network predicts
3D pose of the object using these correspondence maps. This pipeline is further
extended to a multi-view approach, which assimilates keypoint information from
correspondence sets generated from multiple views of the 3D template model.
Fusion of multi-view information significantly improves geometric comprehension
of the system which in turn enhances the pose estimation performance.
Furthermore, use of correspondence framework responsible for the learning of
pose invariant keypoint descriptor also allows us to effectively alleviate the
data-scarcity problem. This enables our method to achieve state-of-the-art
performance on multiple real-image viewpoint estimation datasets, such as
Pascal3D+ and ObjectNet3D. To encourage reproducible research, we have released
the codes for our proposed approach.Comment: Accepted in ECCV-W; Code available at this http url:
https://github.com/val-iisc/pose_estimatio
Drought Stress Classification using 3D Plant Models
Quantification of physiological changes in plants can capture different
drought mechanisms and assist in selection of tolerant varieties in a high
throughput manner. In this context, an accurate 3D model of plant canopy
provides a reliable representation for drought stress characterization in
contrast to using 2D images. In this paper, we propose a novel end-to-end
pipeline including 3D reconstruction, segmentation and feature extraction,
leveraging deep neural networks at various stages, for drought stress study. To
overcome the high degree of self-similarities and self-occlusions in plant
canopy, prior knowledge of leaf shape based on features from deep siamese
network are used to construct an accurate 3D model using structure from motion
on wheat plants. The drought stress is characterized with a deep network based
feature aggregation. We compare the proposed methodology on several
descriptors, and show that the network outperforms conventional methods.Comment: Appears in Workshop on Computer Vision Problems in Plant Phenotyping
(CVPPP), International Conference on Computer Vision (ICCV) 201
Learning Local Shape Descriptors from Part Correspondences With Multi-view Convolutional Networks
We present a new local descriptor for 3D shapes, directly applicable to a
wide range of shape analysis problems such as point correspondences, semantic
segmentation, affordance prediction, and shape-to-scan matching. The descriptor
is produced by a convolutional network that is trained to embed geometrically
and semantically similar points close to one another in descriptor space. The
network processes surface neighborhoods around points on a shape that are
captured at multiple scales by a succession of progressively zoomed out views,
taken from carefully selected camera positions. We leverage two extremely large
sources of data to train our network. First, since our network processes
rendered views in the form of 2D images, we repurpose architectures pre-trained
on massive image datasets. Second, we automatically generate a synthetic dense
point correspondence dataset by non-rigid alignment of corresponding shape
parts in a large collection of segmented 3D models. As a result of these design
choices, our network effectively encodes multi-scale local context and
fine-grained surface detail. Our network can be trained to produce either
category-specific descriptors or more generic descriptors by learning from
multiple shape categories. Once trained, at test time, the network extracts
local descriptors for shapes without requiring any part segmentation as input.
Our method can produce effective local descriptors even for shapes whose
category is unknown or different from the ones used while training. We
demonstrate through several experiments that our learned local descriptors are
more discriminative compared to state of the art alternatives, and are
effective in a variety of shape analysis applications
Visual Affordance and Function Understanding: A Survey
Nowadays, robots are dominating the manufacturing, entertainment and
healthcare industries. Robot vision aims to equip robots with the ability to
discover information, understand it and interact with the environment. These
capabilities require an agent to effectively understand object affordances and
functionalities in complex visual domains. In this literature survey, we first
focus on Visual affordances and summarize the state of the art as well as open
problems and research gaps. Specifically, we discuss sub-problems such as
affordance detection, categorization, segmentation and high-level reasoning.
Furthermore, we cover functional scene understanding and the prevalent
functional descriptors used in the literature. The survey also provides
necessary background to the problem, sheds light on its significance and
highlights the existing challenges for affordance and functionality learning.Comment: 26 pages, 22 image
SeqLPD: Sequence Matching Enhanced Loop-Closure Detection Based on Large-Scale Point Cloud Description for Self-Driving Vehicles
Place recognition and loop-closure detection are main challenges in the
localization, mapping and navigation tasks of self-driving vehicles. In this
paper, we solve the loop-closure detection problem by incorporating the
deep-learning based point cloud description method and the coarse-to-fine
sequence matching strategy. More specifically, we propose a deep neural network
to extract a global descriptor from the original large-scale 3D point cloud,
then based on which, a typical place analysis approach is presented to
investigate the feature space distribution of the global descriptors and select
several super keyframes. Finally, a coarse-to-fine strategy, which includes a
super keyframe based coarse matching stage and a local sequence matching stage,
is presented to ensure the loop-closure detection accuracy and real-time
performance simultaneously. Thanks to the sequence matching operation, the
proposed approach obtains an improvement against the existing deep-learning
based methods. Experiment results on a self-driving vehicle validate the
effectiveness of the proposed loop-closure detection algorithm.Comment: This paper has been accepted by IROS-201
A state of the art of urban reconstruction: street, street network, vegetation, urban feature
World population is raising, especially the part of people living in cities.
With increased population and complex roles regarding their inhabitants and
their surroundings, cities concentrate difficulties for design, planning and
analysis. These tasks require a way to reconstruct/model a city. Traditionally,
much attention has been given to buildings reconstruction, yet an essential
part of city were neglected: streets. Streets reconstruction has been seldom
researched. Streets are also complex compositions of urban features, and have a
unique role for transportation (as they comprise roads). We aim at completing
the recent state of the art for building reconstruction (Musialski2012) by
considering all other aspect of urban reconstruction. We introduce the need for
city models. Because reconstruction always necessitates data, we first analyse
which data are available. We then expose a state of the art of street
reconstruction, street network reconstruction, urban features
reconstruction/modelling, vegetation , and urban objects
reconstruction/modelling.
Although reconstruction strategies vary widely, we can order them by the role
the model plays, from data driven approach, to model-based approach, to inverse
procedural modelling and model catalogue matching. The main challenges seems to
come from the complex nature of urban environment and from the limitations of
the available data. Urban features have strong relationships, between them, and
to their surrounding, as well as in hierarchical relations. Procedural
modelling has the power to express these relations, and could be applied to the
reconstruction of urban features via the Inverse Procedural Modelling paradigm.Comment: Extracted from PhD (chap1
cvpaper.challenge in 2015 - A review of CVPR2015 and DeepSurvey
The "cvpaper.challenge" is a group composed of members from AIST, Tokyo Denki
Univ. (TDU), and Univ. of Tsukuba that aims to systematically summarize papers
on computer vision, pattern recognition, and related fields. For this
particular review, we focused on reading the ALL 602 conference papers
presented at the CVPR2015, the premier annual computer vision event held in
June 2015, in order to grasp the trends in the field. Further, we are proposing
"DeepSurvey" as a mechanism embodying the entire process from the reading
through all the papers, the generation of ideas, and to the writing of paper.Comment: Survey Pape
Human Action Recognition and Prediction: A Survey
Derived from rapid advances in computer vision and machine learning, video
analysis tasks have been moving from inferring the present state to predicting
the future state. Vision-based action recognition and prediction from videos
are such tasks, where action recognition is to infer human actions (present
state) based upon complete action executions, and action prediction to predict
human actions (future state) based upon incomplete action executions. These two
tasks have become particularly prevalent topics recently because of their
explosively emerging real-world applications, such as visual surveillance,
autonomous driving vehicle, entertainment, and video retrieval, etc. Many
attempts have been devoted in the last a few decades in order to build a robust
and effective framework for action recognition and prediction. In this paper,
we survey the complete state-of-the-art techniques in the action recognition
and prediction. Existing models, popular algorithms, technical difficulties,
popular action databases, evaluation protocols, and promising future directions
are also provided with systematic discussions
- …