6,352 research outputs found
PhotoShape: Photorealistic Materials for Large-Scale Shape Collections
Existing online 3D shape repositories contain thousands of 3D models but lack
photorealistic appearance. We present an approach to automatically assign
high-quality, realistic appearance models to large scale 3D shape collections.
The key idea is to jointly leverage three types of online data -- shape
collections, material collections, and photo collections, using the photos as
reference to guide assignment of materials to shapes. By generating a large
number of synthetic renderings, we train a convolutional neural network to
classify materials in real photos, and employ 3D-2D alignment techniques to
transfer materials to different parts of each shape model. Our system produces
photorealistic, relightable, 3D shapes (PhotoShapes).Comment: To be presented at SIGGRAPH Asia 2018. Project page:
https://keunhong.com/publications/photoshape
Deep Face Feature for Face Alignment
In this paper, we present a deep learning based image feature extraction
method designed specifically for face images. To train the feature extraction
model, we construct a large scale photo-realistic face image dataset with
ground-truth correspondence between multi-view face images, which are
synthesized from real photographs via an inverse rendering procedure. The deep
face feature (DFF) is trained using correspondence between face images rendered
from different views. Using the trained DFF model, we can extract a feature
vector for each pixel of a face image, which distinguishes different facial
regions and is shown to be more effective than general-purpose feature
descriptors for face-related tasks such as matching and alignment. Based on the
DFF, we develop a robust face alignment method, which iteratively updates
landmarks, pose and 3D shape. Extensive experiments demonstrate that our method
can achieve state-of-the-art results for face alignment under highly
unconstrained face images
DSR: Direct Self-rectification for Uncalibrated Dual-lens Cameras
With the developments of dual-lens camera modules,depth information
representing the third dimension of thecaptured scenes becomes available for
smartphones. It isestimated by stereo matching algorithms, taking as input
thetwo views captured by dual-lens cameras at slightly differ-ent viewpoints.
Depth-of-field rendering (also be referred toas synthetic defocus or bokeh) is
one of the trending depth-based applications. However, to achieve fast depth
estima-tion on smartphones, the stereo pairs need to be rectified inthe first
place. In this paper, we propose a cost-effective so-lution to perform stereo
rectification for dual-lens camerascalled direct self-rectification, short for
DSR1. It removesthe need of individual offline calibration for every pair
ofdual-lens cameras. In addition, the proposed solution isrobust to the slight
movements, e.g., due to collisions, ofthe dual-lens cameras after fabrication.
Different with ex-isting self-rectification approaches, our approach
computesthe homography in a novel way with zero geometric distor-tions
introduced to the master image. It is achieved by di-rectly minimizing the
vertical displacements of correspond-ing points between the original master
image and the trans-formed slave image. Our method is evaluated on both
real-istic and synthetic stereo image pairs, and produces supe-rior results
compared to the calibrated rectification or otherself-rectification approachesComment: Accepted at 3DV201
Object Recognition by Using Multi-level Feature Point Extraction
In this paper, we present a novel approach for object recognition in
real-time by employing multilevel feature analysis and demonstrate the
practicality of adapting feature extraction into a Naive Bayesian
classification framework that enables simple, efficient, and robust
performance. We also show the proposed method scales well as the number of
level-classes grows. To effectively understand the patches surrounding a
keypoint, the trained classifier uses hundreds of simple binary features and
models class posterior probabilities. In addition, the classification process
is computationally cheap under the assumed independence between arbitrary sets
of features. Even though for some particular scenarios, this assumption can be
invalid. We demonstrate that the efficient classifier nevertheless performs
remarkably well on image datasets with a large variation in the illumination
environment and image capture perspectives. The experiment results show
consistent accuracy can be achieved on many challenging dataset while offer
interactive speed for large resolution images. The method demonstrates
promising results that outperform the state-of-the-art methods on pattern
recognition
Leveraging Photogrammetric Mesh Models for Aerial-Ground Feature Point Matching Toward Integrated 3D Reconstruction
Integration of aerial and ground images has been proved as an efficient
approach to enhance the surface reconstruction in urban environments. However,
as the first step, the feature point matching between aerial and ground images
is remarkably difficult, due to the large differences in viewpoint and
illumination conditions. Previous studies based on geometry-aware image
rectification have alleviated this problem, but the performance and convenience
of this strategy is limited by several flaws, e.g. quadratic image pairs,
segregated extraction of descriptors and occlusions. To address these problems,
we propose a novel approach: leveraging photogrammetric mesh models for
aerial-ground image matching. The methods of this proposed approach have linear
time complexity with regard to the number of images, can explicitly handle low
overlap using multi-view images and can be directly injected into off-the-shelf
structure-from-motion (SfM) and multi-view stereo (MVS) solutions. First,
aerial and ground images are reconstructed separately and initially
co-registered through weak georeferencing data. Second, aerial models are
rendered to the initial ground views, in which the color, depth and normal
images are obtained. Then, the synthesized color images and the corresponding
ground images are matched by comparing the descriptors, filtered by local
geometrical information, and then propagated to the aerial views using depth
images and patch-based matching. Experimental evaluations using various
datasets confirm the superior performance of the proposed methods in
aerial-ground image matching. In addition, incorporation of the existing SfM
and MVS solutions into these methods enables more complete and accurate models
to be directly obtained.Comment: Accepted for publication in ISPRS Journal of Photogrammetry and
Remote Sensin
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
Bridging the 'reality gap' that separates simulated robotics from experiments
on hardware could accelerate robotic research through improved data
availability. This paper explores domain randomization, a simple technique for
training models on simulated images that transfer to real images by randomizing
rendering in the simulator. With enough variability in the simulator, the real
world may appear to the model as just another variation. We focus on the task
of object localization, which is a stepping stone to general robotic
manipulation skills. We find that it is possible to train a real-world object
detector that is accurate to cm and robust to distractors and partial
occlusions using only data from a simulator with non-realistic random textures.
To demonstrate the capabilities of our detectors, we show they can be used to
perform grasping in a cluttered environment. To our knowledge, this is the
first successful transfer of a deep neural network trained only on simulated
RGB images (without pre-training on real images) to the real world for the
purpose of robotic control.Comment: 8 pages, 7 figures. Submitted to 2017 IEEE/RSJ International
Conference on Intelligent Robots and Systems (IROS 2017
Security and Privacy Approaches in Mixed Reality: A Literature Survey
Mixed reality (MR) technology development is now gaining momentum due to
advances in computer vision, sensor fusion, and realistic display technologies.
With most of the research and development focused on delivering the promise of
MR, there is only barely a few working on the privacy and security implications
of this technology. This survey paper aims to put in to light these risks, and
to look into the latest security and privacy work on MR. Specifically, we list
and review the different protection approaches that have been proposed to
ensure user and data security and privacy in MR. We extend the scope to include
work on related technologies such as augmented reality (AR), virtual reality
(VR), and human-computer interaction (HCI) as crucial components, if not the
origins, of MR, as well as numerous related work from the larger area of mobile
devices, wearables, and Internet-of-Things (IoT). We highlight the lack of
investigation, implementation, and evaluation of data protection approaches in
MR. Further challenges and directions on MR security and privacy are also
discussed.Comment: 41 pages, 11 figures, 2 tables (3 tables at the appendix); updated
references in page 1
Visual Localization Under Appearance Change: A Filtering Approach
A major focus of current research on place recognition is visual localization
for autonomous driving. In this scenario, as cameras will be operating
continuously, it is realistic to expect videos as an input to visual
localization algorithms, as opposed to the single-image querying approach used
in other place recognition works. In this paper, we show that exploiting
temporal continuity in the testing sequence significantly improves visual
localization - qualitatively and quantitatively. Although intuitive, this idea
has not been fully explored in recent works. Our main contribution is a novel
Monte Carlo-based visual localization technique that can efficiently reason
over the image sequence. Also, we propose an image retrieval pipeline which
relies on local features and an encoding technique to represent an image as a
single vector. The experimental results show that our proposed method achieves
better results than state-of-the-art approaches for the task on visual
localization under significant appearance change. Our synthetic dataset and
source code are publicly made available.Comment: Best paper award at DICTA 201
Drought Stress Classification using 3D Plant Models
Quantification of physiological changes in plants can capture different
drought mechanisms and assist in selection of tolerant varieties in a high
throughput manner. In this context, an accurate 3D model of plant canopy
provides a reliable representation for drought stress characterization in
contrast to using 2D images. In this paper, we propose a novel end-to-end
pipeline including 3D reconstruction, segmentation and feature extraction,
leveraging deep neural networks at various stages, for drought stress study. To
overcome the high degree of self-similarities and self-occlusions in plant
canopy, prior knowledge of leaf shape based on features from deep siamese
network are used to construct an accurate 3D model using structure from motion
on wheat plants. The drought stress is characterized with a deep network based
feature aggregation. We compare the proposed methodology on several
descriptors, and show that the network outperforms conventional methods.Comment: Appears in Workshop on Computer Vision Problems in Plant Phenotyping
(CVPPP), International Conference on Computer Vision (ICCV) 201
3D Pose Estimation and 3D Model Retrieval for Objects in the Wild
We propose a scalable, efficient and accurate approach to retrieve 3D models
for objects in the wild. Our contribution is twofold. We first present a 3D
pose estimation approach for object categories which significantly outperforms
the state-of-the-art on Pascal3D+. Second, we use the estimated pose as a prior
to retrieve 3D models which accurately represent the geometry of objects in RGB
images. For this purpose, we render depth images from 3D models under our
predicted pose and match learned image descriptors of RGB images against those
of rendered depth images using a CNN-based multi-view metric learning approach.
In this way, we are the first to report quantitative results for 3D model
retrieval on Pascal3D+, where our method chooses the same models as human
annotators for 50% of the validation images on average. In addition, we show
that our method, which was trained purely on Pascal3D+, retrieves rich and
accurate 3D models from ShapeNet given RGB images of objects in the wild.Comment: Accepted to Conference on Computer Vision and Pattern Recognition
(CVPR) 201
- …