3,593 research outputs found
FrameNet: Learning Local Canonical Frames of 3D Surfaces from a Single RGB Image
In this work, we introduce the novel problem of identifying dense canonical
3D coordinate frames from a single RGB image. We observe that each pixel in an
image corresponds to a surface in the underlying 3D geometry, where a canonical
frame can be identified as represented by three orthogonal axes, one along its
normal direction and two in its tangent plane. We propose an algorithm to
predict these axes from RGB. Our first insight is that canonical frames
computed automatically with recently introduced direction field synthesis
methods can provide training data for the task. Our second insight is that
networks designed for surface normal prediction provide better results when
trained jointly to predict canonical frames, and even better when trained to
also predict 2D projections of canonical frames. We conjecture this is because
projections of canonical tangent directions often align with local gradients in
images, and because those directions are tightly linked to 3D canonical frames
through projective geometry and orthogonality constraints. In our experiments,
we find that our method predicts 3D canonical frames that can be used in
applications ranging from surface normal estimation, feature matching, and
augmented reality
Development of An Android Application for Object Detection Based on Color, Shape, or Local Features
Object detection and recognition is an important task in many computer vision
applications. In this paper an Android application was developed using Eclipse
IDE and OpenCV3 Library. This application is able to detect objects in an image
that is loaded from the mobile gallery, based on its color, shape, or local
features. The image is processed in the HSV color domain for better color
detection. Circular shapes are detected using Circular Hough Transform and
other shapes are detected using Douglas-Peucker algorithm. BRISK (binary robust
invariant scalable keypoints) local features were applied in the developed
Android application for matching an object image in another scene image. The
steps of the proposed detection algorithms are described, and the interfaces of
the application are illustrated. The application is ported and tested on Galaxy
S3, S6, and Note1 Smartphones. Based on the experimental results, the
application is capable of detecting eleven different colors, detecting two
dimensional geometrical shapes including circles, rectangles, triangles, and
squares, and correctly match local features of object and scene images for
different conditions. The application could be used as a standalone
application, or as a part of another application such as Robot systems, traffic
systems, e-learning applications, information retrieval and many others
Estimating 6D Pose From Localizing Designated Surface Keypoints
In this paper, we present an accurate yet effective solution for 6D pose
estimation from an RGB image. The core of our approach is that we first
designate a set of surface points on target object model as keypoints and then
train a keypoint detector (KPD) to localize them. Finally a PnP algorithm can
recover the 6D pose according to the 2D-3D relationship of keypoints. Different
from recent state-of-the-art CNN-based approaches that rely on a time-consuming
post-processing procedure, our method can achieve competitive accuracy without
any refinement after pose prediction. Meanwhile, we obtain a 30% relative
improvement in terms of ADD accuracy among methods without using refinement.
Moreover, we succeed in handling heavy occlusion by selecting the most
confident keypoints to recover the 6D pose. For the sake of reproducibility, we
will make our code and models publicly available soon
Drought Stress Classification using 3D Plant Models
Quantification of physiological changes in plants can capture different
drought mechanisms and assist in selection of tolerant varieties in a high
throughput manner. In this context, an accurate 3D model of plant canopy
provides a reliable representation for drought stress characterization in
contrast to using 2D images. In this paper, we propose a novel end-to-end
pipeline including 3D reconstruction, segmentation and feature extraction,
leveraging deep neural networks at various stages, for drought stress study. To
overcome the high degree of self-similarities and self-occlusions in plant
canopy, prior knowledge of leaf shape based on features from deep siamese
network are used to construct an accurate 3D model using structure from motion
on wheat plants. The drought stress is characterized with a deep network based
feature aggregation. We compare the proposed methodology on several
descriptors, and show that the network outperforms conventional methods.Comment: Appears in Workshop on Computer Vision Problems in Plant Phenotyping
(CVPPP), International Conference on Computer Vision (ICCV) 201
A Hierarchical Distributed Processing Framework for Big Image Data
This paper introduces an effective processing framework nominated ICP (Image
Cloud Processing) to powerfully cope with the data explosion in image
processing field. While most previous researches focus on optimizing the image
processing algorithms to gain higher efficiency, our work dedicates to
providing a general framework for those image processing algorithms, which can
be implemented in parallel so as to achieve a boost in time efficiency without
compromising the results performance along with the increasing image scale. The
proposed ICP framework consists of two mechanisms, i.e. SICP (Static ICP) and
DICP (Dynamic ICP). Specifically, SICP is aimed at processing the big image
data pre-stored in the distributed system, while DICP is proposed for dynamic
input. To accomplish SICP, two novel data representations named P-Image and
Big-Image are designed to cooperate with MapReduce to achieve more optimized
configuration and higher efficiency. DICP is implemented through a parallel
processing procedure working with the traditional processing mechanism of the
distributed system. Representative results of comprehensive experiments on the
challenging ImageNet dataset are selected to validate the capacity of our
proposed ICP framework over the traditional state-of-the-art methods, both in
time efficiency and quality of results
Unsupervised Domain Adaptation for 3D Keypoint Estimation via View Consistency
In this paper, we introduce a novel unsupervised domain adaptation technique
for the task of 3D keypoint prediction from a single depth scan or image. Our
key idea is to utilize the fact that predictions from different views of the
same or similar objects should be consistent with each other. Such view
consistency can provide effective regularization for keypoint prediction on
unlabeled instances. In addition, we introduce a geometric alignment term to
regularize predictions in the target domain. The resulting loss function can be
effectively optimized via alternating minimization. We demonstrate the
effectiveness of our approach on real datasets and present experimental results
showing that our approach is superior to state-of-the-art general-purpose
domain adaptation techniques.Comment: ECCV 201
Texture Object Segmentation Based on Affine Invariant Texture Detection
To solve the issue of segmenting rich texture images, a novel detection
methods based on the affine invariable principle is proposed. Considering the
similarity between the texture areas, we first take the affine transform to get
numerous shapes, and utilize the KLT algorithm to verify the similarity. The
transforms include rotation, proportional transformation and perspective
deformation to cope with a variety of situations. Then we propose an improved
LBP method combining canny edge detection to handle the boundary in the
segmentation process. Moreover, human-computer interaction of this method which
helps splitting the matched texture area from the original images is
user-friendly.Comment: 6pages, 15 figure
ENFT: Efficient Non-Consecutive Feature Tracking for Robust Structure-from-Motion
Structure-from-motion (SfM) largely relies on feature tracking. In image
sequences, if disjointed tracks caused by objects moving in and out of the
field of view, occasional occlusion, or image noise, are not handled well,
corresponding SfM could be affected. This problem becomes severer for
large-scale scenes, which typically requires to capture multiple sequences to
cover the whole scene. In this paper, we propose an efficient non-consecutive
feature tracking (ENFT) framework to match interrupted tracks distributed in
different subsequences or even in different videos. Our framework consists of
steps of solving the feature `dropout' problem when indistinctive structures,
noise or large image distortion exists, and of rapidly recognizing and joining
common features located in different subsequences. In addition, we contribute
an effective segment-based coarse-to-fine SfM algorithm for robustly handling
large datasets. Experimental results on challenging video data demonstrate the
effectiveness of the proposed system.Comment: 15 pages, 12 figure
Simultaneous Joint and Object Trajectory Templates for Human Activity Recognition from 3-D Data
The availability of low-cost range sensors and the development of relatively
robust algorithms for the extraction of skeleton joint locations have inspired
many researchers to develop human activity recognition methods using the 3-D
data. In this paper, an effective method for the recognition of human
activities from the normalized joint trajectories is proposed. We represent the
actions as multidimensional signals and introduce a novel method for generating
action templates by averaging the samples in a "dynamic time" sense. Then in
order to deal with the variations in the speed and style of performing actions,
we warp the samples to the action templates by an efficient algorithm and
employ wavelet filters to extract meaningful spatiotemporal features. The
proposed method is also capable of modeling the human-object interactions, by
performing the template generation and temporal warping procedure via the joint
and object trajectories simultaneously. The experimental evaluation on several
challenging datasets demonstrates the effectiveness of our method compared to
the state-of-the-arts
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
- …