9,341 research outputs found
Learning a Local Feature Descriptor for 3D LiDAR Scans
Robust data association is necessary for virtually every SLAM system and
finding corresponding points is typically a preprocessing step for scan
alignment algorithms. Traditionally, handcrafted feature descriptors were used
for these problems but recently learned descriptors have been shown to perform
more robustly. In this work, we propose a local feature descriptor for 3D LiDAR
scans. The descriptor is learned using a Convolutional Neural Network (CNN).
Our proposed architecture consists of a Siamese network for learning a feature
descriptor and a metric learning network for matching the descriptors. We also
present a method for estimating local surface patches and obtaining
ground-truth correspondences. In extensive experiments, we compare our learned
feature descriptor with existing 3D local descriptors and report highly
competitive results for multiple experiments in terms of matching accuracy and
computation time. \end{abstract}Comment: Accepted for IROS-2018. Project details and code:
http://deep3d-descriptor.informatik.uni-freiburg.de
A Performance Evaluation of Local Features for Image Based 3D Reconstruction
This paper performs a comprehensive and comparative evaluation of the state
of the art local features for the task of image based 3D reconstruction. The
evaluated local features cover the recently developed ones by using powerful
machine learning techniques and the elaborately designed handcrafted features.
To obtain a comprehensive evaluation, we choose to include both float type
features and binary ones. Meanwhile, two kinds of datasets have been used in
this evaluation. One is a dataset of many different scene types with
groundtruth 3D points, containing images of different scenes captured at fixed
positions, for quantitative performance evaluation of different local features
in the controlled image capturing situations. The other dataset contains
Internet scale image sets of several landmarks with a lot of unrelated images,
which is used for qualitative performance evaluation of different local
features in the free image collection situations. Our experimental results show
that binary features are competent to reconstruct scenes from controlled image
sequences with only a fraction of processing time compared to use float type
features. However, for the case of large scale image set with many distracting
images, float type features show a clear advantage over binary ones
Drought Stress Classification using 3D Plant Models
Quantification of physiological changes in plants can capture different
drought mechanisms and assist in selection of tolerant varieties in a high
throughput manner. In this context, an accurate 3D model of plant canopy
provides a reliable representation for drought stress characterization in
contrast to using 2D images. In this paper, we propose a novel end-to-end
pipeline including 3D reconstruction, segmentation and feature extraction,
leveraging deep neural networks at various stages, for drought stress study. To
overcome the high degree of self-similarities and self-occlusions in plant
canopy, prior knowledge of leaf shape based on features from deep siamese
network are used to construct an accurate 3D model using structure from motion
on wheat plants. The drought stress is characterized with a deep network based
feature aggregation. We compare the proposed methodology on several
descriptors, and show that the network outperforms conventional methods.Comment: Appears in Workshop on Computer Vision Problems in Plant Phenotyping
(CVPPP), International Conference on Computer Vision (ICCV) 201
D2D: Keypoint Extraction with Describe to Detect Approach
In this paper, we present a novel approach that exploits the information
within the descriptor space to propose keypoint locations. Detect then
describe, or detect and describe jointly are two typical strategies for
extracting local descriptors. In contrast, we propose an approach that inverts
this process by first describing and then detecting the keypoint locations. %
Describe-to-Detect (D2D) leverages successful descriptor models without the
need for any additional training. Our method selects keypoints as salient
locations with high information content which is defined by the descriptors
rather than some independent operators. We perform experiments on multiple
benchmarks including image matching, camera localisation, and 3D
reconstruction. The results indicate that our method improves the matching
performance of various descriptors and that it generalises across methods and
tasks
Image Processing on IOPA Radiographs: A comprehensive case study on Apical Periodontitis
With the recent advancements in Image Processing Techniques and development
of new robust computer vision algorithms, new areas of research within Medical
Diagnosis and Biomedical Engineering are picking up pace. This paper provides a
comprehensive in-depth case study of Image Processing, Feature Extraction and
Analysis of Apical Periodontitis diagnostic cases in IOPA (Intra Oral
Peri-Apical) Radiographs, a common case in oral diagnostic pipeline. This paper
provides a detailed analytical approach towards improving the diagnostic
procedure with improved and faster results with higher accuracy targeting to
eliminate True Negative and False Positive cases.Comment: 15 pages, 42 figures and Submitted at ICIAP 2019: 21st International
Conference on Image Analysis and Processin
3D Scan Registration using Curvelet Features in Planetary Environments
Topographic mapping in planetary environments relies on accurate 3D scan
registration methods. However, most global registration algorithms relying on
features such as FPFH and Harris-3D show poor alignment accuracy in these
settings due to the poor structure of the Mars-like terrain and variable
resolution, occluded, sparse range data that is hard to register without some
a-priori knowledge of the environment. In this paper, we propose an alternative
approach to 3D scan registration using the curvelet transform that performs
multi-resolution geometric analysis to obtain a set of coefficients indexed by
scale (coarsest to finest), angle and spatial position. Features are detected
in the curvelet domain to take advantage of the directional selectivity of the
transform. A descriptor is computed for each feature by calculating the 3D
spatial histogram of the image gradients, and nearest neighbor based matching
is used to calculate the feature correspondences. Correspondence rejection
using Random Sample Consensus identifies inliers, and a locally optimal
Singular Value Decomposition-based estimation of the rigid-body transformation
aligns the laser scans given the re-projected correspondences in the metric
space. Experimental results on a publicly available data-set of planetary
analogue indoor facility, as well as simulated and real-world scans from Neptec
Design Group's IVIGMS 3D laser rangefinder at the outdoor CSA Mars yard
demonstrates improved performance over existing methods in the challenging
sparse Mars-like terrain.Comment: 27 pages in Journal of Field Robotics, 201
DASC: Robust Dense Descriptor for Multi-modal and Multi-spectral Correspondence Estimation
Establishing dense correspondences between multiple images is a fundamental
task in many applications. However, finding a reliable correspondence in
multi-modal or multi-spectral images still remains unsolved due to their
challenging photometric and geometric variations. In this paper, we propose a
novel dense descriptor, called dense adaptive self-correlation (DASC), to
estimate multi-modal and multi-spectral dense correspondences. Based on an
observation that self-similarity existing within images is robust to imaging
modality variations, we define the descriptor with a series of an adaptive
self-correlation similarity measure between patches sampled by a randomized
receptive field pooling, in which a sampling pattern is obtained using a
discriminative learning. The computational redundancy of dense descriptors is
dramatically reduced by applying fast edge-aware filtering. Furthermore, in
order to address geometric variations including scale and rotation, we propose
a geometry-invariant DASC (GI-DASC) descriptor that effectively leverages the
DASC through a superpixel-based representation. For a quantitative evaluation
of the GI-DASC, we build a novel multi-modal benchmark as varying photometric
and geometric conditions. Experimental results demonstrate the outstanding
performance of the DASC and GI-DASC in many cases of multi-modal and
multi-spectral dense correspondences
A Sparse Representation of Complete Local Binary Pattern Histogram for Human Face Recognition
Human face recognition has been a long standing problem in computer vision
and pattern recognition. Facial analysis can be viewed as a two-fold problem,
namely (i) facial representation, and (ii) classification. So far, many face
representations have been proposed, a well-known method is the Local Binary
Pattern (LBP), which has witnessed a growing interest. In this respect, we
treat in this paper the issues of face representation as well as classification
in a novel manner. On the one hand, we use a variant to LBP, so-called Complete
Local Binary Pattern (CLBP), which differs from the basic LBP by coding a given
local region using a given central pixel and Sing_ Magnitude difference.
Subsequently, most of LBPbased descriptors use a fixed grid to code a given
facial image, which technique is, in most cases, not robust to pose variation
and misalignment. To cope with such issue, a representative Multi-Resolution
Histogram (MH) decomposition is adopted in our work. On the other hand, having
the histograms of the considered images extracted, we exploit their sparsity to
construct a so-called Sparse Representation Classifier (SRC) for further face
classification. Experimental results have been conducted on ORL face database,
and pointed out the superiority of our scheme over other popular
state-of-the-art techniques.Comment: Accepted (but unattended) in IEEE-EMBS International Conferences on
Biomedical and Health Informatics (BHI
Feature-based groupwise registration of historical aerial images to present-day ortho-photo maps
In this paper, we address the registration of historical WWII images to
present-day ortho-photo maps for the purpose of geolocalization. Due to the
challenging nature of this problem, we propose to register the images jointly
as a group rather than in a step-by-step manner. To this end, we exploit Hough
Voting spaces as pairwise registration estimators and show how they can be
integrated into a probabilistic groupwise registration framework that can be
efficiently optimized. The feature-based nature of our registration framework
allows to register images with a-priori unknown translational and rotational
relations, and is also able to handle scale changes of up to 30% in our test
data due to a final geometrically guided matching step. The superiority of the
proposed method over existing pairwise and groupwise registration methods is
demonstrated on eight highly challenging sets of historical images with
corresponding ortho-photo maps.Comment: Under review at Elsevier Pattern Recognitio
Semantic Image Networks for Human Action Recognition
In this paper, we propose the use of a semantic image, an improved
representation for video analysis, principally in combination with Inception
networks. The semantic image is obtained by applying localized sparse
segmentation using global clustering (LSSGC) prior to the approximate rank
pooling which summarizes the motion characteristics in single or multiple
images. It incorporates the background information by overlaying a static
background from the window onto the subsequent segmented frames. The idea is to
improve the action-motion dynamics by focusing on the region which is important
for action recognition and encoding the temporal variances using the frame
ranking method. We also propose the sequential combination of
Inception-ResNetv2 and long-short-term memory network (LSTM) to leverage the
temporal variances for improved recognition performance. Extensive analysis has
been carried out on UCF101 and HMDB51 datasets which are widely used in action
recognition studies. We show that (i) the semantic image generates better
activations and converges faster than its original variant, (ii) using
segmentation prior to approximate rank pooling yields better recognition
performance, (iii) The use of LSTM leverages the temporal variance information
from approximate rank pooling to model the action behavior better than the base
network, (iv) the proposed representations can be adaptive as they can be used
with existing methods such as temporal segment networks to improve the
recognition performance, and (v) our proposed four-stream network architecture
comprising of semantic images and semantic optical flows achieves
state-of-the-art performance, 95.9% and 73.5% recognition accuracy on UCF101
and HMDB51, respectively.Comment: 30 page
- …