48 research outputs found
Fast Hierarchical Depth Map Computation from Stereo
Disparity by Block Matching stereo is usually used in applications with
limited computational power in order to get depth estimates. However, the
research on simple stereo methods has been lesser than the energy based
counterparts which promise a better quality depth map with more potential for
future improvements. Semi-global-matching (SGM) methods offer good performance
and easy implementation but suffer from the problem of very high memory
footprint because it's working on the full disparity space image. On the other
hand, Block matching stereo needs much less memory. In this paper, we introduce
a novel multi-scale-hierarchical block-matching approach using a pyramidal
variant of depth and cost functions which drastically improves the results of
standard block matching stereo techniques while preserving the low memory
footprint and further reducing the complexity of standard block matching. We
tested our new multi block matching scheme on the Middlebury stereo benchmark.
For the Middlebury benchmark we get results that are only slightly worse than
state of the art SGM implementations.Comment: Submitted to International Conference on Pattern Recognition and
Artificial Intelligence, 201
Deep feature fusion for self-supervised monocular depth prediction
Recent advances in end-to-end unsupervised learning has significantly
improved the performance of monocular depth prediction and alleviated the
requirement of ground truth depth. Although a plethora of work has been done in
enforcing various structural constraints by incorporating multiple losses
utilising smoothness, left-right consistency, regularisation and matching
surface normals, a few of them take into consideration multi-scale structures
present in real world images. Most works utilise a VGG16 or ResNet50 model
pre-trained on ImageNet weights for predicting depth. We propose a deep feature
fusion method utilising features at multiple scales for learning
self-supervised depth from scratch. Our fusion network selects features from
both upper and lower levels at every level in the encoder network, thereby
creating multiple feature pyramid sub-networks that are fed to the decoder
after applying the CoordConv solution. We also propose a refinement module
learning higher scale residual depth from a combination of higher level deep
features and lower level residual depth using a pixel shuffling framework that
super-resolves lower level residual depth. We select the KITTI dataset for
evaluation and show that our proposed architecture can produce better or
comparable results in depth prediction.Comment: 4 pages, 2 Tables, 2 Figure
DeepPoint3D: Learning Discriminative Local Descriptors using Deep Metric Learning on 3D Point Clouds
Learning local descriptors is an important problem in computer vision. While
there are many techniques for learning local patch descriptors for 2D images,
recently efforts have been made for learning local descriptors for 3D points.
The recent progress towards solving this problem in 3D leverages the strong
feature representation capability of image based convolutional neural networks
by utilizing RGB-D or multi-view representations. However, in this paper, we
propose to learn 3D local descriptors by directly processing unstructured 3D
point clouds without needing any intermediate representation. The method
constitutes a deep network for learning permutation invariant representation of
3D points. To learn the local descriptors, we use a multi-margin contrastive
loss which discriminates between similar and dissimilar points on a surface
while also leveraging the extent of dissimilarity among the negative samples at
the time of training. With comprehensive evaluation against strong baselines,
we show that the proposed method outperforms state-of-the-art methods for
matching points in 3D point clouds. Further, we demonstrate the effectiveness
of the proposed method on various applications achieving state-of-the-art
results
Object cosegmentation using deep Siamese network
Object cosegmentation addresses the problem of discovering similar objects
from multiple images and segmenting them as foreground simultaneously. In this
paper, we propose a novel end-to-end pipeline to segment the similar objects
simultaneously from relevant set of images using supervised learning via
deep-learning framework. We experiment with multiple set of object proposal
generation techniques and perform extensive numerical evaluations by training
the Siamese network with generated object proposals. Similar objects proposals
for the test images are retrieved using the ANNOY (Approximate Nearest
Neighbor) library and deep semantic segmentation is performed on them. Finally,
we form a collage from the segmented similar objects based on the relative
importance of the objects.Comment: Appears in International Conference on Pattern Recognition and
Artificial Intelligence (ICPRAI), 201
Benchmarking KAZE and MCM for Multiclass Classification
In this paper, we propose a novel approach for feature generation by
appropriately fusing KAZE and SIFT features. We then use this feature set along
with Minimal Complexity Machine(MCM) for object classification. We show that
KAZE and SIFT features are complementary. Experimental results indicate that an
elementary integration of these techniques can outperform the state-of-the-art
approaches
SalProp: Salient object proposals via aggregated edge cues
In this paper, we propose a novel object proposal generation scheme by
formulating a graph-based salient edge classification framework that utilizes
the edge context. In the proposed method, we construct a Bayesian probabilistic
edge map to assign a saliency value to the edgelets by exploiting low level
edge features. A Conditional Random Field is then learned to effectively
combine these features for edge classification with object/non-object label. We
propose an objectness score for the generated windows by analyzing the salient
edge density inside the bounding box. Extensive experiments on PASCAL VOC 2007
dataset demonstrate that the proposed method gives competitive performance
against 10 popular generic object detection techniques while using fewer number
of proposals.Comment: 5 pages, 4 figures, accepted at ICIP 201
Object Classification using Ensemble of Local and Deep Features
In this paper we propose an ensemble of local and deep features for object
classification. We also compare and contrast effectiveness of feature
representation capability of various layers of convolutional neural network. We
demonstrate with extensive experiments for object classification that the
representation capability of features from deep networks can be complemented
with information captured from local features. We also find out that features
from various deep convolutional networks encode distinctive characteristic
information. We establish that, as opposed to conventional practice,
intermediate layers of deep networks can augment the classification
capabilities of features obtained from fully connected layers.Comment: Accepted for publication at Ninth International Conference on
Advances in Pattern Recognitio
Performance Evalution of 3D Keypoint Detectors and Descriptors for Plants Health Classification
Plant Phenomics based on imaging based techniques can be used to monitor the
health and the diseases of plants and crops. The use of 3D data for plant
phenomics is a recent phenomenon. However, since 3D point cloud contains more
information than plant images, in this paper, we compare the performance of
different keypoint detectors and local feature descriptors combinations for the
plant growth stage and it's growth condition classification based on 3D point
clouds of the plants. We have also implemented a modified form of 3D SIFT
descriptor, that is invariant to rotation and is computationally less intense
than most of the 3D SIFT descriptors reported in the existing literature. The
performance is evaluated in terms of the classification accuracy and the
results are presented in terms of accuracy tables. We find the ISS-SHOT and the
SIFT-SIFT combinations consistently perform better and Fisher Vector (FV) is a
better encoder than Vector of Linearly Aggregated (VLAD) for such applications.
It can serve as a better modality
Per-Tone model for Common Mode sensor based alien noise cancellation for Downstream xDSL
For xDSL systems, alien noise cancellation using an additional common mode
sensor at the downstream receiver can be thought of as interference
cancellation in a Single Input Dual Output (SIDO) system. The coupling between
the common mode and differential mode can be modelled as an LTI system with a
long impulse response, resulting in high complexity for cancellation. Frequency
domain per-tone cancellation offers a low complexity approach to the problem
besides having other advantages like faster training, but suffers from loss in
cancellation performance due to approximations in the per-tone model. We
analyze this loss and show that it is possible to minimize it by a convenient
post-training "delay" adjustment. We also show via measurements that the loss
of cancellation performance due to the per-tone model is not very large for
real scenarios
Few Shot Speaker Recognition using Deep Neural Networks
The recent advances in deep learning are mostly driven by availability of
large amount of training data. However, availability of such data is not always
possible for specific tasks such as speaker recognition where collection of
large amount of data is not possible in practical scenarios. Therefore, in this
paper, we propose to identify speakers by learning from only a few training
examples. To achieve this, we use a deep neural network with prototypical loss
where the input to the network is a spectrogram. For output, we project the
class feature vectors into a common embedding space, followed by
classification. Further, we show the effectiveness of capsule net in a few shot
learning setting. To this end, we utilize an auto-encoder to learn generalized
feature embeddings from class-specific embeddings obtained from capsule
network. We provide exhaustive experiments on publicly available datasets and
competitive baselines, demonstrating the superiority and generalization ability
of the proposed few shot learning pipelines