48 research outputs found

    Fast Hierarchical Depth Map Computation from Stereo

    Full text link
    Disparity by Block Matching stereo is usually used in applications with limited computational power in order to get depth estimates. However, the research on simple stereo methods has been lesser than the energy based counterparts which promise a better quality depth map with more potential for future improvements. Semi-global-matching (SGM) methods offer good performance and easy implementation but suffer from the problem of very high memory footprint because it's working on the full disparity space image. On the other hand, Block matching stereo needs much less memory. In this paper, we introduce a novel multi-scale-hierarchical block-matching approach using a pyramidal variant of depth and cost functions which drastically improves the results of standard block matching stereo techniques while preserving the low memory footprint and further reducing the complexity of standard block matching. We tested our new multi block matching scheme on the Middlebury stereo benchmark. For the Middlebury benchmark we get results that are only slightly worse than state of the art SGM implementations.Comment: Submitted to International Conference on Pattern Recognition and Artificial Intelligence, 201

    Deep feature fusion for self-supervised monocular depth prediction

    Full text link
    Recent advances in end-to-end unsupervised learning has significantly improved the performance of monocular depth prediction and alleviated the requirement of ground truth depth. Although a plethora of work has been done in enforcing various structural constraints by incorporating multiple losses utilising smoothness, left-right consistency, regularisation and matching surface normals, a few of them take into consideration multi-scale structures present in real world images. Most works utilise a VGG16 or ResNet50 model pre-trained on ImageNet weights for predicting depth. We propose a deep feature fusion method utilising features at multiple scales for learning self-supervised depth from scratch. Our fusion network selects features from both upper and lower levels at every level in the encoder network, thereby creating multiple feature pyramid sub-networks that are fed to the decoder after applying the CoordConv solution. We also propose a refinement module learning higher scale residual depth from a combination of higher level deep features and lower level residual depth using a pixel shuffling framework that super-resolves lower level residual depth. We select the KITTI dataset for evaluation and show that our proposed architecture can produce better or comparable results in depth prediction.Comment: 4 pages, 2 Tables, 2 Figure

    DeepPoint3D: Learning Discriminative Local Descriptors using Deep Metric Learning on 3D Point Clouds

    Full text link
    Learning local descriptors is an important problem in computer vision. While there are many techniques for learning local patch descriptors for 2D images, recently efforts have been made for learning local descriptors for 3D points. The recent progress towards solving this problem in 3D leverages the strong feature representation capability of image based convolutional neural networks by utilizing RGB-D or multi-view representations. However, in this paper, we propose to learn 3D local descriptors by directly processing unstructured 3D point clouds without needing any intermediate representation. The method constitutes a deep network for learning permutation invariant representation of 3D points. To learn the local descriptors, we use a multi-margin contrastive loss which discriminates between similar and dissimilar points on a surface while also leveraging the extent of dissimilarity among the negative samples at the time of training. With comprehensive evaluation against strong baselines, we show that the proposed method outperforms state-of-the-art methods for matching points in 3D point clouds. Further, we demonstrate the effectiveness of the proposed method on various applications achieving state-of-the-art results

    Object cosegmentation using deep Siamese network

    Full text link
    Object cosegmentation addresses the problem of discovering similar objects from multiple images and segmenting them as foreground simultaneously. In this paper, we propose a novel end-to-end pipeline to segment the similar objects simultaneously from relevant set of images using supervised learning via deep-learning framework. We experiment with multiple set of object proposal generation techniques and perform extensive numerical evaluations by training the Siamese network with generated object proposals. Similar objects proposals for the test images are retrieved using the ANNOY (Approximate Nearest Neighbor) library and deep semantic segmentation is performed on them. Finally, we form a collage from the segmented similar objects based on the relative importance of the objects.Comment: Appears in International Conference on Pattern Recognition and Artificial Intelligence (ICPRAI), 201

    Benchmarking KAZE and MCM for Multiclass Classification

    Full text link
    In this paper, we propose a novel approach for feature generation by appropriately fusing KAZE and SIFT features. We then use this feature set along with Minimal Complexity Machine(MCM) for object classification. We show that KAZE and SIFT features are complementary. Experimental results indicate that an elementary integration of these techniques can outperform the state-of-the-art approaches

    SalProp: Salient object proposals via aggregated edge cues

    Full text link
    In this paper, we propose a novel object proposal generation scheme by formulating a graph-based salient edge classification framework that utilizes the edge context. In the proposed method, we construct a Bayesian probabilistic edge map to assign a saliency value to the edgelets by exploiting low level edge features. A Conditional Random Field is then learned to effectively combine these features for edge classification with object/non-object label. We propose an objectness score for the generated windows by analyzing the salient edge density inside the bounding box. Extensive experiments on PASCAL VOC 2007 dataset demonstrate that the proposed method gives competitive performance against 10 popular generic object detection techniques while using fewer number of proposals.Comment: 5 pages, 4 figures, accepted at ICIP 201

    Object Classification using Ensemble of Local and Deep Features

    Full text link
    In this paper we propose an ensemble of local and deep features for object classification. We also compare and contrast effectiveness of feature representation capability of various layers of convolutional neural network. We demonstrate with extensive experiments for object classification that the representation capability of features from deep networks can be complemented with information captured from local features. We also find out that features from various deep convolutional networks encode distinctive characteristic information. We establish that, as opposed to conventional practice, intermediate layers of deep networks can augment the classification capabilities of features obtained from fully connected layers.Comment: Accepted for publication at Ninth International Conference on Advances in Pattern Recognitio

    Performance Evalution of 3D Keypoint Detectors and Descriptors for Plants Health Classification

    Full text link
    Plant Phenomics based on imaging based techniques can be used to monitor the health and the diseases of plants and crops. The use of 3D data for plant phenomics is a recent phenomenon. However, since 3D point cloud contains more information than plant images, in this paper, we compare the performance of different keypoint detectors and local feature descriptors combinations for the plant growth stage and it's growth condition classification based on 3D point clouds of the plants. We have also implemented a modified form of 3D SIFT descriptor, that is invariant to rotation and is computationally less intense than most of the 3D SIFT descriptors reported in the existing literature. The performance is evaluated in terms of the classification accuracy and the results are presented in terms of accuracy tables. We find the ISS-SHOT and the SIFT-SIFT combinations consistently perform better and Fisher Vector (FV) is a better encoder than Vector of Linearly Aggregated (VLAD) for such applications. It can serve as a better modality

    Per-Tone model for Common Mode sensor based alien noise cancellation for Downstream xDSL

    Full text link
    For xDSL systems, alien noise cancellation using an additional common mode sensor at the downstream receiver can be thought of as interference cancellation in a Single Input Dual Output (SIDO) system. The coupling between the common mode and differential mode can be modelled as an LTI system with a long impulse response, resulting in high complexity for cancellation. Frequency domain per-tone cancellation offers a low complexity approach to the problem besides having other advantages like faster training, but suffers from loss in cancellation performance due to approximations in the per-tone model. We analyze this loss and show that it is possible to minimize it by a convenient post-training "delay" adjustment. We also show via measurements that the loss of cancellation performance due to the per-tone model is not very large for real scenarios

    Few Shot Speaker Recognition using Deep Neural Networks

    Full text link
    The recent advances in deep learning are mostly driven by availability of large amount of training data. However, availability of such data is not always possible for specific tasks such as speaker recognition where collection of large amount of data is not possible in practical scenarios. Therefore, in this paper, we propose to identify speakers by learning from only a few training examples. To achieve this, we use a deep neural network with prototypical loss where the input to the network is a spectrogram. For output, we project the class feature vectors into a common embedding space, followed by classification. Further, we show the effectiveness of capsule net in a few shot learning setting. To this end, we utilize an auto-encoder to learn generalized feature embeddings from class-specific embeddings obtained from capsule network. We provide exhaustive experiments on publicly available datasets and competitive baselines, demonstrating the superiority and generalization ability of the proposed few shot learning pipelines
    corecore