1,846 research outputs found

    Enhancement of ELDA Tracker Based on CNN Features and Adaptive Model Update

    Get PDF
    Appearance representation and the observation model are the most important components in designing a robust visual tracking algorithm for video-based sensors. Additionally, the exemplar-based linear discriminant analysis (ELDA) model has shown good performance in object tracking. Based on that, we improve the ELDA tracking algorithm by deep convolutional neural network (CNN) features and adaptive model update. Deep CNN features have been successfully used in various computer vision tasks. Extracting CNN features on all of the candidate windows is time consuming. To address this problem, a two-step CNN feature extraction method is proposed by separately computing convolutional layers and fully-connected layers. Due to the strong discriminative ability of CNN features and the exemplar-based model, we update both object and background models to improve their adaptivity and to deal with the tradeoff between discriminative ability and adaptivity. An object updating method is proposed to select the “good” models (detectors), which are quite discriminative and uncorrelated to other selected models. Meanwhile, we build the background model as a Gaussian mixture model (GMM) to adapt to complex scenes, which is initialized offline and updated online. The proposed tracker is evaluated on a benchmark dataset of 50 video sequences with various challenges. It achieves the best overall performance among the compared state-of-the-art trackers, which demonstrates the effectiveness and robustness of our tracking algorithm

    Towards High Performance Video Object Detection

    Full text link
    There has been significant progresses for image object detection in recent years. Nevertheless, video object detection has received little attention, although it is more challenging and more important in practical scenarios. Built upon the recent works, this work proposes a unified approach based on the principle of multi-frame end-to-end learning of features and cross-frame motion. Our approach extends prior works with three new techniques and steadily pushes forward the performance envelope (speed-accuracy tradeoff), towards high performance video object detection

    Object Localization, Segmentation, and Classification in 3D Images

    Full text link
    We address the problem of identifying objects of interest in 3D images as a set of related tasks involving localization of objects within a scene, segmentation of observed object instances from other scene elements, classifying detected objects into semantic categories, and estimating the 3D pose of detected objects within the scene. The increasing availability of 3D sensors motivates us to leverage large amounts of 3D data to train machine learning models to address these tasks in 3D images. Leveraging recent advances in deep learning has allowed us to develop models capable of addressing these tasks and optimizing these tasks jointly to reduce potential errors propagated when solving these tasks independently

    Enhancement of ELDA Tracker Based on CNN Features and Adaptive Model Update

    Get PDF
    Appearance representation and the observation model are the most important components in designing a robust visual tracking algorithm for video-based sensors. Additionally, the exemplar-based linear discriminant analysis (ELDA) model has shown good performance in object tracking. Based on that, we improve the ELDA tracking algorithm by deep convolutional neural network (CNN) features and adaptive model update. Deep CNN features have been successfully used in various computer vision tasks. Extracting CNN features on all of the candidate windows is time consuming. To address this problem, a two-step CNN feature extraction method is proposed by separately computing convolutional layers and fully-connected layers. Due to the strong discriminative ability of CNN features and the exemplar-based model, we update both object and background models to improve their adaptivity and to deal with the tradeoff between discriminative ability and adaptivity. An object updating method is proposed to select the “good” models (detectors), which are quite discriminative and uncorrelated to other selected models. Meanwhile, we build the background model as a Gaussian mixture model (GMM) to adapt to complex scenes, which is initialized offline and updated online. The proposed tracker is evaluated on a benchmark dataset of 50 video sequences with various challenges. It achieves the best overall performance among the compared state-of-the-art trackers, which demonstrates the effectiveness and robustness of our tracking algorithm

    Probabilistic and Deep Learning Algorithms for the Analysis of Imagery Data

    Get PDF
    Accurate object classification is a challenging problem for various low to high resolution imagery data. This applies to both natural as well as synthetic image datasets. However, each object recognition dataset poses its own distinct set of domain-specific problems. In order to address these issues, we need to devise intelligent learning algorithms which require a deep understanding and careful analysis of the feature space. In this thesis, we introduce three new learning frameworks for the analysis of both airborne images (NAIP dataset) and handwritten digit datasets without and with noise (MNIST and n-MNIST respectively). First, we propose a probabilistic framework for the analysis of the NAIP dataset which includes (1) an unsupervised segmentation module based on the Statistical Region Merging algorithm, (2) a feature extraction module that extracts a set of standard hand-crafted texture features from the images, (3) a supervised classification algorithm based on Feedforward Backpropagation Neural Networks, and (4) a structured prediction framework using Conditional Random Fields that integrates the results of the segmentation and classification modules into a single composite model to generate the final class labels. Next, we introduce two new datasets SAT-4 and SAT-6 sampled from the NAIP imagery and use them to evaluate a multitude of Deep Learning algorithms including Deep Belief Networks (DBN), Convolutional Neural Networks (CNN) and Stacked Autoencoders (SAE) for generating class labels. Finally, we propose a learning framework by integrating hand-crafted texture features with a DBN. A DBN uses an unsupervised pre-training phase to perform initialization of the parameters of a Feedforward Backpropagation Neural Network to a global error basin which can then be improved using a round of supervised fine-tuning using Feedforward Backpropagation Neural Networks. These networks can subsequently be used for classification. In the following discussion, we show that the integration of hand-crafted features with DBN shows significant improvement in performance as compared to traditional DBN models which take raw image pixels as input. We also investigate why this integration proves to be particularly useful for aerial datasets using a statistical analysis based on Distribution Separability Criterion. Then we introduce a new dataset called noisy-MNIST (n-MNIST) by adding (1) additive white gaussian noise (AWGN), (2) motion blur and (3) Reduced contrast and AWGN to the MNIST dataset and present a learning algorithm by combining probabilistic quadtrees and Deep Belief Networks. This dynamic integration of the Deep Belief Network with the probabilistic quadtrees provide significant improvement over traditional DBN models on both the MNIST and the n-MNIST datasets. Finally, we extend our experiments on aerial imagery to the class of general texture images and present a theoretical analysis of Deep Neural Networks applied to texture classification. We derive the size of the feature space of textural features and also derive the Vapnik-Chervonenkis dimension of certain classes of Neural Networks. We also derive some useful results on intrinsic dimension and relative contrast of texture datasets and use these to highlight the differences between texture datasets and general object recognition datasets

    Advances in Object and Activity Detection in Remote Sensing Imagery

    Get PDF
    The recent revolution in deep learning has enabled considerable development in the fields of object and activity detection. Visual object detection tries to find objects of target classes with precise localisation in an image and assign each object instance a corresponding class label. At the same time, activity recognition aims to determine the actions or activities of an agent or group of agents based on sensor or video observation data. It is a very important and challenging problem to detect, identify, track, and understand the behaviour of objects through images and videos taken by various cameras. Together, objects and their activity recognition in imaging data captured by remote sensing platforms is a highly dynamic and challenging research topic. During the last decade, there has been significant growth in the number of publications in the field of object and activity recognition. In particular, many researchers have proposed application domains to identify objects and their specific behaviours from air and spaceborne imagery. This Special Issue includes papers that explore novel and challenging topics for object and activity detection in remote sensing images and videos acquired by diverse platforms
    • …
    corecore