39 research outputs found

    Investigating the latency cost of statistical learning of a Gaussian mixture simulating on a convolutional density network with adaptive batch size technique for background modeling

    Get PDF
    Background modeling is a promising field of study in video analysis, with a wide range of applications in video surveillance. Deep neural networks have proliferated in recent years as a result of effective learning-based approaches to motion analysis. However, these strategies only provide a partial description of the observed scenes' insufficient properties since they use a single-valued mapping to estimate the target background's temporal conditional averages. On the other hand, statistical learning in the imagery domain has become one of the most widely used approaches due to its high adaptability to dynamic context transformation, especially Gaussian Mixture Models. Specifically, these probabilistic models aim to adjust latent parameters to gain high expectation of realistically observed data; however, this approach only concentrates on contextual dynamics in short-term analysis. In a prolonged investigation, it is challenging so that statistical methods cannot reserve the generalization of long-term variation of image data. Balancing the trade-off between traditional machine learning models and deep neural networks requires an integrated approach to ensure accuracy in conception while maintaining a high speed of execution. In this research, we present a novel two-stage approach for detecting changes using two convolutional neural networks in this work. The first architecture is based on unsupervised Gaussian mixtures statistical learning, which is used to classify the salient features of scenes. The second one implements a light-weighted pipeline of foreground detection. Our two-stage system has a total of approximately 3.5K parameters but still converges quickly to complex motion patterns. Our experiments on publicly accessible datasets demonstrate that our proposed networks are not only capable of generalizing regions of moving objects with promising results in unseen scenarios, but also competitive in terms of performance quality and effectiveness foreground segmentation. Apart from modeling the data's underlying generator as a non-convex optimization problem, we briefly examine the communication cost associated with the network training by using a distributed scheme of data-parallelism to simulate a stochastic gradient descent algorithm with communication avoidance for parallel machine learnin

    Feature Reduction and Representation Learning for Visual Applications

    Get PDF
    Computation on large-scale data spaces has been involved in many active problems in computer vision and pattern recognition. However, in realistic applications, most existing algorithms are heavily restricted by the large number of features, and tend to be inefficient and even infeasible. In this thesis, the solution to this problem is addressed in the following ways: (1) projecting features onto a lower-dimensional subspace; (2) embedding features into a Hamming space. Firstly, a novel subspace learning algorithm called Local Feature Discriminant Projection (LFDP) is proposed for discriminant analysis of local features. LFDP is able to efficiently seek a subspace to improve the discriminability of local features for classification. Extensive experimental validation on three benchmark datasets demonstrates that the proposed LFDP outperforms other dimensionality reduction methods and achieves state-of-the-art performance for image classification. Secondly, for action recognition, a novel binary local representation for RGB-D video data fusion is presented. In this approach, a general local descriptor called Local Flux Feature (LFF) is obtained for both RGB and depth data by computing the local fluxes of the gradient fields of video data. Then the LFFs from RGB and depth channels are fused into a Hamming space via the Structure Preserving Projection (SPP), which preserves not only the pairwise feature structure, but also a higher level connection between samples and classes. Comprehensive experimental results show the superiority of both LFF and SPP. Thirdly, in respect of unsupervised learning, SPP is extended to the Binary Set Embedding (BSE) for cross-modal retrieval. BSE outputs meaningful hash codes for local features from the image domain and word vectors from text domain. Extensive evaluation on two widely-used image-text datasets demonstrates the superior performance of BSE compared with state-of-the-art cross-modal hashing methods. Finally, a generalized multiview spectral embedding algorithm called Kernelized Multiview Projection (KMP) is proposed to fuse the multimedia data from multiple sources. Different features/views in the reproducing kernel Hilbert spaces are linearly fused together and then projected onto a low-dimensional subspace by KMP, whose performance is thoroughly evaluated on both image and video datasets compared with other multiview embedding methods

    A Survey on Deep Learning Technique for Video Segmentation

    Full text link
    Video segmentation -- partitioning video frames into multiple segments or objects -- plays a critical role in a broad range of practical applications, from enhancing visual effects in movie, to understanding scenes in autonomous driving, to creating virtual background in video conferencing. Recently, with the renaissance of connectionism in computer vision, there has been an influx of deep learning based approaches for video segmentation that have delivered compelling performance. In this survey, we comprehensively review two basic lines of research -- generic object segmentation (of unknown categories) in videos, and video semantic segmentation -- by introducing their respective task settings, background concepts, perceived need, development history, and main challenges. We also offer a detailed overview of representative literature on both methods and datasets. We further benchmark the reviewed methods on several well-known datasets. Finally, we point out open issues in this field, and suggest opportunities for further research. We also provide a public website to continuously track developments in this fast advancing field: https://github.com/tfzhou/VS-Survey.Comment: Accepted by TPAMI. Website: https://github.com/tfzhou/VS-Surve

    Learning by correlation for computer vision applications: from Kernel methods to deep learning

    Get PDF
    Learning to spot analogies and differences within/across visual categories is an arguably powerful approach in machine learning and pattern recognition which is directly inspired by human cognition. In this thesis, we investigate a variety of approaches which are primarily driven by correlation and tackle several computer vision applications

    Re-identification and semantic retrieval of pedestrians in video surveillance scenarios

    Get PDF
    Person re-identification consists of recognizing individuals across different sensors of a camera network. Whereas clothing appearance cues are widely used, other modalities could be exploited as additional information sources, like anthropometric measures and gait. In this work we investigate whether the re-identification accuracy of clothing appearance descriptors can be improved by fusing them with anthropometric measures extracted from depth data, using RGB-Dsensors, in unconstrained settings. We also propose a dissimilaritybased framework for building and fusing multi-modal descriptors of pedestrian images for re-identification tasks, as an alternative to the widely used score-level fusion. The experimental evaluation is carried out on two data sets including RGB-D data, one of which is a novel, publicly available data set that we acquired using Kinect sensors. In this dissertation we also consider a related task, named semantic retrieval of pedestrians in video surveillance scenarios, which consists of searching images of individuals using a textual description of clothing appearance as a query, given by a Boolean combination of predefined attributes. This can be useful in applications like forensic video analysis, where the query can be obtained froma eyewitness report. We propose a general method for implementing semantic retrieval as an extension of a given re-identification system that uses any multiple part-multiple component appearance descriptor. Additionally, we investigate on deep learning techniques to improve both the accuracy of attribute detectors and generalization capabilities. Finally, we experimentally evaluate our methods on several benchmark datasets originally built for re-identification task

    Single-target tracking of arbitrary objects using multi-layered features and contextual information

    Get PDF
    This thesis investigated single-target tracking of arbitrary objects. Tracking is a difficult problem due to a variety of challenges such as significant deformations of the target, occlusions, illumination variations, background clutter and camouflage. To achieve robust tracking performance under these severe conditions, this thesis proposed firstly a novel RGB single-target tracker which models the target with multi-layered features and contextual information. The proposed algorithm was tested on two different tracking benchmarks, i.e., VTB and VOT, where it demonstrated significantly more robust performance than other state-of-the-art RGB trackers. Proposed secondly was an extension of the designed RGB tracker to handle RGB-D images using both temporal and spatial constraints to exploit depth information more robustly. For evaluation, the thesis introduced a new RGB-D benchmark dataset with per-frame annotated attributes and extensive bias analysis, on which the proposed tracker achieved the best results. Proposed thirdly was a new tracking approach to handle camouflage problems in highly cluttered scenes exploiting global dynamic constraints from the context. To evaluate the tracker, a benchmark dataset was augmented with a new set of clutter sub-attributes. Using this dataset, it was demonstrated that the proposed method outperforms other state-of-the-art single target trackers on highly cluttered scenes

    Synthetic Aperture Radar (SAR) Meets Deep Learning

    Get PDF
    This reprint focuses on the application of the combination of synthetic aperture radars and depth learning technology. It aims to further promote the development of SAR image intelligent interpretation technology. A synthetic aperture radar (SAR) is an important active microwave imaging sensor, whose all-day and all-weather working capacity give it an important place in the remote sensing community. Since the United States launched the first SAR satellite, SAR has received much attention in the remote sensing community, e.g., in geological exploration, topographic mapping, disaster forecast, and traffic monitoring. It is valuable and meaningful, therefore, to study SAR-based remote sensing applications. In recent years, deep learning represented by convolution neural networks has promoted significant progress in the computer vision community, e.g., in face recognition, the driverless field and Internet of things (IoT). Deep learning can enable computational models with multiple processing layers to learn data representations with multiple-level abstractions. This can greatly improve the performance of various applications. This reprint provides a platform for researchers to handle the above significant challenges and present their innovative and cutting-edge research results when applying deep learning to SAR in various manuscript types, e.g., articles, letters, reviews and technical reports

    Digital Image Processing

    Get PDF
    This book presents several recent advances that are related or fall under the umbrella of 'digital image processing', with the purpose of providing an insight into the possibilities offered by digital image processing algorithms in various fields. The presented mathematical algorithms are accompanied by graphical representations and illustrative examples for an enhanced readability. The chapters are written in a manner that allows even a reader with basic experience and knowledge in the digital image processing field to properly understand the presented algorithms. Concurrently, the structure of the information in this book is such that fellow scientists will be able to use it to push the development of the presented subjects even further
    corecore