28 research outputs found

    Fine-grained Incident Video Retrieval with Video Similarity Learning.

    Get PDF
    PhD ThesesIn this thesis, we address the problem of Fine-grained Incident Video Retrieval (FIVR) using video similarity learning methods. FIVR is a video retrieval task that aims to retrieve all videos that depict the same incident given a query video { related video retrieval tasks adopt either very narrow or very broad scopes, considering only nearduplicate or same event videos. To formulate the case of same incident videos, we de ne three video associations taking into account the spatio-temporal spans captured by video pairs. To cover the benchmarking needs of FIVR, we construct a large-scale dataset, called FIVR-200K, consisting of 225,960 YouTube videos from major news events crawled from Wikipedia. The dataset contains four annotation labels according to FIVR de nitions; hence, it can simulate several retrieval scenarios with the same video corpus. To address FIVR, we propose two video-level approaches leveraging features extracted from intermediate layers of Convolutional Neural Networks (CNN). The rst is an unsupervised method that relies on a modi ed Bag-of-Word scheme, which generates video representations from the aggregation of the frame descriptors based on learned visual codebooks. The second is a supervised method based on Deep Metric Learning, which learns an embedding function that maps videos in a feature space where relevant video pairs are closer than the irrelevant ones. However, videolevel approaches generate global video representations, losing all spatial and temporal relations between compared videos. Therefore, we propose a video similarity learning approach that captures ne-grained relations between videos for accurate similarity calculation. We train a CNN architecture to compute video-to-video similarity from re ned frame-to-frame similarity matrices derived from a pairwise region-level similarity function. The proposed approaches have been extensively evaluated on FIVR- 200K and other large-scale datasets, demonstrating their superiority over other video retrieval methods and highlighting the challenging aspect of the FIVR problem

    Fusion of face and iris biometrics in security verification systems.

    Get PDF
    Master of Science in Computer Science. University of KwaZulu-Natal, Durban, 2016.Abstract available in PDF file

    Robust density modelling using the student's t-distribution for human action recognition

    Full text link
    The extraction of human features from videos is often inaccurate and prone to outliers. Such outliers can severely affect density modelling when the Gaussian distribution is used as the model since it is highly sensitive to outliers. The Gaussian distribution is also often used as base component of graphical models for recognising human actions in the videos (hidden Markov model and others) and the presence of outliers can significantly affect the recognition accuracy. In contrast, the Student's t-distribution is more robust to outliers and can be exploited to improve the recognition rate in the presence of abnormal data. In this paper, we present an HMM which uses mixtures of t-distributions as observation probabilities and show how experiments over two well-known datasets (Weizmann, MuHAVi) reported a remarkable improvement in classification accuracy. © 2011 IEEE

    Visual scene recognition with biologically relevant generative models

    No full text
    This research focuses on developing visual object categorization methodologies that are based on machine learning techniques and biologically inspired generative models of visual scene recognition. Modelling the statistical variability in visual patterns, in the space of features extracted from them by an appropriate low level signal processing technique, is an important matter of investigation for both humans and machines. To study this problem, we have examined in detail two recent probabilistic models of vision: a simple multivariate Gaussian model as suggested by (Karklin & Lewicki, 2009) and a restricted Boltzmann machine (RBM) proposed by (Hinton, 2002). Both the models have been widely used for visual object classification and scene analysis tasks before. This research highlights that these models on their own are not plausible enough to perform the classification task, and suggests Fisher kernel as a means of inducing discrimination into these models for classification power. Our empirical results on standard benchmark data sets reveal that the classification performance of these generative models could be significantly boosted near to the state of the art performance, by drawing a Fisher kernel from compact generative models that computes the data labels in a fraction of total computation time. We compare the proposed technique with other distance based and kernel based classifiers to show how computationally efficient the Fisher kernels are. To the best of our knowledge, Fisher kernel has not been drawn from the RBM before, so the work presented in the thesis is novel in terms of its idea and application to vision problem
    corecore