1,525 research outputs found

    Geodesics on the manifold of multivariate generalized Gaussian distributions with an application to multicomponent texture discrimination

    Get PDF
    We consider the Rao geodesic distance (GD) based on the Fisher information as a similarity measure on the manifold of zero-mean multivariate generalized Gaussian distributions (MGGD). The MGGD is shown to be an adequate model for the heavy-tailed wavelet statistics in multicomponent images, such as color or multispectral images. We discuss the estimation of MGGD parameters using various methods. We apply the GD between MGGDs to color texture discrimination in several classification experiments, taking into account the correlation structure between the spectral bands in the wavelet domain. We compare the performance, both in terms of texture discrimination capability and computational load, of the GD and the Kullback-Leibler divergence (KLD). Likewise, both uni- and multivariate generalized Gaussian models are evaluated, characterized by a fixed or a variable shape parameter. The modeling of the interband correlation significantly improves classification efficiency, while the GD is shown to consistently outperform the KLD as a similarity measure

    Learning to detect video events from zero or very few video examples

    Get PDF
    In this work we deal with the problem of high-level event detection in video. Specifically, we study the challenging problems of i) learning to detect video events from solely a textual description of the event, without using any positive video examples, and ii) additionally exploiting very few positive training samples together with a small number of ``related'' videos. For learning only from an event's textual description, we first identify a general learning framework and then study the impact of different design choices for various stages of this framework. For additionally learning from example videos, when true positive training samples are scarce, we employ an extension of the Support Vector Machine that allows us to exploit ``related'' event videos by automatically introducing different weights for subsets of the videos in the overall training set. Experimental evaluations performed on the large-scale TRECVID MED 2014 video dataset provide insight on the effectiveness of the proposed methods.Comment: Image and Vision Computing Journal, Elsevier, 2015, accepted for publicatio

    Video Infringement Detection via Feature Disentanglement and Mutual Information Maximization

    Full text link
    The self-media era provides us tremendous high quality videos. Unfortunately, frequent video copyright infringements are now seriously damaging the interests and enthusiasm of video creators. Identifying infringing videos is therefore a compelling task. Current state-of-the-art methods tend to simply feed high-dimensional mixed video features into deep neural networks and count on the networks to extract useful representations. Despite its simplicity, this paradigm heavily relies on the original entangled features and lacks constraints guaranteeing that useful task-relevant semantics are extracted from the features. In this paper, we seek to tackle the above challenges from two aspects: (1) We propose to disentangle an original high-dimensional feature into multiple sub-features, explicitly disentangling the feature into exclusive lower-dimensional components. We expect the sub-features to encode non-overlapping semantics of the original feature and remove redundant information. (2) On top of the disentangled sub-features, we further learn an auxiliary feature to enhance the sub-features. We theoretically analyzed the mutual information between the label and the disentangled features, arriving at a loss that maximizes the extraction of task-relevant information from the original feature. Extensive experiments on two large-scale benchmark datasets (i.e., SVD and VCSL) demonstrate that our method achieves 90.1% TOP-100 mAP on the large-scale SVD dataset and also sets the new state-of-the-art on the VCSL benchmark dataset. Our code and model have been released at https://github.com/yyyooooo/DMI/, hoping to contribute to the community.Comment: This paper is accepted by ACM MM 202

    Accelerated Probabilistic Learning Concept for Mining Heterogeneous Earth Observation Images

    Get PDF
    We present an accelerated probabilistic learning concept and its prototype implementation for mining heterogeneous Earth observation images, e.g., multispectral images, synthetic aperture radar (SAR) images, image time series, or geographical information systems (GIS) maps. The system prototype combines, at pixel level, the unsupervised clustering results of different features, extracted from heterogeneous satellite images and geographical information resources, with user-defined semantic annotations in order to calculate the posterior probabilities that allow the final probabilistic searches. The system is able to learn different semantic labels based on a newly developed Bayesian networks algorithm and allows different probabilistic retrieval methods of all semantically related images with only a few user interactions. The new algorithm reduces the computational cost, overperforming existing conventional systems, under certain conditions, by several orders of magnitude. The achieved speed-up allows the introduction of new feature models improving the learning capabilities of knowledge-driven image information mining systems and opening them to Big Data environment

    Clustering-based analysis of semantic concept models for video shots

    Get PDF
    In this paper we present a clustering-based method for representing semantic concepts on multimodal low-level feature spaces and study the evaluation of the goodness of such models with entropy-based methods. As different semantic concepts in video are most accurately represented with different features and modalities, we utilize the relative model-wise confidence values of the feature extraction techniques in weighting them automatically. The method also provides a natural way of measuring the similarity of different concepts in a multimedia lexicon. The experiments of the paper are conducted using the development set of the TRECVID 2005 corpus together with a common annotation for 39 semantic concept

    Graph Regularized Non-negative Matrix Factorization By Maximizing Correntropy

    Full text link
    Non-negative matrix factorization (NMF) has proved effective in many clustering and classification tasks. The classic ways to measure the errors between the original and the reconstructed matrix are l2l_2 distance or Kullback-Leibler (KL) divergence. However, nonlinear cases are not properly handled when we use these error measures. As a consequence, alternative measures based on nonlinear kernels, such as correntropy, are proposed. However, the current correntropy-based NMF only targets on the low-level features without considering the intrinsic geometrical distribution of data. In this paper, we propose a new NMF algorithm that preserves local invariance by adding graph regularization into the process of max-correntropy-based matrix factorization. Meanwhile, each feature can learn corresponding kernel from the data. The experiment results of Caltech101 and Caltech256 show the benefits of such combination against other NMF algorithms for the unsupervised image clustering

    Wavelet-based texture retrieval using generalized Gaussian density and Kullback-Leibler distance

    Get PDF
    We present a statistical view of the texture retrieval problem by combining the two related tasks, namely feature extraction (FE) and similarity measurement (SM), into a joint modeling and classification scheme. We show that using a con- sistent estimator of texture model parameters for the FE step followed by computing the Kullback–Leibler distance (KLD) between estimated models for the SM step is asymptotically optimal in term of retrieval error probability. The statistical scheme leads to a new wavelet-based texture retrieval method that is based on the accurate modeling of the marginal distribution of wavelet coefficients using generalized Gaussian density (GGD) and on the existence a closed form for the KLD between GGDs. The proposed method provides greater accuracy and flexibility in capturing texture information, while its simplified form has a close resemblance with the existing methods which uses energy distribution in the frequency domain to identify textures. Ex- perimental results on a database of 640 texture images indicate that the new method significantly improves retrieval rates, e.g., from 65% to 77%, compared with traditional approaches, while it retains comparable levels of computational complexity

    Action Recognition in Videos: from Motion Capture Labs to the Web

    Full text link
    This paper presents a survey of human action recognition approaches based on visual data recorded from a single video camera. We propose an organizing framework which puts in evidence the evolution of the area, with techniques moving from heavily constrained motion capture scenarios towards more challenging, realistic, "in the wild" videos. The proposed organization is based on the representation used as input for the recognition task, emphasizing the hypothesis assumed and thus, the constraints imposed on the type of video that each technique is able to address. Expliciting the hypothesis and constraints makes the framework particularly useful to select a method, given an application. Another advantage of the proposed organization is that it allows categorizing newest approaches seamlessly with traditional ones, while providing an insightful perspective of the evolution of the action recognition task up to now. That perspective is the basis for the discussion in the end of the paper, where we also present the main open issues in the area.Comment: Preprint submitted to CVIU, survey paper, 46 pages, 2 figures, 4 table
    corecore