325 research outputs found

    Latent Structure Preserving Hashing

    Get PDF
    Aiming at efficient similarity search, hash functions are designed to embed high-dimensional feature descriptors to low-dimensional binary codes such that similar descriptors will lead to binary codes with a short distance in the Hamming space. It is critical to effectively maintain the intrinsic structure and preserve the original information of data in a hashing algorithm. In this paper, we propose a novel hashing algorithm called Latent Structure Preserving Hashing (LSPH), with the target of finding a well-structured low-dimensional data representation from the original high-dimensional data through a novel objective function based on Nonnegative Matrix Factorization (NMF) with their corresponding Kullback-Leibler divergence of data distribution as the regularization term. Via exploiting the joint probabilistic distribution of data, LSPH can automatically learn the latent information and successfully preserve the structure of high-dimensional data. To further achieve robust performance with complex and nonlinear data, in this paper, we also contribute a more generalized multi-layer LSPH (ML-LSPH) framework, in which hierarchical representations can be effectively learned by a multiplicative up-propagation algorithm. Once obtaining the latent representations, the hash functions can be easily acquired through multi-variable logistic regression. Experimental results on three large-scale retrieval datasets, i.e., SIFT 1M, GIST 1M and 500 K TinyImage, show that ML-LSPH can achieve better performance than the single-layer LSPH and both of them outperform existing hashing techniques on large-scale data

    Structure-Preserving Binary Representations for RGB-D Action Recognition

    Get PDF
    In this paper, we propose a novel binary local representation for RGB-D video data fusion with a structure-preserving projection. Our contribution consists of two aspects. To acquire a general feature for the video data, we convert the problem to describing the gradient fields of RGB and depth information of video sequences. With the local fluxes of the gradient fields, which include the orientation and the magnitude of the neighborhood of each point, a new kind of continuous local descriptor called Local Flux Feature(LFF) is obtained. Then the LFFs from RGB and depth channels are fused into a Hamming spacevia the Structure Preserving Projection (SPP). Specifically, an orthogonal projection matrix is applied to preserve the pairwise structure with a shape constraint to avoid the collapse of data structure in the projected space. Furthermore, a bipartite graph structure of data is taken into consideration, which is regarded as a higher level connection between samples and classes than the pairwise structure of local features. The extensive experiments show not only the high efficiency of binary codes and the effectiveness of combining LFFs from RGB-D channels via SPP on various action recognition benchmarks of RGB-D data, but also the potential power of LFF for general action recognition

    Unsupervised Local Feature Hashing for Image Similarity Search

    Get PDF
    The potential value of hashing techniques has led to it becoming one of the most active research areas in computer vision and multimedia. However, most existing hashing methods for image search and retrieval are based on global feature representations, which are susceptible to image variations such as viewpoint changes and background cluttering. Traditional global representations gather local features directly to output a single vector without the analysis of the intrinsic geometric property of local features. In this paper, we propose a novel unsupervised hashing method called unsupervised bilinear local hashing (UBLH) for projecting local feature descriptors from a high-dimensional feature space to a lower-dimensional Hamming space via compact bilinear projections rather than a single large projection matrix. UBLH takes the matrix expression of local features as input and preserves the feature-to-feature and image-to-image structures of local features simultaneously. Experimental results on challenging data sets including Caltech-256, SUN397, and Flickr 1M demonstrate the superiority of UBLH compared with state-of-the-art hashing methods

    Projection Bank: From High-Dimensional Data to Medium-Length Binary Codes

    Get PDF
    Recently, very high-dimensional feature representations, e.g., Fisher Vector, have achieved excellent performance for visual recognition and retrieval. However, these lengthy representations always cause extremely heavy computational and storage costs and even become unfeasible in some large-scale applications. A few existing techniques can transfer very high-dimensional data into binary codes, but they still require the reduced code length to be relatively long to maintain acceptable accuracies. To target a better balance between computational efficiency and accuracies, in this paper, we propose a novel embedding method called Binary Projection Bank (BPB), which can effectively reduce the very high-dimensional representations to medium-dimensional binary codes without sacrificing accuracies. Instead of using conventional single linear or bilinear projections, the proposed method learns a bank of small projections via the max-margin constraint to optimally preserve the intrinsic data similarity. We have systematically evaluated the proposed method on three datasets: Flickr 1M, ILSVR2010 and UCF101, showing competitive retrieval and recognition accuracies compared with state-of-the-art approaches, but with a significantly smaller memory footprint and lower coding complexity

    Exploiting Spatial-temporal Correlations for Video Anomaly Detection

    Full text link
    Video anomaly detection (VAD) remains a challenging task in the pattern recognition community due to the ambiguity and diversity of abnormal events. Existing deep learning-based VAD methods usually leverage proxy tasks to learn the normal patterns and discriminate the instances that deviate from such patterns as abnormal. However, most of them do not take full advantage of spatial-temporal correlations among video frames, which is critical for understanding normal patterns. In this paper, we address unsupervised VAD by learning the evolution regularity of appearance and motion in the long and short-term and exploit the spatial-temporal correlations among consecutive frames in normal videos more adequately. Specifically, we proposed to utilize the spatiotemporal long short-term memory (ST-LSTM) to extract and memorize spatial appearances and temporal variations in a unified memory cell. In addition, inspired by the generative adversarial network, we introduce a discriminator to perform adversarial learning with the ST-LSTM to enhance the learning capability. Experimental results on standard benchmarks demonstrate the effectiveness of spatial-temporal correlations for unsupervised VAD. Our method achieves competitive performance compared to the state-of-the-art methods with AUCs of 96.7%, 87.8%, and 73.1% on the UCSD Ped2, CUHK Avenue, and ShanghaiTech, respectively.Comment: This paper is accepted at IEEE 26TH International Conference on Pattern Recognition (ICPR) 202

    Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models

    Full text link
    This is a technical report on the 360-degree panoramic image generation task based on diffusion models. Unlike ordinary 2D images, 360-degree panoramic images capture the entire 360×180360^\circ\times 180^\circ field of view. So the rightmost and the leftmost sides of the 360 panoramic image should be continued, which is the main challenge in this field. However, the current diffusion pipeline is not appropriate for generating such a seamless 360-degree panoramic image. To this end, we propose a circular blending strategy on both the denoising and VAE decoding stages to maintain the geometry continuity. Based on this, we present two models for \textbf{Text-to-360-panoramas} and \textbf{Single-Image-to-360-panoramas} tasks. The code has been released as an open-source project at \href{https://github.com/ArcherFMY/SD-T2I-360PanoImage}{https://github.com/ArcherFMY/SD-T2I-360PanoImage} and \href{https://www.modelscope.cn/models/damo/cv_diffusion_text-to-360panorama-image_generation/summary}{ModelScope}Comment: 2 pages, 8 figures, Tech. Repor

    Ab initio uncertainty quantification in scattering analysis of microscopy

    Full text link
    Estimating parameters from data is a fundamental problem in physics, customarily done by minimizing a loss function between a model and observed statistics. In scattering-based analysis, researchers often employ their domain expertise to select a specific range of wavevectors for analysis, a choice that can vary depending on the specific case. We introduce another paradigm that defines a probabilistic generative model from the beginning of data processing and propagates the uncertainty for parameter estimation, termed ab initio uncertainty quantification (AIUQ). As an illustrative example, we demonstrate this approach with differential dynamic microscopy (DDM) that extracts dynamical information through Fourier analysis at a selected range of wavevectors. We first show that DDM is equivalent to fitting a temporal variogram in the reciprocal space using a latent factor model as the generative model. Then we derive the maximum marginal likelihood estimator, which optimally weighs information at all wavevectors, therefore eliminating the need to select the range of wavevectors. Furthermore, we substantially reduce the computational cost by utilizing the generalized Schur algorithm for Toeplitz covariances without approximation. Simulated studies validate that AIUQ significantly improves estimation accuracy and enables model selection with automated analysis. The utility of AIUQ is also demonstrated by three distinct sets of experiments: first in an isotropic Newtonian fluid, pushing limits of optically dense systems compared to multiple particle tracking; next in a system undergoing a sol-gel transition, automating the determination of gelling points and critical exponent; and lastly, in discerning anisotropic diffusive behavior of colloids in a liquid crystal. These outcomes collectively underscore AIUQ's versatility to capture system dynamics in an efficient and automated manner
    corecore