325 research outputs found
Latent Structure Preserving Hashing
Aiming at efficient similarity search, hash functions are designed to embed high-dimensional feature descriptors to low-dimensional binary codes such that similar descriptors will lead to binary codes with a short distance in the Hamming space. It is critical to effectively maintain the intrinsic structure and preserve the original information of data in a hashing algorithm. In this paper, we propose a novel hashing algorithm called Latent Structure Preserving Hashing (LSPH), with the target of finding a well-structured low-dimensional data representation from the original high-dimensional data through a novel objective function based on Nonnegative Matrix Factorization (NMF) with their corresponding Kullback-Leibler divergence of data distribution as the regularization term. Via exploiting the joint probabilistic distribution of data, LSPH can automatically learn the latent information and successfully preserve the structure of high-dimensional data. To further achieve robust performance with complex and nonlinear data, in this paper, we also contribute a more generalized multi-layer LSPH (ML-LSPH) framework, in which hierarchical representations can be effectively learned by a multiplicative up-propagation algorithm. Once obtaining the latent representations, the hash functions can be easily acquired through multi-variable logistic regression. Experimental results on three large-scale retrieval datasets, i.e., SIFT 1M, GIST 1M and 500 K TinyImage, show that ML-LSPH can achieve better performance than the single-layer LSPH and both of them outperform existing hashing techniques on large-scale data
Structure-Preserving Binary Representations for RGB-D Action Recognition
In this paper, we propose a novel binary local representation for RGB-D video data fusion with a structure-preserving projection. Our contribution consists of two aspects. To acquire a general feature for the video data, we convert the problem to describing the gradient fields of RGB and depth information of video sequences. With the local fluxes of the gradient fields, which include the orientation and the magnitude of the neighborhood of each point, a new kind of continuous local descriptor called Local Flux Feature(LFF) is obtained. Then the LFFs from RGB and depth channels are fused into a Hamming spacevia the Structure Preserving Projection (SPP). Specifically, an orthogonal projection matrix is applied to preserve the pairwise structure with a shape constraint to avoid the collapse of data structure in the projected space. Furthermore, a bipartite graph structure of data is taken into consideration, which is regarded as a higher level connection between samples and classes than the pairwise structure of local features. The extensive experiments show not only the high efficiency of binary codes and the effectiveness of combining LFFs from RGB-D channels via SPP on various action recognition benchmarks of RGB-D data, but also the potential power of LFF for general action recognition
Unsupervised Local Feature Hashing for Image Similarity Search
The potential value of hashing techniques has led to it becoming one of the most active research areas in computer vision and multimedia. However, most existing hashing methods for image search and retrieval are based on global feature representations, which are susceptible to image variations such as viewpoint changes and background cluttering. Traditional global representations gather local features directly to output a single vector without the analysis of the intrinsic geometric property of local features. In this paper, we propose a novel unsupervised hashing method called unsupervised bilinear local hashing (UBLH) for projecting local feature descriptors from a high-dimensional feature space to a lower-dimensional Hamming space via compact bilinear projections rather than a single large projection matrix. UBLH takes the matrix expression of local features as input and preserves the feature-to-feature and image-to-image structures of local features simultaneously. Experimental results on challenging data sets including Caltech-256, SUN397, and Flickr 1M demonstrate the superiority of UBLH compared with state-of-the-art hashing methods
Projection Bank: From High-Dimensional Data to Medium-Length Binary Codes
Recently, very high-dimensional feature representations, e.g., Fisher Vector, have achieved excellent performance for visual recognition and retrieval. However, these lengthy representations always cause extremely heavy computational and storage costs and even become unfeasible in some large-scale applications. A few existing techniques can transfer very high-dimensional data into binary codes, but they still require the reduced code length to be relatively long to maintain acceptable accuracies. To target a better balance between computational efficiency and accuracies, in this paper, we propose a novel embedding method called Binary Projection Bank (BPB), which can effectively reduce the very high-dimensional representations to medium-dimensional binary codes without sacrificing accuracies. Instead of using conventional single linear or bilinear projections, the proposed method learns a bank of small projections via the max-margin constraint to optimally preserve the intrinsic data similarity. We have systematically evaluated the proposed method on three datasets: Flickr 1M, ILSVR2010 and UCF101, showing competitive retrieval and recognition accuracies compared with state-of-the-art approaches, but with a significantly smaller memory footprint and lower coding complexity
Exploiting Spatial-temporal Correlations for Video Anomaly Detection
Video anomaly detection (VAD) remains a challenging task in the pattern
recognition community due to the ambiguity and diversity of abnormal events.
Existing deep learning-based VAD methods usually leverage proxy tasks to learn
the normal patterns and discriminate the instances that deviate from such
patterns as abnormal. However, most of them do not take full advantage of
spatial-temporal correlations among video frames, which is critical for
understanding normal patterns. In this paper, we address unsupervised VAD by
learning the evolution regularity of appearance and motion in the long and
short-term and exploit the spatial-temporal correlations among consecutive
frames in normal videos more adequately. Specifically, we proposed to utilize
the spatiotemporal long short-term memory (ST-LSTM) to extract and memorize
spatial appearances and temporal variations in a unified memory cell. In
addition, inspired by the generative adversarial network, we introduce a
discriminator to perform adversarial learning with the ST-LSTM to enhance the
learning capability. Experimental results on standard benchmarks demonstrate
the effectiveness of spatial-temporal correlations for unsupervised VAD. Our
method achieves competitive performance compared to the state-of-the-art
methods with AUCs of 96.7%, 87.8%, and 73.1% on the UCSD Ped2, CUHK Avenue, and
ShanghaiTech, respectively.Comment: This paper is accepted at IEEE 26TH International Conference on
Pattern Recognition (ICPR) 202
Diffusion360: Seamless 360 Degree Panoramic Image Generation based on Diffusion Models
This is a technical report on the 360-degree panoramic image generation task
based on diffusion models. Unlike ordinary 2D images, 360-degree panoramic
images capture the entire field of view. So the
rightmost and the leftmost sides of the 360 panoramic image should be
continued, which is the main challenge in this field. However, the current
diffusion pipeline is not appropriate for generating such a seamless 360-degree
panoramic image. To this end, we propose a circular blending strategy on both
the denoising and VAE decoding stages to maintain the geometry continuity.
Based on this, we present two models for \textbf{Text-to-360-panoramas} and
\textbf{Single-Image-to-360-panoramas} tasks. The code has been released as an
open-source project at
\href{https://github.com/ArcherFMY/SD-T2I-360PanoImage}{https://github.com/ArcherFMY/SD-T2I-360PanoImage}
and
\href{https://www.modelscope.cn/models/damo/cv_diffusion_text-to-360panorama-image_generation/summary}{ModelScope}Comment: 2 pages, 8 figures, Tech. Repor
Ab initio uncertainty quantification in scattering analysis of microscopy
Estimating parameters from data is a fundamental problem in physics,
customarily done by minimizing a loss function between a model and observed
statistics. In scattering-based analysis, researchers often employ their domain
expertise to select a specific range of wavevectors for analysis, a choice that
can vary depending on the specific case. We introduce another paradigm that
defines a probabilistic generative model from the beginning of data processing
and propagates the uncertainty for parameter estimation, termed ab initio
uncertainty quantification (AIUQ). As an illustrative example, we demonstrate
this approach with differential dynamic microscopy (DDM) that extracts
dynamical information through Fourier analysis at a selected range of
wavevectors. We first show that DDM is equivalent to fitting a temporal
variogram in the reciprocal space using a latent factor model as the generative
model. Then we derive the maximum marginal likelihood estimator, which
optimally weighs information at all wavevectors, therefore eliminating the need
to select the range of wavevectors. Furthermore, we substantially reduce the
computational cost by utilizing the generalized Schur algorithm for Toeplitz
covariances without approximation. Simulated studies validate that AIUQ
significantly improves estimation accuracy and enables model selection with
automated analysis. The utility of AIUQ is also demonstrated by three distinct
sets of experiments: first in an isotropic Newtonian fluid, pushing limits of
optically dense systems compared to multiple particle tracking; next in a
system undergoing a sol-gel transition, automating the determination of gelling
points and critical exponent; and lastly, in discerning anisotropic diffusive
behavior of colloids in a liquid crystal. These outcomes collectively
underscore AIUQ's versatility to capture system dynamics in an efficient and
automated manner
- …