664 research outputs found
Selective Deep Convolutional Features for Image Retrieval
Convolutional Neural Network (CNN) is a very powerful approach to extract
discriminative local descriptors for effective image search. Recent work adopts
fine-tuned strategies to further improve the discriminative power of the
descriptors. Taking a different approach, in this paper, we propose a novel
framework to achieve competitive retrieval performance. Firstly, we propose
various masking schemes, namely SIFT-mask, SUM-mask, and MAX-mask, to select a
representative subset of local convolutional features and remove a large number
of redundant features. We demonstrate that this can effectively address the
burstiness issue and improve retrieval accuracy. Secondly, we propose to employ
recent embedding and aggregating methods to further enhance feature
discriminability. Extensive experiments demonstrate that our proposed framework
achieves state-of-the-art retrieval accuracy.Comment: Accepted to ACM MM 201
A reliable order-statistics-based approximate nearest neighbor search algorithm
We propose a new algorithm for fast approximate nearest neighbor search based
on the properties of ordered vectors. Data vectors are classified based on the
index and sign of their largest components, thereby partitioning the space in a
number of cones centered in the origin. The query is itself classified, and the
search starts from the selected cone and proceeds to neighboring ones. Overall,
the proposed algorithm corresponds to locality sensitive hashing in the space
of directions, with hashing based on the order of components. Thanks to the
statistical features emerging through ordering, it deals very well with the
challenging case of unstructured data, and is a valuable building block for
more complex techniques dealing with structured data. Experiments on both
simulated and real-world data prove the proposed algorithm to provide a
state-of-the-art performance
FedHAP: Federated Hashing with Global Prototypes for Cross-silo Retrieval
Deep hashing has been widely applied in large-scale data retrieval due to its
superior retrieval efficiency and low storage cost. However, data are often
scattered in data silos with privacy concerns, so performing centralized data
storage and retrieval is not always possible. Leveraging the concept of
federated learning (FL) to perform deep hashing is a recent research trend.
However, existing frameworks mostly rely on the aggregation of the local deep
hashing models, which are trained by performing similarity learning with local
skewed data only. Therefore, they cannot work well for non-IID clients in a
real federated environment. To overcome these challenges, we propose a novel
federated hashing framework that enables participating clients to jointly train
the shared deep hashing model by leveraging the prototypical hash codes for
each class. Globally, the transmission of global prototypes with only one
prototypical hash code per class will minimize the impact of communication cost
and privacy risk. Locally, the use of global prototypes are maximized by
jointly training a discriminator network and the local hashing network.
Extensive experiments on benchmark datasets are conducted to demonstrate that
our method can significantly improve the performance of the deep hashing model
in the federated environments with non-IID data distributions
Vectors of Locally Aggregated Centers for Compact Video Representation
We propose a novel vector aggregation technique for compact video
representation, with application in accurate similarity detection within large
video datasets. The current state-of-the-art in visual search is formed by the
vector of locally aggregated descriptors (VLAD) of Jegou et. al. VLAD generates
compact video representations based on scale-invariant feature transform (SIFT)
vectors (extracted per frame) and local feature centers computed over a
training set. With the aim to increase robustness to visual distortions, we
propose a new approach that operates at a coarser level in the feature
representation. We create vectors of locally aggregated centers (VLAC) by first
clustering SIFT features to obtain local feature centers (LFCs) and then
encoding the latter with respect to given centers of local feature centers
(CLFCs), extracted from a training set. The sum-of-differences between the LFCs
and the CLFCs are aggregated to generate an extremely-compact video description
used for accurate video segment similarity detection. Experimentation using a
video dataset, comprising more than 1000 minutes of content from the Open Video
Project, shows that VLAC obtains substantial gains in terms of mean Average
Precision (mAP) against VLAD and the hyper-pooling method of Douze et. al.,
under the same compaction factor and the same set of distortions.Comment: Proc. IEEE International Conference on Multimedia and Expo, ICME
2015, Torino, Ital
Efficient image copy detection using multi-scale fingerprints
Inspired by multi-resolution histogram, we propose
a multi-scale SIFT descriptor to improve the discriminability.
A series of SIFT descriptions with different scale are first
acquired by varying the actual size of each spatial bin. Then
principle component analysis (PCA) is employed to reduce them
to low dimensional vectors, which are further combined into one
128-dimension multi-scale SIFT description. Next, an entropy
maximization based binarization is employed to encode the
descriptions into binary codes called fingerprints for indexing
the local features. Furthermore, an efficient search architecture
consisting of lookup tables and inverted image ID list is designed
to improve the query speed. Since the fingerprint building is
of low-complexity, this method is very efficient and scalable to
very large databases. In addition, the multi-scale fingerprints
are very discriminative such that the copies can be effectively
distinguished from similar objects, which leads to an improved
performance in the detection of copies. The experimental evaluation shows that our approach outperforms the state of the art
methods.Inspired by multi-resolution histogram, we propose a multi-scale SIFT descriptor to improve the discriminability. A series of SIFT descriptions with different scale are first acquired by varying the actual size of each spatial bin. Then principle component analysis (PCA) is employed to reduce them to low dimensional vectors, which are further combined into one 128-dimension multi-scale SIFT description. Next, an entropy maximization based binarization is employed to encode the descriptions into binary codes called fingerprints for indexing the local features. Furthermore, an efficient search architecture consisting of lookup tables and inverted image ID list is designed to improve the query speed. Since the fingerprint building is of low-complexity, this method is very efficient and scalable to very large databases. In addition, the multi-scale fingerprints are very discriminative such that the copies can be effectively distinguished from similar objects, which leads to an improved performance in the detection of copies. The experimental evaluation shows that our approach outperforms the state of the art methods
- …