5,208 research outputs found
Long-tail Cross Modal Hashing
Existing Cross Modal Hashing (CMH) methods are mainly designed for balanced
data, while imbalanced data with long-tail distribution is more general in
real-world. Several long-tail hashing methods have been proposed but they can
not adapt for multi-modal data, due to the complex interplay between labels and
individuality and commonality information of multi-modal data. Furthermore, CMH
methods mostly mine the commonality of multi-modal data to learn hash codes,
which may override tail labels encoded by the individuality of respective
modalities. In this paper, we propose LtCMH (Long-tail CMH) to handle
imbalanced multi-modal data. LtCMH firstly adopts auto-encoders to mine the
individuality and commonality of different modalities by minimizing the
dependency between the individuality of respective modalities and by enhancing
the commonality of these modalities. Then it dynamically combines the
individuality and commonality with direct features extracted from respective
modalities to create meta features that enrich the representation of tail
labels, and binaries meta features to generate hash codes. LtCMH significantly
outperforms state-of-the-art baselines on long-tail datasets and holds a better
(or comparable) performance on datasets with balanced labels.Comment: Accepted by the Thirty-Seventh AAAI Conference on Artificial
Intelligence(AAAI2023
Long-tail hashing
Hashing, which represents data items as compact binary codes, has been becoming a more and more popular technique, e.g., for large-scale image retrieval, owing to its super fast search speed as well as its extremely economical memory consumption. However, existing hashing methods all try to learn binary codes from artificially balanced datasets which are not commonly available in real-world scenarios. In this paper, we propose Long-Tail Hashing Network (LTHNet), a novel two-stage deep hashing approach that addresses the problem of learning to hash for more realistic datasets where the data labels roughly exhibit a long-tail distribution. Specifically, the first stage is to learn relaxed embeddings of the given dataset with its long-tail characteristic taken into account via an end-to-end deep neural network; the second stage is to binarize those obtained embeddings. A critical part of LTHNet is its dynamic meta-embedding module extended with a determinantal point process which can adaptively realize visual knowledge transfer between head and tail classes, and thus enrich image representations for hashing. Our experiments have shown that LTHNet achieves dramatic performance improvements over all state-of-the-art competitors on long-tail datasets, with no or little sacrifice on balanced datasets. Further analyses reveal that while to our surprise directly manipulating class weights in the loss function has little effect, the extended dynamic meta-embedding module, the usage of cross-entropy loss instead of square loss, and the relatively small batch-size for training all contribute to LTHNet's success
Fast and Powerful Hashing using Tabulation
Randomized algorithms are often enjoyed for their simplicity, but the hash
functions employed to yield the desired probabilistic guarantees are often too
complicated to be practical. Here we survey recent results on how simple
hashing schemes based on tabulation provide unexpectedly strong guarantees.
Simple tabulation hashing dates back to Zobrist [1970]. Keys are viewed as
consisting of characters and we have precomputed character tables
mapping characters to random hash values. A key
is hashed to . This schemes is
very fast with character tables in cache. While simple tabulation is not even
4-independent, it does provide many of the guarantees that are normally
obtained via higher independence, e.g., linear probing and Cuckoo hashing.
Next we consider twisted tabulation where one input character is "twisted" in
a simple way. The resulting hash function has powerful distributional
properties: Chernoff-Hoeffding type tail bounds and a very small bias for
min-wise hashing. This also yields an extremely fast pseudo-random number
generator that is provably good for many classic randomized algorithms and
data-structures.
Finally, we consider double tabulation where we compose two simple tabulation
functions, applying one to the output of the other, and show that this yields
very high independence in the classic framework of Carter and Wegman [1977]. In
fact, w.h.p., for a given set of size proportional to that of the space
consumed, double tabulation gives fully-random hashing. We also mention some
more elaborate tabulation schemes getting near-optimal independence for given
time and space.
While these tabulation schemes are all easy to implement and use, their
analysis is not
Topological Quantum Gate Construction by Iterative Pseudogroup Hashing
We describe the hashing technique to obtain a fast approximation of a target
quantum gate in the unitary group SU(2) represented by a product of the
elements of a universal basis. The hashing exploits the structure of the
icosahedral group [or other finite subgroups of SU(2)] and its pseudogroup
approximations to reduce the search within a small number of elements. One of
the main advantages of the pseudogroup hashing is the possibility to iterate to
obtain more accurate representations of the targets in the spirit of the
renormalization group approach. We describe the iterative pseudogroup hashing
algorithm using the universal basis given by the braidings of Fibonacci anyons.
The analysis of the efficiency of the iterations based on the random matrix
theory indicates that the runtime and the braid length scale
poly-logarithmically with the final error, comparing favorably to the
Solovay-Kitaev algorithm.Comment: 20 pages, 5 figure
- …