5,208 research outputs found

    Long-tail Cross Modal Hashing

    Full text link
    Existing Cross Modal Hashing (CMH) methods are mainly designed for balanced data, while imbalanced data with long-tail distribution is more general in real-world. Several long-tail hashing methods have been proposed but they can not adapt for multi-modal data, due to the complex interplay between labels and individuality and commonality information of multi-modal data. Furthermore, CMH methods mostly mine the commonality of multi-modal data to learn hash codes, which may override tail labels encoded by the individuality of respective modalities. In this paper, we propose LtCMH (Long-tail CMH) to handle imbalanced multi-modal data. LtCMH firstly adopts auto-encoders to mine the individuality and commonality of different modalities by minimizing the dependency between the individuality of respective modalities and by enhancing the commonality of these modalities. Then it dynamically combines the individuality and commonality with direct features extracted from respective modalities to create meta features that enrich the representation of tail labels, and binaries meta features to generate hash codes. LtCMH significantly outperforms state-of-the-art baselines on long-tail datasets and holds a better (or comparable) performance on datasets with balanced labels.Comment: Accepted by the Thirty-Seventh AAAI Conference on Artificial Intelligence(AAAI2023

    Long-tail hashing

    Get PDF
    Hashing, which represents data items as compact binary codes, has been becoming a more and more popular technique, e.g., for large-scale image retrieval, owing to its super fast search speed as well as its extremely economical memory consumption. However, existing hashing methods all try to learn binary codes from artificially balanced datasets which are not commonly available in real-world scenarios. In this paper, we propose Long-Tail Hashing Network (LTHNet), a novel two-stage deep hashing approach that addresses the problem of learning to hash for more realistic datasets where the data labels roughly exhibit a long-tail distribution. Specifically, the first stage is to learn relaxed embeddings of the given dataset with its long-tail characteristic taken into account via an end-to-end deep neural network; the second stage is to binarize those obtained embeddings. A critical part of LTHNet is its dynamic meta-embedding module extended with a determinantal point process which can adaptively realize visual knowledge transfer between head and tail classes, and thus enrich image representations for hashing. Our experiments have shown that LTHNet achieves dramatic performance improvements over all state-of-the-art competitors on long-tail datasets, with no or little sacrifice on balanced datasets. Further analyses reveal that while to our surprise directly manipulating class weights in the loss function has little effect, the extended dynamic meta-embedding module, the usage of cross-entropy loss instead of square loss, and the relatively small batch-size for training all contribute to LTHNet's success

    Fast and Powerful Hashing using Tabulation

    Get PDF
    Randomized algorithms are often enjoyed for their simplicity, but the hash functions employed to yield the desired probabilistic guarantees are often too complicated to be practical. Here we survey recent results on how simple hashing schemes based on tabulation provide unexpectedly strong guarantees. Simple tabulation hashing dates back to Zobrist [1970]. Keys are viewed as consisting of cc characters and we have precomputed character tables h1,...,hch_1,...,h_c mapping characters to random hash values. A key x=(x1,...,xc)x=(x_1,...,x_c) is hashed to h1[x1]⊕h2[x2].....⊕hc[xc]h_1[x_1] \oplus h_2[x_2].....\oplus h_c[x_c]. This schemes is very fast with character tables in cache. While simple tabulation is not even 4-independent, it does provide many of the guarantees that are normally obtained via higher independence, e.g., linear probing and Cuckoo hashing. Next we consider twisted tabulation where one input character is "twisted" in a simple way. The resulting hash function has powerful distributional properties: Chernoff-Hoeffding type tail bounds and a very small bias for min-wise hashing. This also yields an extremely fast pseudo-random number generator that is provably good for many classic randomized algorithms and data-structures. Finally, we consider double tabulation where we compose two simple tabulation functions, applying one to the output of the other, and show that this yields very high independence in the classic framework of Carter and Wegman [1977]. In fact, w.h.p., for a given set of size proportional to that of the space consumed, double tabulation gives fully-random hashing. We also mention some more elaborate tabulation schemes getting near-optimal independence for given time and space. While these tabulation schemes are all easy to implement and use, their analysis is not

    Topological Quantum Gate Construction by Iterative Pseudogroup Hashing

    Full text link
    We describe the hashing technique to obtain a fast approximation of a target quantum gate in the unitary group SU(2) represented by a product of the elements of a universal basis. The hashing exploits the structure of the icosahedral group [or other finite subgroups of SU(2)] and its pseudogroup approximations to reduce the search within a small number of elements. One of the main advantages of the pseudogroup hashing is the possibility to iterate to obtain more accurate representations of the targets in the spirit of the renormalization group approach. We describe the iterative pseudogroup hashing algorithm using the universal basis given by the braidings of Fibonacci anyons. The analysis of the efficiency of the iterations based on the random matrix theory indicates that the runtime and the braid length scale poly-logarithmically with the final error, comparing favorably to the Solovay-Kitaev algorithm.Comment: 20 pages, 5 figure
    • …
    corecore