Search CORE

1,437 research outputs found

ForestHash: Semantic Hashing With Shallow Random Forests and Tiny Convolutional Networks

Author: A Criminisi
A Krause
AL Yuille
BK Sriperumbudur
G Huang
G Nemhauser
GA Watson
J Masci
J Shotton
JR Quinlan
K He
L Breiman
M Aharon
ME Hellman
Q Qiu
Y Bengio
Y Lecun
Publication venue
Publication date: 27/07/2018
Field of study

Hash codes are efficient data representations for coping with the ever growing amounts of data. In this paper, we introduce a random forest semantic hashing scheme that embeds tiny convolutional neural networks (CNN) into shallow random forests, with near-optimal information-theoretic code aggregation among trees. We start with a simple hashing scheme, where random trees in a forest act as hashing functions by setting `1' for the visited tree leaf, and `0' for the rest. We show that traditional random forests fail to generate hashes that preserve the underlying similarity between the trees, rendering the random forests approach to hashing challenging. To address this, we propose to first randomly group arriving classes at each tree split node into two groups, obtaining a significantly simplified two-class classification problem, which can be handled using a light-weight CNN weak learner. Such random class grouping scheme enables code uniqueness by enforcing each class to share its code with different classes in different trees. A non-conventional low-rank loss is further adopted for the CNN weak learners to encourage code consistency by minimizing intra-class variations and maximizing inter-class distance for the two random class groups. Finally, we introduce an information-theoretic approach for aggregating codes of individual trees into a single hash code, producing a near-optimal unique hash for each class. The proposed approach significantly outperforms state-of-the-art hashing methods for image retrieval tasks on large-scale public datasets, while performing at the level of other state-of-the-art image classification techniques while utilizing a more compact and efficient scalable representation. This work proposes a principled and robust procedure to train and deploy in parallel an ensemble of light-weight CNNs, instead of simply going deeper.Comment: Accepted to ECCV 201

arXiv.org e-Print Archive

Crossref

MIHash: Online Hashing with Mutual Information

Author: Bargal Sarah Adel
Cakir Fatih
He Kun
Sclaroff Stan
Publication venue
Publication date: 29/07/2017
Field of study

Learning-based hashing methods are widely used for nearest neighbor retrieval, and recently, online hashing methods have demonstrated good performance-complexity trade-offs by learning hash functions from streaming data. In this paper, we first address a key challenge for online hashing: the binary codes for indexed data must be recomputed to keep pace with updates to the hash functions. We propose an efficient quality measure for hash functions, based on an information-theoretic quantity, mutual information, and use it successfully as a criterion to eliminate unnecessary hash table updates. Next, we also show how to optimize the mutual information objective using stochastic gradient descent. We thus develop a novel hashing method, MIHash, that can be used in both online and batch settings. Experiments on image retrieval benchmarks (including a 2.5M image dataset) confirm the effectiveness of our formulation, both in reducing hash table recomputations and in learning high-quality hash functions.Comment: International Conference on Computer Vision (ICCV), 201

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

Hashing for Similarity Search: A Survey

Author: Ji Jianqiu
Shen Heng Tao
Song Jingkuan
Wang Jingdong
Publication venue
Publication date: 13/08/2014
Field of study

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

arXiv.org e-Print Archive

CiteSeerX

Can Quantum Key Distribution Be Secure

Author: Yuen Horace P.
Publication venue
Publication date: 21/09/2014
Field of study

The importance of quantum key distribution as a cryptographic method depends upon its purported strong security guarantee. The following gives reasons on why such strong security guarantee has not been validly established and why good QKD security is difficult to obtain.Comment: This new version is a rewriting of the last v1 for a broader group of readers. It also contains a new specific counter-example not in v

arXiv.org e-Print Archive

CiteSeerX

Achievable secrecy enchancement through joint encryption and privacy amplification

Author: Sowti Khiabani Yahya
Publication venue: LSU Digital Commons
Publication date: 01/01/2013
Field of study

In this dissertation we try to achieve secrecy enhancement in communications by resorting to both cryptographic and information theoretic secrecy tools and metrics. Our objective is to unify tools and measures from cryptography community with techniques and metrics from information theory community that are utilized to provide privacy and confidentiality in communication systems. For this purpose we adopt encryption techniques accompanied with privacy amplification tools in order to achieve secrecy goals that are determined based on information theoretic and cryptographic metrics. Every secrecy scheme relies on a certain advantage for legitimate users over adversaries viewed as an asymmetry in the system to deliver the required security for data transmission. In all of the proposed schemes in this dissertation, we resort to either inherently existing asymmetry in the system or proactively created advantage for legitimate users over a passive eavesdropper to further enhance secrecy of the communications. This advantage is manipulated by means of privacy amplification and encryption tools to achieve secrecy goals for the system evaluated based on information theoretic and cryptographic metrics. In our first work discussed in Chapter 2 and the third work explained in Chapter 4, we rely on a proactively established advantage for legitimate users based on eavesdropper’s lack of knowledge about a shared source of data. Unlike these works that assume an errorfree physical channel, in the second work discussed in Chapter 3 correlated erasure wiretap channel model is considered. This work relies on a passive and internally existing advantage for legitimate users that is built upon statistical and partial independence of eavesdropper’s channel errors from the errors in the main channel. We arrive at this secrecy advantage for legitimate users by exploitation of an authenticated but insecure feedback channel. From the perspective of the utilized tools, the first work discussed in Chapter 2 considers a specific scenario where secrecy enhancement of a particular block cipher called Data Encryption standard (DES) operating in cipher feedback mode (CFB) is studied. This secrecy enhancement is achieved by means of deliberate noise injection and wiretap channel encoding as a technique for privacy amplification against a resource constrained eavesdropper. Compared to the first work, the third work considers a more general framework in terms of both metrics and secrecy tools. This work studies secrecy enhancement of a general cipher based on universal hashing as a privacy amplification technique against an unbounded adversary. In this work, we have achieved the goal of exponential secrecy where information leakage to adversary, that is assessed in terms of mutual information as an information theoretic measure and Eve’s distinguishability as a cryptographic metric, decays at an exponential rate. In the second work generally encrypted data frames are transmitted through Automatic Repeat reQuest (ARQ) protocol to generate a common random source between legitimate users that later on is transformed into information theoretically secure keys for encryption by means of privacy amplification based on universal hashing. Towards the end, future works as an extension of the accomplished research in this dissertation are outlined. Proofs of major theorems and lemmas are presented in the Appendix

Louisiana State University