Search CORE

10 research outputs found

Unsupervised Semantic Hashing with Pairwise Reconstruction

Author: Alstrup Stephen
Hansen Casper
Hansen Christian
Lioma Christina
Simonsen Jakob Grue
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Semantic Hashing is a popular family of methods for efficient similarity search in large-scale datasets. In Semantic Hashing, documents are encoded as short binary vectors (i.e., hash codes), such that semantic similarity can be efficiently computed using the Hamming distance. Recent state-of-the-art approaches have utilized weak supervision to train better performing hashing models. Inspired by this, we present Semantic Hashing with Pairwise Reconstruction (PairRec), which is a discrete variational autoencoder based hashing model. PairRec first encodes weakly supervised training pairs (a query document and a semantically similar document) into two hash codes, and then learns to reconstruct the same query document from both of these hash codes (i.e., pairwise reconstruction). This pairwise reconstruction enables our model to encode local neighbourhood structures within the hash code directly through the decoder. We experimentally compare PairRec to traditional and state-of-the-art approaches, and obtain significant performance improvements in the task of document similarity search.Comment: Accepted at SIGIR'2

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Content-aware Neural Hashing for Cold-start Recommendation

Author: Abadi Martín
Bayer Immanuel
Chaidaroon Suthee
Gionis Aristides
Hansen Casper
Hansen Casper
Hansen Casper
Karatzoglou Alexandros
Kingma Diederik P
Kingma Diederik P
Lian Defu
Liu Chenghao
Shen Dinghan
Wang Benyou
Zhang Yan
Zhang Yan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Content-aware recommendation approaches are essential for providing meaningful recommendations for \textit{new} (i.e., \textit{cold-start}) items in a recommender system. We present a content-aware neural hashing-based collaborative filtering approach (NeuHash-CF), which generates binary hash codes for users and items, such that the highly efficient Hamming distance can be used for estimating user-item relevance. NeuHash-CF is modelled as an autoencoder architecture, consisting of two joint hashing components for generating user and item hash codes. Inspired from semantic hashing, the item hashing component generates a hash code directly from an item's content information (i.e., it generates cold-start and seen item hash codes in the same manner). This contrasts existing state-of-the-art models, which treat the two item cases separately. The user hash codes are generated directly based on user id, through learning a user embedding matrix. We show experimentally that NeuHash-CF significantly outperforms state-of-the-art baselines by up to 12\% NDCG and 13\% MRR in cold-start recommendation settings, and up to 4\% in both NDCG and MRR in standard settings where all items are present while training. Our approach uses 2-4x shorter hash codes, while obtaining the same or better performance compared to the state of the art, thus consequently also enabling a notable storage reduction.Comment: Accepted to SIGIR 202

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

FSpH: Fitted spectral hashing for efficient similarity search

Author: HOI Steven C. H.
LI Jin-Tao
TANG Sheng
WANG Yu
ZHANG Yong-Dong
Publication venue: 'Elsevier BV'
Publication date: 01/07/2014
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Sparse hashing for fast multimedia search

Author: Gao S.
He X.
Heng Tao Shen
Hong Cheng
Jain P.
Jiangtao Cui
Kulis B.
Lee H.
Lee H.
Liu W.
Mu Y.
Muja M.
Norouzi M. E.
Raginsky M.
Tibshirani R.
Torralba A.
Wang J.
Wang J.
Weiss Y.
Wu M.
Xiaofeng Zhu
Zass R.
Zi Huang
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Large-Margin Learning of Compact Binary Image Encodings

Author: Anton van den Hengel
Chunhua Shen
Sakrapee Paisitkriangkrai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Extending low-rank matrix factorizations for emerging applications

Author: Zhou Ke
Publication venue: Georgia Institute of Technology
Publication date: 13/01/2014
Field of study

Low-rank matrix factorizations have become increasingly popular to project high dimensional data into latent spaces with small dimensions in order to obtain better understandings of the data and thus more accurate predictions. In particular, they have been widely applied to important applications such as collaborative filtering and social network analysis. In this thesis, I investigate the applications and extensions of the ideas of the low-rank matrix factorization to solve several practically important problems arise from collaborative filtering and social network analysis. A key challenge in recommendation system research is how to effectively profile new users, a problem generally known as \emph{cold-start} recommendation. In the first part of this work, we extend the low-rank matrix factorization by allowing the latent factors to have more complex structures --- decision trees to solve the problem of cold-start recommendations. In particular, we present \emph{functional matrix factorization} (fMF), a novel cold-start recommendation method that solves the problem of adaptive interview construction based on low-rank matrix factorizations. The second part of this work considers the efficiency problem of making recommendations in the context of large user and item spaces. Specifically, we address the problem through learning binary codes for collaborative filtering, which can be viewed as restricting the latent factors in low-rank matrix factorizations to be binary vectors that represent the binary codes for both users and items. In the third part of this work, we investigate the applications of low-rank matrix factorizations in the context of social network analysis. Specifically, we propose a convex optimization approach to discover the hidden network of social influence with low-rank and sparse structure by modeling the recurrent events at different individuals as multi-dimensional Hawkes processes, emphasizing the mutual-excitation nature of the dynamics of event occurrences. The proposed framework combines the estimation of mutually exciting process and the low-rank matrix factorization in a principled manner. In the fourth part of this work, we estimate the triggering kernels for the Hawkes process. In particular, we focus on estimating the triggering kernels from an infinite dimensional functional space with the Euler Lagrange equation, which can be viewed as applying the idea of low-rank factorizations in the functional space.Ph.D

Scholarly Materials And Research @ Georgia Tech

Laplacian co-hashing of terms and documents

Author: Cai D.
Lu J.
Wang J.
Zhang Dell
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes within a short Hamming distance. In this paper, we introduce the novel problem of co-hashing where both documents and terms are hashed simultaneously according to their semantic similarities. Furthermore, we propose a novel algorithm Laplacian Co-Hashing (LCH) to solve this problem which directly optimises the Hamming distance

CiteSeerX

UCL Discovery

Birkbeck Institutional Research Online