21,091 research outputs found
Variational Deep Semantic Hashing for Text Documents
As the amount of textual data has been rapidly increasing over the past
decade, efficient similarity search methods have become a crucial component of
large-scale information retrieval systems. A popular strategy is to represent
original data samples by compact binary codes through hashing. A spectrum of
machine learning methods have been utilized, but they often lack expressiveness
and flexibility in modeling to learn effective representations. The recent
advances of deep learning in a wide range of applications has demonstrated its
capability to learn robust and powerful feature representations for complex
data. Especially, deep generative models naturally combine the expressiveness
of probabilistic generative models with the high capacity of deep neural
networks, which is very suitable for text modeling. However, little work has
leveraged the recent progress in deep learning for text hashing.
In this paper, we propose a series of novel deep document generative models
for text hashing. The first proposed model is unsupervised while the second one
is supervised by utilizing document labels/tags for hashing. The third model
further considers document-specific factors that affect the generation of
words. The probabilistic generative formulation of the proposed models provides
a principled framework for model extension, uncertainty estimation, simulation,
and interpretability. Based on variational inference and reparameterization,
the proposed models can be interpreted as encoder-decoder deep neural networks
and thus they are capable of learning complex nonlinear distributed
representations of the original documents. We conduct a comprehensive set of
experiments on four public testbeds. The experimental results have demonstrated
the effectiveness of the proposed supervised learning models for text hashing.Comment: 11 pages, 4 figure
Context-aware person identification in personal photo collections
Identifying the people in photos is an important need for users of photo management systems. We present MediAssist, one such system which facilitates browsing, searching and semi-automatic annotation of personal photos, using analysis of both image content and the context in which the photo is captured. This semi-automatic annotation includes annotation of the identity of people in photos. In this paper, we focus on such person annotation, and propose person identification techniques based on a combination of context and content. We propose language modelling and nearest neighbor approaches to context-based person identification, in addition to novel face color and image color content-based features (used alongside face recognition and body patch features). We conduct a comprehensive empirical study of these techniques using the real private photo collections of a number of users, and show that combining context- and content-based analysis improves performance over content or context alone
Thouless-Anderson-Palmer equation for analog neural network with temporally fluctuating white synaptic noise
Effects of synaptic noise on the retrieval process of associative memory
neural networks are studied from the viewpoint of neurobiological and
biophysical understanding of information processing in the brain. We
investigate the statistical mechanical properties of stochastic analog neural
networks with temporally fluctuating synaptic noise, which is assumed to be
white noise. Such networks, in general, defy the use of the replica method,
since they have no energy concept. The self-consistent signal-to-noise analysis
(SCSNA), which is an alternative to the replica method for deriving a set of
order parameter equations, requires no energy concept and thus becomes
available in studying networks without energy functions. Applying the SCSNA to
stochastic network requires the knowledge of the Thouless-Anderson-Palmer (TAP)
equation which defines the deterministic networks equivalent to the original
stochastic ones. The study of the TAP equation which is of particular interest
for the case without energy concept is very few, while it is closely related to
the SCSNA in the case with energy concept. This paper aims to derive the TAP
equation for networks with synaptic noise together with a set of order
parameter equations by a hybrid use of the cavity method and the SCSNA.Comment: 13 pages, 3 figure
Efficient Correlated Topic Modeling with Topic Embedding
Correlated topic modeling has been limited to small model and problem sizes
due to their high computational cost and poor scaling. In this paper, we
propose a new model which learns compact topic embeddings and captures topic
correlations through the closeness between the topic vectors. Our method
enables efficient inference in the low-dimensional embedding space, reducing
previous cubic or quadratic time complexity to linear w.r.t the topic size. We
further speedup variational inference with a fast sampler to exploit sparsity
of topic occurrence. Extensive experiments show that our approach is capable of
handling model and data scales which are several orders of magnitude larger
than existing correlation results, without sacrificing modeling quality by
providing competitive or superior performance in document classification and
retrieval.Comment: KDD 2017 oral. The first two authors contributed equall
Share your Model instead of your Data: Privacy Preserving Mimic Learning for Ranking
Deep neural networks have become a primary tool for solving problems in many
fields. They are also used for addressing information retrieval problems and
show strong performance in several tasks. Training these models requires large,
representative datasets and for most IR tasks, such data contains sensitive
information from users. Privacy and confidentiality concerns prevent many data
owners from sharing the data, thus today the research community can only
benefit from research on large-scale datasets in a limited manner. In this
paper, we discuss privacy preserving mimic learning, i.e., using predictions
from a privacy preserving trained model instead of labels from the original
sensitive training data as a supervision signal. We present the results of
preliminary experiments in which we apply the idea of mimic learning and
privacy preserving mimic learning for the task of document re-ranking as one of
the core IR tasks. This research is a step toward laying the ground for
enabling researchers from data-rich environments to share knowledge learned
from actual users' data, which should facilitate research collaborations.Comment: SIGIR 2017 Workshop on Neural Information Retrieval
(Neu-IR'17)}{}{August 7--11, 2017, Shinjuku, Tokyo, Japa
- …