3,826 research outputs found
Connection Discovery using Shared Images by Gaussian Relational Topic Model
Social graphs, representing online friendships among users, are one of the
fundamental types of data for many applications, such as recommendation,
virality prediction and marketing in social media. However, this data may be
unavailable due to the privacy concerns of users, or kept private by social
network operators, which makes such applications difficult. Inferring user
interests and discovering user connections through their shared multimedia
content has attracted more and more attention in recent years. This paper
proposes a Gaussian relational topic model for connection discovery using user
shared images in social media. The proposed model not only models user
interests as latent variables through their shared images, but also considers
the connections between users as a result of their shared images. It explicitly
relates user shared images to user connections in a hierarchical, systematic
and supervisory way and provides an end-to-end solution for the problem. This
paper also derives efficient variational inference and learning algorithms for
the posterior of the latent variables and model parameters. It is demonstrated
through experiments with over 200k images from Flickr that the proposed method
significantly outperforms the methods in previous works.Comment: IEEE International Conference on Big Data 201
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
Context-Dependent Diffusion Network for Visual Relationship Detection
Visual relationship detection can bridge the gap between computer vision and
natural language for scene understanding of images. Different from pure object
recognition tasks, the relation triplets of subject-predicate-object lie on an
extreme diversity space, such as \textit{person-behind-person} and
\textit{car-behind-building}, while suffering from the problem of combinatorial
explosion. In this paper, we propose a context-dependent diffusion network
(CDDN) framework to deal with visual relationship detection. To capture the
interactions of different object instances, two types of graphs, word semantic
graph and visual scene graph, are constructed to encode global context
interdependency. The semantic graph is built through language priors to model
semantic correlations across objects, whilst the visual scene graph defines the
connections of scene objects so as to utilize the surrounding scene
information. For the graph-structured data, we design a diffusion network to
adaptively aggregate information from contexts, which can effectively learn
latent representations of visual relationships and well cater to visual
relationship detection in view of its isomorphic invariance to graphs.
Experiments on two widely-used datasets demonstrate that our proposed method is
more effective and achieves the state-of-the-art performance.Comment: 8 pages, 3 figures, 2018 ACM Multimedia Conference (MM'18
Joint Intermodal and Intramodal Label Transfers for Extremely Rare or Unseen Classes
In this paper, we present a label transfer model from texts to images for
image classification tasks. The problem of image classification is often much
more challenging than text classification. On one hand, labeled text data is
more widely available than the labeled images for classification tasks. On the
other hand, text data tends to have natural semantic interpretability, and they
are often more directly related to class labels. On the contrary, the image
features are not directly related to concepts inherent in class labels. One of
our goals in this paper is to develop a model for revealing the functional
relationships between text and image features as to directly transfer
intermodal and intramodal labels to annotate the images. This is implemented by
learning a transfer function as a bridge to propagate the labels between two
multimodal spaces. However, the intermodal label transfers could be undermined
by blindly transferring the labels of noisy texts to annotate images. To
mitigate this problem, we present an intramodal label transfer process, which
complements the intermodal label transfer by transferring the image labels
instead when relevant text is absent from the source corpus. In addition, we
generalize the inter-modal label transfer to zero-shot learning scenario where
there are only text examples available to label unseen classes of images
without any positive image examples. We evaluate our algorithm on an image
classification task and show the effectiveness with respect to the other
compared algorithms.Comment: The paper has been accepted by IEEE Transactions on Pattern Analysis
and Machine Intelligence. It will apear in a future issu
Recommended from our members
Efficient Variational Inference for Hierarchical Models of Images, Text, and Networks
Variational inference provides a general optimization framework to approximate the posterior distributions of latent variables in probabilistic models. Although effective in simple scenarios, variational inference may be inaccurate or infeasible when the data is high-dimensional, the model structure is complicated, or variable relationships are non-conjugate. We propose solutions to these problems through the smart design and leverage of model structures, the rigorous derivation of variational bounds, and the creation of flexible algorithms for various models with rich, non-conjugate dependencies.Concretely, we first design an interpretable generative model for natural images, in which the hundreds of thousands of pixels per image are split into small patches represented by Gaussian mixture models. Through structured variational inference, the evidence lower bound of this model automatically recovers the popular expected patch log-likelihood method for image processing. A nonparametric extension using hierarchical Dirichlet processes further enables self-similarities to be captured and image-specific clusters created during inference, boosting image denoising and inpainting accuracy.Then we move on to text data, and design hierarchical topic graphs that generalize the bipartite noisy-OR models previously used for medical diagnosis. We derive auxiliary bounds to overcome the non-conjugacy of noisy-OR conditionals, and use stochastic variational inference to efficiently train on datasets with hundreds of thousands of documents. We dramatically increase the algorithm speed through a constrained family of variational bounds, so that only the ancestors of the sparse observed tokens of each document need to be considered.Finally, we propose a general-purpose Monte Carlo variational inference strategy that is directly applicable to any model with discrete variables. Compared to REINFORCE-style stochastic gradient updates, our coordinate-ascent updates have lower variance and converge much faster. Compared to auxiliary-variable bounds crafted for each individual model, our algorithm is simpler to derive and may be easily integrated into probabilistic programming languages for broader use. By avoiding auxiliary variables, we also tighten likelihood bounds and increase robustness to local optima. Extensive experiments on real-world models of images, text, and networks illustrate these appealing advantages
- …