Search CORE

2,473 research outputs found

Composite Correlation Quantization for Efficient Multimodal Retrieval

Author: Besag J.
Wang J.
Wang Q.
Weiss Y.
Wu B.
Zhang D.
Zhang T.
Zhao F.
Zhen Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/05/2016
Field of study

Efficient similarity retrieval from large-scale multimodal database is pervasive in modern search engines and social networks. To support queries across content modalities, the system should enable cross-modal correlation and computation-efficient indexing. While hashing methods have shown great potential in achieving this goal, current attempts generally fail to learn isomorphic hash codes in a seamless scheme, that is, they embed multiple modalities in a continuous isomorphic space and separately threshold embeddings into binary codes, which incurs substantial loss of retrieval accuracy. In this paper, we approach seamless multimodal hashing by proposing a novel Composite Correlation Quantization (CCQ) model. Specifically, CCQ jointly finds correlation-maximal mappings that transform different modalities into isomorphic latent space, and learns composite quantizers that convert the isomorphic latent features into compact binary codes. An optimization framework is devised to preserve both intra-modal similarity and inter-modal correlation through minimizing both reconstruction and quantization errors, which can be trained from both paired and partially paired data in linear time. A comprehensive set of experiments clearly show the superior effectiveness and efficiency of CCQ against the state of the art hashing methods for both unimodal and cross-modal retrieval

arXiv.org e-Print Archive

Crossref

Terminology mining in social media

Author: Karlgren Jussi
Sahlgren Magnus
Publication venue
Publication date: 01/01/2009
Field of study

The highly variable and dynamic word usage in social media presents serious challenges for both research and those commercial applications that are geared towards blogs or other user-generated non-editorial texts. This paper discusses and exempliﬁes a terminology mining approach for dealing with the productive character of the textual environment in social media. We explore the challenges of practically acquiring new terminology, and of modeling similarity and relatedness of terms from observing realistic amounts of data. We also discuss semantic evolution and density, and investigate novel measures for characterizing the preconditions for terminology mining

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Nonnegative approximations of nonnegative tensors

Author: Comon Pierre
Lim Lek-Heng
Publication venue
Publication date: 01/01/2009
Field of study

We study the decomposition of a nonnegative tensor into a minimal sum of outer product of nonnegative vectors and the associated parsimonious naive Bayes probabilistic model. We show that the corresponding approximation problem, which is central to nonnegative PARAFAC, will always have optimal solutions. The result holds for any choice of norms and, under a mild assumption, even Bregman divergences.Comment: 14 page

arXiv.org e-Print Archive

CiteSeerX

HAL-UNICE

Exploring the feasability and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications.

Author: Magerman Tom
Song Xiaoyan
Van Looy Bart
Publication venue
Publication date
Field of study

Research Papers in Economics

Multidimensional Membership Mixture Models

Author: Jiang Yun
Lim Marcus
Saxena Ashutosh
Publication venue
Publication date: 01/01/2012
Field of study

We present the multidimensional membership mixture (M3) models where every dimension of the membership represents an independent mixture model and each data point is generated from the selected mixture components jointly. This is helpful when the data has a certain shared structure. For example, three unique means and three unique variances can effectively form a Gaussian mixture model with nine components, while requiring only six parameters to fully describe it. In this paper, we present three instantiations of M3 models (together with the learning and inference algorithms): infinite, finite, and hybrid, depending on whether the number of mixtures is fixed or not. They are built upon Dirichlet process mixture models, latent Dirichlet allocation, and a combination respectively. We then consider two applications: topic modeling and learning 3D object arrangements. Our experiments show that our M3 models achieve better performance using fewer topics than many classic topic models. We also observe that topics from the different dimensions of M3 models are meaningful and orthogonal to each other.Comment: 9 pages, 7 figure

arXiv.org e-Print Archive

CiteSeerX

Effect of Tuned Parameters on a LSA MCQ Answering Model

Author: A. C. Graesser
Alain Lifchitz
C. H. Q. Ding
D. I. Martin
G. Denhière
G. Salton
G. Salton
Guy Denhière
J. Diaz
J. Diaz
J. Quesada
M. Efron
M. F. Porter
M. W. Berry
S. Deerwester
S. T. Dumais
S. T. Dumais
Sandra Jhean-Larose
W. Kintsch
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

This paper presents the current state of a work in progress, whose objective is to better understand the effects of factors that significantly influence the performance of Latent Semantic Analysis (LSA). A difficult task, which consists in answering (French) biology Multiple Choice Questions, is used to test the semantic properties of the truncated singular space and to study the relative influence of main parameters. A dedicated software has been designed to fine tune the LSA semantic space for the Multiple Choice Questions task. With optimal parameters, the performances of our simple model are quite surprisingly equal or superior to those of 7th and 8th grades students. This indicates that semantic spaces were quite good despite their low dimensions and the small sizes of training data sets. Besides, we present an original entropy global weighting of answers' terms of each question of the Multiple Choice Questions which was necessary to achieve the model's success.Comment: 9 page

arXiv.org e-Print Archive

Graph Regularized Non-negative Matrix Factorization By Maximizing Correntropy

Author: Fan Zhuoyi
Li Le
Xu Yang
Yang Jianjun
Zhang Honggang
Zhao Kaili
Publication venue
Publication date: 09/05/2014
Field of study

Non-negative matrix factorization (NMF) has proved effective in many clustering and classification tasks. The classic ways to measure the errors between the original and the reconstructed matrix are

l_2

distance or Kullback-Leibler (KL) divergence. However, nonlinear cases are not properly handled when we use these error measures. As a consequence, alternative measures based on nonlinear kernels, such as correntropy, are proposed. However, the current correntropy-based NMF only targets on the low-level features without considering the intrinsic geometrical distribution of data. In this paper, we propose a new NMF algorithm that preserves local invariance by adding graph regularization into the process of max-correntropy-based matrix factorization. Meanwhile, each feature can learn corresponding kernel from the data. The experiment results of Caltech101 and Caltech256 show the benefits of such combination against other NMF algorithms for the unsupervised image clustering

arXiv.org e-Print Archive

CiteSeerX