1,709 research outputs found
Inferring Networks of Substitutable and Complementary Products
In a modern recommender system, it is important to understand how products
relate to each other. For example, while a user is looking for mobile phones,
it might make sense to recommend other phones, but once they buy a phone, we
might instead want to recommend batteries, cases, or chargers. These two types
of recommendations are referred to as substitutes and complements: substitutes
are products that can be purchased instead of each other, while complements are
products that can be purchased in addition to each other.
Here we develop a method to infer networks of substitutable and complementary
products. We formulate this as a supervised link prediction task, where we
learn the semantics of substitutes and complements from data associated with
products. The primary source of data we use is the text of product reviews,
though our method also makes use of features such as ratings, specifications,
prices, and brands. Methodologically, we build topic models that are trained to
automatically discover topics from text that are successful at predicting and
explaining such relationships. Experimentally, we evaluate our system on the
Amazon product catalog, a large dataset consisting of 9 million products, 237
million links, and 144 million reviews.Comment: 12 pages, 6 figure
Correction: A correlated topic model of Science
Correction to Annals of Applied Statistics 1 (2007) 17--35
[doi:10.1214/07-AOAS114]Comment: Published in at http://dx.doi.org/10.1214/07-AOAS136 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A correlated topic model of Science
Topic models, such as latent Dirichlet allocation (LDA), can be useful tools
for the statistical analysis of document collections and other discrete data.
The LDA model assumes that the words of each document arise from a mixture of
topics, each of which is a distribution over the vocabulary. A limitation of
LDA is the inability to model topic correlation even though, for example, a
document about genetics is more likely to also be about disease than X-ray
astronomy. This limitation stems from the use of the Dirichlet distribution to
model the variability among the topic proportions. In this paper we develop the
correlated topic model (CTM), where the topic proportions exhibit correlation
via the logistic normal distribution [J. Roy. Statist. Soc. Ser. B 44 (1982)
139--177]. We derive a fast variational inference algorithm for approximate
posterior inference in this model, which is complicated by the fact that the
logistic normal is not conjugate to the multinomial. We apply the CTM to the
articles from Science published from 1990--1999, a data set that comprises 57M
words. The CTM gives a better fit of the data than LDA, and we demonstrate its
use as an exploratory tool of large document collections.Comment: Published at http://dx.doi.org/10.1214/07-AOAS114 in the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Analysis of Computer Science Communities Based on DBLP
It is popular nowadays to bring techniques from bibliometrics and
scientometrics into the world of digital libraries to analyze the collaboration
patterns and explore mechanisms which underlie community development. In this
paper we use the DBLP data to investigate the author's scientific career and
provide an in-depth exploration of some of the computer science communities. We
compare them in terms of productivity, population stability and collaboration
trends.Besides we use these features to compare the sets of topranked
conferences with their lower ranked counterparts.Comment: 9 pages, 7 figures, 6 table
- …
