Search CORE

1,709 research outputs found

Inferring Networks of Substitutable and Complementary Products

Author: Bennett J.
Blei D.
Blei D.
Blei D. M.
Brody S.
Chang J.
Ganu G.
Mas-Colell A.
Moghaddam S.
Reyes A.
Titov I.
Vu D.
Publication venue
Publication date: 29/06/2015
Field of study

In a modern recommender system, it is important to understand how products relate to each other. For example, while a user is looking for mobile phones, it might make sense to recommend other phones, but once they buy a phone, we might instead want to recommend batteries, cases, or chargers. These two types of recommendations are referred to as substitutes and complements: substitutes are products that can be purchased instead of each other, while complements are products that can be purchased in addition to each other. Here we develop a method to infer networks of substitutable and complementary products. We formulate this as a supervised link prediction task, where we learn the semantics of substitutes and complements from data associated with products. The primary source of data we use is the text of product reviews, though our method also makes use of features such as ratings, specifications, prices, and brands. Methodologically, we build topic models that are trained to automatically discover topics from text that are successful at predicting and explaining such relationships. Experimentally, we evaluate our system on the Amazon product catalog, a large dataset consisting of 9 million products, 237 million links, and 144 million reviews.Comment: 12 pages, 6 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Correction: A correlated topic model of Science

Author: Blei David M.
Lafferty John D.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 10/12/2007
Field of study

Correction to Annals of Applied Statistics 1 (2007) 17--35 [doi:10.1214/07-AOAS114]Comment: Published in at http://dx.doi.org/10.1214/07-AOAS136 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

A correlated topic model of Science

Author: Blei David M.
Lafferty John D.
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2007
Field of study

Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than X-ray astronomy. This limitation stems from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution [J. Roy. Statist. Soc. Ser. B 44 (1982) 139--177]. We derive a fast variational inference algorithm for approximate posterior inference in this model, which is complicated by the fact that the logistic normal is not conjugate to the multinomial. We apply the CTM to the articles from Science published from 1990--1999, a data set that comprises 57M words. The CTM gives a better fit of the data than LDA, and we demonstrate its use as an exploratory tool of large document collections.Comment: Published at http://dx.doi.org/10.1214/07-AOAS114 in the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

Analysis of Computer Science Communities Based on DBLP

Author: A. Sidiropoulos
C. Bird
C.L. Giles
D. Blei
M.E.J. Newman
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

It is popular nowadays to bring techniques from bibliometrics and scientometrics into the world of digital libraries to analyze the collaboration patterns and explore mechanisms which underlie community development. In this paper we use the DBLP data to investigate the author's scientific career and provide an in-depth exploration of some of the computer science communities. We compare them in terms of productivity, population stability and collaboration trends.Besides we use these features to compare the sets of topranked conferences with their lower ranked counterparts.Comment: 9 pages, 7 figures, 6 table

arXiv.org e-Print Archive

Crossref