Collaborative recommendation is an information-filtering technique that
attempts to present information items that are likely of interest to an
Internet user. Traditionally, collaborative systems deal with situations with
two types of variables, users and items. In its most common form, the problem
is framed as trying to estimate ratings for items that have not yet been
consumed by a user. Despite wide-ranging literature, little is known about the
statistical properties of recommendation systems. In fact, no clear
probabilistic model even exists which would allow us to precisely describe the
mathematical forces driving collaborative filtering. To provide an initial
contribution to this, we propose to set out a general sequential stochastic
model for collaborative recommendation. We offer an in-depth analysis of the
so-called cosine-type nearest neighbor collaborative method, which is one of
the most widely used algorithms in collaborative filtering, and analyze its
asymptotic performance as the number of users grows. We establish consistency
of the procedure under mild assumptions on the model. Rates of convergence and
examples are also provided.Comment: Published in at http://dx.doi.org/10.1214/09-AOS759 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org