Skip to main content
Article thumbnail
Location of Repository

Overcoming data sparsity

By Rosemary Apple, Chris Cawthorn, Kwan Yee Chan, Oded Lachish, Achim Nonnenmacker, Mason A. Porter, Sylvain Reboux and Vera Hazelwood


Unilever is currently designing and testing recommendation algorithms that would make recommendations about products to online customers given the customer ID and the current content of their basket. Unilever collected a large amount of purchasing data that demonstrates that most of the items (around 80%) are purchased infrequently and account for 20% of the data while frequently purchased items account for 80% of the data. Therefore, the data is sparse, skewed and demonstrates a long tail. Attempts to incorporate the data from the long tail, so far have proved difficult and current Unilever recommendation systems do not incorporate the information about infrequently purchased items. At the same time, these items are more indicative of customers' preferences and Unilever would like to make recommendations from/about these items, i.e. give a rank ordering of available products in real time.\ud \ud Study Group suggested to use the approach of bipartite networks to construct a similarity matrix that would allow the recommendation scores for different products to be computed. Given a current basket and a customer ID, this approach gives recommendation scores for each available item and recommends the item with the highest score that is not already in the basket. The similarity matrix can be computed offline, while recommendation score calculations can be performed live. This report contains the summary of Study Group findings together with the insights into properties of the similarity matrix and other related issues, such as recommendation for the data collection

Topics: None/Other, Information and communication technology
Year: 2008
OAI identifier:

Suggested articles


  1. (2004). A Measure of Similarity between Graph Vertices: Applications to Synonym Extraction and Web Searching, doi
  2. (2007). Evaluating retail recommender systems via retrospective data: Lessons learnt from a live intervention study,
  3. (2006). Finding community structure in networks using the eigenvectors of matrices, doi
  4. (2003). Handbook of Graph Theory, edited by
  5. (2004). Hao-Hsiang Chung and Han-Shen Huang, Mining sparse and skewed transaction data for personalized shopping recommendation. doi
  6. (2004). Item-Based Top-N Recommendation Algorithms, doi
  7. (2007). Jie Ren, Mat Medo and Yi-Cheng Zhang, Bipartite network projection and personal recommendation, doi
  8. (2005). Link prediction approach to collaborative
  9. (2006). The Long Tail: Why the Future of Business is Selling Less of More, published by Hyperion, doi
  10. (2003). The structure and function of complex networks, doi
  11. (1999). The use of association rules for product assortment decisions: a case study, in: doi
  12. Ultra accurate personal recommendation via eliminating redundant correlations,
  13. (2006). Vertex similarity in networks, doi

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.