434 research outputs found

    Product Recommendation using Hadoop

    Get PDF
    Recommendation systems are used widely to provide personalized recommendations to users. Such systems are used by e-commerce and social networking websites to increase their business and user engagement. Day-to-day growth of customers and products pose a challenge for generating high quality recommendations. Moreover, they are even needed to perform many recommendations per second, for millions of customers and products. In such scenarios, implementing a recommendation algorithm sequentially has large performance issues. To address such issues, we propose a parallel algorithm to generate recommendations by using Hadoop map-reduce framework. In this implementation, we will focus on item-based collaborative filtering technique based on user's browsing history, which is a well-known technique to generate recommendations

    A User- Based Recommendation with a Scalable Machine Learning Tool

    Get PDF
    Recommender Systems have proven to be valuable way for online users to recommend information items like books, videos, songs etc.colloborative filtering methods are used to make all predictions from historical data. In this paper we introduce Apache mahout which is an open source and provides a rich set of components to construct a customized recommender system from a selection of machine learning algorithms.[12] This paper also focuses on addressing the challenges in collaborative filtering like scalability and data sparsity. To deal with scalability problems, we go with a distributed frame work like hadoop. We then present a customized user based recommender system

    Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark

    Get PDF
    Collaborative filtering based recommender systems use information about a user\u27s preferences to make personalized predictions about content, such as topics, people, or products, that they might find relevant. As the volume of accessible information and active users on the Internet continues to grow, it becomes increasingly difficult to compute recommendations quickly and accurately over a large dataset. In this study, we will introduce an algorithmic framework built on top of Apache Spark for parallel computation of the neighborhood-based collaborative filtering problem, which allows the algorithm to scale linearly with a growing number of users. We also investigate several different variants of this technique including user and item-based recommendation approaches, correlation and vector-based similarity calculations, and selective down-sampling of user interactions. Finally, we provide an experimental comparison of these techniques on the MovieLens dataset consisting of 10 million movie ratings

    A file-based approach for recommender systems in high-performance computing environments

    Get PDF

    In-memory, distributed content-based recommender system

    Get PDF
    Burdened by their popularity, recommender systems increasingly take on larger datasets while they are expected to deliver high quality results within reasonable time. To meet these ever growing requirements, industrial recommender systems often turn to parallel hardware and distributed computing. While the MapReduce paradigm is generally accepted for massive parallel data processing, it often entails complex algorithm reorganization and suboptimal efficiency because mid-computation values are typically read from and written to hard disk. This work implements an in-memory, content-based recommendation algorithm and shows how it can be parallelized and efficiently distributed across many homogeneous machines in a distributed-memory environment. By focusing on data parallelism and carefully constructing the definition of work in the context of recommender systems, we are able to partition the complete calculation process into any number of independent and equally sized jobs. An empirically validated performance model is developed to predict parallel speedup and promises high efficiencies for realistic hardware configurations. For the MovieLens 10 M dataset we note efficiency values up to 71 % for a configuration of 200 computing nodes (eight cores per node)
    corecore