1,201 research outputs found

    BFSMpR:A BFS Graph based Recommendation System using Map Reduce

    Get PDF
    Nowadays, Many associations, organizations and analysts need to manage huge datasets (i.e. Terabytes or even Petabytes). A well-known information filtering algorithm for dealing with such large datasets in an effective way is Hadoop Map Reduce. These large size datasets are regularly known to as graphs by many frameworks of current intrigue (i.e. Web, informal organization). A key element of the graph based recommendation system is that they depend upon the neighbor’s interest by taking minimum distance into account. Generally recent day proposal frameworks utilize complex strategy to give recommend to every user. This paper introduced an alternate approach to give suggestions to users in used of an un-weighted graph using a Hadoop iterative MapReduce approach for the execution.

    Big Data Meets Telcos: A Proactive Caching Perspective

    Full text link
    Mobile cellular networks are becoming increasingly complex to manage while classical deployment/optimization techniques and current solutions (i.e., cell densification, acquiring more spectrum, etc.) are cost-ineffective and thus seen as stopgaps. This calls for development of novel approaches that leverage recent advances in storage/memory, context-awareness, edge/cloud computing, and falls into framework of big data. However, the big data by itself is yet another complex phenomena to handle and comes with its notorious 4V: velocity, voracity, volume and variety. In this work, we address these issues in optimization of 5G wireless networks via the notion of proactive caching at the base stations. In particular, we investigate the gains of proactive caching in terms of backhaul offloadings and request satisfactions, while tackling the large-amount of available data for content popularity estimation. In order to estimate the content popularity, we first collect users' mobile traffic data from a Turkish telecom operator from several base stations in hours of time interval. Then, an analysis is carried out locally on a big data platform and the gains of proactive caching at the base stations are investigated via numerical simulations. It turns out that several gains are possible depending on the level of available information and storage size. For instance, with 10% of content ratings and 15.4 Gbyte of storage size (87% of total catalog size), proactive caching achieves 100% of request satisfaction and offloads 98% of the backhaul when considering 16 base stations.Comment: 8 pages, 5 figure

    Product Recommendation using Hadoop

    Get PDF
    Recommendation systems are used widely to provide personalized recommendations to users. Such systems are used by e-commerce and social networking websites to increase their business and user engagement. Day-to-day growth of customers and products pose a challenge for generating high quality recommendations. Moreover, they are even needed to perform many recommendations per second, for millions of customers and products. In such scenarios, implementing a recommendation algorithm sequentially has large performance issues. To address such issues, we propose a parallel algorithm to generate recommendations by using Hadoop map-reduce framework. In this implementation, we will focus on item-based collaborative filtering technique based on user's browsing history, which is a well-known technique to generate recommendations

    When Hashes Met Wedges: A Distributed Algorithm for Finding High Similarity Vectors

    Full text link
    Finding similar user pairs is a fundamental task in social networks, with numerous applications in ranking and personalization tasks such as link prediction and tie strength detection. A common manifestation of user similarity is based upon network structure: each user is represented by a vector that represents the user's network connections, where pairwise cosine similarity among these vectors defines user similarity. The predominant task for user similarity applications is to discover all similar pairs that have a pairwise cosine similarity value larger than a given threshold Ï„\tau. In contrast to previous work where Ï„\tau is assumed to be quite close to 1, we focus on recommendation applications where Ï„\tau is small, but still meaningful. The all pairs cosine similarity problem is computationally challenging on networks with billions of edges, and especially so for settings with small Ï„\tau. To the best of our knowledge, there is no practical solution for computing all user pairs with, say Ï„=0.2\tau = 0.2 on large social networks, even using the power of distributed algorithms. Our work directly addresses this challenge by introducing a new algorithm --- WHIMP --- that solves this problem efficiently in the MapReduce model. The key insight in WHIMP is to combine the "wedge-sampling" approach of Cohen-Lewis for approximate matrix multiplication with the SimHash random projection techniques of Charikar. We provide a theoretical analysis of WHIMP, proving that it has near optimal communication costs while maintaining computation cost comparable with the state of the art. We also empirically demonstrate WHIMP's scalability by computing all highly similar pairs on four massive data sets, and show that it accurately finds high similarity pairs. In particular, we note that WHIMP successfully processes the entire Twitter network, which has tens of billions of edges

    A Recommendation Engine Using Apache Spark

    Get PDF
    The volume of structured and unstructured data has grown at exponential scale in recent days. As a result of this rapid data growth, we are always inundated with plethora of choices in any product or service. It is very natural to get lost in the amazon of such choices and finding hard to make decisions. The project aims at addressing this problem by using entity recommendation. The two main aspects that the project concentrates on are implementing and presenting more accurate entity recommendations to the user and another is dealing with vast amount of data. The project aims at presenting recommendation results according to user’s query with efficiency and accuracy. Project makes use of ListNet ranking algorithm to rank the recommendation results. Query independent features and query dependent features are used to come up with ranking scores. Ranking scores decide the order in which the recommendation results are presented to the user. Project makes use of Apache Spark, a distributed bigdata processing framework. Spark gives the advantage of handling iterative and interactive algorithms with efficiency and minimal processing time as compared to traditional mapreduce paradigm. We performed the experiments for recommendation engine using DBPedia as the dataset and tested the results for movie domain. We used both queryindependent (pagerank) and querydependent (clicklogs) features for ranking purposes. We observed that ListNet algorithm performs really well by making use of Apache Spark as the RDDs provide faster way for iterative algorithms to execute. We also observed that the results of recommendation engine are accurate and the entities are well ranked

    A User- Based Recommendation with a Scalable Machine Learning Tool

    Get PDF
    Recommender Systems have proven to be valuable way for online users to recommend information items like books, videos, songs etc.colloborative filtering methods are used to make all predictions from historical data. In this paper we introduce Apache mahout which is an open source and provides a rich set of components to construct a customized recommender system from a selection of machine learning algorithms.[12] This paper also focuses on addressing the challenges in collaborative filtering like scalability and data sparsity. To deal with scalability problems, we go with a distributed frame work like hadoop. We then present a customized user based recommender system
    • …
    corecore