Search CORE

34,113 research outputs found

An improved switching hybrid recommender system using naive Bayes classifier and collaborative filtering

Author: Ghazanfar Mustansar
Prugel-Bennett Adam
Publication venue
Publication date: 20/04/2010
Field of study

Recommender Systems apply machine learning and data mining techniques for filtering unseen information and can predict whether a user would like a given resource. To date a number of recommendation algorithms have been proposed, where collaborative filtering and content-based filtering are the two most famous and adopted recommendation techniques. Collaborative filtering recommender systems recommend items by identifying other users with similar taste and use their opinions for recommendation; whereas content-based recommender systems recommend items based on the content information of the items. These systems suffer from scalability, data sparsity, over specialization, and cold-start problems resulting in poor quality recommendations and reduced coverage. Hybrid recommender systems combine individual systems to avoid certain aforementioned limitations of these systems. In this paper, we proposed a unique switching hybrid recommendation approach by combining a Naive Bayes classification approach with the collaborative filtering. Experimental results on two different data sets, show that the proposed algorithm is scalable and provide better performance – in terms of accuracy and coverage – than other algorithms while at the same time eliminates some recorded problems with the recommender systems

Southampton (e-Prints Soton)

An Accuracy-Assured Privacy-Preserving Recommender System for Internet Commerce

Author: Lu Zhigang
Shen Hong
Publication venue
Publication date: 01/01/2015
Field of study

Recommender systems, tool for predicting users' potential preferences by computing history data and users' interests, show an increasing importance in various Internet applications such as online shopping. As a well-known recommendation method, neighbourhood-based collaborative filtering has attracted considerable attention recently. The risk of revealing users' private information during the process of filtering has attracted noticeable research interests. Among the current solutions, the probabilistic techniques have shown a powerful privacy preserving effect. When facing

k

Nearest Neighbour attack, all the existing methods provide no data utility guarantee, for the introduction of global randomness. In this paper, to overcome the problem of recommendation accuracy loss, we propose a novel approach, Partitioned Probabilistic Neighbour Selection, to ensure a required prediction accuracy while maintaining high security against

k

NN attack. We define the sum of

k

neighbours' similarity as the accuracy metric alpha, the number of user partitions, across which we select the

k

neighbours, as the security metric beta. We generalise the

k

Nearest Neighbour attack to beta k Nearest Neighbours attack. Differing from the existing approach that selects neighbours across the entire candidate list randomly, our method selects neighbours from each exclusive partition of size

k

with a decreasing probability. Theoretical and experimental analysis show that to provide an accuracy-assured recommendation, our Partitioned Probabilistic Neighbour Selection method yields a better trade-off between the recommendation accuracy and system security.Comment: replacement for the previous versio

arXiv.org e-Print Archive

Crossref

Adelaide Research & Scholarship

Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization

Author: Geambasu Roxana
Huang Tzu-Kuo
Lecuyer Mathias
Sen Siddhartha
Spahn Riley
Publication venue
Publication date: 21/05/2017
Field of study

Protecting vast quantities of data poses a daunting challenge for the growing number of organizations that collect, stockpile, and monetize it. The ability to distinguish data that is actually needed from data collected "just in case" would help these organizations to limit the latter's exposure to attack. A natural approach might be to monitor data use and retain only the working-set of in-use data in accessible storage; unused data can be evicted to a highly protected store. However, many of today's big data applications rely on machine learning (ML) workloads that are periodically retrained by accessing, and thus exposing to attack, the entire data store. Training set minimization methods, such as count featurization, are often used to limit the data needed to train ML workloads to improve performance or scalability. We present Pyramid, a limited-exposure data management system that builds upon count featurization to enhance data protection. As such, Pyramid uniquely introduces both the idea and proof-of-concept for leveraging training set minimization methods to instill rigor and selectivity into big data management. We integrated Pyramid into Spark Velox, a framework for ML-based targeting and personalization. We evaluate it on three applications and show that Pyramid approaches state-of-the-art models while training on less than 1% of the raw data

arXiv.org e-Print Archive

Crossref