34,113 research outputs found
An improved switching hybrid recommender system using naive Bayes classifier and collaborative filtering
Recommender Systems apply machine learning and data mining techniques for filtering unseen information and can predict whether a user would like a given resource. To date a number of recommendation algorithms have been proposed, where collaborative filtering and content-based filtering are the two most famous and adopted recommendation techniques. Collaborative filtering recommender systems recommend items by identifying other users with similar taste and use their opinions for recommendation; whereas content-based recommender systems recommend items based on the content information of the items. These systems suffer from scalability, data sparsity, over specialization, and cold-start problems resulting in poor quality recommendations and reduced coverage. Hybrid recommender systems combine individual systems to avoid certain aforementioned limitations of these systems. In this paper, we proposed a unique switching hybrid recommendation approach by combining a Naive Bayes classification approach with the collaborative filtering. Experimental results on two different data sets, show that the proposed algorithm is scalable and provide better performance – in terms of accuracy and coverage – than other algorithms while at the same time eliminates some recorded problems with the recommender systems
An Accuracy-Assured Privacy-Preserving Recommender System for Internet Commerce
Recommender systems, tool for predicting users' potential preferences by
computing history data and users' interests, show an increasing importance in
various Internet applications such as online shopping. As a well-known
recommendation method, neighbourhood-based collaborative filtering has
attracted considerable attention recently. The risk of revealing users' private
information during the process of filtering has attracted noticeable research
interests. Among the current solutions, the probabilistic techniques have shown
a powerful privacy preserving effect. When facing Nearest Neighbour attack,
all the existing methods provide no data utility guarantee, for the
introduction of global randomness. In this paper, to overcome the problem of
recommendation accuracy loss, we propose a novel approach, Partitioned
Probabilistic Neighbour Selection, to ensure a required prediction accuracy
while maintaining high security against NN attack. We define the sum of
neighbours' similarity as the accuracy metric alpha, the number of user
partitions, across which we select the neighbours, as the security metric
beta. We generalise the Nearest Neighbour attack to beta k Nearest
Neighbours attack. Differing from the existing approach that selects neighbours
across the entire candidate list randomly, our method selects neighbours from
each exclusive partition of size with a decreasing probability. Theoretical
and experimental analysis show that to provide an accuracy-assured
recommendation, our Partitioned Probabilistic Neighbour Selection method yields
a better trade-off between the recommendation accuracy and system security.Comment: replacement for the previous versio
Pyramid: Enhancing Selectivity in Big Data Protection with Count Featurization
Protecting vast quantities of data poses a daunting challenge for the growing
number of organizations that collect, stockpile, and monetize it. The ability
to distinguish data that is actually needed from data collected "just in case"
would help these organizations to limit the latter's exposure to attack. A
natural approach might be to monitor data use and retain only the working-set
of in-use data in accessible storage; unused data can be evicted to a highly
protected store. However, many of today's big data applications rely on machine
learning (ML) workloads that are periodically retrained by accessing, and thus
exposing to attack, the entire data store. Training set minimization methods,
such as count featurization, are often used to limit the data needed to train
ML workloads to improve performance or scalability. We present Pyramid, a
limited-exposure data management system that builds upon count featurization to
enhance data protection. As such, Pyramid uniquely introduces both the idea and
proof-of-concept for leveraging training set minimization methods to instill
rigor and selectivity into big data management. We integrated Pyramid into
Spark Velox, a framework for ML-based targeting and personalization. We
evaluate it on three applications and show that Pyramid approaches
state-of-the-art models while training on less than 1% of the raw data
- …