92 research outputs found

    Data Privacy Preservation in Collaborative Filtering Based Recommender Systems

    Get PDF
    This dissertation studies data privacy preservation in collaborative filtering based recommender systems and proposes several collaborative filtering models that aim at preserving user privacy from different perspectives. The empirical study on multiple classical recommendation algorithms presents the basic idea of the models and explores their performance on real world datasets. The algorithms that are investigated in this study include a popularity based model, an item similarity based model, a singular value decomposition based model, and a bipartite graph model. Top-N recommendations are evaluated to examine the prediction accuracy. It is apparent that with more customers\u27 preference data, recommender systems can better profile customers\u27 shopping patterns which in turn produces product recommendations with higher accuracy. The precautions should be taken to address the privacy issues that arise during data sharing between two vendors. Study shows that matrix factorization techniques are ideal choices for data privacy preservation by their nature. In this dissertation, singular value decomposition (SVD) and nonnegative matrix factorization (NMF) are adopted as the fundamental techniques for collaborative filtering to make privacy-preserving recommendations. The proposed SVD based model utilizes missing value imputation, randomization technique, and the truncated SVD to perturb the raw rating data. The NMF based models, namely iAux-NMF and iCluster-NMF, take into account the auxiliary information of users and items to help missing value imputation and privacy preservation. Additionally, these models support efficient incremental data update as well. A good number of online vendors allow people to leave their feedback on products. It is considered as users\u27 public preferences. However, due to the connections between users\u27 public and private preferences, if a recommender system fails to distinguish real customers from attackers, the private preferences of real customers can be exposed. This dissertation addresses an attack model in which an attacker holds real customers\u27 partial ratings and tries to obtain their private preferences by cheating recommender systems. To resolve this problem, trustworthiness information is incorporated into NMF based collaborative filtering techniques to detect the attackers and make reasonably different recommendations to the normal users and the attackers. By doing so, users\u27 private preferences can be effectively protected

    Cosine-based explainable matrix factorization for collaborative filtering recommendation.

    Get PDF
    Recent years saw an explosive growth in the amount of digital information and the number of users who interact with this information through various platforms, ranging from web services to mobile applications and smart devices. This increase in information and users has naturally led to information overload which inherently limits the capacity of users to discover and find their needs among the staggering array of options available at any given time, the majority of which they may never become aware of. Online services have handled this information overload by using algorithmic filtering tools that can suggest relevant and personalized information to users. These filtering methods, known as Recommender Systems (RS), have become essential to recommend a range of relevant options in diverse domains ranging from friends, courses, music, and restaurants, to movies, books, and travel recommendations. Most research on recommender systems has focused on developing and evaluating models that can make predictions efficiently and accurately, without taking into account other desiderata such as fairness and transparency which are becoming increasingly important to establish trust with human users. For this reason, researchers have been recently pressed to develop recommendation systems that are endowed with the increased ability to explain why a recommendation is given, and hence help users make more informed decisions. Nowadays, state of the art Machine Learning (ML) techniques are being used to achieve unprecedented levels of accuracy in recommender systems. Unfortunately, most models are notorious for being black box models that cannot explain their output predictions. One such example is Matrix Factorization, a technique that is widely used in Collaborative Filtering algorithms. Unfortunately, like all black box machine learning models, MF is unable to explain its outputs. This dissertation proposes a new Cosine-based explainable Matrix Factorization model (CEMF) that incorporates a user-neighborhood explanation matrix (NSE) and incorporates a cosine based penalty in the objective function to encourage predictions that are explainable. Our evaluation experiments demonstrate that CEMF can recommend items that are more explainable and diverse compared to its competitive baselines, and that it further achieves this superior performance without sacrificing the accuracy of its predictions

    PICAE – Intelligent publication of audiovisual and editorial contents

    Get PDF
    The development in internet infrastructure and technology in last tow decades have given users and retailers the possibility to purchase and sell items online. This has of course broadened the horizons of what products can be offered outside of the traditional trading sense, to the point where virtually any product can be offered. These massive online markets have had a considerable impact on the habits of consumers, providing them access to a greater variety of products and information on these goods. This variety has made online commerce into a multi-billion dollar industry but it has also put the customer in a position where it is getting increasingly difficult to select the products that best fit their individual needs. In the same vein, the rise of both availability and the amounts of data that computers have been able to process in the last decades have allowed for many solutions that are computationally expensive to exist, and recommender systems are no exception. These systems are the perfect tools to overcome the information overload problem since they provide automated and personalized suggestions to consumers. The PICAE project tackles the recommendation problem in the audiovisual sector. The vast amount of audiovisual content that is available nowadays to the user can be overwhelming, which is why recommenders have been increasingly growing in popularity in this sector ---Netflix being the biggest example. PICAE seeks to provide insightful and personalized recommendations to users in a public TV setting. The PICAE project develops new models and analytical tools for recommending audiovisual and editorial content with the aim of improving the user experience, based on their profile and environment, and the level of satisfaction and loyalty. These new tools represent a qualitative improvement in the state of the art of television and editorial content recommendation. On the other hand, the project also improves the digital consumption index of these contents based on the identification of products that these new forms of consumption demand and how they must be produced, distributed and promoted to respond to the needs of this emerging market. The main challenge of the PICAE project is to resolve two differentiating aspects with respect to other existing solutions such as: variety and dynamic contents that requires a real-time analysis of the recommendation and the lack of available information about the user, who in these areas is reluctant to register, making it difficult to identify in multi-device consumption. This document will explain the contributions made in the development of the project, which can be divided in two: the development of the project, which can be divided in two: the development of a recommender system that takes into account information of both users and items and a deep analysis of the current metrics used to assess the performance of a recommender system

    Swarm intelligence for clustering dynamic data sets for web usage mining and personalization.

    Get PDF
    Swarm Intelligence (SI) techniques were inspired by bee swarms, ant colonies, and most recently, bird flocks. Flock-based Swarm Intelligence (FSI) has several unique features, namely decentralized control, collaborative learning, high exploration ability, and inspiration from dynamic social behavior. Thus FSI offers a natural choice for modeling dynamic social data and solving problems in such domains. One particular case of dynamic social data is online/web usage data which is rich in information about user activities, interests and choices. This natural analogy between SI and social behavior is the main motivation for the topic of investigation in this dissertation, with a focus on Flock based systems which have not been well investigated for this purpose. More specifically, we investigate the use of flock-based SI to solve two related and challenging problems by developing algorithms that form critical building blocks of intelligent personalized websites, namely, (i) providing a better understanding of the online users and their activities or interests, for example using clustering techniques that can discover the groups that are hidden within the data; and (ii) reducing information overload by providing guidance to the users on websites and services, typically by using web personalization techniques, such as recommender systems. Recommender systems aim to recommend items that will be potentially liked by a user. To support a better understanding of the online user activities, we developed clustering algorithms that address two challenges of mining online usage data: the need for scalability to large data and the need to adapt cluster sing to dynamic data sets. To address the scalability challenge, we developed new clustering algorithms using a hybridization of traditional Flock-based clustering with faster K-Means based partitional clustering algorithms. We tested our algorithms on synthetic data, real VCI Machine Learning repository benchmark data, and a data set consisting of real Web user sessions. Having linear complexity with respect to the number of data records, the resulting algorithms are considerably faster than traditional Flock-based clustering (which has quadratic complexity). Moreover, our experiments demonstrate that scalability was gained without sacrificing quality. To address the challenge of adapting to dynamic data, we developed a dynamic clustering algorithm that can handle the following dynamic properties of online usage data: (1) New data records can be added at any time (example: a new user is added on the site); (2) Existing data records can be removed at any time. For example, an existing user of the site, who no longer subscribes to a service, or who is terminated because of violating policies; (3) New parts of existing records can arrive at any time or old parts of the existing data record can change. The user\u27s record can change as a result of additional activity such as purchasing new products, returning a product, rating new products, or modifying the existing rating of a product. We tested our dynamic clustering algorithm on synthetic dynamic data, and on a data set consisting of real online user ratings for movies. Our algorithm was shown to handle the dynamic nature of data without sacrificing quality compared to a traditional Flock-based clustering algorithm that is re-run from scratch with each change in the data. To support reducing online information overload, we developed a Flock-based recommender system to predict the interests of users, in particular focusing on collaborative filtering or social recommender systems. Our Flock-based recommender algorithm (FlockRecom) iteratively adjusts the position and speed of dynamic flocks of agents, such that each agent represents a user, on a visualization panel. Then it generates the top-n recommendations for a user based on the ratings of the users that are represented by its neighboring agents. Our recommendation system was tested on a real data set consisting of online user ratings for a set of jokes, and compared to traditional user-based Collaborative Filtering (CF). Our results demonstrated that our recommender system starts performing at the same level of quality as traditional CF, and then, with more iterations for exploration, surpasses CF\u27s recommendation quality, in terms of precision and recall. Another unique advantage of our recommendation system compared to traditional CF is its ability to generate more variety or diversity in the set of recommended items. Our contributions advance the state of the art in Flock-based 81 for clustering and making predictions in dynamic Web usage data, and therefore have an impact on improving the quality of online services

    Spectrum Sensing Security in Cognitive Radio Networks

    Get PDF
    This thesis explores the use of unsupervised machine learning for spectrum sensing in cognitive radio (CR) networks from a security perspective. CR is an enabling technology for dynamic spectrum access (DSA) because of a CR's ability to reconfigure itself in a smart way. CR can adapt and use unoccupied spectrum with the help of spectrum sensing and DSA. DSA is an efficient way to dynamically allocate white spaces (unutilized spectrum) to other CR users in order to tackle the spectrum scarcity problem and improve spectral efficiency. So far various techniques have been developed to efficiently detect and classify signals in a DSA environment. Neural network techniques, especially those using unsupervised learning have some key advantages over other methods mainly because of the fact that minimal preconfiguration is required to sense the spectrum. However, recent results have shown some possible security vulnerabilities, which can be exploited by adversarial users to gain unrestricted access to spectrum by fooling signal classifiers. It is very important to address these new classes of security threats and challenges in order to make CR a long-term commercially viable concept. This thesis identifies some key security vulnerabilities when unsupervised machine learning is used for spectrum sensing and also proposes mitigation techniques to counter the security threats. The simulation work demonstrates the ability of malicious user to manipulate signals in such a way to confuse signal classifier. The signal classifier is forced by the malicious user to draw incorrect decision boundaries by presenting signal features which are akin to a primary user. Hence, a malicious user is able to classify itself as a primary user and thus gains unrivaled access to the spectrum. First, performance of various classification algorithms are evaluated. K-means and weighted classification algorithms are selected because of their robustness against proposed attacks as compared to other classification algorithm. Second, connection attack, point cluster attack, and random noise attack are shown to have an adverse effect on classification algorithms. In the end, some mitigation techniques are proposed to counter the effect of these attacks

    The Wits intelligent teaching system (WITS): a smart lecture theatre to assess audience engagement

    Get PDF
    A Thesis submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy, 2017The utility of lectures is directly related to the engagement of the students therein. To ensure the value of lectures, one needs to be certain that they are engaging to students. In small classes experienced lecturers develop an intuition of how engaged the class is as a whole and can then react appropriately to remedy the situation through various strategies such as breaks or changes in style, pace and content. As both the number of students and size of the venue grow, this type of contingent teaching becomes increasingly difficult and less precise. Furthermore, relying on intuition alone gives no way to recall and analyse previous classes or to objectively investigate trends over time. To address these problems this thesis presents the WITS INTELLIGENT TEACHING SYSTEM (WITS) to highlight disengaged students during class. A web-based, mobile application called Engage was developed to try elicit anonymous engagement information directly from students. The majority of students were unwilling or unable to self-report their engagement levels during class. This stems from a number of cultural and practical issues related to social display rules, unreliable internet connections, data costs, and distractions. This result highlights the need for a non-intrusive system that does not require the active participation of students. A nonintrusive, computer vision and machine learning based approach is therefore proposed. To support the development thereof, a labelled video dataset of students was built by recording a number of first year lectures. Students were labelled across a number of affects – including boredom, frustration, confusion, and fatigue – but poor inter-rater reliability meant that these labels could not be used as ground truth. Based on manual coding methods identified in the literature, a number of actions, gestures, and postures were identified as proxies of behavioural engagement. These proxies are then used in an observational checklist to mark students as engaged or not. A Support Vector Machine (SVM) was trained on Histograms of Oriented Gradients (HOG) to classify the students based on the identified behaviours. The results suggest a high temporal correlation of a single subject’s video frames. This leads to extremely high accuracies on seen subjects. However, this approach generalised poorly to unseen subjects and more careful feature engineering is required. The use of Convolutional Neural Networks (CNNs) improved the classification accuracy substantially, both over a single subject and when generalising to unseen subjects. While more computationally expensive than the SVM, the CNN approach lends itself to parallelism using Graphics Processing Units (GPUs). With GPU hardware acceleration, the system is able to run in near real-time and with further optimisations a real-time classifier is feasible. The classifier provides engagement values, which can be displayed to the lecturer live during class. This information is displayed as an Interest Map which highlights spatial areas of disengagement. The lecturer can then make informed decisions about how to progress with the class, what teaching styles to employ, and on which students to focus. An Interest Map was presented to lecturers and professors at the University of the Witwatersrand yielding 131 responses. The vast majority of respondents indicated that they would like to receive live engagement feedback during class, that they found the Interest Map an intuitive visualisation tool, and that they would be interested in using such technology. Contributions of this thesis include the development of a labelled video dataset; the development of a web based system to allow students to self-report engagement; the development of cross-platform, open-source software for spatial, action and affect labelling; the application of Histogram of Oriented Gradient based Support Vector Machines, and Deep Convolutional Neural Networks to classify this data; the development of an Interest Map to intuitively display engagement information to presenters; and finally an analysis of acceptance of such a system by educators.XL201

    Manipulating the Capacity of Recommendation Models in Recall-Coverage Optimization

    Get PDF
    Traditional approaches in Recommender Systems ignore the problem of long-tail recommendations. There is no systematic approach to control the magnitude of long-tail recommendations generated by the models, and there is not even proper methodology to evaluate the quality of long-tail recommendations. This thesis addresses the long-tail recommendation problem from both the algorithmic and evaluation perspective. We proposed controlling the magnitude of long-tail recommendations generated by models through the manipulation with capacity hyperparameters of learning algorithms, and we dene such hyperparameters for multiple state-of-the-art algorithms. We also summarize multiple such algorithms under the common framework of the score function, which allows us to apply popularity-based regularization to all of them. We propose searching for Pareto-optimal states in the Recall-Coverage plane as the right way to search for long-tail, high-accuracy models. On the set of exhaustive experiments, we empirically demonstrate the corectness of our theory on a mixture of public and industrial datasets for 5 dierent algorithms and their dierent versions.Traditional approaches in Recommender Systems ignore the problem of long-tail recommendations. There is no systematic approach to control the magnitude of long-tail recommendations generated by the models, and there is not even proper methodology to evaluate the quality of long-tail recommendations. This thesis addresses the long-tail recommendation problem from both the algorithmic and evaluation perspective. We proposed controlling the magnitude of long-tail recommendations generated by models through the manipulation with capacity hyperparameters of learning algorithms, and we dene such hyperparameters for multiple state-of-the-art algorithms. We also summarize multiple such algorithms under the common framework of the score function, which allows us to apply popularity-based regularization to all of them. We propose searching for Pareto-optimal states in the Recall-Coverage plane as the right way to search for long-tail, high-accuracy models. On the set of exhaustive experiments, we empirically demonstrate the corectness of our theory on a mixture of public and industrial datasets for 5 dierent algorithms and their dierent versions
    corecore