15 research outputs found

    Understanding Shilling Attacks and Their Detection Traits: A Comprehensive Survey

    Get PDF
    The internet is the home for huge volumes of useful data that is constantly being created making it difficult for users to find information relevant to them. Recommendation System is a special type of information filtering system adapted by online vendors to provide recommendations to their customers based on their requirements. Collaborative filtering is one of the most widely used recommendation systems; unfortunately, it is prone to shilling/profile injection attacks. Such attacks alter the recommendation process to promote or demote a particular product. Over the years, multiple attack models and detection techniques have been developed to mitigate the problem. This paper aims to be a comprehensive survey of the shilling attack models, detection attributes, and detection algorithms. Additionally, we unravel and classify the intrinsic traits of the injected profiles that are exploited by the detection algorithms, which has not been explored in previous works. We also briefly discuss recent works in the development of robust algorithms that alleviate the impact of shilling attacks, attacks on multi-criteria systems, and intrinsic feedback based collaborative filtering methods

    Cross-systems Personalisierung

    Get PDF
    The World Wide Web provides access to a wealth of information and services to a huge and heterogeneous user population on a global scale. One important and successful design mechanism in dealing with this diversity of users is to personalize Web sites and services, i.e. to customize system content, characteristics, or appearance with respect to a specific user. Each system independently builds up user proïŹles and uses this information to personalize the service offering. Such isolated approaches have two major drawbacks: firstly, investments of users in personalizing a system either through explicit provision of information or through long and regular use are not transferable to other systems. Secondly, users have little or no control over the information that defines their profile, since user data are deeply buried in personalization engines running on the server side. Cross system personalization (CSP) (Mehta, Niederee, & Stewart, 2005) allows for sharing information across different information systems in a user-centric way and can overcome the aforementioned problems. Information about users, which is originally scattered across multiple systems, is combined to obtain maximum leverage and reuse of information. Our initial approaches to cross system personalization relied on each user having a unified profile which different systems can understand. The unified profile contains facets modeling aspects of a multidimensional user which is stored inside a "Context Passport" that the user carries along in his/her journey across information space. The user’s Context Passport is presented to a system, which can then understand the context in which the user wants to use the system. The basis of ’understanding’ in this approach is of a semantic nature, i.e. the semantics of the facets and dimensions of the uniïŹed proïŹle are known, so that the latter can be aligned with the proïŹles maintained internally at a specific site. The results of the personalization process are then transfered back to the user’s Context Passport via a protocol understood by both parties. The main challenge in this approach is to establish some common and globally accepted vocabulary and to create a standard every system will comply with. Machine Learning techniques provide an alternative approach to enable CSP without the need of accepted semantic standards or ontologies. The key idea is that one can try to learn dependencies between proïŹles maintained within one system and profiles maintained within a second system based on data provided by users who use both systems and who are willing to share their proïŹles across systems – which we assume is in the interest of the user. Here, instead of requiring a common semantic framework, it is only required that a sufficient number of users cross between systems and that there is enough regularity among users that one can learn within a user population, a fact that is commonly exploited in collaborative filtering. In this thesis, we aim to provide a principled approach towards achieving cross system personalization. We describe both semantic and learning approaches, with a stronger emphasis on the learning approach. We also investigate the privacy and scalability aspects of CSP and provide solutions to these problems. Finally, we also explore in detail the aspect of robustness in recommender systems. We motivate several approaches for robustifying collaborative filtering and provide the best performing algorithm for detecting malicious attacks reported so far.Die Personalisierung von Software Systemen ist von stetig zunehmender Bedeutung, insbesondere im Zusammenhang mit Web-Applikationen wie Suchmaschinen, Community-Portalen oder Electronic Commerce Sites, die große, stark diversifizierte Nutzergruppen ansprechen. Da explizite Personalisierung typischerweise mit einem erheblichen zeitlichem Aufwand fĂŒr den Nutzer verbunden ist, greift man in vielen Applikationen auf implizite Techniken zur automatischen Personalisierung zurĂŒck, insbesondere auf Empfehlungssysteme (Recommender Systems), die typischerweise Methoden wie das Collaborative oder Social Filtering verwenden. WĂ€hrend diese Verfahren keine explizite Erzeugung von Benutzerprofilen mittels Beantwortung von Fragen und explizitem Feedback erfordern, ist die QualitĂ€t der impliziten Personalisierung jedoch stark vom verfĂŒgbaren Datenvolumen, etwa Transaktions-, Query- oder Click-Logs, abhĂ€ngig. Ist in diesem Sinne von einem Nutzer wenig bekannt, so können auch keine zuverlĂ€ssigen persönlichen Anpassungen oder Empfehlungen vorgenommen werden. Die vorgelegte Dissertation behandelt die Frage, wie Personalisierung ĂŒber Systemgrenzen hinweg („cross system“) ermöglicht und unterstĂŒtzt werden kann, wobei hauptsĂ€chlich implizite Personalisierungstechniken, aber eingeschrĂ€nkt auch explizite Methodiken wie der semantische Context Passport diskutiert werden. Damit behandelt die Dissertation eine wichtige Forschungs-frage von hoher praktischer Relevanz, die in der neueren wissenschaftlichen Literatur zu diesem Thema nur recht unvollstĂ€ndig und unbefriedigend gelöst wurde. Automatische Empfehlungssysteme unter Verwendung von Techniken des Social Filtering sind etwas seit Mitte der 90er Jahre mit dem Aufkommen der ersten E-Commerce Welle popularisiert orden, insbesondere durch Projekte wie Information Tapistery, Grouplens und Firefly. In den spĂ€ten 90er Jahren und Anfang dieses Jahrzehnts lag der Hauptfokus der Forschungsliteratur dann auf verbesserten statistischen Verfahren und fortgeschrittenen Inferenz-Methodiken, mit deren Hilfe die impliziten Beobachtungen auf konkrete Anpassungs- oder Empfehlungsaktionen abgebildet werden können. In den letzten Jahren sind vor allem Fragen in den Vordergrund gerĂŒckt, wie Personalisierungssysteme besser auf die praktischen Anforderungen bestimmter Applikationen angepasst werden können, wobei es insbesondere um eine geeignete Anpassung und Erweiterung existierender Techniken geht. In diesem Rahmen stellt sich die vorgelegte Arbeit

    Trust-Based Rating Prediction and Malicious Profile Detection in Online Social Recommender Systems

    Get PDF
    Online social networks and recommender systems have become an effective channel for influencing millions of users by facilitating exchange and spread of information. This dissertation addresses multiple challenges that are faced by online social recommender systems such as: i) finding the extent of information spread; ii) predicting the rating of a product; and iii) detecting malicious profiles. Most of the research in this area do not capture the social interactions and rely on empirical or statistical approaches without considering the temporal aspects. We capture the temporal spread of information using a probabilistic model and use non-linear differential equations to model the diffusion process. To predict the rating of a product, we propose a social trust model and use the matrix factorization method to estimate user\u27s taste by incorporating user-item rating matrix. The effect of tastes of friends of a user is captured using a trust model which is based on similarities between users and their centralities. Similarity is modeled using Vector Space Similarity and Pearson Correlation Coefficient algorithms, whereas degree, eigen-vector, Katz, and PageRank are used to model centrality. As rating of a product has tremendous influence on its saleability, social recommender systems are vulnerable to profile injection attacks that affect user\u27s opinion towards favorable or unfavorable recommendations for a product. We propose a classification approach for detecting attackers based on attributes that provide the likelihood of a user profile of that of an attacker. To evaluate the performance, we inject push and nuke attacks, and use precision and recall to identify the attackers. All proposed models have been validated using datasets from Facebook, Epinions, and Digg. Results exhibit that the proposed models are able to better predict the information spread, rating of a product, and identify malicious user profiles with high accuracy and low false positives

    Contributions to outlier detection and recommendation systems

    Get PDF
    Le forage de donnĂ©es, appelĂ© Ă©galement "DĂ©couverte de connaissance dans les bases de donnĂ©es" , est un jeune domaine de recherche interdisciplinaire. Le forage de donnĂ©es Ă©tudie les processus d'analyse de grands ensembles de donnĂ©es pour en extraire des connaissances, et les processus de transformation de ces connaissances en des structures faciles Ă  comprendre et Ă  utiliser par les humains. Cette thĂšse Ă©tudie deux tĂąches importantes dans le domaine du forage de donnĂ©es : la dĂ©tection des anomalies et la recommandation de produits. La dĂ©tection des anomalies est l'identification des donnĂ©es non conformes aux observations normales. La recommandation de produit est la prĂ©diction du niveau d'intĂ©rĂȘt d'un client pour des produits en se basant sur des donnĂ©es d'achats antĂ©rieurs et des donnĂ©es socio-Ă©conomiques. Plus prĂ©cisĂ©ment, cette thĂšse porte sur 1) la dĂ©tection des anomalies dans de grands ensembles de donnĂ©es de type catĂ©gorielles; et 2) les techniques de recommandation Ă  partir des donnĂ©es de classements asymĂ©triques. La dĂ©tection des anomalies dans des donnĂ©es catĂ©gorielles de grande Ă©chelle est un problĂšme important qui est loin d'ĂȘtre rĂ©solu. Les mĂ©thodes existantes dans ce domaine souffrnt d'une faible efficience et efficacitĂ© en raison de la dimensionnalitĂ© Ă©levĂ©e des donnĂ©es, de la grande taille des bases de donnĂ©es, de la complexitĂ© Ă©levĂ©e des tests statistiques, ainsi que des mesures de proximitĂ© non adĂ©quates. Cette thĂšse propose une dĂ©finition formelle d'anomalie dans les donnĂ©es catĂ©gorielles ainsi que deux algorithmes efficaces et efficients pour la dĂ©tection des anomalies dans les donnĂ©es de grande taille. Ces algorithmes ont besoin d'un seul paramĂštre : le nombre des anomalies. Pour dĂ©terminer la valeur de ce paramĂštre, nous avons dĂ©veloppĂ© un critĂšre en nous basant sur un nouveau concept qui est l'holo-entropie. Plusieurs recherches antĂ©rieures sur les systĂšmes de recommandation ont nĂ©gligĂ© un type de classements rĂ©pandu dans les applications Web, telles que le commerce Ă©lectronique (ex. Amazon, Taobao) et les sites fournisseurs de contenu (ex. YouTube). Les donnĂ©es de classements recueillies par ces sites se diffĂ©rencient de celles de classements des films et des musiques par leur distribution asymĂ©trique Ă©levĂ©e. Cette thĂšse propose un cadre mieux adaptĂ© pour estimer les classements et les prĂ©fĂ©rences quantitatives d'ordre supĂ©rieur pour des donnĂ©es de classements asymĂ©triques. Ce cadre permet de crĂ©er de nouveaux modĂšles de recommandation en se basant sur la factorisation de matrice ou sur l'estimation de voisinage. Des rĂ©sultats expĂ©rimentaux sur des ensembles de donnĂ©es asymĂ©triques indiquent que les modĂšles crĂ©Ă©s avec ce cadre ont une meilleure performance que les modĂšles conventionnels non seulement pour la prĂ©diction de classements, mais aussi pour la prĂ©diction de la liste des Top-N produits

    Using Data Mining for Facilitating User Contributions in the Social Semantic Web

    Get PDF
    This thesis utilizes recommender systems to aid the user in contributing to the Social Semantic Web. In this work, we propose a framework that maps domain properties to recommendation technologies. Next, we develop novel recommendation algorithms for improving personalized tag recommendation and for recommendation of semantic relations. Finally, we introduce a framework to analyze different types of potential attacks against social tagging systems and evaluate their impact on those systems

    Blockchain-based recommender systems: Applications, challenges and future opportunities

    Get PDF
    Recommender systems have been widely used in different application domains including energy-preservation, e-commerce, healthcare, social media, etc. Such applications require the analysis and mining of massive amounts of various types of user data, including demographics, preferences, social interactions, etc. in order to develop accurate and precise recommender systems. Such datasets often include sensitive information, yet most recommender systems are focusing on the models' accuracy and ignore issues related to security and the users' privacy. Despite the efforts to overcome these problems using different risk reduction techniques, none of them has been completely successful in ensuring cryptographic security and protection of the users' private information. To bridge this gap, the blockchain technology is presented as a promising strategy to promote security and privacy preservation in recommender systems, not only because of its security and privacy salient features, but also due to its resilience, adaptability, fault tolerance and trust characteristics. This paper presents a holistic review of blockchain-based recommender systems covering challenges, open issues and solutions. Accordingly, a well-designed taxonomy is introduced to describe the security and privacy challenges, overview existing frameworks and discuss their applications and benefits when using blockchain before indicating opportunities for future research. 2021 Elsevier Inc.This paper was made possible by National Priorities Research Program (NPRP) grant No. 10-0130-170288 from the Qatar National Research Fund (a member of Qatar Foundation). The statements made herein are solely the responsibility of the authors.Scopu

    Addressing the Issues of Coalitions and Collusion in Multiagent Systems

    Get PDF
    In the field of multiagent systems, trust and reputation systems are intended to assist agents in finding trustworthy partners with whom to interact. Earlier work of ours identified in theory a number of security vulnerabilities in trust and reputation systems, weaknesses that might be exploited by malicious agents to bypass the protections offered by such systems. In this work, we begin by developing the TREET testbed, a simulation platform that allows for extensive evaluation and flexible experimentation with trust and reputation technologies. We use this testbed to experimentally validate the practicality and gravity of attacks against vulnerabilities. Of particular interest are attacks that are collusive in nature: groups of agents (coalitions) working together to improve their expected rewards. But the issue of coalitions is not unique to trust and reputation; rather, it cuts across a range of fields in multiagent systems and beyond. In some scenarios, coalitions may be unwanted or forbidden; in others they may be benign or even desirable. In this document, we propose a method for detecting coalitions and identifying coalition members, a capability that is likely to be valuable in many of the diverse fields where coalitions may be of interest. Our method makes use of clustering in benefit space (a high-dimensional space reflecting how agents benefit others in the system) in order to identify groups of agents who benefit similar sets of agents. A statistical technique is then used to identify which clusters contain coalitions. Experimentation using the TREET platform verifies the effectiveness of this approach. A series of enhancements to our method are also introduced, which improve the accuracy and robustness of the algorithm. To demonstrate how this broadly-applicable tool can be used to address domain-specific problems, we focus again on trust and reputation systems. We show how, by incorporating our work into one such system (the existing Beta Reputation System), we can provide resistance to collusion. We conclude with a detailed discussion of the value of our work for a wide range of environments, including a variety of multiagent systems and real-world settings

    Combating Threats to the Quality of Information in Social Systems

    Get PDF
    Many large-scale social systems such as Web-based social networks, online social media sites and Web-scale crowdsourcing systems have been growing rapidly, enabling millions of human participants to generate, share and consume content on a massive scale. This reliance on users can lead to many positive effects, including large-scale growth in the size and content in the community, bottom-up discovery of “citizen-experts”, serendipitous discovery of new resources beyond the scope of the system designers, and new social-based information search and retrieval algorithms. But the relative openness and reliance on users coupled with the widespread interest and growth of these social systems carries risks and raises growing concerns over the quality of information in these systems. In this dissertation research, we focus on countering threats to the quality of information in self-managing social systems. Concretely, we identify three classes of threats to these systems: (i) content pollution by social spammers, (ii) coordinated campaigns for strategic manipulation, and (iii) threats to collective attention. To combat these threats, we propose three inter-related methods for detecting evidence of these threats, mitigating their impact, and improving the quality of information in social systems. We augment this three-fold defense with an exploration of their origins in “crowdturfing” – a sinister counterpart to the enormous positive opportunities of crowdsourcing. In particular, this dissertation research makes four unique contributions: ‱ The first contribution of this dissertation research is a framework for detecting and filtering social spammers and content polluters in social systems. To detect and filter individual social spammers and content polluters, we propose and evaluate a novel social honeypot-based approach. ‱ Second, we present a set of methods and algorithms for detecting coordinated campaigns in large-scale social systems. We propose and evaluate a content- driven framework for effectively linking free text posts with common “talking points” and extracting campaigns from large-scale social systems. ‱ Third, we present a dual study of the robustness of social systems to collective attention threats through both a data-driven modeling approach and deploy- ment over a real system trace. We evaluate the effectiveness of countermeasures deployed based on the first moments of a bursting phenomenon in a real system. ‱ Finally, we study the underlying ecosystem of crowdturfing for engaging in each of the three threat types. We present a framework for “pulling back the curtain” on crowdturfers to reveal their underlying ecosystem on both crowdsourcing sites and social media
    corecore