72 research outputs found

    Shilling Black-box Review-based Recommender Systems through Fake Review Generation

    Full text link
    Review-Based Recommender Systems (RBRS) have attracted increasing research interest due to their ability to alleviate well-known cold-start problems. RBRS utilizes reviews to construct the user and items representations. However, in this paper, we argue that such a reliance on reviews may instead expose systems to the risk of being shilled. To explore this possibility, in this paper, we propose the first generation-based model for shilling attacks against RBRSs. Specifically, we learn a fake review generator through reinforcement learning, which maliciously promotes items by forcing prediction shifts after adding generated reviews to the system. By introducing the auxiliary rewards to increase text fluency and diversity with the aid of pre-trained language models and aspect predictors, the generated reviews can be effective for shilling with high fidelity. Experimental results demonstrate that the proposed framework can successfully attack three different kinds of RBRSs on the Amazon corpus with three domains and Yelp corpus. Furthermore, human studies also show that the generated reviews are fluent and informative. Finally, equipped with Attack Review Generators (ARGs), RBRSs with adversarial training are much more robust to malicious reviews

    Cross-systems Personalisierung

    Get PDF
    The World Wide Web provides access to a wealth of information and services to a huge and heterogeneous user population on a global scale. One important and successful design mechanism in dealing with this diversity of users is to personalize Web sites and services, i.e. to customize system content, characteristics, or appearance with respect to a specific user. Each system independently builds up user pro๏ฌles and uses this information to personalize the service offering. Such isolated approaches have two major drawbacks: firstly, investments of users in personalizing a system either through explicit provision of information or through long and regular use are not transferable to other systems. Secondly, users have little or no control over the information that defines their profile, since user data are deeply buried in personalization engines running on the server side. Cross system personalization (CSP) (Mehta, Niederee, & Stewart, 2005) allows for sharing information across different information systems in a user-centric way and can overcome the aforementioned problems. Information about users, which is originally scattered across multiple systems, is combined to obtain maximum leverage and reuse of information. Our initial approaches to cross system personalization relied on each user having a unified profile which different systems can understand. The unified profile contains facets modeling aspects of a multidimensional user which is stored inside a "Context Passport" that the user carries along in his/her journey across information space. The userโ€™s Context Passport is presented to a system, which can then understand the context in which the user wants to use the system. The basis of โ€™understandingโ€™ in this approach is of a semantic nature, i.e. the semantics of the facets and dimensions of the uni๏ฌed pro๏ฌle are known, so that the latter can be aligned with the pro๏ฌles maintained internally at a specific site. The results of the personalization process are then transfered back to the userโ€™s Context Passport via a protocol understood by both parties. The main challenge in this approach is to establish some common and globally accepted vocabulary and to create a standard every system will comply with. Machine Learning techniques provide an alternative approach to enable CSP without the need of accepted semantic standards or ontologies. The key idea is that one can try to learn dependencies between pro๏ฌles maintained within one system and profiles maintained within a second system based on data provided by users who use both systems and who are willing to share their pro๏ฌles across systems โ€“ which we assume is in the interest of the user. Here, instead of requiring a common semantic framework, it is only required that a sufficient number of users cross between systems and that there is enough regularity among users that one can learn within a user population, a fact that is commonly exploited in collaborative filtering. In this thesis, we aim to provide a principled approach towards achieving cross system personalization. We describe both semantic and learning approaches, with a stronger emphasis on the learning approach. We also investigate the privacy and scalability aspects of CSP and provide solutions to these problems. Finally, we also explore in detail the aspect of robustness in recommender systems. We motivate several approaches for robustifying collaborative filtering and provide the best performing algorithm for detecting malicious attacks reported so far.Die Personalisierung von Software Systemen ist von stetig zunehmender Bedeutung, insbesondere im Zusammenhang mit Web-Applikationen wie Suchmaschinen, Community-Portalen oder Electronic Commerce Sites, die groรŸe, stark diversifizierte Nutzergruppen ansprechen. Da explizite Personalisierung typischerweise mit einem erheblichen zeitlichem Aufwand fรผr den Nutzer verbunden ist, greift man in vielen Applikationen auf implizite Techniken zur automatischen Personalisierung zurรผck, insbesondere auf Empfehlungssysteme (Recommender Systems), die typischerweise Methoden wie das Collaborative oder Social Filtering verwenden. Wรคhrend diese Verfahren keine explizite Erzeugung von Benutzerprofilen mittels Beantwortung von Fragen und explizitem Feedback erfordern, ist die Qualitรคt der impliziten Personalisierung jedoch stark vom verfรผgbaren Datenvolumen, etwa Transaktions-, Query- oder Click-Logs, abhรคngig. Ist in diesem Sinne von einem Nutzer wenig bekannt, so kรถnnen auch keine zuverlรคssigen persรถnlichen Anpassungen oder Empfehlungen vorgenommen werden. Die vorgelegte Dissertation behandelt die Frage, wie Personalisierung รผber Systemgrenzen hinweg (โ€žcross systemโ€œ) ermรถglicht und unterstรผtzt werden kann, wobei hauptsรคchlich implizite Personalisierungstechniken, aber eingeschrรคnkt auch explizite Methodiken wie der semantische Context Passport diskutiert werden. Damit behandelt die Dissertation eine wichtige Forschungs-frage von hoher praktischer Relevanz, die in der neueren wissenschaftlichen Literatur zu diesem Thema nur recht unvollstรคndig und unbefriedigend gelรถst wurde. Automatische Empfehlungssysteme unter Verwendung von Techniken des Social Filtering sind etwas seit Mitte der 90er Jahre mit dem Aufkommen der ersten E-Commerce Welle popularisiert orden, insbesondere durch Projekte wie Information Tapistery, Grouplens und Firefly. In den spรคten 90er Jahren und Anfang dieses Jahrzehnts lag der Hauptfokus der Forschungsliteratur dann auf verbesserten statistischen Verfahren und fortgeschrittenen Inferenz-Methodiken, mit deren Hilfe die impliziten Beobachtungen auf konkrete Anpassungs- oder Empfehlungsaktionen abgebildet werden kรถnnen. In den letzten Jahren sind vor allem Fragen in den Vordergrund gerรผckt, wie Personalisierungssysteme besser auf die praktischen Anforderungen bestimmter Applikationen angepasst werden kรถnnen, wobei es insbesondere um eine geeignete Anpassung und Erweiterung existierender Techniken geht. In diesem Rahmen stellt sich die vorgelegte Arbeit

    Unsolved Problems in ML Safety

    Full text link
    Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. We present four problems ready for research, namely withstanding hazards ("Robustness"), identifying hazards ("Monitoring"), reducing inherent model hazards ("Alignment"), and reducing systemic hazards ("Systemic Safety"). Throughout, we clarify each problem's motivation and provide concrete research directions.Comment: Position Pape

    Probabilistic Modeling and Inference for Obfuscated Network Attack Sequences

    Get PDF
    Prevalent computing devices with networking capabilities have become critical network infrastructure for government, industry, academia and every-day life. As their value rises, the motivation driving network attacks on this infrastructure has shifted from the pursuit of notoriety to the pursuit of profit or political gains, leading to network attack on various scales. Facing diverse network attack strategies and overwhelming alters, much work has been devoted to correlate observed malicious events to pre-defined scenarios, attempting to deduce the attack plans based on expert models of how network attacks may transpire. We started the exploration of characterizing network attacks by investigating how temporal and spatial features of attack sequence can be used to describe different types of attack sources in real data set. Attack sequence models were built from real data set to describe different attack strategies. Based on the probabilistic attack sequence model, attack predictions were made to actively predict next possible actions. Experiments through attack predictions have revealed that sophisticated attackers can employ a number of obfuscation techniques to confuse the alert correlation engine or classifier. Unfortunately, most exiting work treats attack obfuscations by developing ad-hoc fixes to specific obfuscation technique. To this end, we developed an attack modeling framework that enables a systematical analysis of obfuscations. The proposed framework represents network attack strategies as general finite order Markov models and integrates it with different attack obfuscation models to form probabilistic graphical model models. A set of algorithms is developed to inference the network attack strategies given the models and the observed sequences, which are likely to be obfuscated. The algorithms enable an efficient analysis of the impact of different obfuscation techniques and attack strategies, by determining the expected classification accuracy of the obfuscated sequences. The algorithms are developed by integrating the recursion concept in dynamic programming and the Monte-Carlo method. The primary contributions of this work include the development of the formal framework and the algorithms to evaluate the impact of attack obfuscations. Several knowledge-driven attack obfuscation models are developed and analyzed to demonstrate the impact of different types of commonly used obfuscation techniques. The framework and algorithms developed in this work can also be applied to other contexts beyond network security. Any behavior sequences that might suffer from noise and require matching to pre-defined models can use this work to recover the most likely original sequence or evaluate quantitatively the expected classification accuracy one can achieve to separate the sequences

    ์ •๋ณด ์ˆ˜์ค€์„ ์ด์šฉํ•œ ๊ฐ•๊ฑดํ•œ ์‹œ๋นŒ๊ณต๊ฒฉ ๋ฐฉ์–ด ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ค๊ณ„ ๋ฐ ๋ถ„์„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2014. 8. ๊น€์ข…๊ถŒ.์ถ”์ฒœ ์‹œ์Šคํ…œ(Recommender System, RS)์€ ๊ถ๊ทน์ ์ธ ์†Œ๋น„์ž (์ฆ‰, ์ถ”์ฒœ ์‹œ์Šคํ…œ ์‚ฌ์šฉ์ž)์—๊ฒŒ ์ƒ์—…์ ์ธ ์•„์ดํ…œ๋“ค์„ ์ถ”์ฒœํ•ด ์ฃผ๋Š” ๊ฒƒ์ด ์ฃผ์š” ๊ธฐ๋Šฅ์ด๋‹ค. ์ถ”์ฒœ ์‹œ์Šคํ…œ์—์„œ ์ •ํ™•ํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ์€ ์ถ”์ฒœ ์„œ๋น„์Šค ๊ณต๊ธ‰์ž์™€ ์‹œ์Šคํ…œ ์‚ฌ์šฉ์ž ๋ชจ๋‘์—๊ฒŒ ์ค‘์š”ํ•˜๋‹ค. ์˜จ๋ผ์ธ ์†Œ์…œ ๋„คํŠธ์›Œํฌ์˜ ํ™•์‚ฐ์œผ๋กœ ์ถ”์ฒœ ์‹œ์Šคํ…œ์˜ ์˜ํ–ฅ๋ ฅ์€ ๊ธ‰๊ฒฉํžˆ ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ๋ฐ˜๋ฉด์— ์ถ”์ฒœ ์‹œ์Šคํ…œ์˜ ์˜๋„์™€๋Š” ๋ฐ˜๋Œ€๋กœ ์ •๋ณด๋ฅผ ์กฐ์ž‘ํ•˜๋Š” ๊ฑฐ์ง“ ์•„์ด๋ดํ„ฐํ‹ฐ๋“ค์„ ์‚ฌ์šฉํ•œ ์•…์˜์ ์ธ ์‚ฌ์šฉ์ž๋“ค์˜ ์ถ”์ฒœ ์‹œ์Šคํ…œ์— ๋Œ€ํ•œ ๊ณต๊ฒฉ์ด ์ฆ๊ฐ€ํ•˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฑฐ์ง“ ์•„์ด๋ดํ„ฐํ‹ฐ๋“ค์„ ํ™œ์šฉํ•œ ๊ณต๊ฒฉ์„ ์‹œ๋นŒ(Sybil) ๊ณต๊ฒฉ์ด๋ผ ๋ถ€๋ฅธ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋‹ค๋ฅธ ์—ฐ๊ตฌ์—์„œ ์†Œ๊ฐœ๋œ ์ ์ด ์—†๋Š” ์–ด๋“œ๋ฏธ์…˜ ํ†ต์ œ ๊ฐœ๋…์„ ํ™œ์šฉํ•œ RobuRec์ด๋ผ ๋ถˆ๋ฆฌ๋Š” ์ƒˆ๋กœ์šด ๊ฐ•๊ฑดํ•œ ์ถ”์ฒœ ์‹œ์Šคํ…œ์„ ์ œ์•ˆํ•œ๋‹ค. ์–ด๋“œ๋ฏธ์…˜ ํ†ต์ œ๋ผ๋Š” ๊ฐ•๋ ฅํ•œ ๊ฐœ๋…์„ ํ™œ์šฉํ•˜์—ฌ ์ •์งํ•œ ์‚ฌ์šฉ์ž๊ฐ€ ์ƒ์„ฑํ•œ ํ‰๊ฐ€์ธ์ง€ ํ˜น์€ ์‹œ๋นŒ ์•„์ด๋ดํ„ฐํ‹ฐ๋“ค์„ ํ™œ์šฉํ•œ ์•…์˜์ ์ธ ํ‰๊ฐ€์ธ์ง€์— ๊ด€๊ณ„์—†์ด ๊ณ ์‹ ๋ขฐ ์ˆ˜์ค€์˜ ์ถ”์ฒœ์„ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ๋‹ค. RobuRec ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ธฐ ์œ„ํ•ด, ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์—ฌ๋Ÿฌ๊ฐ€์ง€ ๊ฐ€๋Šฅํ•œ ์‹œ๋นŒ ๊ณต๊ฒฉ ์‹œ๋‚˜๋ฆฌ์˜ค๋Š” ๋ฌผ๋ก  ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•˜์—ฌ ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. RobuRec์€ ์‹คํ—˜ ๋ฐ ๋ถ„์„์„ ํ†ตํ•ด RobuRec๊ณผ ๋น„๊ต ๊ฐ€๋Šฅํ•œ PCA (Principal Component Analysis) ๋ฐฉ์‹ ๋ฐ LTSMF (Least Trimmed Squared Matrix Factorization) ๋ฐฉ์‹๋ณด๋‹ค ํ”„๋ฆฌ๋”•์…˜ ์‰ฌํ”„ํŠธ (Prediction Shift, PS) ๋ฐ ์ ์ค‘ ๋น„์œจ(Hit Ratio, HR)์—์„œ ์›”๋“ฑํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ ์ฃผ์—ˆ๋‹ค.As the major function of Recommender Systems (RSs) is recommending commercial items to potential consumers (i.e., system users), providing correct information of RS is crucial to both RS providers and system users. The influence of RS over Online Social Networks (OSNs) is expanding rapidly, whereas malicious users continuously try to attack the RSs with fake identities (i.e., Sybils) by manipulating the information in the RS adversely. In this thesis, we propose a novel robust recommendation algorithm called RobuRec which exploits a distinctive feature, admission control. RobuRec provides highly Trusted recommendation results since RobuRec predicts appropriate recommendations regardless of whether the ratings are given by honest users or by Sybils thanks to the power of admission control. To demonstrate the performance of RobuRec, we have conducted extensive exper iments with various datasets as well as diverse attack scenarios. The evaluation results confirm that RobuRec outperforms the comparable schemes such as Principal Component Analysis (PCA) and Least Trimmed Squared Matrix Factorization (LTSMF) significantly in terms of Prediction Shift (PS) and Hit Ratio (HR).Chapter 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . 1 1.2 Goal and Contribution . . . . . . . . . . . . . . 3 1.3 Thesis Organization . . . . . . . . . . . . . . . 6 Chapter 2 Related Work 7 2.1 RS approaches . . . . . . . . . . . . . . . . . . 7 2.2 Sybil Attack Defense . . . . . . . . . . . . . . 9 2.3 Robust RS Approaches . . . . . . . . . . . . . . 10 Chapter 3 System Model 13 3.1 Target Applications . . . . . . . . . . . . . . 17 3.2 Strong Attacker . . . . . . . . . . . . . . . . 17 3.3 Attack Model . . . . . . . . . . . . . . . . . . 18 3.4 Model Assumptions . . . . . . . . . . . . . . . 21 Chapter 4 RobuRec Design 23 4.1 Algorithm Intuition . . . . . . . . . . . . . . 23 4.2 Initialization Phase . . . . . . . . . . . . . . 25 4.3 Admission Control Phase . . . . . . . . . . . . 26 4.4 Rating Prediction Phase . . . . . . . . . . . . 30 4.5 Dynamic Parameter Control . . . . . . . . . . . 35 4.5.1 Simplifying Control Parameters . . . . . . . . 36 4.5.2 Dynamic Cmax Control . . . . . . . . . . . . . 37 4.5.3 Dynamic Global and Local Control . . . . . . 42 Chapter 5 Evaluation and Analysis 45 5.1 Evaluation Metrics . . . . . . . . . . . . . . . 45 5.2 Parameter (alpha) Study . . . . . . . . . . . . 47 5.3 Datasets and Setup . . . . . . . . . . . . . . . 48 5.4 Results and Analysis . . . . . . . . . . . . . . 52 5.4.1 Performance on PS . . . . . . . . . . . . . . 52 5.4.2 Impact of Filler Size . . . . . . . . . . . . 55 5.4.3 Impact of Target Selection Strategy . . . . . 58 5.4.4 Dynamic Parameter Control . . . . . . . . . . 59 5.4.5 Performance on HR . . . . . . . . . . . . . . 62 5.4.6 Analysis on Escaping Probability . . . . . . . 63 Chapter 6 Conclusion 67Docto
    • โ€ฆ
    corecore