Search CORE

72 research outputs found

Shilling Black-box Review-based Recommender Systems through Fake Review Generation

Author: Chang Jason S.
Chen Yi-Syuan
Chiang Hung-Yun
Shuai Hong-Han
Song Yun-Zhu
Publication venue
Publication date: 27/06/2023
Field of study

Review-Based Recommender Systems (RBRS) have attracted increasing research interest due to their ability to alleviate well-known cold-start problems. RBRS utilizes reviews to construct the user and items representations. However, in this paper, we argue that such a reliance on reviews may instead expose systems to the risk of being shilled. To explore this possibility, in this paper, we propose the first generation-based model for shilling attacks against RBRSs. Specifically, we learn a fake review generator through reinforcement learning, which maliciously promotes items by forcing prediction shifts after adding generated reviews to the system. By introducing the auxiliary rewards to increase text fluency and diversity with the aid of pre-trained language models and aspect predictors, the generated reviews can be effective for shilling with high fidelity. Experimental results demonstrate that the proposed framework can successfully attack three different kinds of RBRSs on the Amazon corpus with three domains and Yelp corpus. Furthermore, human studies also show that the generated reviews are fluent and informative. Finally, equipped with Attack Review Generators (ARGs), RBRSs with adversarial training are much more robust to malicious reviews

arXiv.org e-Print Archive

Cross-systems Personalisierung

Author: Mehta Bhaskar
Publication venue
Publication date: 04/04/2008
Field of study

The World Wide Web provides access to a wealth of information and services to a huge and heterogeneous user population on a global scale. One important and successful design mechanism in dealing with this diversity of users is to personalize Web sites and services, i.e. to customize system content, characteristics, or appearance with respect to a specific user. Each system independently builds up user proﬁles and uses this information to personalize the service offering. Such isolated approaches have two major drawbacks: firstly, investments of users in personalizing a system either through explicit provision of information or through long and regular use are not transferable to other systems. Secondly, users have little or no control over the information that defines their profile, since user data are deeply buried in personalization engines running on the server side. Cross system personalization (CSP) (Mehta, Niederee, & Stewart, 2005) allows for sharing information across different information systems in a user-centric way and can overcome the aforementioned problems. Information about users, which is originally scattered across multiple systems, is combined to obtain maximum leverage and reuse of information. Our initial approaches to cross system personalization relied on each user having a unified profile which different systems can understand. The unified profile contains facets modeling aspects of a multidimensional user which is stored inside a "Context Passport" that the user carries along in his/her journey across information space. The user’s Context Passport is presented to a system, which can then understand the context in which the user wants to use the system. The basis of ’understanding’ in this approach is of a semantic nature, i.e. the semantics of the facets and dimensions of the uniﬁed proﬁle are known, so that the latter can be aligned with the proﬁles maintained internally at a specific site. The results of the personalization process are then transfered back to the user’s Context Passport via a protocol understood by both parties. The main challenge in this approach is to establish some common and globally accepted vocabulary and to create a standard every system will comply with. Machine Learning techniques provide an alternative approach to enable CSP without the need of accepted semantic standards or ontologies. The key idea is that one can try to learn dependencies between proﬁles maintained within one system and profiles maintained within a second system based on data provided by users who use both systems and who are willing to share their proﬁles across systems – which we assume is in the interest of the user. Here, instead of requiring a common semantic framework, it is only required that a sufficient number of users cross between systems and that there is enough regularity among users that one can learn within a user population, a fact that is commonly exploited in collaborative filtering. In this thesis, we aim to provide a principled approach towards achieving cross system personalization. We describe both semantic and learning approaches, with a stronger emphasis on the learning approach. We also investigate the privacy and scalability aspects of CSP and provide solutions to these problems. Finally, we also explore in detail the aspect of robustness in recommender systems. We motivate several approaches for robustifying collaborative filtering and provide the best performing algorithm for detecting malicious attacks reported so far.Die Personalisierung von Software Systemen ist von stetig zunehmender Bedeutung, insbesondere im Zusammenhang mit Web-Applikationen wie Suchmaschinen, Community-Portalen oder Electronic Commerce Sites, die große, stark diversifizierte Nutzergruppen ansprechen. Da explizite Personalisierung typischerweise mit einem erheblichen zeitlichem Aufwand für den Nutzer verbunden ist, greift man in vielen Applikationen auf implizite Techniken zur automatischen Personalisierung zurück, insbesondere auf Empfehlungssysteme (Recommender Systems), die typischerweise Methoden wie das Collaborative oder Social Filtering verwenden. Während diese Verfahren keine explizite Erzeugung von Benutzerprofilen mittels Beantwortung von Fragen und explizitem Feedback erfordern, ist die Qualität der impliziten Personalisierung jedoch stark vom verfügbaren Datenvolumen, etwa Transaktions-, Query- oder Click-Logs, abhängig. Ist in diesem Sinne von einem Nutzer wenig bekannt, so können auch keine zuverlässigen persönlichen Anpassungen oder Empfehlungen vorgenommen werden. Die vorgelegte Dissertation behandelt die Frage, wie Personalisierung über Systemgrenzen hinweg („cross system“) ermöglicht und unterstützt werden kann, wobei hauptsächlich implizite Personalisierungstechniken, aber eingeschränkt auch explizite Methodiken wie der semantische Context Passport diskutiert werden. Damit behandelt die Dissertation eine wichtige Forschungs-frage von hoher praktischer Relevanz, die in der neueren wissenschaftlichen Literatur zu diesem Thema nur recht unvollständig und unbefriedigend gelöst wurde. Automatische Empfehlungssysteme unter Verwendung von Techniken des Social Filtering sind etwas seit Mitte der 90er Jahre mit dem Aufkommen der ersten E-Commerce Welle popularisiert orden, insbesondere durch Projekte wie Information Tapistery, Grouplens und Firefly. In den späten 90er Jahren und Anfang dieses Jahrzehnts lag der Hauptfokus der Forschungsliteratur dann auf verbesserten statistischen Verfahren und fortgeschrittenen Inferenz-Methodiken, mit deren Hilfe die impliziten Beobachtungen auf konkrete Anpassungs- oder Empfehlungsaktionen abgebildet werden können. In den letzten Jahren sind vor allem Fragen in den Vordergrund gerückt, wie Personalisierungssysteme besser auf die praktischen Anforderungen bestimmter Applikationen angepasst werden können, wobei es insbesondere um eine geeignete Anpassung und Erweiterung existierender Techniken geht. In diesem Rahmen stellt sich die vorgelegte Arbeit

Duisburg-Essen Publications Online

Unsolved Problems in ML Safety

Author: Carlini Nicholas
Hendrycks Dan
Schulman John
Steinhardt Jacob
Publication venue
Publication date: 16/06/2022
Field of study

Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. We present four problems ready for research, namely withstanding hazards ("Robustness"), identifying hazards ("Monitoring"), reducing inherent model hazards ("Alignment"), and reducing systemic hazards ("Systemic Safety"). Throughout, we clarify each problem's motivation and provide concrete research directions.Comment: Position Pape

arXiv.org e-Print Archive

Probabilistic Modeling and Inference for Obfuscated Network Attack Sequences

Author: Du Haitao
Publication venue: RIT Scholar Works
Publication date: 01/08/2014
Field of study

Prevalent computing devices with networking capabilities have become critical network infrastructure for government, industry, academia and every-day life. As their value rises, the motivation driving network attacks on this infrastructure has shifted from the pursuit of notoriety to the pursuit of profit or political gains, leading to network attack on various scales. Facing diverse network attack strategies and overwhelming alters, much work has been devoted to correlate observed malicious events to pre-defined scenarios, attempting to deduce the attack plans based on expert models of how network attacks may transpire. We started the exploration of characterizing network attacks by investigating how temporal and spatial features of attack sequence can be used to describe different types of attack sources in real data set. Attack sequence models were built from real data set to describe different attack strategies. Based on the probabilistic attack sequence model, attack predictions were made to actively predict next possible actions. Experiments through attack predictions have revealed that sophisticated attackers can employ a number of obfuscation techniques to confuse the alert correlation engine or classifier. Unfortunately, most exiting work treats attack obfuscations by developing ad-hoc fixes to specific obfuscation technique. To this end, we developed an attack modeling framework that enables a systematical analysis of obfuscations. The proposed framework represents network attack strategies as general finite order Markov models and integrates it with different attack obfuscation models to form probabilistic graphical model models. A set of algorithms is developed to inference the network attack strategies given the models and the observed sequences, which are likely to be obfuscated. The algorithms enable an efficient analysis of the impact of different obfuscation techniques and attack strategies, by determining the expected classification accuracy of the obfuscated sequences. The algorithms are developed by integrating the recursion concept in dynamic programming and the Monte-Carlo method. The primary contributions of this work include the development of the formal framework and the algorithms to evaluate the impact of attack obfuscations. Several knowledge-driven attack obfuscation models are developed and analyzed to demonstrate the impact of different types of commonly used obfuscation techniques. The framework and algorithms developed in this work can also be applied to other contexts beyond network security. Any behavior sequences that might suffer from noise and require matching to pre-defined models can use this work to recover the most likely original sequence or evaluate quantitatively the expected classification accuracy one can achieve to separate the sequences

RIT Scholar Works

정보 수준을 이용한 강건한 시빌공격 방어 알고리즘 설계 및 분석

Author: 노기섭
Publication venue: 서울대학교 대학원
Publication date: 01/08/2014
Field of study

학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2014. 8. 김종권.추천 시스템(Recommender System, RS)은 궁극적인 소비자 (즉, 추천 시스템 사용자)에게 상업적인 아이템들을 추천해 주는 것이 주요 기능이다. 추천 시스템에서 정확한 정보를 제공하는 것은 추천 서비스 공급자와 시스템 사용자 모두에게 중요하다. 온라인 소셜 네트워크의 확산으로 추천 시스템의 영향력은 급격히 증가하고 있다. 반면에 추천 시스템의 의도와는 반대로 정보를 조작하는 거짓 아이덴터티들을 사용한 악의적인 사용자들의 추천 시스템에 대한 공격이 증가하고 있다. 이러한 거짓 아이덴터티들을 활용한 공격을 시빌(Sybil) 공격이라 부른다. 본 논문에서는 다른 연구에서 소개된 적이 없는 어드미션 통제 개념을 활용한 RobuRec이라 불리는 새로운 강건한 추천 시스템을 제안한다. 어드미션 통제라는 강력한 개념을 활용하여 정직한 사용자가 생성한 평가인지 혹은 시빌 아이덴터티들을 활용한 악의적인 평가인지에 관계없이 고신뢰 수준의 추천을 예측할 수 있다. RobuRec 시스템의 성능을 보이기 위해, 본 논문에서는 여러가지 가능한 시빌 공격 시나리오는 물론 다양한 데이터셋을 활용하여 광범위한 실험을 수행하였다. RobuRec은 실험 및 분석을 통해 RobuRec과 비교 가능한 PCA (Principal Component Analysis) 방식 및 LTSMF (Least Trimmed Squared Matrix Factorization) 방식보다 프리딕션 쉬프트 (Prediction Shift, PS) 및 적중 비율(Hit Ratio, HR)에서 월등한 성능을 보여 주었다.As the major function of Recommender Systems (RSs) is recommending commercial items to potential consumers (i.e., system users), providing correct information of RS is crucial to both RS providers and system users. The influence of RS over Online Social Networks (OSNs) is expanding rapidly, whereas malicious users continuously try to attack the RSs with fake identities (i.e., Sybils) by manipulating the information in the RS adversely. In this thesis, we propose a novel robust recommendation algorithm called RobuRec which exploits a distinctive feature, admission control. RobuRec provides highly Trusted recommendation results since RobuRec predicts appropriate recommendations regardless of whether the ratings are given by honest users or by Sybils thanks to the power of admission control. To demonstrate the performance of RobuRec, we have conducted extensive exper iments with various datasets as well as diverse attack scenarios. The evaluation results confirm that RobuRec outperforms the comparable schemes such as Principal Component Analysis (PCA) and Least Trimmed Squared Matrix Factorization (LTSMF) significantly in terms of Prediction Shift (PS) and Hit Ratio (HR).Chapter 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . 1 1.2 Goal and Contribution . . . . . . . . . . . . . . 3 1.3 Thesis Organization . . . . . . . . . . . . . . . 6 Chapter 2 Related Work 7 2.1 RS approaches . . . . . . . . . . . . . . . . . . 7 2.2 Sybil Attack Defense . . . . . . . . . . . . . . 9 2.3 Robust RS Approaches . . . . . . . . . . . . . . 10 Chapter 3 System Model 13 3.1 Target Applications . . . . . . . . . . . . . . 17 3.2 Strong Attacker . . . . . . . . . . . . . . . . 17 3.3 Attack Model . . . . . . . . . . . . . . . . . . 18 3.4 Model Assumptions . . . . . . . . . . . . . . . 21 Chapter 4 RobuRec Design 23 4.1 Algorithm Intuition . . . . . . . . . . . . . . 23 4.2 Initialization Phase . . . . . . . . . . . . . . 25 4.3 Admission Control Phase . . . . . . . . . . . . 26 4.4 Rating Prediction Phase . . . . . . . . . . . . 30 4.5 Dynamic Parameter Control . . . . . . . . . . . 35 4.5.1 Simplifying Control Parameters . . . . . . . . 36 4.5.2 Dynamic Cmax Control . . . . . . . . . . . . . 37 4.5.3 Dynamic Global and Local Control . . . . . . 42 Chapter 5 Evaluation and Analysis 45 5.1 Evaluation Metrics . . . . . . . . . . . . . . . 45 5.2 Parameter (alpha) Study . . . . . . . . . . . . 47 5.3 Datasets and Setup . . . . . . . . . . . . . . . 48 5.4 Results and Analysis . . . . . . . . . . . . . . 52 5.4.1 Performance on PS . . . . . . . . . . . . . . 52 5.4.2 Impact of Filler Size . . . . . . . . . . . . 55 5.4.3 Impact of Target Selection Strategy . . . . . 58 5.4.4 Dynamic Parameter Control . . . . . . . . . . 59 5.4.5 Performance on HR . . . . . . . . . . . . . . 62 5.4.6 Analysis on Escaping Probability . . . . . . . 63 Chapter 6 Conclusion 67Docto

SNU Open Repository and Archive