339 research outputs found

    Combating User Misbehavior on Social Media

    Get PDF
    Social media encourages user participation and facilitates userโ€™s self-expression like never before. While enriching user behavior in a spectrum of means, many social media platforms have become breeding grounds for user misbehavior. In this dissertation we focus on understanding and combating three specific threads of user misbehaviors that widely exist on social media โ€” spamming, manipulation, and distortion. First, we address the challenge of detecting spam links. Rather than rely on traditional blacklist-based or content-based methods, we examine the behavioral factors of both who is posting the link and who is clicking on the link. The core intuition is that these behavioral signals may be more difficult to manipulate than traditional signals. We find that this purely behavioral approach can achieve good performance for robust behavior-based spam link detection. Next, we deal with uncovering manipulated behavior of link sharing. We propose a four-phase approach to model, identify, characterize, and classify organic and organized groups who engage in link sharing. The key motivating insight is that group-level behavioral signals can distinguish manipulated user groups. We find that levels of organized behavior vary by link type and that the proposed approach achieves good performance measured by commonly-used metrics. Finally, we investigate a particular distortion behavior: making bullshit (BS) statements on social media. We explore the factors impacting the perception of BS and what leads users to ultimately perceive and call a post BS. We begin by preparing a crowdsourced collection of real social media posts that have been called BS. We then build a classification model that can determine what posts are more likely to be called BS. Our experiments suggest our classifier has the potential of leveraging linguistic cues for detecting social media posts that are likely to be called BS. We complement these three studies with a cross-cutting investigation of learning user topical profiles, which can shed light into what subjects each user is associated with, which can benefit the understanding of the connection between user and misbehavior. Concretely, we propose a unified model for learning user topical profiles that simultaneously considers multiple footprints and we show how these footprints can be embedded in a generalized optimization framework. Through extensive experiments on millions of real social media posts, we find our proposed models can effectively combat user misbehavior on social media

    Opinion spam detection: using multi-iterative graph-based model

    Get PDF
    The demand to detect opinionated spam, using opinion mining applications to prevent their damaging effects on e-commerce reputations is on the rise in many business sectors globally. The existing spam detection techniques in use nowadays, only consider one or two types of spam entities such as review, reviewer, group of reviewers, and product. Besides, they use a limited number of features related to behaviour, content and the relation of entities which reduces the detection's accuracy. Accordingly, these techniques mostly exploit synthetic datasets to analyse their model and are not able to be applied in the context of the real-world environment. As such, a novel graph-based model called โ€œMulti-iterative Graph-based opinion Spam Detectionโ€ (MGSD) in which all various types of entities are considered simultaneously within a unified structure is proposed. Using this approach, the model reveals both implicit (i.e., similar entity's) and explicit (i.e., different entitiesโ€™) relationships. The MGSD model is able to evaluate the โ€˜spamicityโ€™ effects of entities more efficiently given it applies a novel multi-iterative algorithm which considers different sets of factors to update the spamicity score of entities. To enhance the accuracy of the MGSD detection model, a higher number of existing weighted features along with the novel proposed features from different categories were selected using a combination of feature fusion techniques and machine learning (ML) algorithms. The MGSD model can also be generalised and applied in various opinionated documents due to employing domain independent features. The output of the MGSD model showed that our feature selection and feature fusion techniques showed a remarkable improvement in detecting spam. The findings of this study showed that MGSD could improve the accuracy of state-of-the-art ML and graph-based techniques by around 5.6% and 4.8%, respectively, also achieving an accuracy of 93% for the detection of spam detection in our synthetic crowdsourced dataset and 95.3% for Ott's crowdsourced dataset

    Detecting collusive spamming activities in community question answering

    Get PDF
    Community Question Answering (CQA) portals provide rich sources of information on a variety of topics. However, the authenticity and quality of questions and answers (Q&As) has proven hard to control. In a troubling direction, the widespread growth of crowdsourcing websites has created a large-scale, potentially difficult-to-detect workforce to manipulate malicious contents in CQA. The crowd workers who join the same crowdsourcing task about promotion campaigns in CQA collusively manipulate deceptive Q&As for promoting a target (product or service). The collusive spamming group can fully control the sentiment of the target. How to utilize the structure and the attributes for detecting manipulated Q&As? How to detect the collusive group and leverage the group information for the detection task? To shed light on these research questions, we propose a unified framework to tackle the challenge of detecting collusive spamming activities of CQA. First, we interpret the questions and answers in CQA as two independent networks. Second, we detect collusive question groups and answer groups from these two networks respectively by measuring the similarity of the contents posted within a short duration. Third, using attributes (individual-level and group-level) and correlations (user-based and content-based), we proposed a combined factor graph model to detect deceptive Q&As simultaneously by combining two independent factor graphs. With a large-scale practical data set, we find that the proposed framework can detect deceptive contents at early stage, and outperforms a number of competitive baselines

    The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race

    Full text link
    Recent studies in social media spam and automation provide anecdotal argumentation of the rise of a new generation of spambots, so-called social spambots. Here, for the first time, we extensively study this novel phenomenon on Twitter and we provide quantitative evidence that a paradigm-shift exists in spambot design. First, we measure current Twitter's capabilities of detecting the new social spambots. Later, we assess the human performance in discriminating between genuine accounts, social spambots, and traditional spambots. Then, we benchmark several state-of-the-art techniques proposed by the academic literature. Results show that neither Twitter, nor humans, nor cutting-edge applications are currently capable of accurately detecting the new social spambots. Our results call for new approaches capable of turning the tide in the fight against this raising phenomenon. We conclude by reviewing the latest literature on spambots detection and we highlight an emerging common research trend based on the analysis of collective behaviors. Insights derived from both our extensive experimental campaign and survey shed light on the most promising directions of research and lay the foundations for the arms race against the novel social spambots. Finally, to foster research on this novel phenomenon, we make publicly available to the scientific community all the datasets used in this study.Comment: To appear in Proc. 26th WWW, 2017, Companion Volume (Web Science Track, Perth, Australia, 3-7 April, 2017

    ์†Œ์…œ ๋„คํŠธ์›Œํฌ์™€ ์ด์ปค๋จธ์Šค ํ”Œ๋žซํผ์—์„œ์˜ ์ž ์žฌ ๋„คํŠธ์›Œํฌ ๋งˆ์ด๋‹

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2023. 2. ๊ถŒํƒœ๊ฒฝ.์›น ๊ธฐ๋ฐ˜ ์„œ๋น„์Šค์˜ ํญ๋ฐœ์ ์ธ ๋ฐœ๋‹ฌ๋กœ ์‚ฌ์šฉ์ž๋“ค์€ ์˜จ๋ผ์ธ ์ƒ์—์„œ ํญ๋„“๊ฒŒ ์—ฐ๊ฒฐ๋˜๊ณ  ์žˆ๋‹ค. ์˜จ๋ผ์ธ ํ”Œ๋žซํผ ์ƒ์—์„œ, ์‚ฌ์šฉ์ž๋“ค์€ ์„œ๋กœ์—๊ฒŒ ์˜ํ–ฅ์„ ์ฃผ๊ณ ๋ฐ›์œผ๋ฉฐ ์˜์‚ฌ ๊ฒฐ์ •์— ๊ทธ๋“ค์˜ ๊ฒฝํ—˜๊ณผ ์˜๊ฒฌ์„ ๋ฐ˜์˜ํ•˜๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์ธ๋‹ค. ๋ณธ ํ•™์œ„ ๋…ผ๋ฌธ์—์„œ๋Š” ๋Œ€ํ‘œ์ ์ธ ์˜จ๋ผ์ธ ํ”Œ๋žซํผ์ธ ์†Œ์…œ ๋„คํŠธ์›Œํฌ ์„œ๋น„์Šค์™€ ์ด์ปค๋จธ์Šค ํ”Œ๋žซํผ์—์„œ์˜ ์‚ฌ์šฉ์ž ํ–‰๋™์— ๋Œ€ํ•ด ์—ฐ๊ตฌํ•˜์˜€๋‹ค. ์˜จ๋ผ์ธ ํ”Œ๋žซํผ์—์„œ์˜ ์‚ฌ์šฉ์ž ํ–‰๋™์€ ์‚ฌ์šฉ์ž์™€ ํ”Œ๋žซํผ ๊ตฌ์„ฑ ์š”์†Œ ๊ฐ„์˜ ๊ด€๊ณ„๋กœ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค. ์‚ฌ์šฉ์ž์˜ ๊ตฌ๋งค๋Š” ์‚ฌ์šฉ์ž์™€ ์ƒํ’ˆ ๊ฐ„์˜ ๊ด€๊ณ„๋กœ, ์‚ฌ์šฉ์ž์˜ ์ฒดํฌ์ธ์€ ์‚ฌ์šฉ์ž์™€ ์žฅ์†Œ ๊ฐ„์˜ ๊ด€๊ณ„๋กœ ๋‚˜ํƒ€๋‚ด์ง„๋‹ค. ์—ฌ๊ธฐ์— ํ–‰๋™์˜ ์‹œ๊ฐ„๊ณผ ๋ ˆ์ดํŒ…, ํƒœ๊ทธ ๋“ฑ์˜ ์ •๋ณด๊ฐ€ ํฌํ•จ๋  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ๋‘ ํ”Œ๋žซํผ์—์„œ ์ •์˜๋œ ์‚ฌ์šฉ์ž์˜ ํ–‰๋™ ๊ทธ๋ž˜ํ”„์— ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ์ž ์žฌ ๋„คํŠธ์›Œํฌ๋ฅผ ํŒŒ์•…ํ•˜๋Š” ์—ฐ๊ตฌ๋ฅผ ์ œ์‹œํ•œ๋‹ค. ์œ„์น˜ ๊ธฐ๋ฐ˜์˜ ์†Œ์…œ ๋„คํŠธ์›Œํฌ ์„œ๋น„์Šค์˜ ๊ฒฝ์šฐ ํŠน์ • ์žฅ์†Œ์— ๋ฐฉ๋ฌธํ•˜๋Š” ์ฒดํฌ์ธ ํ˜•์‹์œผ๋กœ ๋งŽ์€ ํฌ์ŠคํŠธ๊ฐ€ ๋งŒ๋“ค์–ด์ง€๋Š”๋ฐ, ์‚ฌ์šฉ์ž์˜ ์žฅ์†Œ ๋ฐฉ๋ฌธ์€ ์‚ฌ์šฉ์ž ๊ฐ„์— ์‚ฌ์ „์— ์กด์žฌํ•˜๋Š” ์นœ๊ตฌ ๊ด€๊ณ„์— ์˜ํ•ด ์˜ํ–ฅ์„ ํฌ๊ฒŒ ๋ฐ›๋Š”๋‹ค. ์‚ฌ์šฉ์ž ํ™œ๋™ ๋„คํŠธ์›Œํฌ์˜ ์ €๋ณ€์— ์ž ์žฌ๋œ ์‚ฌ์šฉ์ž ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ์€ ํ™œ๋™ ์˜ˆ์ธก์— ๋„์›€์ด ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ด๋ฅผ ์œ„ํ•ด ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋น„์ง€๋„ํ•™์Šต ๊ธฐ๋ฐ˜์œผ๋กœ ํ™œ๋™ ๋„คํŠธ์›Œํฌ๋กœ๋ถ€ํ„ฐ ์‚ฌ์šฉ์ž ๊ฐ„ ์‚ฌํšŒ์  ๊ด€๊ณ„๋ฅผ ์ถ”์ถœํ•˜๋Š” ์—ฐ๊ตฌ๋ฅผ ์ œ์•ˆํ•˜์˜€๋‹ค. ๊ธฐ์กด์— ์—ฐ๊ตฌ๋˜์—ˆ๋˜ ๋ฐฉ๋ฒ•๋“ค์€ ๋‘ ์‚ฌ์šฉ์ž๊ฐ€ ๋™์‹œ์— ๋ฐฉ๋ฌธํ•˜๋Š” ํ–‰์œ„์ธ co-visitation์„ ์ค‘์ ์ ์œผ๋กœ ๊ณ ๋ คํ•˜์—ฌ ์‚ฌ์šฉ์ž ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ์˜ˆ์ธกํ•˜๊ฑฐ๋‚˜, ๋„คํŠธ์›Œํฌ ์ž„๋ฒ ๋”ฉ ๋˜๋Š” ๊ทธ๋ž˜ํ”„ ์‹ ๊ฒฝ๋ง(GNN)์„ ์‚ฌ์šฉํ•˜์—ฌ ํ‘œํ˜„ ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜์˜€๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Ÿฌํ•œ ์ ‘๊ทผ ๋ฐฉ์‹์€ ์ฃผ๊ธฐ์ ์ธ ๋ฐฉ๋ฌธ์ด๋‚˜ ์žฅ๊ฑฐ๋ฆฌ ์ด๋™ ๋“ฑ์œผ๋กœ ๋Œ€ํ‘œ๋˜๋Š” ์‚ฌ์šฉ์ž์˜ ํ–‰๋™ ํŒจํ„ด์„ ์ž˜ ํฌ์ฐฉํ•˜์ง€ ๋ชปํ•œ๋‹ค. ํ–‰๋™ ํŒจํ„ด์„ ๋” ์ž˜ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด, ANES๋Š” ์‚ฌ์šฉ์ž ์ปจํ…์ŠคํŠธ ๋‚ด์—์„œ ์‚ฌ์šฉ์ž์™€ ๊ด€์‹ฌ ์ง€์ (POI) ๊ฐ„์˜ ์ธก๋ฉด(Aspect) ์ง€ํ–ฅ ๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•œ๋‹ค. ANES๋Š” User-POI ์ด๋ถ„ ๊ทธ๋ž˜ํ”„์˜ ๊ตฌ์กฐ์—์„œ ์‚ฌ์šฉ์ž์˜ ํ–‰๋™์„ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์ธก๋ฉด์œผ๋กœ ๋‚˜๋ˆ„๊ณ , ๊ฐ๊ฐ์˜ ๊ด€๊ณ„๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ํ–‰๋™ ํŒจํ„ด์„ ์ถ”์ถœํ•˜๋Š” ์ตœ์ดˆ์˜ ๋น„์ง€๋„ํ•™์Šต ๊ธฐ๋ฐ˜ ์ ‘๊ทผ ๋ฐฉ์‹์ด๋‹ค. ์‹ค์ œ LBSN ๋ฐ์ดํ„ฐ์—์„œ ์ˆ˜ํ–‰๋œ ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜์—์„œ, ANES๋Š” ๊ธฐ์กด์— ์ œ์•ˆ๋˜์—ˆ๋˜ ๊ธฐ๋ฒ•๋“ค๋ณด๋‹ค ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ค€๋‹ค. ์œ„์น˜ ๊ธฐ๋ฐ˜ ์†Œ์…œ ๋„คํŠธ์›Œํฌ์™€๋Š” ๋‹ค๋ฅด๊ฒŒ, ์ด์ปค๋จธ์Šค์˜ ๋ฆฌ๋ทฐ ์‹œ์Šคํ…œ์—์„œ๋Š” ์‚ฌ์šฉ์ž๋“ค์ด ๋Šฅ๋™์ ์ธ ํŒ”๋กœ์šฐ/ํŒ”๋กœ์ž‰ ๋“ฑ์˜ ํ–‰์œ„๋ฅผ ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š๊ณ ๋„ ํ”Œ๋žซํผ์— ์˜ํ•ด ์„œ๋กœ์˜ ์ •๋ณด๋ฅผ ์ฃผ๊ณ ๋ฐ›๊ณ  ์˜ํ–ฅ๋ ฅ์„ ํ–‰์‚ฌํ•˜๊ฒŒ ๋œ๋‹ค. ์ด์™€ ๊ฐ™์€ ์‚ฌ์šฉ์ž๋“ค์˜ ํ–‰๋™ ํŠน์„ฑ์€ ๋ฆฌ๋ทฐ ์ŠคํŒธ์— ์˜ํ•ด ์‰ฝ๊ฒŒ ์•…์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. ๋ฆฌ๋ทฐ ์ŠคํŒธ์€ ์‹ค์ œ ์‚ฌ์šฉ์ž์˜ ์˜๊ฒฌ์„ ์ˆจ๊ธฐ๊ณ  ํ‰์ ์„ ์กฐ์ž‘ํ•˜์—ฌ ์ž˜๋ชป๋œ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ด๋ฃจ์–ด์ง„๋‹ค. ๋‚˜๋Š” ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ์ž ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ์—์„œ ์‚ฌ์šฉ์ž ๊ฐ„ ์‚ฌ์ „ ๊ณต๋ชจ์„ฑ(Collusiveness)์˜ ๊ฐ€๋Šฅ์„ฑ์„ ์ฐพ๊ณ , ์ด๋ฅผ ์ŠคํŒธ ํƒ์ง€์— ํ™œ์šฉํ•œ ๋ฐฉ๋ฒ•์ธ SC-Com์„ ์ œ์•ˆํ•œ๋‹ค. SC-Com์€ ํ–‰๋™์˜ ๊ณต๋ชจ์„ฑ์œผ๋กœ๋ถ€ํ„ฐ ์‚ฌ์šฉ์ž ๊ฐ„ ๊ณต๋ชจ ์ ์ˆ˜๋ฅผ ๊ณ„์‚ฐํ•˜๊ณ  ํ•ด๋‹น ์ ์ˆ˜๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ „์ฒด ์‚ฌ์šฉ์ž๋ฅผ ์œ ์‚ฌํ•œ ์‚ฌ์šฉ์ž๋“ค์˜ ์ปค๋ฎค๋‹ˆํ‹ฐ๋กœ ๋ถ„๋ฅ˜ํ•œ๋‹ค. ๊ทธ ํ›„ ์ŠคํŒธ ์œ ์ €์™€ ์ผ๋ฐ˜ ์œ ์ €๋ฅผ ๊ตฌ๋ณ„ํ•˜๋Š” ๋ฐ์— ์ค‘์š”ํ•œ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜์˜ ํŠน์ง•์„ ์ถ”์ถœํ•˜์—ฌ ๊ฐ๋… ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ๋ถ„๋ฅ˜๊ธฐ์˜ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋กœ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. SC-Com์€ ๊ณต๋ชจ์„ฑ์„ ๊ฐ–๋Š” ์ŠคํŒธ ์œ ์ €์˜ ์ง‘ํ•ฉ์„ ํšจ๊ณผ์ ์œผ๋กœ ํƒ์ง€ํ•œ๋‹ค. ์‹ค์ œ ๋ฐ์ดํ„ฐ์…‹์„ ์ด์šฉํ•œ ์‹คํ—˜์—์„œ, SC-Com์€ ๊ธฐ์กด ๋…ผ๋ฌธ๋“ค ๋Œ€๋น„ ์ŠคํŒธ ํƒ์ง€์— ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ์—ˆ๋‹ค. ์œ„ ๋…ผ๋ฌธ์—์„œ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ์—ฐ๊ตฌ๋œ ์•”์‹œ์  ์—ฐ๊ฒฐ๋ง ํƒ์ง€ ๋ชจ๋ธ์€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ๋„ ์‚ฌ์ „์— ์—ฐ๊ฒฐ๋˜์—ˆ์„ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’์€ ์‚ฌ์šฉ์ž๋“ค์„ ์˜ˆ์ธกํ•˜๋ฏ€๋กœ, ์‹ค์‹œ๊ฐ„ ์œ„์น˜ ๋ฐ์ดํ„ฐ๋‚˜, ์•ฑ ์‚ฌ์šฉ ๋ฐ์ดํ„ฐ ๋“ฑ์˜ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์—์„œ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์œ ์šฉํ•œ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜์—ฌ ๊ด‘๊ณ  ์ถ”์ฒœ ์‹œ์Šคํ…œ์ด๋‚˜, ์•…์„ฑ ์œ ์ € ํƒ์ง€ ๋“ฑ์˜ ๋ถ„์•ผ์—์„œ ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€ํ•œ๋‹ค.Following the exploding usage on online services, people are connected with each other more broadly and widely. In online platforms, people influence each other, and have tendency to reflect their opinions in decision-making. Social Network Services (SNSs) and E-commerce are typical example of online platforms. User behaviors in online platforms can be defined as relation between user and platform components. A user's purchase is a relationship between a user and a product, and a user's check-in is a relationship between a user and a place. Here, information such as action time, rating, tag, etc. may be included. In many studies, platform user behavior is represented in graph form. At this time, the elements constituting the nodes of the graph are composed of objects such as users and products and places within the platform, and the interaction between the platform elements and the user can be expressed as two nodes being connected. In this study, I present studies to identify potential networks that affect the user's behavior graph defined on the two platforms. In ANES, I focus on representation learning for social link inference based on user trajectory data. While traditional methods predict relations between users by considering hand-crafted features, recent studies first perform representation learning using network/node embedding or graph neural networks (GNNs) for downstream tasks such as node classification and link prediction. However, those approaches fail to capture behavioral patterns of individuals ingrained in periodical visits or long-distance movements. To better learn behavioral patterns, this paper proposes a novel scheme called ANES (Aspect-oriented Network Embedding for Social link inference). ANES learns aspect-oriented relations between users and Point-of-Interests (POIs) within their contexts. ANES is the first approach that extracts the complex behavioral pattern of users from both trajectory data and the structure of User-POI bipartite graphs. Extensive experiments on several real-world datasets show that ANES outperforms state-of-the-art baselines. In contrast to active social networks, people are connected to other users regardless of their intentions in some platforms, such as online shopping websites and restaurant review sites. They do not have any information about each other in advance, and they only have a common point which is that they have visited or have planned to visit same place or purchase a product. Interestingly, users have tendency to be influenced by the review data on their purchase intentions. Unfortunately, this instinct is easily exploited by opinion spammers. In SC-Com, I focus on opinion spam detection in online shopping services. In many cases, my decision-making process is closely related to online reviews. However, there have been threats of opinion spams by hired reviewers increasingly, which aim to mislead potential customers by hiding genuine consumers opinions. Opinion spams should be filed up collectively to falsify true information. Fortunately, I propose the way to spot the possibility to detect them from their collusiveness. In this paper, I propose SC-Com, an optimized collusive community detection framework. It constructs the graph of reviewers from the collusiveness of behavior and divides a graph by communities based on their mutual suspiciousness. After that, I extract community-based and temporal abnormality features which are critical to discriminate spammers from other genuine users. I show that my method detects collusive opinion spam reviewers effectively and precisely from their collective behavioral patterns. In the real-world dataset, my approach showed prominent performance while only considering primary data such as time and ratings. These implicit network inference models studied on various data in this thesis predicts users who are likely to be pre-connected to unlabeled data, so it is expected to contribute to areas such as advertising recommendation systems and malicious user detection by providing useful information.Chapter 1 Introduction 1 Chapter 2 Social link Inference in Location-based check-in data 5 2.1 Background 5 2.2 Related Work 12 2.3 Location-based Social Network Service Data 15 2.4 Aspect-wise Graph Decomposition 18 2.5 Aspect-wise Graph learning 19 2.6 Inferring Social Relation from User Representation 21 2.7 Performance Analysis 23 2.8 Discussion and Implications 26 2.9 Summary 34 Chapter 3 Detecting collusiveness from reviews in Online platforms and its application 35 3.1 Background 35 3.2 Related Work 39 3.3 Online Review Data 43 3.4 Collusive Graph Projection 44 3.5 Reviewer Community Detection 47 3.6 Review Community feature extraction and spammer detection 51 3.7 Performance Analysis 53 3.8 Discussion and Implications 55 3.9 Summary 62 Chapter 4 Conclusion 63๋ฐ•

    Combating User Misbehavior on Social Media

    Get PDF
    Social media encourages user participation and facilitates userโ€™s self-expression like never before. While enriching user behavior in a spectrum of means, many social media platforms have become breeding grounds for user misbehavior. In this dissertation we focus on understanding and combating three specific threads of user misbehaviors that widely exist on social media โ€” spamming, manipulation, and distortion. First, we address the challenge of detecting spam links. Rather than rely on traditional blacklist-based or content-based methods, we examine the behavioral factors of both who is posting the link and who is clicking on the link. The core intuition is that these behavioral signals may be more difficult to manipulate than traditional signals. We find that this purely behavioral approach can achieve good performance for robust behavior-based spam link detection. Next, we deal with uncovering manipulated behavior of link sharing. We propose a four-phase approach to model, identify, characterize, and classify organic and organized groups who engage in link sharing. The key motivating insight is that group-level behavioral signals can distinguish manipulated user groups. We find that levels of organized behavior vary by link type and that the proposed approach achieves good performance measured by commonly-used metrics. Finally, we investigate a particular distortion behavior: making bullshit (BS) statements on social media. We explore the factors impacting the perception of BS and what leads users to ultimately perceive and call a post BS. We begin by preparing a crowdsourced collection of real social media posts that have been called BS. We then build a classification model that can determine what posts are more likely to be called BS. Our experiments suggest our classifier has the potential of leveraging linguistic cues for detecting social media posts that are likely to be called BS. We complement these three studies with a cross-cutting investigation of learning user topical profiles, which can shed light into what subjects each user is associated with, which can benefit the understanding of the connection between user and misbehavior. Concretely, we propose a unified model for learning user topical profiles that simultaneously considers multiple footprints and we show how these footprints can be embedded in a generalized optimization framework. Through extensive experiments on millions of real social media posts, we find our proposed models can effectively combat user misbehavior on social media

    Fake Review Detection using Data Mining

    Get PDF
    Online spam reviews are deceptive evaluations of products and services. They are often carried out as a deliberate manipulation strategy to deceive the readers. Recognizing such reviews is an important but challenging problem. In this work, I try to solve this problem by using different data mining techniques. I explore the strength and weakness of those data mining techniques in detecting fake review. I start with different supervised techniques such as Support Vector Ma- chine (SVM), Multinomial Naive Bayes (MNB), and Multilayer Perceptron. The results attest that all the above mentioned supervised techniques can successfully detect fake review with more than 86% accuracy. Then, I work on a semi-supervised technique which reduces the dimension- ality of the input features vector but offers similar performance to existing approaches. I use a combination of topic modeling and SVM for the implementation of the semi-supervised tech- nique. I also compare the results with other approaches that consider all the words of a dataset as input features. I found that topic words are enough as input features to get similar accuracy compared to other approaches where researchers consider all the words as input features. At the end, I propose an unsupervised learning approach named as Words Basket Analysis for fake re- view detection. I utilize five Amazon products review dataset for an experiment and report the performance of the proposed on these datasets
    • โ€ฆ
    corecore