632 research outputs found

    The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race

    Full text link
    Recent studies in social media spam and automation provide anecdotal argumentation of the rise of a new generation of spambots, so-called social spambots. Here, for the first time, we extensively study this novel phenomenon on Twitter and we provide quantitative evidence that a paradigm-shift exists in spambot design. First, we measure current Twitter's capabilities of detecting the new social spambots. Later, we assess the human performance in discriminating between genuine accounts, social spambots, and traditional spambots. Then, we benchmark several state-of-the-art techniques proposed by the academic literature. Results show that neither Twitter, nor humans, nor cutting-edge applications are currently capable of accurately detecting the new social spambots. Our results call for new approaches capable of turning the tide in the fight against this raising phenomenon. We conclude by reviewing the latest literature on spambots detection and we highlight an emerging common research trend based on the analysis of collective behaviors. Insights derived from both our extensive experimental campaign and survey shed light on the most promising directions of research and lay the foundations for the arms race against the novel social spambots. Finally, to foster research on this novel phenomenon, we make publicly available to the scientific community all the datasets used in this study.Comment: To appear in Proc. 26th WWW, 2017, Companion Volume (Web Science Track, Perth, Australia, 3-7 April, 2017

    Contextual Outlier Interpretation

    Full text link
    Outlier detection plays an essential role in many data-driven applications to identify isolated instances that are different from the majority. While many statistical learning and data mining techniques have been used for developing more effective outlier detection algorithms, the interpretation of detected outliers does not receive much attention. Interpretation is becoming increasingly important to help people trust and evaluate the developed models through providing intrinsic reasons why the certain outliers are chosen. It is difficult, if not impossible, to simply apply feature selection for explaining outliers due to the distinct characteristics of various detection models, complicated structures of data in certain applications, and imbalanced distribution of outliers and normal instances. In addition, the role of contrastive contexts where outliers locate, as well as the relation between outliers and contexts, are usually overlooked in interpretation. To tackle the issues above, in this paper, we propose a novel Contextual Outlier INterpretation (COIN) method to explain the abnormality of existing outliers spotted by detectors. The interpretability for an outlier is achieved from three aspects: outlierness score, attributes that contribute to the abnormality, and contextual description of its neighborhoods. Experimental results on various types of datasets demonstrate the flexibility and effectiveness of the proposed framework compared with existing interpretation approaches

    ๊ฐœ์ธ ์‚ฌํšŒ๋ง ๋„คํŠธ์›Œํฌ ๋ถ„์„ ๊ธฐ๋ฐ˜ ์˜จ๋ผ์ธ ์‚ฌํšŒ ๊ณต๊ฒฉ์ž ํƒ์ง€

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ๊น€์ข…๊ถŒ.In the last decade we have witnessed the explosive growth of online social networking services (SNSs) such as Facebook, Twitter, Weibo and LinkedIn. While SNSs provide diverse benefits โ€“ for example, fostering inter-personal relationships, community formations and news propagation, they also attracted uninvited nuiance. Spammers abuse SNSs as vehicles to spread spams rapidly and widely. Spams, unsolicited or inappropriate messages, significantly impair the credibility and reliability of services. Therefore, detecting spammers has become an urgent and critical issue in SNSs. This paper deals with spamming in Twitter and Weibo. Instead of spreading annoying messages to the public, a spammer follows (subscribes to) normal users, and followed a normal user. Sometimes a spammer makes link farm to increase target accounts explicit influence. Based on the assumption that the online relationships of spammers are different from those of normal users, I proposed classification schemes that detect online social attackers including spammers. I firstly focused on ego-network social relations and devised two features, structural features based on Triad Significance Profile (TSP) and relational semantic features based on hierarchical homophily in an ego-network. Experiments on real Twitter and Weibo datasets demonstrated that the proposed approach is very practical. The proposed features are scalable because instead of analyzing the whole network, they inspect user-centered ego-networks. My performance study showed that proposed methods yield significantly better performance than prior scheme in terms of true positives and false positives.์ตœ๊ทผ ์šฐ๋ฆฌ๋Š” Facebook, Twitter, Weibo, LinkedIn ๋“ฑ์˜ ๋‹ค์–‘ํ•œ ์‚ฌํšŒ ๊ด€๊ณ„๋ง ์„œ๋น„์Šค๊ฐ€ ํญ๋ฐœ์ ์œผ๋กœ ์„ฑ์žฅํ•˜๋Š” ํ˜„์ƒ์„ ๋ชฉ๊ฒฉํ•˜์˜€๋‹ค. ํ•˜์ง€๋งŒ ์‚ฌํšŒ ๊ด€๊ณ„๋ง ์„œ๋น„์Šค๊ฐ€ ๊ฐœ์ธ๊ณผ ๊ฐœ์ธ๊ฐ„์˜ ๊ด€๊ณ„ ๋ฐ ์ปค๋ฎค๋‹ˆํ‹ฐ ํ˜•์„ฑ๊ณผ ๋‰ด์Šค ์ „ํŒŒ ๋“ฑ์˜ ์—ฌ๋Ÿฌ ์ด์ ์„ ์ œ๊ณตํ•ด ์ฃผ๊ณ  ์žˆ๋Š”๋ฐ ๋ฐ˜ํ•ด ๋ฐ˜๊ฐ‘์ง€ ์•Š์€ ํ˜„์ƒ ์—ญ์‹œ ๋ฐœ์ƒํ•˜๊ณ  ์žˆ๋‹ค. ์ŠคํŒจ๋จธ๋“ค์€ ์‚ฌํšŒ ๊ด€๊ณ„๋ง ์„œ๋น„์Šค๋ฅผ ๋™๋ ฅ ์‚ผ์•„ ์ŠคํŒธ์„ ๋งค์šฐ ๋น ๋ฅด๊ณ  ๋„“๊ฒŒ ์ „ํŒŒํ•˜๋Š” ์‹์œผ๋กœ ์•…์šฉํ•˜๊ณ  ์žˆ๋‹ค. ์ŠคํŒธ์€ ์ˆ˜์‹ ์ž๊ฐ€ ์›์น˜ ์•Š๋Š” ๋ฉ”์‹œ์ง€๋“ค์„ ์ผ์ปฝ๋Š”๋ฐ ์ด๋Š” ์„œ๋น„์Šค์˜ ์‹ ๋ขฐ๋„์™€ ์•ˆ์ •์„ฑ์„ ํฌ๊ฒŒ ์†์ƒ์‹œํ‚จ๋‹ค. ๋”ฐ๋ผ์„œ, ์ŠคํŒจ๋จธ๋ฅผ ํƒ์ง€ํ•˜๋Š” ๊ฒƒ์ด ํ˜„์žฌ ์†Œ์…œ ๋ฏธ๋””์–ด์—์„œ ๋งค์šฐ ๊ธด๊ธ‰ํ•˜๊ณ  ์ค‘์š”ํ•œ ๋ฌธ์ œ๊ฐ€ ๋˜์—ˆ๋‹ค. ์ด ๋…ผ๋ฌธ์€ ๋Œ€ํ‘œ์ ์ธ ์‚ฌํšŒ ๊ด€๊ณ„๋ง ์„œ๋น„์Šค๋“ค ์ค‘ Twitter์™€ Weibo์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์ŠคํŒจ๋ฐ์„ ๋‹ค๋ฃจ๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฌํ•œ ์œ ํ˜•์˜ ์ŠคํŒจ๋ฐ๋“ค์€ ๋ถˆํŠน์ • ๋‹ค์ˆ˜์—๊ฒŒ ๋ฉ”์‹œ์ง€๋ฅผ ์ „ํŒŒํ•˜๋Š” ๋Œ€์‹ ์—, ๋งŽ์€ ์ผ๋ฐ˜ ์‚ฌ์šฉ์ž๋“ค์„ 'ํŒ”๋กœ์šฐ(๊ตฌ๋…)'ํ•˜๊ณ  ์ด๋“ค๋กœ๋ถ€ํ„ฐ '๋งž ํŒ”๋กœ์ž‰(๋งž ๊ตฌ๋…)'์„ ์ด๋Œ์–ด ๋‚ด๋Š” ๊ฒƒ์„ ๋ชฉ์ ์œผ๋กœ ํ•˜๊ธฐ๋„ ํ•œ๋‹ค. ๋•Œ๋กœ๋Š” link farm์„ ์ด์šฉํ•ด ํŠน์ • ๊ณ„์ •์˜ ํŒ”๋กœ์›Œ ์ˆ˜๋ฅผ ๋†’์ด๊ณ  ๋ช…์‹œ์  ์˜ํ–ฅ๋ ฅ์„ ์ฆ๊ฐ€์‹œํ‚ค๊ธฐ๋„ ํ•œ๋‹ค. ์ŠคํŒจ๋จธ์˜ ์˜จ๋ผ์ธ ๊ด€๊ณ„๋ง์ด ์ผ๋ฐ˜ ์‚ฌ์šฉ์ž์˜ ์˜จ๋ผ์ธ ์‚ฌํšŒ๋ง๊ณผ ๋‹ค๋ฅผ ๊ฒƒ์ด๋ผ๋Š” ๊ฐ€์ • ํ•˜์—, ๋‚˜๋Š” ์ŠคํŒจ๋จธ๋“ค์„ ํฌํ•จํ•œ ์ผ๋ฐ˜์ ์ธ ์˜จ๋ผ์ธ ์‚ฌํšŒ๋ง ๊ณต๊ฒฉ์ž๋“ค์„ ํƒ์ง€ํ•˜๋Š” ๋ถ„๋ฅ˜ ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ๋‚˜๋Š” ๋จผ์ € ๊ฐœ์ธ ์‚ฌํšŒ๋ง ๋‚ด ์‚ฌํšŒ ๊ด€๊ณ„์— ์ฃผ๋ชฉํ•˜๊ณ  ๋‘ ๊ฐ€์ง€ ์ข…๋ฅ˜์˜ ๋ถ„๋ฅ˜ ํŠน์„ฑ์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ์ด๋“ค์€ ๊ฐœ์ธ ์‚ฌํšŒ๋ง์˜ Triad Significance Profile (TSP)์— ๊ธฐ๋ฐ˜ํ•œ ๊ตฌ์กฐ์  ํŠน์„ฑ๊ณผ Hierarchical homophily์— ๊ธฐ๋ฐ˜ํ•œ ๊ด€๊ณ„ ์˜๋ฏธ์  ํŠน์„ฑ์ด๋‹ค. ์‹ค์ œ Twitter์™€ Weibo ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•์ด ๋งค์šฐ ์‹ค์šฉ์ ์ด๋ผ๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค. ์ œ์•ˆํ•œ ํŠน์„ฑ๋“ค์€ ์ „์ฒด ๋„คํŠธ์›Œํฌ๋ฅผ ๋ถ„์„ํ•˜์ง€ ์•Š์•„๋„ ๊ฐœ์ธ ์‚ฌํšŒ๋ง๋งŒ ๋ถ„์„ํ•˜๋ฉด ๋˜๊ธฐ ๋•Œ๋ฌธ์— scalableํ•˜๊ฒŒ ์ธก์ •๋  ์ˆ˜ ์žˆ๋‹ค. ๋‚˜์˜ ์„ฑ๋Šฅ ๋ถ„์„ ๊ฒฐ๊ณผ๋Š” ์ œ์•ˆํ•œ ๊ธฐ๋ฒ•์ด ๊ธฐ์กด ๋ฐฉ๋ฒ•์— ๋น„ํ•ด true positive์™€ false positive ์ธก๋ฉด์—์„œ ์šฐ์ˆ˜ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€๋‹ค.1 Introduction 1 2 Related Work 6 2.1 OSN Spammer Detection Approaches 6 2.1.1 Contents-based Approach 6 2.1.2 Social Network-based Approach 7 2.1.3 Subnetwork-based Approach 8 2.1.4 Behavior-based Approach 9 2.2 Link Spam Detection 10 2.3 Data mining schemes for Spammer Detection 10 2.4 Sybil Detection 12 3 Triad Significance Profile Analysis 14 3.1 Motivation 14 3.2 Twitter Dataset 18 3.3 Indegree and Outdegree of Dataset 20 3.4 Twitter spammer Detection with TSP 22 3.5 TSP-Filtering 27 3.6 Performance Evaluation of TSP-Filtering 29 4 Hierarchical Homophily Analysis 33 4.1 Motivation 33 4.2 Hierarchical Homophily in OSN 37 4.2.1 Basic Analysis of Datasets 39 4.2.2 Status gap distribution and Assortativity 44 4.2.3 Hierarchical gap distribution 49 4.3 Performance Evaluation of HH-Filtering 53 5 Overall Performance Evaluation 58 6 Conclusion 63 Bibliography 65Docto

    Probabilistic Matching: Causal Inference under Measurement Errors

    Get PDF
    The abundance of data produced daily from large variety of sources has boosted the need of novel approaches on causal inference analysis from observational data. Observational data often contain noisy or missing entries. Moreover, causal inference studies may require unobserved high-level information which needs to be inferred from other observed attributes. In such cases, inaccuracies of the applied inference methods will result in noisy outputs. In this study, we propose a novel approach for causal inference when one or more key variables are noisy. Our method utilizes the knowledge about the uncertainty of the real values of key variables in order to reduce the bias induced by noisy measurements. We evaluate our approach in comparison with existing methods both on simulated and real scenarios and we demonstrate that our method reduces the bias and avoids false causal inference conclusions in most cases.Comment: In Proceedings of International Joint Conference Of Neural Networks (IJCNN) 201

    Bot Spammer Detection in Twitter Using Tweet Similarity and TIME Interval Entropy

    Get PDF
    The popularity of Twitter has attracted spammers to disseminate large amount of spam messages. Preliminary studies had shown that most spam messages were produced automatically by bot. Therefore bot spammer detection can reduce the number of spam messages in Twitter significantly. However, to the best of our knowledge, few researches have focused in detecting Twitter bot spammer. Thus, this paper proposes a novel approach to differentiate between bot spammer and legitimate user accounts using time interval entropy and tweet similarity. Timestamp collections are utilized to calculate the time interval entropy of each user. Uni-gram matching-based similarity will be used to calculate tweet similarity. Datasets are crawled from Twitter containing both normal and spammer accounts. Experimental results showed that legitimate user may exhibit regular behavior in posting tweet as bot spammer. Several legitimate users are also detected to post similar tweets. Therefore it is less optimal to detect bot spammer using one of those features only. However, combination of both features gives better classification result. Precision, recall, and f-measure of the proposed method reached 85,71%, 94,74% and 90% respectively. It outperforms precision, recall, and f-measure of method which only uses either time interval entropy or tweet similarity

    Leveraging Multi-level Dependency of Relational Sequences for Social Spammer Detection

    Full text link
    Much recent research has shed light on the development of the relation-dependent but content-independent framework for social spammer detection. This is largely because the relation among users is difficult to be altered when spammers attempt to conceal their malicious intents. Our study investigates the spammer detection problem in the context of multi-relation social networks, and makes an attempt to fully exploit the sequences of heterogeneous relations for enhancing the detection accuracy. Specifically, we present the Multi-level Dependency Model (MDM). The MDM is able to exploit user's long-term dependency hidden in their relational sequences along with short-term dependency. Moreover, MDM fully considers short-term relational sequences from the perspectives of individual-level and union-level, due to the fact that the type of short-term sequences is multi-folds. Experimental results on a real-world multi-relational social network demonstrate the effectiveness of our proposed MDM on multi-relational social spammer detection
    • โ€ฆ
    corecore