632 research outputs found
The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race
Recent studies in social media spam and automation provide anecdotal
argumentation of the rise of a new generation of spambots, so-called social
spambots. Here, for the first time, we extensively study this novel phenomenon
on Twitter and we provide quantitative evidence that a paradigm-shift exists in
spambot design. First, we measure current Twitter's capabilities of detecting
the new social spambots. Later, we assess the human performance in
discriminating between genuine accounts, social spambots, and traditional
spambots. Then, we benchmark several state-of-the-art techniques proposed by
the academic literature. Results show that neither Twitter, nor humans, nor
cutting-edge applications are currently capable of accurately detecting the new
social spambots. Our results call for new approaches capable of turning the
tide in the fight against this raising phenomenon. We conclude by reviewing the
latest literature on spambots detection and we highlight an emerging common
research trend based on the analysis of collective behaviors. Insights derived
from both our extensive experimental campaign and survey shed light on the most
promising directions of research and lay the foundations for the arms race
against the novel social spambots. Finally, to foster research on this novel
phenomenon, we make publicly available to the scientific community all the
datasets used in this study.Comment: To appear in Proc. 26th WWW, 2017, Companion Volume (Web Science
Track, Perth, Australia, 3-7 April, 2017
Contextual Outlier Interpretation
Outlier detection plays an essential role in many data-driven applications to
identify isolated instances that are different from the majority. While many
statistical learning and data mining techniques have been used for developing
more effective outlier detection algorithms, the interpretation of detected
outliers does not receive much attention. Interpretation is becoming
increasingly important to help people trust and evaluate the developed models
through providing intrinsic reasons why the certain outliers are chosen. It is
difficult, if not impossible, to simply apply feature selection for explaining
outliers due to the distinct characteristics of various detection models,
complicated structures of data in certain applications, and imbalanced
distribution of outliers and normal instances. In addition, the role of
contrastive contexts where outliers locate, as well as the relation between
outliers and contexts, are usually overlooked in interpretation. To tackle the
issues above, in this paper, we propose a novel Contextual Outlier
INterpretation (COIN) method to explain the abnormality of existing outliers
spotted by detectors. The interpretability for an outlier is achieved from
three aspects: outlierness score, attributes that contribute to the
abnormality, and contextual description of its neighborhoods. Experimental
results on various types of datasets demonstrate the flexibility and
effectiveness of the proposed framework compared with existing interpretation
approaches
๊ฐ์ธ ์ฌํ๋ง ๋คํธ์ํฌ ๋ถ์ ๊ธฐ๋ฐ ์จ๋ผ์ธ ์ฌํ ๊ณต๊ฒฉ์ ํ์ง
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ,2020. 2. ๊น์ข
๊ถ.In the last decade we have witnessed the explosive growth of online social networking services (SNSs) such as Facebook, Twitter, Weibo and LinkedIn. While SNSs provide diverse benefits โ for example, fostering inter-personal relationships, community formations and news propagation, they also attracted uninvited nuiance. Spammers abuse SNSs as vehicles to spread spams rapidly and widely. Spams, unsolicited or inappropriate messages, significantly impair the credibility and reliability of services. Therefore, detecting spammers has become an urgent and critical issue in SNSs. This paper deals with spamming in Twitter and Weibo. Instead of spreading annoying messages to the public, a spammer follows (subscribes to) normal users, and followed a normal user. Sometimes a spammer makes link farm to increase target accounts explicit influence. Based on the assumption that the online relationships of spammers are different from those of normal users, I proposed classification schemes that detect online social attackers including spammers. I firstly focused on ego-network social relations and devised two features, structural features based on Triad Significance Profile (TSP) and relational semantic features based on hierarchical homophily in an ego-network. Experiments on real Twitter and Weibo datasets demonstrated that the proposed approach is very practical. The proposed features are scalable because instead of analyzing the whole network, they inspect user-centered ego-networks. My performance study showed that proposed methods yield significantly better performance than prior scheme in terms of true positives and false positives.์ต๊ทผ ์ฐ๋ฆฌ๋ Facebook, Twitter, Weibo, LinkedIn ๋ฑ์ ๋ค์ํ ์ฌํ ๊ด๊ณ๋ง ์๋น์ค๊ฐ ํญ๋ฐ์ ์ผ๋ก ์ฑ์ฅํ๋ ํ์์ ๋ชฉ๊ฒฉํ์๋ค. ํ์ง๋ง ์ฌํ ๊ด๊ณ๋ง ์๋น์ค๊ฐ ๊ฐ์ธ๊ณผ ๊ฐ์ธ๊ฐ์ ๊ด๊ณ ๋ฐ ์ปค๋ฎค๋ํฐ ํ์ฑ๊ณผ ๋ด์ค ์ ํ ๋ฑ์ ์ฌ๋ฌ ์ด์ ์ ์ ๊ณตํด ์ฃผ๊ณ ์๋๋ฐ ๋ฐํด ๋ฐ๊ฐ์ง ์์ ํ์ ์ญ์ ๋ฐ์ํ๊ณ ์๋ค. ์คํจ๋จธ๋ค์ ์ฌํ ๊ด๊ณ๋ง ์๋น์ค๋ฅผ ๋๋ ฅ ์ผ์ ์คํธ์ ๋งค์ฐ ๋น ๋ฅด๊ณ ๋๊ฒ ์ ํํ๋ ์์ผ๋ก ์
์ฉํ๊ณ ์๋ค. ์คํธ์ ์์ ์๊ฐ ์์น ์๋ ๋ฉ์์ง๋ค์ ์ผ์ปฝ๋๋ฐ ์ด๋ ์๋น์ค์ ์ ๋ขฐ๋์ ์์ ์ฑ์ ํฌ๊ฒ ์์์ํจ๋ค. ๋ฐ๋ผ์, ์คํจ๋จธ๋ฅผ ํ์งํ๋ ๊ฒ์ด ํ์ฌ ์์
๋ฏธ๋์ด์์ ๋งค์ฐ ๊ธด๊ธํ๊ณ ์ค์ํ ๋ฌธ์ ๊ฐ ๋์๋ค. ์ด ๋
ผ๋ฌธ์ ๋ํ์ ์ธ ์ฌํ ๊ด๊ณ๋ง ์๋น์ค๋ค ์ค Twitter์ Weibo์์ ๋ฐ์ํ๋ ์คํจ๋ฐ์ ๋ค๋ฃจ๊ณ ์๋ค. ์ด๋ฌํ ์ ํ์ ์คํจ๋ฐ๋ค์ ๋ถํน์ ๋ค์์๊ฒ ๋ฉ์์ง๋ฅผ ์ ํํ๋ ๋์ ์, ๋ง์ ์ผ๋ฐ ์ฌ์ฉ์๋ค์ 'ํ๋ก์ฐ(๊ตฌ๋
)'ํ๊ณ ์ด๋ค๋ก๋ถํฐ '๋ง ํ๋ก์(๋ง ๊ตฌ๋
)'์ ์ด๋์ด ๋ด๋ ๊ฒ์ ๋ชฉ์ ์ผ๋ก ํ๊ธฐ๋ ํ๋ค. ๋๋ก๋ link farm์ ์ด์ฉํด ํน์ ๊ณ์ ์ ํ๋ก์ ์๋ฅผ ๋์ด๊ณ ๋ช
์์ ์ํฅ๋ ฅ์ ์ฆ๊ฐ์ํค๊ธฐ๋ ํ๋ค. ์คํจ๋จธ์ ์จ๋ผ์ธ ๊ด๊ณ๋ง์ด ์ผ๋ฐ ์ฌ์ฉ์์ ์จ๋ผ์ธ ์ฌํ๋ง๊ณผ ๋ค๋ฅผ ๊ฒ์ด๋ผ๋ ๊ฐ์ ํ์, ๋๋ ์คํจ๋จธ๋ค์ ํฌํจํ ์ผ๋ฐ์ ์ธ ์จ๋ผ์ธ ์ฌํ๋ง ๊ณต๊ฒฉ์๋ค์ ํ์งํ๋ ๋ถ๋ฅ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ๋๋ ๋จผ์ ๊ฐ์ธ ์ฌํ๋ง ๋ด ์ฌํ ๊ด๊ณ์ ์ฃผ๋ชฉํ๊ณ ๋ ๊ฐ์ง ์ข
๋ฅ์ ๋ถ๋ฅ ํน์ฑ์ ์ ์ํ์๋ค. ์ด๋ค์ ๊ฐ์ธ ์ฌํ๋ง์ Triad Significance Profile (TSP)์ ๊ธฐ๋ฐํ ๊ตฌ์กฐ์ ํน์ฑ๊ณผ Hierarchical homophily์ ๊ธฐ๋ฐํ ๊ด๊ณ ์๋ฏธ์ ํน์ฑ์ด๋ค. ์ค์ Twitter์ Weibo ๋ฐ์ดํฐ์
์ ๋ํ ์คํ ๊ฒฐ๊ณผ๋ ์ ์ํ ๋ฐฉ๋ฒ์ด ๋งค์ฐ ์ค์ฉ์ ์ด๋ผ๋ ๊ฒ์ ๋ณด์ฌ์ค๋ค. ์ ์ํ ํน์ฑ๋ค์ ์ ์ฒด ๋คํธ์ํฌ๋ฅผ ๋ถ์ํ์ง ์์๋ ๊ฐ์ธ ์ฌํ๋ง๋ง ๋ถ์ํ๋ฉด ๋๊ธฐ ๋๋ฌธ์ scalableํ๊ฒ ์ธก์ ๋ ์ ์๋ค. ๋์ ์ฑ๋ฅ ๋ถ์ ๊ฒฐ๊ณผ๋ ์ ์ํ ๊ธฐ๋ฒ์ด ๊ธฐ์กด ๋ฐฉ๋ฒ์ ๋นํด true positive์ false positive ์ธก๋ฉด์์ ์ฐ์ํ๋ค๋ ๊ฒ์ ๋ณด์ฌ์ค๋ค.1 Introduction 1
2 Related Work 6
2.1 OSN Spammer Detection Approaches 6
2.1.1 Contents-based Approach 6
2.1.2 Social Network-based Approach 7
2.1.3 Subnetwork-based Approach 8
2.1.4 Behavior-based Approach 9
2.2 Link Spam Detection 10
2.3 Data mining schemes for Spammer Detection 10
2.4 Sybil Detection 12
3 Triad Significance Profile Analysis 14
3.1 Motivation 14
3.2 Twitter Dataset 18
3.3 Indegree and Outdegree of Dataset 20
3.4 Twitter spammer Detection with TSP 22
3.5 TSP-Filtering 27
3.6 Performance Evaluation of TSP-Filtering 29
4 Hierarchical Homophily Analysis 33
4.1 Motivation 33
4.2 Hierarchical Homophily in OSN 37
4.2.1 Basic Analysis of Datasets 39
4.2.2 Status gap distribution and Assortativity 44
4.2.3 Hierarchical gap distribution 49
4.3 Performance Evaluation of HH-Filtering 53
5 Overall Performance Evaluation 58
6 Conclusion 63
Bibliography 65Docto
Probabilistic Matching: Causal Inference under Measurement Errors
The abundance of data produced daily from large variety of sources has
boosted the need of novel approaches on causal inference analysis from
observational data. Observational data often contain noisy or missing entries.
Moreover, causal inference studies may require unobserved high-level
information which needs to be inferred from other observed attributes. In such
cases, inaccuracies of the applied inference methods will result in noisy
outputs. In this study, we propose a novel approach for causal inference when
one or more key variables are noisy. Our method utilizes the knowledge about
the uncertainty of the real values of key variables in order to reduce the bias
induced by noisy measurements. We evaluate our approach in comparison with
existing methods both on simulated and real scenarios and we demonstrate that
our method reduces the bias and avoids false causal inference conclusions in
most cases.Comment: In Proceedings of International Joint Conference Of Neural Networks
(IJCNN) 201
Bot Spammer Detection in Twitter Using Tweet Similarity and TIME Interval Entropy
The popularity of Twitter has attracted spammers to disseminate large amount of spam messages. Preliminary studies had shown that most spam messages were produced automatically by bot. Therefore bot spammer detection can reduce the number of spam messages in Twitter significantly. However, to the best of our knowledge, few researches have focused in detecting Twitter bot spammer. Thus, this paper proposes a novel approach to differentiate between bot spammer and legitimate user accounts using time interval entropy and tweet similarity. Timestamp collections are utilized to calculate the time interval entropy of each user. Uni-gram matching-based similarity will be used to calculate tweet similarity. Datasets are crawled from Twitter containing both normal and spammer accounts. Experimental results showed that legitimate user may exhibit regular behavior in posting tweet as bot spammer. Several legitimate users are also detected to post similar tweets. Therefore it is less optimal to detect bot spammer using one of those features only. However, combination of both features gives better classification result. Precision, recall, and f-measure of the proposed method reached 85,71%, 94,74% and 90% respectively. It outperforms precision, recall, and f-measure of method which only uses either time interval entropy or tweet similarity
Leveraging Multi-level Dependency of Relational Sequences for Social Spammer Detection
Much recent research has shed light on the development of the
relation-dependent but content-independent framework for social spammer
detection. This is largely because the relation among users is difficult to be
altered when spammers attempt to conceal their malicious intents. Our study
investigates the spammer detection problem in the context of multi-relation
social networks, and makes an attempt to fully exploit the sequences of
heterogeneous relations for enhancing the detection accuracy. Specifically, we
present the Multi-level Dependency Model (MDM). The MDM is able to exploit
user's long-term dependency hidden in their relational sequences along with
short-term dependency. Moreover, MDM fully considers short-term relational
sequences from the perspectives of individual-level and union-level, due to the
fact that the type of short-term sequences is multi-folds. Experimental results
on a real-world multi-relational social network demonstrate the effectiveness
of our proposed MDM on multi-relational social spammer detection
- โฆ