359 research outputs found
Detect Spammers in Online Social Networks
Fake followers in online social networks (OSNs) are the accounts that are created to boost the rank of some targets. These spammers can be generated by programs or human beings, making them hard to identify. In this thesis, we propose a novel spammer detection method by detecting near-duplicate accounts who share most of the followers. It is hard to discover such near-duplicates on large social networks that provide limited remote access. We identify the near-duplicates and the corresponding spammers by estimating the Jaccard similarity using star sampling, a combination of uniform random sampling and breadth-first crawling. Then we applied our methods in Sina Weibo and Twitter. For Weibo, we find 395 near-duplicates, 12 millions suspected spammers and 741 millions spam links. In Twitter, we find 129 near-duplicates, 4.93 million suspected spammers and 2.608 billion spam links. Moreover, we cluster the near-duplicates and the corresponding spammers, and analyze the properties of each group
Look behind the Censorship: Reposting-User Characterization and Muted-Topic Restoration
The emergence of social media has largely eased the way people receive
information and participate in public discussions. However, in countries with
strict regulations on discussions in the public space, social media is no
exception. To limit the degree of dissent or inhibit the spread of "harmful"
information, a common approach is to impose information operations such as
censorship/suspension on social media. In this paper, we focus on a study of
censorship on Weibo, the counterpart of Twitter in China. Specifically, we 1)
create a web-scraping pipeline and collect a large dataset solely focus on the
reposts from Weibo; 2) discover the characteristics of users whose reposts
contain censored information, in terms of gender, device, and account type; and
3) conduct a thematic analysis by extracting and analyzing topic information.
Note that although the original posts are no longer visible, we can use
comments users wrote when reposting the original post to infer the topic of the
original content. We find that such efforts can recover the discussions around
social events that triggered massive discussions but were later muted. Further,
we show the variations of inferred topics across different user groups and time
frames.Comment: Accepted for publication in Proceedings of the International Workshop
on Social Sensing (SocialSens 2022): Special Edition on Belief Dynamics, 202
Follow Whom? Chinese Users Have Different Choice
Sina Weibo, which was launched in 2009, is the most popular Chinese
micro-blogging service. It has been reported that Sina Weibo has more than 400
million registered users by the end of the third quarter in 2012. Sina Weibo
and Twitter have a lot in common, however, in terms of the following
preference, Sina Weibo users, most of whom are Chinese, behave differently
compared with those of Twitter.
This work is based on a data set of Sina Weibo which contains 80.8 million
users' profiles and 7.2 billion relations and a large data set of Twitter.
Firstly some basic features of Sina Weibo and Twitter are analyzed such as
degree and activeness distribution, correlation between degree and activeness,
and the degree of separation. Then the following preference is investigated by
studying the assortative mixing, friend similarities, following distribution,
edge balance ratio, and ranking correlation, where edge balance ratio is newly
proposed to measure balance property of graphs. It is found that Sina Weibo has
a lower reciprocity rate, more positive balanced relations and is more
disassortative. Coinciding with Asian traditional culture, the following
preference of Sina Weibo users is more concentrated and hierarchical: they are
more likely to follow people at higher or the same social levels and less
likely to follow people lower than themselves. In contrast, the same kind of
following preference is weaker in Twitter. Twitter users are open as they
follow people from levels, which accords with its global characteristic and the
prevalence of western civilization. The message forwarding behavior is studied
by displaying the propagation levels, delays, and critical users. The following
preference derives from not only the usage habits but also underlying reasons
such as personalities and social moralities that is worthy of future research.Comment: 9 pages, 13 figure
๊ฐ์ธ ์ฌํ๋ง ๋คํธ์ํฌ ๋ถ์ ๊ธฐ๋ฐ ์จ๋ผ์ธ ์ฌํ ๊ณต๊ฒฉ์ ํ์ง
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ)--์์ธ๋ํ๊ต ๋ํ์ :๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ,2020. 2. ๊น์ข
๊ถ.In the last decade we have witnessed the explosive growth of online social networking services (SNSs) such as Facebook, Twitter, Weibo and LinkedIn. While SNSs provide diverse benefits โ for example, fostering inter-personal relationships, community formations and news propagation, they also attracted uninvited nuiance. Spammers abuse SNSs as vehicles to spread spams rapidly and widely. Spams, unsolicited or inappropriate messages, significantly impair the credibility and reliability of services. Therefore, detecting spammers has become an urgent and critical issue in SNSs. This paper deals with spamming in Twitter and Weibo. Instead of spreading annoying messages to the public, a spammer follows (subscribes to) normal users, and followed a normal user. Sometimes a spammer makes link farm to increase target accounts explicit influence. Based on the assumption that the online relationships of spammers are different from those of normal users, I proposed classification schemes that detect online social attackers including spammers. I firstly focused on ego-network social relations and devised two features, structural features based on Triad Significance Profile (TSP) and relational semantic features based on hierarchical homophily in an ego-network. Experiments on real Twitter and Weibo datasets demonstrated that the proposed approach is very practical. The proposed features are scalable because instead of analyzing the whole network, they inspect user-centered ego-networks. My performance study showed that proposed methods yield significantly better performance than prior scheme in terms of true positives and false positives.์ต๊ทผ ์ฐ๋ฆฌ๋ Facebook, Twitter, Weibo, LinkedIn ๋ฑ์ ๋ค์ํ ์ฌํ ๊ด๊ณ๋ง ์๋น์ค๊ฐ ํญ๋ฐ์ ์ผ๋ก ์ฑ์ฅํ๋ ํ์์ ๋ชฉ๊ฒฉํ์๋ค. ํ์ง๋ง ์ฌํ ๊ด๊ณ๋ง ์๋น์ค๊ฐ ๊ฐ์ธ๊ณผ ๊ฐ์ธ๊ฐ์ ๊ด๊ณ ๋ฐ ์ปค๋ฎค๋ํฐ ํ์ฑ๊ณผ ๋ด์ค ์ ํ ๋ฑ์ ์ฌ๋ฌ ์ด์ ์ ์ ๊ณตํด ์ฃผ๊ณ ์๋๋ฐ ๋ฐํด ๋ฐ๊ฐ์ง ์์ ํ์ ์ญ์ ๋ฐ์ํ๊ณ ์๋ค. ์คํจ๋จธ๋ค์ ์ฌํ ๊ด๊ณ๋ง ์๋น์ค๋ฅผ ๋๋ ฅ ์ผ์ ์คํธ์ ๋งค์ฐ ๋น ๋ฅด๊ณ ๋๊ฒ ์ ํํ๋ ์์ผ๋ก ์
์ฉํ๊ณ ์๋ค. ์คํธ์ ์์ ์๊ฐ ์์น ์๋ ๋ฉ์์ง๋ค์ ์ผ์ปฝ๋๋ฐ ์ด๋ ์๋น์ค์ ์ ๋ขฐ๋์ ์์ ์ฑ์ ํฌ๊ฒ ์์์ํจ๋ค. ๋ฐ๋ผ์, ์คํจ๋จธ๋ฅผ ํ์งํ๋ ๊ฒ์ด ํ์ฌ ์์
๋ฏธ๋์ด์์ ๋งค์ฐ ๊ธด๊ธํ๊ณ ์ค์ํ ๋ฌธ์ ๊ฐ ๋์๋ค. ์ด ๋
ผ๋ฌธ์ ๋ํ์ ์ธ ์ฌํ ๊ด๊ณ๋ง ์๋น์ค๋ค ์ค Twitter์ Weibo์์ ๋ฐ์ํ๋ ์คํจ๋ฐ์ ๋ค๋ฃจ๊ณ ์๋ค. ์ด๋ฌํ ์ ํ์ ์คํจ๋ฐ๋ค์ ๋ถํน์ ๋ค์์๊ฒ ๋ฉ์์ง๋ฅผ ์ ํํ๋ ๋์ ์, ๋ง์ ์ผ๋ฐ ์ฌ์ฉ์๋ค์ 'ํ๋ก์ฐ(๊ตฌ๋
)'ํ๊ณ ์ด๋ค๋ก๋ถํฐ '๋ง ํ๋ก์(๋ง ๊ตฌ๋
)'์ ์ด๋์ด ๋ด๋ ๊ฒ์ ๋ชฉ์ ์ผ๋ก ํ๊ธฐ๋ ํ๋ค. ๋๋ก๋ link farm์ ์ด์ฉํด ํน์ ๊ณ์ ์ ํ๋ก์ ์๋ฅผ ๋์ด๊ณ ๋ช
์์ ์ํฅ๋ ฅ์ ์ฆ๊ฐ์ํค๊ธฐ๋ ํ๋ค. ์คํจ๋จธ์ ์จ๋ผ์ธ ๊ด๊ณ๋ง์ด ์ผ๋ฐ ์ฌ์ฉ์์ ์จ๋ผ์ธ ์ฌํ๋ง๊ณผ ๋ค๋ฅผ ๊ฒ์ด๋ผ๋ ๊ฐ์ ํ์, ๋๋ ์คํจ๋จธ๋ค์ ํฌํจํ ์ผ๋ฐ์ ์ธ ์จ๋ผ์ธ ์ฌํ๋ง ๊ณต๊ฒฉ์๋ค์ ํ์งํ๋ ๋ถ๋ฅ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. ๋๋ ๋จผ์ ๊ฐ์ธ ์ฌํ๋ง ๋ด ์ฌํ ๊ด๊ณ์ ์ฃผ๋ชฉํ๊ณ ๋ ๊ฐ์ง ์ข
๋ฅ์ ๋ถ๋ฅ ํน์ฑ์ ์ ์ํ์๋ค. ์ด๋ค์ ๊ฐ์ธ ์ฌํ๋ง์ Triad Significance Profile (TSP)์ ๊ธฐ๋ฐํ ๊ตฌ์กฐ์ ํน์ฑ๊ณผ Hierarchical homophily์ ๊ธฐ๋ฐํ ๊ด๊ณ ์๋ฏธ์ ํน์ฑ์ด๋ค. ์ค์ Twitter์ Weibo ๋ฐ์ดํฐ์
์ ๋ํ ์คํ ๊ฒฐ๊ณผ๋ ์ ์ํ ๋ฐฉ๋ฒ์ด ๋งค์ฐ ์ค์ฉ์ ์ด๋ผ๋ ๊ฒ์ ๋ณด์ฌ์ค๋ค. ์ ์ํ ํน์ฑ๋ค์ ์ ์ฒด ๋คํธ์ํฌ๋ฅผ ๋ถ์ํ์ง ์์๋ ๊ฐ์ธ ์ฌํ๋ง๋ง ๋ถ์ํ๋ฉด ๋๊ธฐ ๋๋ฌธ์ scalableํ๊ฒ ์ธก์ ๋ ์ ์๋ค. ๋์ ์ฑ๋ฅ ๋ถ์ ๊ฒฐ๊ณผ๋ ์ ์ํ ๊ธฐ๋ฒ์ด ๊ธฐ์กด ๋ฐฉ๋ฒ์ ๋นํด true positive์ false positive ์ธก๋ฉด์์ ์ฐ์ํ๋ค๋ ๊ฒ์ ๋ณด์ฌ์ค๋ค.1 Introduction 1
2 Related Work 6
2.1 OSN Spammer Detection Approaches 6
2.1.1 Contents-based Approach 6
2.1.2 Social Network-based Approach 7
2.1.3 Subnetwork-based Approach 8
2.1.4 Behavior-based Approach 9
2.2 Link Spam Detection 10
2.3 Data mining schemes for Spammer Detection 10
2.4 Sybil Detection 12
3 Triad Significance Profile Analysis 14
3.1 Motivation 14
3.2 Twitter Dataset 18
3.3 Indegree and Outdegree of Dataset 20
3.4 Twitter spammer Detection with TSP 22
3.5 TSP-Filtering 27
3.6 Performance Evaluation of TSP-Filtering 29
4 Hierarchical Homophily Analysis 33
4.1 Motivation 33
4.2 Hierarchical Homophily in OSN 37
4.2.1 Basic Analysis of Datasets 39
4.2.2 Status gap distribution and Assortativity 44
4.2.3 Hierarchical gap distribution 49
4.3 Performance Evaluation of HH-Filtering 53
5 Overall Performance Evaluation 58
6 Conclusion 63
Bibliography 65Docto
The Web of False Information: Rumors, Fake News, Hoaxes, Clickbait, and Various Other Shenanigans
A new era of Information Warfare has arrived. Various actors, including
state-sponsored ones, are weaponizing information on Online Social Networks to
run false information campaigns with targeted manipulation of public opinion on
specific topics. These false information campaigns can have dire consequences
to the public: mutating their opinions and actions, especially with respect to
critical world events like major elections. Evidently, the problem of false
information on the Web is a crucial one, and needs increased public awareness,
as well as immediate attention from law enforcement agencies, public
institutions, and in particular, the research community. In this paper, we make
a step in this direction by providing a typology of the Web's false information
ecosystem, comprising various types of false information, actors, and their
motives. We report a comprehensive overview of existing research on the false
information ecosystem by identifying several lines of work: 1) how the public
perceives false information; 2) understanding the propagation of false
information; 3) detecting and containing false information on the Web; and 4)
false information on the political stage. In this work, we pay particular
attention to political false information as: 1) it can have dire consequences
to the community (e.g., when election results are mutated) and 2) previous work
show that this type of false information propagates faster and further when
compared to other types of false information. Finally, for each of these lines
of work, we report several future research directions that can help us better
understand and mitigate the emerging problem of false information dissemination
on the Web
- โฆ