13 research outputs found
Detecting collusive spamming activities in community question answering
Community Question Answering (CQA) portals provide rich sources of information on a variety of topics. However, the authenticity and quality of questions and answers (Q&As) has proven hard to control. In a troubling direction, the widespread growth of crowdsourcing websites has created a large-scale, potentially difficult-to-detect workforce to manipulate malicious contents in CQA. The crowd workers who join the same crowdsourcing task about promotion campaigns in CQA collusively manipulate deceptive Q&As for promoting a target (product or service). The collusive spamming group can fully control the sentiment of the target. How to utilize the structure and the attributes for detecting manipulated Q&As? How to detect the collusive group and leverage the group information for the detection task?
To shed light on these research questions, we propose a unified framework to tackle the challenge of detecting collusive spamming activities of CQA. First, we interpret the questions and answers in CQA as two independent networks. Second, we detect collusive question groups and answer groups from these two networks respectively by measuring the similarity of the contents posted within a short duration. Third, using attributes (individual-level and group-level) and correlations (user-based and content-based), we proposed a combined factor graph model to detect deceptive Q&As simultaneously by combining two independent factor graphs. With a large-scale practical data set, we find that the proposed framework can detect deceptive contents at early stage, and outperforms a number of competitive baselines
์์ ๋คํธ์ํฌ์ ์ด์ปค๋จธ์ค ํ๋ซํผ์์์ ์ ์ฌ ๋คํธ์ํฌ ๋ง์ด๋
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ, 2023. 2. ๊ถํ๊ฒฝ.์น ๊ธฐ๋ฐ ์๋น์ค์ ํญ๋ฐ์ ์ธ ๋ฐ๋ฌ๋ก ์ฌ์ฉ์๋ค์ ์จ๋ผ์ธ ์์์ ํญ๋๊ฒ ์ฐ๊ฒฐ๋๊ณ ์๋ค. ์จ๋ผ์ธ ํ๋ซํผ ์์์, ์ฌ์ฉ์๋ค์ ์๋ก์๊ฒ ์ํฅ์ ์ฃผ๊ณ ๋ฐ์ผ๋ฉฐ ์์ฌ ๊ฒฐ์ ์ ๊ทธ๋ค์ ๊ฒฝํ๊ณผ ์๊ฒฌ์ ๋ฐ์ํ๋ ๊ฒฝํฅ์ ๋ณด์ธ๋ค. ๋ณธ ํ์ ๋
ผ๋ฌธ์์๋ ๋ํ์ ์ธ ์จ๋ผ์ธ ํ๋ซํผ์ธ ์์
๋คํธ์ํฌ ์๋น์ค์ ์ด์ปค๋จธ์ค ํ๋ซํผ์์์ ์ฌ์ฉ์ ํ๋์ ๋ํด ์ฐ๊ตฌํ์๋ค.
์จ๋ผ์ธ ํ๋ซํผ์์์ ์ฌ์ฉ์ ํ๋์ ์ฌ์ฉ์์ ํ๋ซํผ ๊ตฌ์ฑ ์์ ๊ฐ์ ๊ด๊ณ๋ก ํํํ ์ ์๋ค. ์ฌ์ฉ์์ ๊ตฌ๋งค๋ ์ฌ์ฉ์์ ์ํ ๊ฐ์ ๊ด๊ณ๋ก, ์ฌ์ฉ์์ ์ฒดํฌ์ธ์ ์ฌ์ฉ์์ ์ฅ์ ๊ฐ์ ๊ด๊ณ๋ก ๋ํ๋ด์ง๋ค. ์ฌ๊ธฐ์ ํ๋์ ์๊ฐ๊ณผ ๋ ์ดํ
, ํ๊ทธ ๋ฑ์ ์ ๋ณด๊ฐ ํฌํจ๋ ์ ์๋ค.
๋ณธ ์ฐ๊ตฌ์์๋ ๋ ํ๋ซํผ์์ ์ ์๋ ์ฌ์ฉ์์ ํ๋ ๊ทธ๋ํ์ ์ํฅ์ ๋ฏธ์น๋ ์ ์ฌ ๋คํธ์ํฌ๋ฅผ ํ์
ํ๋ ์ฐ๊ตฌ๋ฅผ ์ ์ํ๋ค. ์์น ๊ธฐ๋ฐ์ ์์
๋คํธ์ํฌ ์๋น์ค์ ๊ฒฝ์ฐ ํน์ ์ฅ์์ ๋ฐฉ๋ฌธํ๋ ์ฒดํฌ์ธ ํ์์ผ๋ก ๋ง์ ํฌ์คํธ๊ฐ ๋ง๋ค์ด์ง๋๋ฐ, ์ฌ์ฉ์์ ์ฅ์ ๋ฐฉ๋ฌธ์ ์ฌ์ฉ์ ๊ฐ์ ์ฌ์ ์ ์กด์ฌํ๋ ์น๊ตฌ ๊ด๊ณ์ ์ํด ์ํฅ์ ํฌ๊ฒ ๋ฐ๋๋ค. ์ฌ์ฉ์ ํ๋ ๋คํธ์ํฌ์ ์ ๋ณ์ ์ ์ฌ๋ ์ฌ์ฉ์ ๊ฐ์ ๊ด๊ณ๋ฅผ ํ์
ํ๋ ๊ฒ์ ํ๋ ์์ธก์ ๋์์ด ๋ ์ ์์ผ๋ฉฐ, ์ด๋ฅผ ์ํด ๋ณธ ๋
ผ๋ฌธ์์๋ ๋น์ง๋ํ์ต ๊ธฐ๋ฐ์ผ๋ก ํ๋ ๋คํธ์ํฌ๋ก๋ถํฐ ์ฌ์ฉ์ ๊ฐ ์ฌํ์ ๊ด๊ณ๋ฅผ ์ถ์ถํ๋ ์ฐ๊ตฌ๋ฅผ ์ ์ํ์๋ค.
๊ธฐ์กด์ ์ฐ๊ตฌ๋์๋ ๋ฐฉ๋ฒ๋ค์ ๋ ์ฌ์ฉ์๊ฐ ๋์์ ๋ฐฉ๋ฌธํ๋ ํ์์ธ co-visitation์ ์ค์ ์ ์ผ๋ก ๊ณ ๋ คํ์ฌ ์ฌ์ฉ์ ๊ฐ์ ๊ด๊ณ๋ฅผ ์์ธกํ๊ฑฐ๋, ๋คํธ์ํฌ ์๋ฒ ๋ฉ ๋๋ ๊ทธ๋ํ ์ ๊ฒฝ๋ง(GNN)์ ์ฌ์ฉํ์ฌ ํํ ํ์ต์ ์ํํ์๋ค. ๊ทธ๋ฌ๋ ์ด๋ฌํ ์ ๊ทผ ๋ฐฉ์์ ์ฃผ๊ธฐ์ ์ธ ๋ฐฉ๋ฌธ์ด๋ ์ฅ๊ฑฐ๋ฆฌ ์ด๋ ๋ฑ์ผ๋ก ๋ํ๋๋ ์ฌ์ฉ์์ ํ๋ ํจํด์ ์ ํฌ์ฐฉํ์ง ๋ชปํ๋ค. ํ๋ ํจํด์ ๋ ์ ํ์ตํ๊ธฐ ์ํด, ANES๋ ์ฌ์ฉ์ ์ปจํ
์คํธ ๋ด์์ ์ฌ์ฉ์์ ๊ด์ฌ ์ง์ (POI) ๊ฐ์ ์ธก๋ฉด(Aspect) ์งํฅ ๊ด๊ณ๋ฅผ ํ์ตํ๋ค. ANES๋ User-POI ์ด๋ถ ๊ทธ๋ํ์ ๊ตฌ์กฐ์์ ์ฌ์ฉ์์ ํ๋์ ์ฌ๋ฌ ๊ฐ์ ์ธก๋ฉด์ผ๋ก ๋๋๊ณ , ๊ฐ๊ฐ์ ๊ด๊ณ๋ฅผ ๊ณ ๋ คํ์ฌ ํ๋ ํจํด์ ์ถ์ถํ๋ ์ต์ด์ ๋น์ง๋ํ์ต ๊ธฐ๋ฐ ์ ๊ทผ ๋ฐฉ์์ด๋ค. ์ค์ LBSN ๋ฐ์ดํฐ์์ ์ํ๋ ๊ด๋ฒ์ํ ์คํ์์, ANES๋ ๊ธฐ์กด์ ์ ์๋์๋ ๊ธฐ๋ฒ๋ค๋ณด๋ค ๋์ ์ฑ๋ฅ์ ๋ณด์ฌ์ค๋ค.
์์น ๊ธฐ๋ฐ ์์
๋คํธ์ํฌ์๋ ๋ค๋ฅด๊ฒ, ์ด์ปค๋จธ์ค์ ๋ฆฌ๋ทฐ ์์คํ
์์๋ ์ฌ์ฉ์๋ค์ด ๋ฅ๋์ ์ธ ํ๋ก์ฐ/ํ๋ก์ ๋ฑ์ ํ์๋ฅผ ์ํํ์ง ์๊ณ ๋ ํ๋ซํผ์ ์ํด ์๋ก์ ์ ๋ณด๋ฅผ ์ฃผ๊ณ ๋ฐ๊ณ ์ํฅ๋ ฅ์ ํ์ฌํ๊ฒ ๋๋ค. ์ด์ ๊ฐ์ ์ฌ์ฉ์๋ค์ ํ๋ ํน์ฑ์ ๋ฆฌ๋ทฐ ์คํธ์ ์ํด ์ฝ๊ฒ ์
์ฉ๋ ์ ์๋ค. ๋ฆฌ๋ทฐ ์คํธ์ ์ค์ ์ฌ์ฉ์์ ์๊ฒฌ์ ์จ๊ธฐ๊ณ ํ์ ์ ์กฐ์ํ์ฌ ์๋ชป๋ ์ ๋ณด๋ฅผ ์ ๋ฌํ๋ ๋ฐฉ์์ผ๋ก ์ด๋ฃจ์ด์ง๋ค. ๋๋ ์ด๋ฅผ ํด๊ฒฐํ๊ธฐ ์ํด ์ฌ์ฉ์ ๋ฆฌ๋ทฐ ๋ฐ์ดํฐ์์ ์ฌ์ฉ์ ๊ฐ ์ฌ์ ๊ณต๋ชจ์ฑ(Collusiveness)์ ๊ฐ๋ฅ์ฑ์ ์ฐพ๊ณ , ์ด๋ฅผ ์คํธ ํ์ง์ ํ์ฉํ ๋ฐฉ๋ฒ์ธ SC-Com์ ์ ์ํ๋ค. SC-Com์ ํ๋์ ๊ณต๋ชจ์ฑ์ผ๋ก๋ถํฐ ์ฌ์ฉ์ ๊ฐ ๊ณต๋ชจ ์ ์๋ฅผ ๊ณ์ฐํ๊ณ ํด๋น ์ ์๋ฅผ ๋ฐํ์ผ๋ก ์ ์ฒด ์ฌ์ฉ์๋ฅผ ์ ์ฌํ ์ฌ์ฉ์๋ค์ ์ปค๋ฎค๋ํฐ๋ก ๋ถ๋ฅํ๋ค. ๊ทธ ํ ์คํธ ์ ์ ์ ์ผ๋ฐ ์ ์ ๋ฅผ ๊ตฌ๋ณํ๋ ๋ฐ์ ์ค์ํ ๊ทธ๋ํ ๊ธฐ๋ฐ์ ํน์ง์ ์ถ์ถํ์ฌ ๊ฐ๋
ํ์ต ๊ธฐ๋ฐ์ ๋ถ๋ฅ๊ธฐ์ ์
๋ ฅ ๋ฐ์ดํฐ๋ก ํ์ฉํ๋ ๋ฐฉ๋ฒ์ ์ ์ํ๋ค. SC-Com์ ๊ณต๋ชจ์ฑ์ ๊ฐ๋ ์คํธ ์ ์ ์ ์งํฉ์ ํจ๊ณผ์ ์ผ๋ก ํ์งํ๋ค. ์ค์ ๋ฐ์ดํฐ์
์ ์ด์ฉํ ์คํ์์, SC-Com์ ๊ธฐ์กด ๋
ผ๋ฌธ๋ค ๋๋น ์คํธ ํ์ง์ ๋ฐ์ด๋ ์ฑ๋ฅ์ ๋ณด์ฌ์ฃผ์๋ค.
์ ๋
ผ๋ฌธ์์ ๋ค์ํ ๋ฐ์ดํฐ์ ๋ํด ์ฐ๊ตฌ๋ ์์์ ์ฐ๊ฒฐ๋ง ํ์ง ๋ชจ๋ธ์ ๋ ์ด๋ธ์ด ์๋ ๋ฐ์ดํฐ์ ๋ํด์๋ ์ฌ์ ์ ์ฐ๊ฒฐ๋์์ ๊ฐ๋ฅ์ฑ์ด ๋์ ์ฌ์ฉ์๋ค์ ์์ธกํ๋ฏ๋ก, ์ค์๊ฐ ์์น ๋ฐ์ดํฐ๋, ์ฑ ์ฌ์ฉ ๋ฐ์ดํฐ ๋ฑ์ ๋ค์ํ ๋ฐ์ดํฐ์์ ํ์ฉํ ์ ์๋ ์ ์ฉํ ์ ๋ณด๋ฅผ ์ ๊ณตํ์ฌ ๊ด๊ณ ์ถ์ฒ ์์คํ
์ด๋, ์
์ฑ ์ ์ ํ์ง ๋ฑ์ ๋ถ์ผ์์ ๊ธฐ์ฌํ ์ ์์ ๊ฒ์ผ๋ก ๊ธฐ๋ํ๋ค.Following the exploding usage on online services, people are connected with each other more broadly and widely. In online platforms, people influence each other, and have tendency to reflect their opinions in decision-making. Social Network Services (SNSs) and E-commerce are typical example of online platforms.
User behaviors in online platforms can be defined as relation between user and platform components. A user's purchase is a relationship between a user and a product, and a user's check-in is a relationship between a user and a place. Here, information such as action time, rating, tag, etc. may be included. In many studies, platform user behavior is represented in graph form. At this time, the elements constituting the nodes of the graph are composed of objects such as users and products and places within the platform, and the interaction between the platform elements and the user can be expressed as two nodes being connected.
In this study, I present studies to identify potential networks that affect the user's behavior graph defined on the two platforms.
In ANES, I focus on representation learning for social link inference based on user trajectory data. While traditional methods predict relations between users by considering hand-crafted features, recent studies first perform representation learning using network/node embedding or graph neural networks (GNNs) for downstream tasks such as node classification and link prediction. However, those approaches fail to capture behavioral patterns of individuals ingrained in periodical visits or long-distance movements. To better learn behavioral patterns, this paper proposes a novel scheme called ANES (Aspect-oriented Network Embedding for Social link inference). ANES learns aspect-oriented relations between users and Point-of-Interests (POIs) within their contexts. ANES is the first approach that extracts the complex behavioral pattern of users from both trajectory data and the structure of User-POI bipartite graphs. Extensive experiments on several real-world datasets show that ANES outperforms state-of-the-art baselines.
In contrast to active social networks, people are connected to other users regardless of their intentions in some platforms, such as online shopping websites and restaurant review sites. They do not have any information about each other in advance, and they only have a common point which is that they have visited or have planned to visit same place or purchase a product. Interestingly, users have tendency to be influenced by the review data on their purchase intentions.
Unfortunately, this instinct is easily exploited by opinion spammers. In SC-Com, I focus on opinion spam detection in online shopping services. In many cases, my decision-making process is closely related to online reviews. However, there have been threats of opinion spams by hired reviewers increasingly, which aim to mislead potential customers by hiding genuine consumers opinions. Opinion spams should be filed up collectively to falsify true information. Fortunately, I propose the way to spot the possibility to detect them from their collusiveness. In this paper, I propose SC-Com, an optimized collusive community detection framework. It constructs the graph of reviewers from the collusiveness of behavior and divides a graph by communities based on their mutual suspiciousness. After that, I extract community-based and temporal abnormality features which are critical to discriminate spammers from other genuine users. I show that my method detects collusive opinion spam reviewers effectively and precisely from their collective behavioral patterns. In the real-world dataset, my approach showed prominent performance while only considering primary data such as time and ratings.
These implicit network inference models studied on various data in this thesis predicts users who are likely to be pre-connected to unlabeled data, so it is expected to contribute to areas such as advertising recommendation systems and malicious user detection by providing useful information.Chapter 1 Introduction 1
Chapter 2 Social link Inference in Location-based check-in data 5
2.1 Background 5
2.2 Related Work 12
2.3 Location-based Social Network Service Data 15
2.4 Aspect-wise Graph Decomposition 18
2.5 Aspect-wise Graph learning 19
2.6 Inferring Social Relation from User Representation 21
2.7 Performance Analysis 23
2.8 Discussion and Implications 26
2.9 Summary 34
Chapter 3 Detecting collusiveness from reviews in Online platforms and its application 35
3.1 Background 35
3.2 Related Work 39
3.3 Online Review Data 43
3.4 Collusive Graph Projection 44
3.5 Reviewer Community Detection 47
3.6 Review Community feature extraction and spammer detection 51
3.7 Performance Analysis 53
3.8 Discussion and Implications 55
3.9 Summary 62
Chapter 4 Conclusion 63๋ฐ
Signed Latent Factors for Spamming Activity Detection
Due to the increasing trend of performing spamming activities (e.g., Web
spam, deceptive reviews, fake followers, etc.) on various online platforms to
gain undeserved benefits, spam detection has emerged as a hot research issue.
Previous attempts to combat spam mainly employ features related to metadata,
user behaviors, or relational ties. These works have made considerable progress
in understanding and filtering spamming campaigns. However, this problem
remains far from fully solved. Almost all the proposed features focus on a
limited number of observed attributes or explainable phenomena, making it
difficult for existing methods to achieve further improvement. To broaden the
vision about solving the spam problem and address long-standing challenges
(class imbalance and graph incompleteness) in the spam detection area, we
propose a new attempt of utilizing signed latent factors to filter fraudulent
activities. The spam-contaminated relational datasets of multiple online
applications in this scenario are interpreted by the unified signed network.
Two competitive and highly dissimilar algorithms of latent factors mining (LFM)
models are designed based on multi-relational likelihoods estimation (LFM-MRLE)
and signed pairwise ranking (LFM-SPR), respectively. We then explore how to
apply the mined latent factors to spam detection tasks. Experiments on
real-world datasets of different kinds of Web applications (social media and
Web forum) indicate that LFM models outperform state-of-the-art baselines in
detecting spamming activities. By specifically manipulating experimental data,
the effectiveness of our methods in dealing with incomplete and imbalanced
challenges is valid
Using Contextual Features for Online Recruitment Fraud Detection
The recent growth of online recruitment and candidate management systems has established yet another media for fraudsters on the internet. The ever-growing size of the candidate pool has forced different industries to move to web-based candidate management systems. The advantages of such web-based systems are substantial. On one hand, they are the best means to filter through thousands of applicants for employers and on the other hand, the candidates find themselves in a convenient position while applying for a position. People with fraudulent motivations explore these systems to lure candidates in a hoax and extract sensitive information (e.g. contact information) using fake job advertisements. In this paper, we analyzed a publicly available dataset and used machine learning algorithms to classify job postings as fraudulent or legitimate. The contribution of this research is the inclusion of contextual features in the feature space, which revealed compelling improvements of accuracy, precision and recall
Accumulative time-based ranking method to reputation evaluation in information networks
With the rapid development of modern technology, the Web has become an
important platform for users to make friends and acquire information. However,
since information on the Web is over-abundant, information filtering becomes a
key task for online users to obtain relevant suggestions. As most Websites can
be ranked according to users' rating and preferences, relevance to queries, and
recency, how to extract the most relevant item from the over-abundant
information is always a key topic for researchers in various fields. In this
paper, we adopt tools used to analyze complex networks to evaluate user
reputation and item quality. In our proposed accumulative time-based ranking
(ATR) algorithm, we incorporate two behavioral weighting factors which are
updated when users select or rate items, to reflect the evolution of user
reputation and item quality over time. We showed that our algorithm outperforms
state-of-the-art ranking algorithms in terms of precision and robustness on
empirical datasets from various online retailers and the citation datasets
among research publications
Uncovering Download Fraud Activities in Mobile App Markets
Download fraud is a prevalent threat in mobile App markets, where fraudsters
manipulate the number of downloads of Apps via various cheating approaches.
Purchased fake downloads can mislead recommendation and search algorithms and
further lead to bad user experience in App markets. In this paper, we
investigate download fraud problem based on a company's App Market, which is
one of the most popular Android App markets. We release a honeypot App on the
App Market and purchase fake downloads from fraudster agents to track fraud
activities in the wild. Based on our interaction with the fraudsters, we
categorize download fraud activities into three types according to their
intentions: boosting front end downloads, optimizing App search ranking, and
enhancing user acquisition&retention rate. For the download fraud aimed at
optimizing App search ranking, we select, evaluate, and validate several
features in identifying fake downloads based on billions of download data. To
get a comprehensive understanding of download fraud, we further gather stances
of App marketers, fraudster agencies, and market operators on download fraud.
The followed analysis and suggestions shed light on the ways to mitigate
download fraud in App markets and other social platforms. To the best of our
knowledge, this is the first work that investigates the download fraud problem
in mobile App markets.Comment: Published as a conference paper in IEEE/ACM ASONAM 201
Review manipolation in the hotel industry: the case of one- time contributors
Although, review manipulation has shown to have a significant adverse impact on
consumer welfare, there is yet little understanding of which economic incentives drive this
behavior as most of the current research has focused on the characteristics that define a fake
review. The present study investigates these incentives using the innovative approach of
examining one-time contributor user reviews as an alternative measure of review manipulation.
With a sample comprising 450 hotels, registered on TripAdvisor, from the cities of Amsterdam
and Brussels two type of studies were developed encompassing both cross-sectional and panel
data analyses. The empirical results obtained show that review manipulation is sufficiently
economically important since agents with different economic incentives will indulge in review
fraud in a dissimilar extent. These incentives were found to include: the type of organizational
structure; the total number of reviews; and the attributed user bubble rating