868 research outputs found

    BlogForever: D2.5 Weblog Spam Filtering Report and Associated Methodology

    Get PDF
    This report is written as a first attempt to define the BlogForever spam detection strategy. It comprises a survey of weblog spam technology and approaches to their detection. While the report was written to help identify possible approaches to spam detection as a component within the BlogForver software, the discussion has been extended to include observations related to the historical, social and practical value of spam, and proposals of other ways of dealing with spam within the repository without necessarily removing them. It contains a general overview of spam types, ready-made anti-spam APIs available for weblogs, possible methods that have been suggested for preventing the introduction of spam into a blog, and research related to spam focusing on those that appear in the weblog context, concluding in a proposal for a spam detection workflow that might form the basis for the spam detection component of the BlogForever software

    PageRank optimization applied to spam detection

    Full text link
    We give a new link spam detection and PageRank demotion algorithm called MaxRank. Like TrustRank and AntiTrustRank, it starts with a seed of hand-picked trusted and spam pages. We define the MaxRank of a page as the frequency of visit of this page by a random surfer minimizing an average cost per time unit. On a given page, the random surfer selects a set of hyperlinks and clicks with uniform probability on any of these hyperlinks. The cost function penalizes spam pages and hyperlink removals. The goal is to determine a hyperlink deletion policy that minimizes this score. The MaxRank is interpreted as a modified PageRank vector, used to sort web pages instead of the usual PageRank vector. The bias vector of this ergodic control problem, which is unique up to an additive constant, is a measure of the "spamicity" of each page, used to detect spam pages. We give a scalable algorithm for MaxRank computation that allowed us to perform experimental results on the WEBSPAM-UK2007 dataset. We show that our algorithm outperforms both TrustRank and AntiTrustRank for spam and nonspam page detection.Comment: 8 pages, 6 figure

    Evaluation of Spam Impact on Arabic Websites Popularity

    Get PDF
    The expansion of the Web and its information in all aspects of life raises the concern of how to trust information published on the Web especially in cases where publisher may not be known. Websites strive to be more popular and make themselves visible to search engines and eventually to users. Website popularity can be measured using several metrics such as the Web traffic (e.g. Website: visitors\u27 number and visited page number). A link or page popularity refers to the total number of hyperlinks referring to a certain Web page. In this study, several top ranked Arabic Websites are selected for evaluating possible Web spam behavior. Websites use spam techniques to boost their ranks within Search Engine Results Page (SERP). Results of this study showed that some of these popular Websites are using techniques that are considered spam techniques according to Search Engine Optimization guidelines

    Link-based similarity search to fight web spam

    Get PDF
    www.ilab.sztaki.hu/websearch We investigate the usability of similarity search in fighting Web spam based on the assumption that an unknown spam page is more similar to certain known spam pages than to honest pages. In order to be successful, search engine spam never appears in isolation: we observe link farms and alliances for the sole purpose of search engine ranking manipulation. The artificial nature and strong inside connectedness however gave rise to successful algorithms to identify search engine spam. One example is trust and distrust propagation, an idea originating in recommender systems and P2P networks, that yields spam classificators by spreading information along hyperlinks from white and blacklists. While most previous results use PageRank variants for propagation, we form classifiers by investigating similarity top lists of an unknown page along various measures such as co-citation, companion, nearest neighbors in low dimensional projections and SimRank. We test our method over two data sets previously used to measure spam filtering algorithms. 1

    Web Spam DetectionUsing Fuzzy Clustering

    Get PDF
    Internet is the most widespread medium to express our views and ideas and a lucrative platform for delivering the products. F or this in tention, search engine plays a key role. The information or data about the web pages are stored in an index database of the search engine for use in later queries. Web spam refers to a host of techniques to challenge the ranking algorithms of web search en gines and cause them to rank their web pages higher or for some other beneficial purpose. Usually, the web spam is irritating the web surfers and makes disruption. It ruins the quality of the web search engine. So, in this paper, we presented an efficient clustering method to detect the spam web pages effectively and accurately. Also, we employed various validation measures to validate our research work by using the clustering methods. The comparison s between the obtained charts and the val idation results clearly explain that the research work we presented produces the better result

    개인 사회망 네트워크 분석 기반 온라인 사회 공격자 탐지

    Get PDF
    학위논문(박사)--서울대학교 대학원 :공과대학 컴퓨터공학부,2020. 2. 김종권.In the last decade we have witnessed the explosive growth of online social networking services (SNSs) such as Facebook, Twitter, Weibo and LinkedIn. While SNSs provide diverse benefits – for example, fostering inter-personal relationships, community formations and news propagation, they also attracted uninvited nuiance. Spammers abuse SNSs as vehicles to spread spams rapidly and widely. Spams, unsolicited or inappropriate messages, significantly impair the credibility and reliability of services. Therefore, detecting spammers has become an urgent and critical issue in SNSs. This paper deals with spamming in Twitter and Weibo. Instead of spreading annoying messages to the public, a spammer follows (subscribes to) normal users, and followed a normal user. Sometimes a spammer makes link farm to increase target accounts explicit influence. Based on the assumption that the online relationships of spammers are different from those of normal users, I proposed classification schemes that detect online social attackers including spammers. I firstly focused on ego-network social relations and devised two features, structural features based on Triad Significance Profile (TSP) and relational semantic features based on hierarchical homophily in an ego-network. Experiments on real Twitter and Weibo datasets demonstrated that the proposed approach is very practical. The proposed features are scalable because instead of analyzing the whole network, they inspect user-centered ego-networks. My performance study showed that proposed methods yield significantly better performance than prior scheme in terms of true positives and false positives.최근 우리는 Facebook, Twitter, Weibo, LinkedIn 등의 다양한 사회 관계망 서비스가 폭발적으로 성장하는 현상을 목격하였다. 하지만 사회 관계망 서비스가 개인과 개인간의 관계 및 커뮤니티 형성과 뉴스 전파 등의 여러 이점을 제공해 주고 있는데 반해 반갑지 않은 현상 역시 발생하고 있다. 스패머들은 사회 관계망 서비스를 동력 삼아 스팸을 매우 빠르고 넓게 전파하는 식으로 악용하고 있다. 스팸은 수신자가 원치 않는 메시지들을 일컽는데 이는 서비스의 신뢰도와 안정성을 크게 손상시킨다. 따라서, 스패머를 탐지하는 것이 현재 소셜 미디어에서 매우 긴급하고 중요한 문제가 되었다. 이 논문은 대표적인 사회 관계망 서비스들 중 Twitter와 Weibo에서 발생하는 스패밍을 다루고 있다. 이러한 유형의 스패밍들은 불특정 다수에게 메시지를 전파하는 대신에, 많은 일반 사용자들을 '팔로우(구독)'하고 이들로부터 '맞 팔로잉(맞 구독)'을 이끌어 내는 것을 목적으로 하기도 한다. 때로는 link farm을 이용해 특정 계정의 팔로워 수를 높이고 명시적 영향력을 증가시키기도 한다. 스패머의 온라인 관계망이 일반 사용자의 온라인 사회망과 다를 것이라는 가정 하에, 나는 스패머들을 포함한 일반적인 온라인 사회망 공격자들을 탐지하는 분류 방법을 제시한다. 나는 먼저 개인 사회망 내 사회 관계에 주목하고 두 가지 종류의 분류 특성을 제안하였다. 이들은 개인 사회망의 Triad Significance Profile (TSP)에 기반한 구조적 특성과 Hierarchical homophily에 기반한 관계 의미적 특성이다. 실제 Twitter와 Weibo 데이터셋에 대한 실험 결과는 제안한 방법이 매우 실용적이라는 것을 보여준다. 제안한 특성들은 전체 네트워크를 분석하지 않아도 개인 사회망만 분석하면 되기 때문에 scalable하게 측정될 수 있다. 나의 성능 분석 결과는 제안한 기법이 기존 방법에 비해 true positive와 false positive 측면에서 우수하다는 것을 보여준다.1 Introduction 1 2 Related Work 6 2.1 OSN Spammer Detection Approaches 6 2.1.1 Contents-based Approach 6 2.1.2 Social Network-based Approach 7 2.1.3 Subnetwork-based Approach 8 2.1.4 Behavior-based Approach 9 2.2 Link Spam Detection 10 2.3 Data mining schemes for Spammer Detection 10 2.4 Sybil Detection 12 3 Triad Significance Profile Analysis 14 3.1 Motivation 14 3.2 Twitter Dataset 18 3.3 Indegree and Outdegree of Dataset 20 3.4 Twitter spammer Detection with TSP 22 3.5 TSP-Filtering 27 3.6 Performance Evaluation of TSP-Filtering 29 4 Hierarchical Homophily Analysis 33 4.1 Motivation 33 4.2 Hierarchical Homophily in OSN 37 4.2.1 Basic Analysis of Datasets 39 4.2.2 Status gap distribution and Assortativity 44 4.2.3 Hierarchical gap distribution 49 4.3 Performance Evaluation of HH-Filtering 53 5 Overall Performance Evaluation 58 6 Conclusion 63 Bibliography 65Docto
    corecore