232 research outputs found

    Identifying Search Engine Spam Using DNS

    Get PDF
    Web crawlers encounter both finite and infinite elements during crawl. Pages and hosts can be infinitely generated using automated scripts and DNS wildcard entries. It is a challenge to rank such resources as an entire web of pages and hosts could be created to manipulate the rank of a target resource. It is crucial to be able to differentiate genuine content from spam in real-time to allocate crawl budgets. In this study, ranking algorithms to rank hosts are designed which use the finite Pay Level Domains(PLD) and IPv4 addresses. Heterogenous graphs derived from the webgraph of IRLbot are used to achieve this. PLD Supporters (PSUPP) which is the number of level-2 PLD supporters for each host on the host-host-PLD graph is the first algorithm that is studied. This is further improved by True PLD Supporters(TSUPP) which uses true egalitarian level-2 PLD supporters on the host-IP-PLD graph and DNS blacklists. It was found that support from content farms and stolen links could be eliminated by finding TSUPP. When TSUPP was applied on the host graph of IRLbot, there was less than 1% spam in the top 100,000 hosts

    Graph based Anomaly Detection and Description: A Survey

    Get PDF
    Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field

    Trust and Credibility in Online Social Networks

    Get PDF
    Increasing portions of people's social and communicative activities now take place in the digital world. The growth and popularity of online social networks (OSNs) have tremendously facilitated online interaction and information exchange. As OSNs enable people to communicate more effectively, a large volume of user-generated content (UGC) is produced daily. As UGC contains valuable information, more people now turn to OSNs for news, opinions, and social networking. Besides users, companies and business owners also benefit from UGC as they utilize OSNs as the platforms for communicating with customers and marketing activities. Hence, UGC has a powerful impact on users' opinions and decisions. However, the openness of OSNs also brings concerns about trust and credibility online. The freedom and ease of publishing information online could lead to UGC with problematic quality. It has been observed that professional spammers are hired to insert deceptive content and promote harmful information in OSNs. It is known as the spamming problem, which jeopardizes the ecosystems of OSNs. The severity of the spamming problem has attracted the attention of researchers and many detection approaches have been proposed. However, most existing approaches are based on behavioral patterns. As spammers evolve to evade being detected by faking normal behaviors, these detection approaches may fail. In this dissertation, we present our work of detecting spammers by extracting behavioral patterns that are difficult to be manipulated in OSNs. We focus on two scenarios, review spamming and social bots. We first identify that the rating deviations and opinion deviations are invariant patterns in review spamming activities since the goal of review spamming is to insert deceptive reviews. We utilize the two kinds of deviations as clues for trust propagation and propose our detection mechanisms. For social bots detection, we identify the behavioral patterns among users in a neighborhood is difficult to be manipulated for a social bot and propose a neighborhood-based detection scheme. Our work shows that the trustworthiness of a user can be reflected in social relations and opinions expressed in the review content. Besides, our proposed features extracted from the neighborhood are useful for social bot detection
    • …
    corecore