12 research outputs found

    Outlier Edge Detection Using Random Graph Generation Models and Applications

    Get PDF
    Outliers are samples that are generated by different mechanisms from other normal data samples. Graphs, in particular social network graphs, may contain nodes and edges that are made by scammers, malicious programs or mistakenly by normal users. Detecting outlier nodes and edges is important for data mining and graph analytics. However, previous research in the field has merely focused on detecting outlier nodes. In this article, we study the properties of edges and propose outlier edge detection algorithms using two random graph generation models. We found that the edge-ego-network, which can be defined as the induced graph that contains two end nodes of an edge, their neighboring nodes and the edges that link these nodes, contains critical information to detect outlier edges. We evaluated the proposed algorithms by injecting outlier edges into some real-world graph data. Experiment results show that the proposed algorithms can effectively detect outlier edges. In particular, the algorithm based on the Preferential Attachment Random Graph Generation model consistently gives good performance regardless of the test graph data. Further more, the proposed algorithms are not limited in the area of outlier edge detection. We demonstrate three different applications that benefit from the proposed algorithms: 1) a preprocessing tool that improves the performance of graph clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel noisy data clustering algorithm. These applications show the great potential of the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape

    Outlier edge detection using random graph generation models and applications

    Get PDF
    Outliers are samples that are generated by different mechanisms from other normal data samples. Graphs, in particular social network graphs, may contain nodes and edges that are made by scammers, malicious programs or mistakenly by normal users. Detecting outlier nodes and edges is important for data mining and graph analytics. However, previous research in the field has merely focused on detecting outlier nodes. In this article, we study the properties of edges and propose effective outlier edge detection algorithm. The proposed algorithms are inspired by community structures that are very common in social networks. We found that the graph structure around an edge holds critical information for determining the authenticity of the edge. We evaluated the proposed algorithms by injecting outlier edges into some real-world graph data. Experiment results show that the proposed algorithms can effectively detect outlier edges. In particular, the algorithm based on the Preferential Attachment Random Graph Generation model consistently gives good performance regardless of the test graph data. More important, by analyzing the authenticity of the edges in a graph, we are able to reveal underlying structure and properties of a graph. Thus, the proposed algorithms are not limited in the area of outlier edge detection. We demonstrate three different applications that benefit from the proposed algorithms: (1) a preprocessing tool that improves the performance of graph clustering algorithms; (2) an outlier node detection algorithm; and (3) a novel noisy data clustering algorithm. These applications show the great potential of the proposed outlier edge detection techniques. They also address the importance of analyzing the edges in graph mining—a topic that has been mostly neglected by researchers.Academy of Finland supported this research

    iBGP: A Bipartite Graph Propagation Approach for Mobile Advertising Fraud Detection

    Get PDF

    Goodreads: A social network site for book readers

    Get PDF
    This is an accepted manuscript of an article published by John Wiley & Sons, Inc. in Journal of the Association for Information Science and Technology on 21/12/2016, available online: https://doi.org/10.1002/asi.23733 The accepted version of the publication may differ from the final published version.Goodreads is an Amazon‐owned book‐based social web site for members to share books, read, review books, rate books, and connect with other readers. Goodreads has tens of millions of book reviews, recommendations, and ratings that may help librarians and readers to select relevant books. This article describes a first investigation of the properties of Goodreads users, using a random sample of 50,000 members. The results suggest that about three quarters of members with a public profile are female, and that there is little difference between male and female users in patterns of behavior, except for females registering more books and rating them less positively. Goodreads librarians and super‐users engage extensively with most features of the site. The absence of strong correlations between book‐based and social usage statistics (e.g., numbers of friends, followers, books, reviews, and ratings) suggests that members choose their own individual balance of social and book activities and rarely ignore one at the expense of the other. Goodreads is therefore neither primarily a book‐based website nor primarily a social network site but is a genuine hybrid, social navigation site.University of Wolverhampto

    What you think and what I think: Studying intersubjectivity in knowledge artifacts evaluation

    Get PDF
    Miscalibration, the failure to accurately evaluate one’s own work relative to others' evaluation, is a common concern in social systems of knowledge creation where participants act as both creators and evaluators. Theories of social norming hold that individual’s self-evaluation miscalibration diminishes over multiple iterations of creator-evaluator interactions and shared understanding emerges. This paper explores intersubjectivity and the longitudinal dynamics of miscalibration between creators' and evaluators' assessments in IT-enabled social knowledge creation and refinement systems. Using Latent Growth Modeling, we investigated dynamics of creator’s assessments of their own knowledge artifacts compared to peer evaluators' to determine whether miscalibration attenuates over multiple interactions. Contrary to theory, we found that creator’s self-assessment miscalibration does not attenuate over repeated interactions. Moreover, depending on the degree of difference, we found self-assessment miscalibration to amplify over time with knowledge artifact creators' diverging farther from their peers' collective opinion. Deeper analysis found no significant evidence of the influence of bias and controversy on miscalibration. Therefore, relying on social norming to correct miscalibration in knowledge creation environments (e.g., social media interactions) may not function as expected

    Graph based Anomaly Detection and Description: A Survey

    Get PDF
    Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field

    Investigating intersubjectivity in peer-review-based, technology-enabled knowledge creation and refinement social systems

    Get PDF
    In peer-based knowledge creation domains, problem complexity and subjectivity of individual understanding impedes development of actors' competencies. Prior research remains ambivalent on whether interactions between peers lead to the development of shared, intersubjective, understanding about one's own and peers' competencies. On the one hand, actors may develop this shared understanding through social learning. On the other hand, due to the Dunning-Kruger effect, both less and more competent actors may persistently miscalibrate their own performance relative to peers. This dissertation examines how creation and evaluation competencies in peer-based social knowledge creation communities, where complex-problem social knowledge artifacts are produced, change and interact over time. It hypothesizes the existence of latent classes of longitudinal trajectories of creation and evaluation competency development, and convergence of these trajectories over multiple interactions, as intersubjective understanding emerges; moreover, their trajectories may be affected by the openness of peer groups. To investigate this research problem, a peer review system was designed, instantiated, and tested in a controlled experiment study. Findings support the existence of multiple latent longitudinal trajectories. Partial evidence of the peer group openness' effect on competency change over time was also found. Results indicate that longitudinal peer interaction patterns are very complex. Practical implications of these finding for various domains are discussed and directions for further investigation are proposed
    corecore