12 research outputs found
Outlier Edge Detection Using Random Graph Generation Models and Applications
Outliers are samples that are generated by different mechanisms from other
normal data samples. Graphs, in particular social network graphs, may contain
nodes and edges that are made by scammers, malicious programs or mistakenly by
normal users. Detecting outlier nodes and edges is important for data mining
and graph analytics. However, previous research in the field has merely focused
on detecting outlier nodes. In this article, we study the properties of edges
and propose outlier edge detection algorithms using two random graph generation
models. We found that the edge-ego-network, which can be defined as the induced
graph that contains two end nodes of an edge, their neighboring nodes and the
edges that link these nodes, contains critical information to detect outlier
edges. We evaluated the proposed algorithms by injecting outlier edges into
some real-world graph data. Experiment results show that the proposed
algorithms can effectively detect outlier edges. In particular, the algorithm
based on the Preferential Attachment Random Graph Generation model consistently
gives good performance regardless of the test graph data. Further more, the
proposed algorithms are not limited in the area of outlier edge detection. We
demonstrate three different applications that benefit from the proposed
algorithms: 1) a preprocessing tool that improves the performance of graph
clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel
noisy data clustering algorithm. These applications show the great potential of
the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape
Outlier edge detection using random graph generation models and applications
Outliers are samples that are generated by different mechanisms from other normal data samples. Graphs, in particular social network graphs, may contain nodes and edges that are made by scammers, malicious programs or mistakenly by normal users. Detecting outlier nodes and edges is important for data mining and graph analytics. However, previous research in the field has merely focused on detecting outlier nodes. In this article, we study the properties of edges and propose effective outlier edge detection algorithm. The proposed algorithms are inspired by community structures that are very common in social networks. We found that the graph structure around an edge holds critical information for determining the authenticity of the edge. We evaluated the proposed algorithms by injecting outlier edges into some real-world graph data. Experiment results show that the proposed algorithms can effectively detect outlier edges. In particular, the algorithm based on the Preferential Attachment Random Graph Generation model consistently gives good performance regardless of the test graph data. More important, by analyzing the authenticity of the edges in a graph, we are able to reveal underlying structure and properties of a graph. Thus, the proposed algorithms are not limited in the area of outlier edge detection. We demonstrate three different applications that benefit from the proposed algorithms: (1) a preprocessing tool that improves the performance of graph clustering algorithms; (2) an outlier node detection algorithm; and (3) a novel noisy data clustering algorithm. These applications show the great potential of the proposed outlier edge detection techniques. They also address the importance of analyzing the edges in graph miningâa topic that has been mostly neglected by researchers.Academy of Finland supported this research
Goodreads: A social network site for book readers
This is an accepted manuscript of an article published by John Wiley & Sons, Inc. in Journal of the Association for Information Science and Technology on 21/12/2016, available online: https://doi.org/10.1002/asi.23733
The accepted version of the publication may differ from the final published version.Goodreads is an Amazonâowned bookâbased social web site for members to share books, read, review books, rate books, and connect with other readers. Goodreads has tens of millions of book reviews, recommendations, and ratings that may help librarians and readers to select relevant books. This article describes a first investigation of the properties of Goodreads users, using a random sample of 50,000 members. The results suggest that about three quarters of members with a public profile are female, and that there is little difference between male and female users in patterns of behavior, except for females registering more books and rating them less positively. Goodreads librarians and superâusers engage extensively with most features of the site. The absence of strong correlations between bookâbased and social usage statistics (e.g., numbers of friends, followers, books, reviews, and ratings) suggests that members choose their own individual balance of social and book activities and rarely ignore one at the expense of the other. Goodreads is therefore neither primarily a bookâbased website nor primarily a social network site but is a genuine hybrid, social navigation site.University of Wolverhampto
What you think and what I think: Studying intersubjectivity in knowledge artifacts evaluation
Miscalibration, the failure to accurately evaluate oneâs own work relative to others' evaluation, is a common concern in social systems of knowledge creation where participants act as both creators and evaluators. Theories of social norming hold that individualâs self-evaluation miscalibration diminishes over multiple iterations of creator-evaluator interactions and shared understanding emerges. This paper explores intersubjectivity and the longitudinal dynamics of miscalibration between creators' and evaluators' assessments in IT-enabled social knowledge creation and refinement systems. Using Latent Growth Modeling, we investigated dynamics of creatorâs assessments of their own knowledge artifacts compared to peer evaluators' to determine whether miscalibration attenuates over multiple interactions. Contrary to theory, we found that creatorâs self-assessment miscalibration does not attenuate over repeated interactions. Moreover, depending on the degree of difference, we found self-assessment miscalibration to amplify over time with knowledge artifact creators' diverging farther from their peers' collective opinion. Deeper analysis found no significant evidence of the influence of bias and controversy on miscalibration. Therefore, relying on social norming to correct miscalibration in knowledge creation environments (e.g., social media interactions) may not function as expected
Graph based Anomaly Detection and Description: A Survey
Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus recently. As objects in graphs have long-range correlations, a suite of novel technology has been developed for anomaly detection in graph data. This survey aims to provide a general, comprehensive, and structured overview of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under various settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attributed vs. plain graphs. We highlight the effectiveness, scalability, generality, and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the âwhyâ, of the detected anomalies for further analysis and sense-making. Finally, we present several real-world applications of graph-based anomaly detection in diverse domains, including financial, auction, computer traffic, and social networks. We conclude our survey with a discussion on open theoretical and practical challenges in the field
Investigating intersubjectivity in peer-review-based, technology-enabled knowledge creation and refinement social systems
In peer-based knowledge creation domains, problem complexity and subjectivity of individual understanding impedes development of actors' competencies. Prior research remains ambivalent on whether interactions between peers lead to the development of shared, intersubjective, understanding about one's own and peers' competencies. On the one hand, actors may develop this shared understanding through social learning. On the other hand, due to the Dunning-Kruger effect, both less and more competent actors may persistently miscalibrate their own performance relative to peers. This dissertation examines how creation and evaluation competencies in peer-based social knowledge creation communities, where complex-problem social knowledge artifacts are produced, change and interact over time. It hypothesizes the existence of latent classes of longitudinal trajectories of creation and evaluation competency development, and convergence of these trajectories over multiple interactions, as intersubjective understanding emerges; moreover, their trajectories may be affected by the openness of peer groups. To investigate this research problem, a peer review system was designed, instantiated, and tested in a controlled experiment study. Findings support the existence of multiple latent longitudinal trajectories. Partial evidence of the peer group openness' effect on competency change over time was also found. Results indicate that longitudinal peer interaction patterns are very complex. Practical implications of these finding for various domains are discussed and directions for further investigation are proposed