649 research outputs found

    Discovering Informative Connection Subgraphs in Multi-Relational Graphs

    Get PDF
    Discovering patterns in graphs has long been an area of interest. In most approaches to such pattern discovery either quantitative anomalies, frequency of substructure or maximum flow is used to measure the interestingness of a pattern. In this paper we introduce heuristics that guide a subgraph discovery algorithm away from banal paths towards more informative ones. Given an RDF graph a user might pose a question of the form: What are the most relevant ways in which entity X is related to entity Y? the response to which is a subgraph connecting X to Y. We use our heuristics to discover informative subgraphs within RDF graphs. Our heuristics are based on weighting mechanisms derived from edge semantics suggested by the RDF schema. We present an analysis of the quality of the subgraphs generated with respect to path ranking metrics. We then conclude presenting intuitions about which of our weighting schemes and heuristics produce higher quality subgraphs

    A Multi-Relational Network to Support the Scholarly Communication Process

    Full text link
    The general pupose of the scholarly communication process is to support the creation and dissemination of ideas within the scientific community. At a finer granularity, there exists multiple stages which, when confronted by a member of the community, have different requirements and therefore different solutions. In order to take a researcher's idea from an initial inspiration to a community resource, the scholarly communication infrastructure may be required to 1) provide a scientist initial seed ideas; 2) form a team of well suited collaborators; 3) located the most appropriate venue to publish the formalized idea; 4) determine the most appropriate peers to review the manuscript; and 5) disseminate the end product to the most interested members of the community. Through the various delinieations of this process, the requirements of each stage are tied soley to the multi-functional resources of the community: its researchers, its journals, and its manuscritps. It is within the collection of these resources and their inherent relationships that the solutions to scholarly communication are to be found. This paper describes an associative network composed of multiple scholarly artifacts that can be used as a medium for supporting the scholarly communication process.Comment: keywords: digital libraries and scholarly communicatio

    Mixed membership stochastic blockmodels

    Full text link
    Observations consisting of measurements on relationships for pairs of objects arise in many settings, such as protein interaction and gene regulatory networks, collections of author-recipient email, and social networks. Analyzing such data with probabilisic models can be delicate because the simple exchangeability assumptions underlying many boilerplate models no longer hold. In this paper, we describe a latent variable model of such data called the mixed membership stochastic blockmodel. This model extends blockmodels for relational data to ones which capture mixed membership latent relational structure, thus providing an object-specific low-dimensional representation. We develop a general variational inference algorithm for fast approximate posterior inference. We explore applications to social and protein interaction networks.Comment: 46 pages, 14 figures, 3 table

    On the Nature and Types of Anomalies: A Review

    Full text link
    Anomalies are occurrences in a dataset that are in some way unusual and do not fit the general patterns. The concept of the anomaly is generally ill-defined and perceived as vague and domain-dependent. Moreover, despite some 250 years of publications on the topic, no comprehensive and concrete overviews of the different types of anomalies have hitherto been published. By means of an extensive literature review this study therefore offers the first theoretically principled and domain-independent typology of data anomalies, and presents a full overview of anomaly types and subtypes. To concretely define the concept of the anomaly and its different manifestations, the typology employs five dimensions: data type, cardinality of relationship, anomaly level, data structure and data distribution. These fundamental and data-centric dimensions naturally yield 3 broad groups, 9 basic types and 61 subtypes of anomalies. The typology facilitates the evaluation of the functional capabilities of anomaly detection algorithms, contributes to explainable data science, and provides insights into relevant topics such as local versus global anomalies.Comment: 38 pages (30 pages content), 10 figures, 3 tables. Preprint; review comments will be appreciated. Improvements in version 2: Explicit mention of fifth anomaly dimension; Added section on explainable anomaly detection; Added section on variations on the anomaly concept; Various minor additions and improvement
    • …
    corecore