649 research outputs found
Discovering Informative Connection Subgraphs in Multi-Relational Graphs
Discovering patterns in graphs has long been an area of interest. In most approaches to such pattern discovery either quantitative anomalies, frequency of substructure or maximum flow is used to measure the interestingness of a pattern. In this paper we introduce heuristics that guide a subgraph discovery algorithm away from banal paths towards more informative ones. Given an RDF graph a user might pose a question of the form: What are the most relevant ways in which entity X is related to entity Y? the response to which is a subgraph connecting X to Y. We use our heuristics to discover informative subgraphs within RDF graphs. Our heuristics are based on weighting mechanisms derived from edge semantics suggested by the RDF schema. We present an analysis of the quality of the subgraphs generated with respect to path ranking metrics. We then conclude presenting intuitions about which of our weighting schemes and heuristics produce higher quality subgraphs
A Multi-Relational Network to Support the Scholarly Communication Process
The general pupose of the scholarly communication process is to support the
creation and dissemination of ideas within the scientific community. At a finer
granularity, there exists multiple stages which, when confronted by a member of
the community, have different requirements and therefore different solutions.
In order to take a researcher's idea from an initial inspiration to a community
resource, the scholarly communication infrastructure may be required to 1)
provide a scientist initial seed ideas; 2) form a team of well suited
collaborators; 3) located the most appropriate venue to publish the formalized
idea; 4) determine the most appropriate peers to review the manuscript; and 5)
disseminate the end product to the most interested members of the community.
Through the various delinieations of this process, the requirements of each
stage are tied soley to the multi-functional resources of the community: its
researchers, its journals, and its manuscritps. It is within the collection of
these resources and their inherent relationships that the solutions to
scholarly communication are to be found. This paper describes an associative
network composed of multiple scholarly artifacts that can be used as a medium
for supporting the scholarly communication process.Comment: keywords: digital libraries and scholarly communicatio
Mixed membership stochastic blockmodels
Observations consisting of measurements on relationships for pairs of objects
arise in many settings, such as protein interaction and gene regulatory
networks, collections of author-recipient email, and social networks. Analyzing
such data with probabilisic models can be delicate because the simple
exchangeability assumptions underlying many boilerplate models no longer hold.
In this paper, we describe a latent variable model of such data called the
mixed membership stochastic blockmodel. This model extends blockmodels for
relational data to ones which capture mixed membership latent relational
structure, thus providing an object-specific low-dimensional representation. We
develop a general variational inference algorithm for fast approximate
posterior inference. We explore applications to social and protein interaction
networks.Comment: 46 pages, 14 figures, 3 table
On the Nature and Types of Anomalies: A Review
Anomalies are occurrences in a dataset that are in some way unusual and do
not fit the general patterns. The concept of the anomaly is generally
ill-defined and perceived as vague and domain-dependent. Moreover, despite some
250 years of publications on the topic, no comprehensive and concrete overviews
of the different types of anomalies have hitherto been published. By means of
an extensive literature review this study therefore offers the first
theoretically principled and domain-independent typology of data anomalies, and
presents a full overview of anomaly types and subtypes. To concretely define
the concept of the anomaly and its different manifestations, the typology
employs five dimensions: data type, cardinality of relationship, anomaly level,
data structure and data distribution. These fundamental and data-centric
dimensions naturally yield 3 broad groups, 9 basic types and 61 subtypes of
anomalies. The typology facilitates the evaluation of the functional
capabilities of anomaly detection algorithms, contributes to explainable data
science, and provides insights into relevant topics such as local versus global
anomalies.Comment: 38 pages (30 pages content), 10 figures, 3 tables. Preprint; review
comments will be appreciated. Improvements in version 2: Explicit mention of
fifth anomaly dimension; Added section on explainable anomaly detection;
Added section on variations on the anomaly concept; Various minor additions
and improvement
- …