753 research outputs found
Progresses and Challenges in Link Prediction
Link prediction is a paradigmatic problem in network science, which aims at
estimating the existence likelihoods of nonobserved links, based on known
topology. After a brief introduction of the standard problem and metrics of
link prediction, this Perspective will summarize representative progresses
about local similarity indices, link predictability, network embedding, matrix
completion, ensemble learning and others, mainly extracted from thousands of
related publications in the last decade. Finally, this Perspective will outline
some long-standing challenges for future studies.Comment: 45 pages, 1 tabl
Explainable Representations for Relation Prediction in Knowledge Graphs
Knowledge graphs represent real-world entities and their relations in a
semantically-rich structure supported by ontologies. Exploring this data with
machine learning methods often relies on knowledge graph embeddings, which
produce latent representations of entities that preserve structural and local
graph neighbourhood properties, but sacrifice explainability. However, in tasks
such as link or relation prediction, understanding which specific features
better explain a relation is crucial to support complex or critical
applications.
We propose SEEK, a novel approach for explainable representations to support
relation prediction in knowledge graphs. It is based on identifying relevant
shared semantic aspects (i.e., subgraphs) between entities and learning
representations for each subgraph, producing a multi-faceted and explainable
representation.
We evaluate SEEK on two real-world highly complex relation prediction tasks:
protein-protein interaction prediction and gene-disease association prediction.
Our extensive analysis using established benchmarks demonstrates that SEEK
achieves significantly better performance than standard learning representation
methods while identifying both sufficient and necessary explanations based on
shared semantic aspects.Comment: 16 pages, 3 figure
Reconstructing networks
Complex networks datasets often come with the problem of missing information:
interactions data that have not been measured or discovered, may be affected by
errors, or are simply hidden because of privacy issues. This Element provides
an overview of the ideas, methods and techniques to deal with this problem and
that together define the field of network reconstruction. Given the extent of
the subject, we shall focus on the inference methods rooted in statistical
physics and information theory. The discussion will be organized according to
the different scales of the reconstruction task, that is, whether the goal is
to reconstruct the macroscopic structure of the network, to infer its mesoscale
properties, or to predict the individual microscopic connections.Comment: 107 pages, 25 figure
Reconstructing networks
Complex networks datasets often come with the problem of missing information: interactions data that have not been measured or discovered, may be affected by errors, or are simply hidden because of privacy issues. This Element provides an overview of the ideas, methods and techniques to deal with this problem and that together define the field of network reconstruction. Given the extent of the subject, the authors focus on the inference methods rooted in statistical physics and information theory. The discussion is organized according to the different scales of the reconstruction task, that is, whether the goal is to reconstruct the macroscopic structure of the network, to infer its mesoscale properties, or to predict the individual microscopic connections
Discovering weak community structures in large biological networks
Identifying intrinsic structures in large networks is a fundamental problem in many fields, such as biology, engineering and social sciences. Motivated by biology applications, in this paper we are concerned with identifying community structures, which are densely connected sub-graphs, in large biological networks. We address several critical issues for finding community structures. First, biological networks directly constructed from experimental data often contain spurious edges and may also miss genuine connections. As a result, community structures in biological networks are often weak. We introduce simple operations to capture local neighborhood structures for identifying weak communities. Second, we consider the issue of automatically determining the most appropriate number of communities, a crucial problem for all clustering methods. This requires to properly evaluate the quality of community structures. We extend an existing work of a modularity function for evaluating community structures to weighted graphs. Third, we propose a spectral clustering algorithm to optimize the modularity function, and a greedy partitioning method to approximate the first algorithm with much reduced running time. We evaluate our methods on many networks of known structures, and apply them to three real-world networks that have different types of network communities: a yeast protein-protein interaction network, a co-expression network of yeast cell-cycle genes, and a collaboration network of bioinformaticians. The results show that our methods can find superb community structures and the correct numbers of communities. Our results reveal several interesting network structures that have not been reported previously
- …