Search CORE

302 research outputs found

Subgraph covers -- An information theoretic approach to motif analysis in networks

Author: Wegner Anatol E.
Publication venue: 'American Physical Society (APS)'
Publication date: 01/10/2014
Field of study

Many real world networks contain a statistically surprising number of certain subgraphs, called network motifs. In the prevalent approach to motif analysis, network motifs are detected by comparing subgraph frequencies in the original network with a statistical null model. In this paper we propose an alternative approach to motif analysis where network motifs are defined to be connectivity patterns that occur in a subgraph cover that represents the network using minimal total information. A subgraph cover is defined to be a set of subgraphs such that every edge of the graph is contained in at least one of the subgraphs in the cover. Some recently introduced random graph models that can incorporate significant densities of motifs have natural formulations in terms of subgraph covers and the presented approach can be used to match networks with such models. To prove the practical value of our approach we also present a heuristic for the resulting NP-hard optimization problem and give results for several real world networks.Comment: 10 pages, 7 tables, 1 Figur

arXiv.org e-Print Archive

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Directory of Open Access Journals

Qucosa - Publikationsserver der Universität Leipzig

Information content of colored motifs in complex networks

Author: Arend Hintze
Christoph Adami
Jifeng Qian
Kolmogorov A.
Matthew Rupp
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2011
Field of study

We study complex networks in which the nodes of the network are tagged with different colors depending on the functionality of the nodes (colored graphs), using information theory applied to the distribution of motifs in such networks. We find that colored motifs can be viewed as the building blocks of the networks (much more so than the uncolored structural motifs can be) and that the relative frequency with which these motifs appear in the network can be used to define the information content of the network. This information is defined in such a way that a network with random coloration (but keeping the relative number of nodes with different colors the same) has zero color information content. Thus, colored motif information captures the exceptionality of coloring in the motifs that is maintained via selection. We study the motif information content of the C. elegans brain as well as the evolution of colored motif information in networks that reflect the interaction between instructions in genomes of digital life organisms. While we find that colored motif information appears to capture essential functionality in the C. elegans brain (where the color assignment of nodes is straightforward) it is not obvious whether the colored motif information content always increases during evolution, as would be expected from a measure that captures network complexity. For a single choice of color assignment of instructions in the digital life form Avida, we find rather that colored motif information content increases or decreases during evolution, depending on how the genomes are organized, and therefore could be an interesting tool to dissect genomic rearrangements.Comment: 21 pages, 8 figures, to appear in Artificial Lif

arXiv.org e-Print Archive

CiteSeerX

Crossref

The Graph Motif problem parameterized by the structure of the input graph

Author: Bonnet Édouard
Sikora Florian
Publication venue: 'Elsevier BV'
Publication date: 01/01/2017
Field of study

The Graph Motif problem was introduced in 2006 in the context of biological networks. It consists of deciding whether or not a multiset of colors occurs in a connected subgraph of a vertex-colored graph. Graph Motif has been mostly analyzed from the standpoint of parameterized complexity. The main parameters which came into consideration were the size of the multiset and the number of colors. Though, in the many applications of Graph Motif, the input graph originates from real-life and has structure. Motivated by this prosaic observation, we systematically study its complexity relatively to graph structural parameters. For a wide range of parameters, we give new or improved FPT algorithms, or show that the problem remains intractable. For the FPT cases, we also give some kernelization lower bounds as well as some ETH-based lower bounds on the worst case running time. Interestingly, we establish that Graph Motif is W[1]-hard (while in W[P]) for parameter max leaf number, which is, to the best of our knowledge, the first problem to behave this way.Comment: 24 pages, accepted in DAM, conference version in IPEC 201

arXiv.org e-Print Archive

Coping with new Challenges in Clustering and Biomedical Imaging

Author: Oswald Annahita
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 21/07/2011
Field of study

The last years have seen a tremendous increase of data acquisition in different scientific fields such as molecular biology, bioinformatics or biomedicine. Therefore, novel methods are needed for automatic data processing and analysis of this large amount of data. Data mining is the process of applying methods like clustering or classification to large databases in order to uncover hidden patterns. Clustering is the task of partitioning points of a data set into distinct groups in order to minimize the intra cluster similarity and to maximize the inter cluster similarity. In contrast to unsupervised learning like clustering, the classification problem is known as supervised learning that aims at the prediction of group membership of data objects on the basis of rules learned from a training set where the group membership is known. Specialized methods have been proposed for hierarchical and partitioning clustering. However, these methods suffer from several drawbacks. In the first part of this work, new clustering methods are proposed that cope with problems from conventional clustering algorithms. ITCH (Information-Theoretic Cluster Hierarchies) is a hierarchical clustering method that is based on a hierarchical variant of the Minimum Description Length (MDL) principle which finds hierarchies of clusters without requiring input parameters. As ITCH may converge only to a local optimum we propose GACH (Genetic Algorithm for Finding Cluster Hierarchies) that combines the benefits from genetic algorithms with information-theory. In this way the search space is explored more effectively. Furthermore, we propose INTEGRATE a novel clustering method for data with mixed numerical and categorical attributes. Supported by the MDL principle our method integrates the information provided by heterogeneous numerical and categorical attributes and thus naturally balances the influence of both sources of information. A competitive evaluation illustrates that INTEGRATE is more effective than existing clustering methods for mixed type data. Besides clustering methods for single data objects we provide a solution for clustering different data sets that are represented by their skylines. The skyline operator is a well-established database primitive for finding database objects which minimize two or more attributes with an unknown weighting between these attributes. In this thesis, we define a similarity measure, called SkyDist, for comparing skylines of different data sets that can directly be integrated into different data mining tasks such as clustering or classification. The experiments show that SkyDist in combination with different clustering algorithms can give useful insights into many applications. In the second part, we focus on the analysis of high resolution magnetic resonance images (MRI) that are clinically relevant and may allow for an early detection and diagnosis of several diseases. In particular, we propose a framework for the classification of Alzheimer's disease in MR images combining the data mining steps of feature selection, clustering and classification. As a result, a set of highly selective features discriminating patients with Alzheimer and healthy people has been identified. However, the analysis of the high dimensional MR images is extremely time-consuming. Therefore we developed JGrid, a scalable distributed computing solution designed to allow for a large scale analysis of MRI and thus an optimized prediction of diagnosis. In another study we apply efficient algorithms for motif discovery to task-fMRI scans in order to identify patterns in the brain that are characteristic for patients with somatoform pain disorder. We find groups of brain compartments that occur frequently within the brain networks and discriminate well among healthy and diseased people

Digitale Hochschulschriften der LMU

Temporal Networks

Author: Alon
Anderson
Bajardi
Bajardi
Bansal
Barabási
Barrat
Barrat
Barthélemy
Barthélemy
Bassett
Bearman
Berman
Blonder
Boguñá
Braha
Bui Xuan
Bullmore
Candia
Carley
Cattuto
Chechik
Cheng
Cohen
Cooke
Croft
da Fontoura Costa
de Vico Fallani
Dimitriadis
Eagle
Easley
Eckmann
Farrel
Ferreira
Fortunato
Gautreau
Ghosh
Goh
Gracia
Grindrod
Gross
Gunturi
Hachul
Han
Harary
Harris
Hethcote
Hill
Holme
Holme
Holme
Iribarren
Iribarren
Isella
Isella
Jackson
Jari Saramäki
Jo
Jo
Johansen
Kamp
Karsai
Kauppi
Kempe
Kenah
Kimmel
Kleinberg
Kolar
Komurov
Kostakos
Kovanen
Kretzschmar
Kuhn
Kumpula
Lahiri
Lahiri
Lamport
Liben-Nowell
Liljeros
Liljeros
Liljeros
Lusseau
Lèbre
Lèbre
Malmgren
Malmgren
Medo
Min
Miritello
Moody
Morris
Mucha
Newman
Newman
Nordvik
Oliveira
Onnela
Pahl-Wostl
Palla
Palsson
Pan
Panisson
Park
Pascual
Pastor-Satorras
Pastor-Satorras
Petter Holme
Przytycka
Rao
Riolo
Robins
Rocha
Rocha
Ronhovde
Rosvall
Snijders
Snijders
Sporns
Stehlé
Stehlé
Stehlé
Sundaresan
Szendroi
Takaguchi
Tang
Taylor
Turova
Ueno
Ulanowicz
V Solé
Valencia
Vazquez
Vernon
Volz
Wasserman
Watts
Wu
Yang
Yasseri
Yoshida
Yoshida
Zhao
Zhao
Zhou
Publication venue: 'Elsevier BV'
Publication date: 15/12/2011
Field of study

A great variety of systems in nature, society and technology -- from the web of sexual contacts to the Internet, from the nervous system to power grids -- can be modeled as graphs of vertices coupled by edges. The network structure, describing how the graph is wired, helps us understand, predict and optimize the behavior of dynamical systems. In many cases, however, the edges are not continuously active. As an example, in networks of communication via email, text messages, or phone calls, edges represent sequences of instantaneous or practically instantaneous contacts. In some cases, edges are active for non-negligible periods of time: e.g., the proximity patterns of inpatients at hospitals can be represented by a graph where an edge between two individuals is on throughout the time they are at the same ward. Like network topology, the temporal structure of edge activations can affect dynamics of systems interacting through the network, from disease contagion on the network of patients to information diffusion over an e-mail network. In this review, we present the emergent field of temporal networks, and discuss methods for analyzing topological and temporal structure and models for elucidating their relation to the behavior of dynamical systems. In the light of traditional network theory, one can see this framework as moving the information of when things happen from the dynamical system on the network, to the network itself. Since fundamental properties, such as the transitivity of edges, do not necessarily hold in temporal networks, many of these methods need to be quite different from those for static networks

arXiv.org e-Print Archive

Crossref

Publikationer från Umeå universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

CERN Document Server

Balanced Connected Subgraph Problem in Geometric Intersection Graphs

Author: HL Bodlaender
HL Bodlaender
HL Bodlaender
MR Fellows
MR Fellows
MR Garey
N Alon
S Bhore
T Kikuno
V Lacroix
V Lacroix
É Bonnet
Publication venue
Publication date: 09/09/2019
Field of study

We study the Balanced Connected Subgraph(shortly, BCS) problem on geometric intersection graphs such as interval, circular-arc, permutation, unit-disk, outer-string graphs, etc. Given a vertex-colored graph

G=(V,E)

, where each vertex in

V

is colored with either ``red'' or ``blue'', the BCS problem seeks a maximum cardinality induced connected subgraph

H

G

such that

H

is color-balanced, i.e.,

H

contains an equal number of red and blue vertices. We study the computational complexity landscape of the BCS problem while considering geometric intersection graphs. On one hand, we prove that the BCS problem is NP-hard on the unit disk, outer-string, complete grid, and unit square graphs. On the other hand, we design polynomial-time algorithms for the BCS problem on interval, circular-arc and permutation graphs. In particular, we give algorithm for the Steiner Tree problem on both the interval graphs and circular arc graphs, that is used as a subroutine for solving BCS problem on same graph classes. Finally, we present a FPT algorithm for the BCS problem on general graphs.Comment: 17 pages, 3 figure

arXiv.org e-Print Archive

Crossref

Mining subjectively interesting patterns in rich data

Author: Deng Junning
Publication venue: Universiteit Gent. Faculteit Ingenieurswetenschappen en Architectuur
Publication date: 01/01/2021
Field of study

Ghent University Academic Bibliography

Towards comprehensive structural motif mining for better fold annotation in the "twilight zone" of sequence dissimilarity

Author: Jintao Zhang
Jun Huan
Leonidas N. Carayannopoulos
Leonidas N. Carayannopoulos
Vincent Buhr
Yi Jia
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Automatic identification of structure fingerprints from a group of diverse protein structures is challenging, especially for proteins whose divergent amino acid sequences may fall into the “twilight-” or “midnight– ” zones where pair-wise sequence identities to known sequences fall below 25 % and sequence-based functional annotations often fail. Results: Here we report a novel graph database mining method and demonstrate its application to protein structure pattern identification and structure classification. The biologic motivation of our study is to recognize common structure patterns in “immunoevasins”, proteins mediating virus evasion of host immune defense. Our experimental study, using both viral and non-viral proteins, demonstrates the efficiency and efficacy of the proposed method. Conclusions: We present a theoretic framework, offer a practical software implementation for incorporating prior domain knowledge, such as substitution matrices as studied here, and devise an efficient algorithm to identify approximate matched frequent subgraphs. By doing so, we significantly expanded the analytical power of sophisticated data mining algorithms in dealing with large volume of complicated and noisy protein structure data. And without loss of generality, choice of appropriate compatibility matrices allows our method to be easily employed in domains where subgraph labels have some uncertainty

CiteSeerX

Crossref

Springer - Publisher Connector

KU ScholarWorks

PubMed Central

Digital Commons@Becker