899 research outputs found
An Introductory Guide to Aligning Networks Using SANA, the Simulated Annealing Network Aligner.
Sequence alignment has had an enormous impact on our understanding of biology, evolution, and disease. The alignment of biological networks holds similar promise. Biological networks generally model interactions between biomolecules such as proteins, genes, metabolites, or mRNAs. There is strong evidence that the network topology-the "structure" of the network-is correlated with the functions performed, so that network topology can be used to help predict or understand function. However, unlike sequence comparison and alignment-which is an essentially solved problem-network comparison and alignment is an NP-complete problem for which heuristic algorithms must be used.Here we introduce SANA, the Simulated Annealing Network Aligner. SANA is one of many algorithms proposed for the arena of biological network alignment. In the context of global network alignment, SANA stands out for its speed, memory efficiency, ease-of-use, and flexibility in the arena of producing alignments between two or more networks. SANA produces better alignments in minutes on a laptop than most other algorithms can produce in hours or days of CPU time on large server-class machines. We walk the user through how to use SANA for several types of biomolecular networks
Matched Filters for Noisy Induced Subgraph Detection
The problem of finding the vertex correspondence between two noisy graphs
with different number of vertices where the smaller graph is still large has
many applications in social networks, neuroscience, and computer vision. We
propose a solution to this problem via a graph matching matched filter:
centering and padding the smaller adjacency matrix and applying graph matching
methods to align it to the larger network. The centering and padding schemes
can be incorporated into any algorithm that matches using adjacency matrices.
Under a statistical model for correlated pairs of graphs, which yields a noisy
copy of the small graph within the larger graph, the resulting optimization
problem can be guaranteed to recover the true vertex correspondence between the
networks.
However, there are currently no efficient algorithms for solving this
problem. To illustrate the possibilities and challenges of such problems, we
use an algorithm that can exploit a partially known correspondence and show via
varied simulations and applications to {\it Drosophila} and human connectomes
that this approach can achieve good performance.Comment: 41 pages, 7 figure
Matched filters for noisy induced subgraph detection
First author draftWe consider the problem of finding the vertex correspondence between two graphs with different number of vertices where the smaller graph is still potentially large. We propose a solution to this problem via a graph matching matched filter: padding the smaller graph in different ways and then using graph matching methods to align it to the larger network. Under a statistical model for correlated pairs of graphs, which yields a noisy copy of the small graph within the larger graph, the resulting optimization problem can be guaranteed to recover the true vertex correspondence between the networks, though there are currently no efficient algorithms for solving this problem. We consider an approach that exploits a partially known correspondence and show via varied simulations and applications to the Drosophila connectome that in practice this approach can achieve good performance.https://arxiv.org/abs/1803.02423https://arxiv.org/abs/1803.0242
Parallel Maximum Clique Algorithms with Applications to Network Analysis and Storage
We propose a fast, parallel maximum clique algorithm for large sparse graphs
that is designed to exploit characteristics of social and information networks.
The method exhibits a roughly linear runtime scaling over real-world networks
ranging from 1000 to 100 million nodes. In a test on a social network with 1.8
billion edges, the algorithm finds the largest clique in about 20 minutes. Our
method employs a branch and bound strategy with novel and aggressive pruning
techniques. For instance, we use the core number of a vertex in combination
with a good heuristic clique finder to efficiently remove the vast majority of
the search space. In addition, we parallelize the exploration of the search
tree. During the search, processes immediately communicate changes to upper and
lower bounds on the size of maximum clique, which occasionally results in a
super-linear speedup because vertices with large search spaces can be pruned by
other processes. We apply the algorithm to two problems: to compute temporal
strong components and to compress graphs.Comment: 11 page
Maximum Common Subgraph Isomorphism Algorithms
Maximum common subgraph (MCS) isomorphism algorithms play an important role in chemoinformatics by providing an effective mechanism for the alignment of pairs of chemical structures. This article discusses the various types of MCS that can be identified when two graphs are compared and reviews some of the algorithms that are available for this purpose, focusing on those that are, or may be, applicable to the matching of chemical graphs
A Linear Network Code Construction for General Integer Connections Based on the Constraint Satisfaction Problem
The problem of finding network codes for general connections is inherently
difficult in capacity constrained networks. Resource minimization for general
connections with network coding is further complicated. Existing methods for
identifying solutions mainly rely on highly restricted classes of network
codes, and are almost all centralized. In this paper, we introduce linear
network mixing coefficients for code constructions of general connections that
generalize random linear network coding (RLNC) for multicast connections. For
such code constructions, we pose the problem of cost minimization for the
subgraph involved in the coding solution and relate this minimization to a
path-based Constraint Satisfaction Problem (CSP) and an edge-based CSP. While
CSPs are NP-complete in general, we present a path-based probabilistic
distributed algorithm and an edge-based probabilistic distributed algorithm
with almost sure convergence in finite time by applying Communication Free
Learning (CFL). Our approach allows fairly general coding across flows,
guarantees no greater cost than routing, and shows a possible distributed
implementation. Numerical results illustrate the performance improvement of our
approach over existing methods.Comment: submitted to TON (conference version published at IEEE GLOBECOM 2015
Detection of large exact subgraph isomorphisms with a topology-only graphlet index built using deterministic walks
We introduce the first algorithm to perform topology-only local graph
matching (a.k.a. local network alignment or subgraph isomorphism): BLANT, for
Basic Local Alignment of Network Topology. BLANT first creates a limited,
high-specificity index of a single graph containing connected k-node induced
subgraphs called k-graphlets, for k=6-15. The index is constructed in a
deterministic way such that, if significant common network topology exists
between two networks, their indexes are likely to overlap. This is the key
insight which allows BLANT to discover alignments using only topological
information. To align two networks, BLANT queries their respective indexes to
form large, high quality local alignments. BLANT is able to discover highly
topologically similar alignments (S3 >= 0.95) of up to 150 node-pairs for which
up to 50% of node pairs differ from their "assigned" global counterpart. These
results compare favorably against the baseline, a state-of-the-art local
alignment algorithm which was adapted to be topology-only. Such alignments are
3x larger and differ 30% more (additive) more from the global alignment than
alignments of similar topological similarity (S3 >= 0.95) discovered by the
baseline. We hope that such regions of high local similarity and low global
similarity may provide complementary insights to global alignment algorithms.Comment: 13 pages, 11 figures, 4 table
Graph-Based Approaches to Protein StructureComparison - From Local to Global Similarity
The comparative analysis of protein structure data is a central aspect of structural bioinformatics. Drawing upon structural information allows the inference of function for unknown proteins even in cases where no apparent homology can be found on the sequence level.
Regarding the function of an enzyme, the overall fold topology might less important than the specific structural conformation of the catalytic site or the surface region of a protein, where the interaction with other molecules, such as binding partners, substrates and ligands occurs. Thus, a comparison of these regions is especially interesting for functional inference, since structural constraints imposed by the demands of the catalyzed biochemical function make them more likely to exhibit structural similarity. Moreover, the comparative analysis of protein binding sites is of special interest in pharmaceutical chemistry, in order to predict cross-reactivities and gain a deeper understanding of the catalysis mechanism.
From an algorithmic point of view, the comparison of structured data, or, more generally, complex objects, can be attempted based on different methodological principles. Global methods aim at comparing structures as a whole, while local methods transfer the problem to multiple comparisons of local substructures. In the context of protein structure analysis, it is not a priori clear, which strategy is more suitable.
In this thesis, several conceptually different algorithmic approaches have been developed, based on local, global and semi-global strategies, for the task of comparing protein structure data, more specifically protein binding pockets. The use of graphs for the modeling of protein structure data has a long standing tradition in structural bioinformatics. Recently, graphs have been used to model the geometric constraints of protein binding sites. The algorithms developed in this thesis are based on this modeling concept, hence, from a computer scientist's point of view, they can also be regarded as global, local and semi-global approaches to graph comparison. The developed algorithms were mainly designed on the premise to allow for a more approximate comparison of protein binding sites, in order to account for the molecular flexibility of the protein structures. A main motivation was to allow for the detection of more remote similarities, which are not apparent by using more rigid methods. Subsequently, the developed approaches were applied to different problems typically encountered in the field of structural bioinformatics in order to assess and compare their performance and suitability for different problems.
Each of the approaches developed during this work was capable of improving upon the performance of existing methods in the field. Another major aspect in the experiments was the question, which methodological concept, local, global or a combination of both, offers the most benefits for the specific task of protein binding site comparison, a question that is addressed throughout this thesis
- …