15,735 research outputs found
Graph theoretic methods for the analysis of structural relationships in biological macromolecules
Subgraph isomorphism and maximum common subgraph isomorphism algorithms from graph theory provide an effective and an efficient way of identifying structural relationships between biological macromolecules. They thus provide a natural complement to the pattern matching algorithms that are used in bioinformatics to identify sequence relationships. Examples are provided of the use of graph theory to analyze proteins for which three-dimensional crystallographic or NMR structures are available, focusing on the use of the Bron-Kerbosch clique detection algorithm to identify common folding motifs and of the Ullmann subgraph isomorphism algorithm to identify patterns of amino acid residues. Our methods are also applicable to other types of biological macromolecule, such as carbohydrate and nucleic acid structures
Loom: Query-aware Partitioning of Online Graphs
As with general graph processing systems, partitioning data over a cluster of
machines improves the scalability of graph database management systems.
However, these systems will incur additional network cost during the execution
of a query workload, due to inter-partition traversals. Workload-agnostic
partitioning algorithms typically minimise the likelihood of any edge crossing
partition boundaries. However, these partitioners are sub-optimal with respect
to many workloads, especially queries, which may require more frequent
traversal of specific subsets of inter-partition edges. Furthermore, they
largely unsuited to operating incrementally on dynamic, growing graphs.
We present a new graph partitioning algorithm, Loom, that operates on a
stream of graph updates and continuously allocates the new vertices and edges
to partitions, taking into account a query workload of graph pattern
expressions along with their relative frequencies.
First we capture the most common patterns of edge traversals which occur when
executing queries. We then compare sub-graphs, which present themselves
incrementally in the graph update stream, against these common patterns.
Finally we attempt to allocate each match to single partitions, reducing the
number of inter-partition edges within frequently traversed sub-graphs and
improving average query performance.
Loom is extensively evaluated over several large test graphs with realistic
query workloads and various orderings of the graph updates. We demonstrate
that, given a workload, our prototype produces partitionings of significantly
better quality than existing streaming graph partitioning algorithms Fennel and
LDG
Pattern matching and pattern discovery algorithms for protein topologies
We describe algorithms for pattern matching and pattern
learning in TOPS diagrams (formal descriptions of protein topologies).
These problems can be reduced to checking for subgraph isomorphism
and finding maximal common subgraphs in a restricted class of ordered
graphs. We have developed a subgraph isomorphism algorithm for
ordered graphs, which performs well on the given set of data. The
maximal common subgraph problem then is solved by repeated
subgraph extension and checking for isomorphisms. Despite the
apparent inefficiency such approach gives an algorithm with time
complexity proportional to the number of graphs in the input set and is
still practical on the given set of data. As a result we obtain fast
methods which can be used for building a database of protein
topological motifs, and for the comparison of a given protein of known
secondary structure against a motif database
Domain discovery method for topological profile searches in protein structures
We describe a method for automated domain discovery for topological profile searches in protein
structures. The method is used in a system TOPStructure for fast prediction of CATH classification
for protein structures (given as PDB files). It is important for profile searches in multi-domain
proteins, for which the profile method by itself tends to perform poorly. We also present an
O(C(n)k +nk2) time algorithm for this problem, compared to the O(C(n)k +(nk)2) time used by
a trivial algorithm (where n is the length of the structure, k is the number of profiles and C(n) is the
time needed to check for a presence of a given motif in a structure of length n). This method has
been developed and is currently used for TOPS representations of protein structures and prediction
of CATH classification, but may be applied to other graph-based representations of protein or RNA
structures and/or other prediction problems. A protein structure prediction system incorporating
the domain discovery method is available at http://bioinf.mii.lu.lv/tops/
- …