158,961 research outputs found
Graphs in molecular biology
Graph theoretical concepts are useful for the description and analysis of interactions and relationships in biological systems. We give a brief introduction into some of the concepts and their areas of application in molecular biology. We discuss software that is available through the Bioconductor project and present a simple example application to the integration of a protein-protein interaction and a co-expression network
Principal manifolds and graphs in practice: from molecular biology to dynamical systems
We present several applications of non-linear data modeling, using principal
manifolds and principal graphs constructed using the metaphor of elasticity
(elastic principal graph approach). These approaches are generalizations of the
Kohonen's self-organizing maps, a class of artificial neural networks. On
several examples we show advantages of using non-linear objects for data
approximation in comparison to the linear ones. We propose four numerical
criteria for comparing linear and non-linear mappings of datasets into the
spaces of lower dimension. The examples are taken from comparative political
science, from analysis of high-throughput data in molecular biology, from
analysis of dynamical systems.Comment: 12 pages, 9 figure
Error Graphs and the Reconstruction of Elements in Groups
Packing and covering problems for metric spaces, and graphs in particular,
are of essential interest in combinatorics and coding theory. They are
formulated in terms of metric balls of vertices. We consider a new problem in
graph theory which is also based on the consideration of metric balls of
vertices, but which is distinct from the traditional packing and covering
problems. This problem is motivated by applications in information transmission
when redundancy of messages is not sufficient for their exact reconstruction,
and applications in computational biology when one wishes to restore an
evolutionary process. It can be defined as the reconstruction, or
identification, of an unknown vertex in a given graph from a minimal number of
vertices (erroneous or distorted patterns) in a metric ball of a given radius r
around the unknown vertex. For this problem it is required to find minimum
restrictions for such a reconstruction to be possible and also to find
efficient reconstruction algorithms under such minimal restrictions.
In this paper we define error graphs and investigate their basic properties.
A particular class of error graphs occurs when the vertices of the graph are
the elements of a group, and when the path metric is determined by a suitable
set of group elements. These are the undirected Cayley graphs. Of particular
interest is the transposition Cayley graph on the symmetric group which occurs
in connection with the analysis of transpositional mutations in molecular
biology. We obtain a complete solution of the above problems for the
transposition Cayley graph on the symmetric group.Comment: Journal of Combinatorial Theory A 200
Toll Based Measures for Dynamical Graphs
Biological networks are one of the most studied object in computational
biology. Several methods have been developed for studying qualitative
properties of biological networks. Last decade had seen the improvement of
molecular techniques that make quantitative analyses reachable. One of the
major biological modelling goals is therefore to deal with the quantitative
aspect of biological graphs. We propose a probabilistic model that suits with
this quantitative aspects. Our model combines graph with several dynamical
sources. It emphazises various asymptotic statistical properties that might be
useful for giving biological insightsComment: 11 page
Analysis of Three-Dimensional Protein Images
A fundamental goal of research in molecular biology is to understand protein
structure. Protein crystallography is currently the most successful method for
determining the three-dimensional (3D) conformation of a protein, yet it
remains labor intensive and relies on an expert's ability to derive and
evaluate a protein scene model. In this paper, the problem of protein structure
determination is formulated as an exercise in scene analysis. A computational
methodology is presented in which a 3D image of a protein is segmented into a
graph of critical points. Bayesian and certainty factor approaches are
described and used to analyze critical point graphs and identify meaningful
substructures, such as alpha-helices and beta-sheets. Results of applying the
methodologies to protein images at low and medium resolution are reported. The
research is related to approaches to representation, segmentation and
classification in vision, as well as to top-down approaches to protein
structure prediction.Comment: See http://www.jair.org/ for any accompanying file
Solving Hard Computational Problems Efficiently: Asymptotic Parametric Complexity 3-Coloring Algorithm
Many practical problems in almost all scientific and technological
disciplines have been classified as computationally hard (NP-hard or even
NP-complete). In life sciences, combinatorial optimization problems frequently
arise in molecular biology, e.g., genome sequencing; global alignment of
multiple genomes; identifying siblings or discovery of dysregulated pathways.In
almost all of these problems, there is the need for proving a hypothesis about
certain property of an object that can be present only when it adopts some
particular admissible structure (an NP-certificate) or be absent (no admissible
structure), however, none of the standard approaches can discard the hypothesis
when no solution can be found, since none can provide a proof that there is no
admissible structure. This article presents an algorithm that introduces a
novel type of solution method to "efficiently" solve the graph 3-coloring
problem; an NP-complete problem. The proposed method provides certificates
(proofs) in both cases: present or absent, so it is possible to accept or
reject the hypothesis on the basis of a rigorous proof. It provides exact
solutions and is polynomial-time (i.e., efficient) however parametric. The only
requirement is sufficient computational power, which is controlled by the
parameter . Nevertheless, here it is proved that the
probability of requiring a value of to obtain a solution for a
random graph decreases exponentially: , making
tractable almost all problem instances. Thorough experimental analyses were
performed. The algorithm was tested on random graphs, planar graphs and
4-regular planar graphs. The obtained experimental results are in accordance
with the theoretical expected results.Comment: Working pape
Dynamic load balancing for the distributed mining of molecular structures
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of
methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the
past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially
render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to
discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no
reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic
partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated
load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer
Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed
approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable
for large-scale, multi-domain, heterogeneous environments, such as computational grids
- …