27 research outputs found

    Comparing biological networks via graph compression

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Comparison of various kinds of biological data is one of the main problems in bioinformatics and systems biology. Data compression methods have been applied to comparison of large sequence data and protein structure data. Since it is still difficult to compare global structures of large biological networks, it is reasonable to try to apply data compression methods to comparison of biological networks. In existing compression methods, the uniqueness of compression results is not guaranteed because there is some ambiguity in selection of overlapping edges.</p> <p>Results</p> <p>This paper proposes novel efficient methods, CompressEdge and CompressVertices, for comparing large biological networks. In the proposed methods, an original network structure is compressed by iteratively contracting identical edges and sets of connected edges. Then, the similarity of two networks is measured by a compression ratio of the concatenated networks. The proposed methods are applied to comparison of metabolic networks of several organisms, <it>H. sapiens, M. musculus, A. thaliana, D. melanogaster, C. elegans, E. coli, S. cerevisiae,</it> and <it>B. subtilis,</it> and are compared with an existing method. These results suggest that our methods can efficiently measure the similarities between metabolic networks.</p> <p>Conclusions</p> <p>Our proposed algorithms, which compress node-labeled networks, are useful for measuring the similarity of large biological networks.</p

    Retrieval, alignment, and clustering of computational models based on semantic annotations

    Get PDF
    As the number of computational systems biology models increases, new methods are needed to explore their content and build connections with experimental data. In this Perspective article, the authors propose a flexible semantic framework that can help achieve these aims

    Defining genes: a computational framework

    Get PDF
    The precise elucidation of the gene concept has become the subject of intense discussion in light of results from several, large high-throughput surveys of transcriptomes and proteomes. In previous work, we proposed an approach for constructing gene concepts that combines genomic heritability with elements of function. Here, we introduce a definition of the gene within a computational framework of cellular interactions. The definition seeks to satisfy the practical requirements imposed by annotation, capture logical aspects of regulation, and encompass the evolutionary property of homology

    Metabolic pathway alignment between species using a comprehensive and flexible similarity measure

    Get PDF
    Comparative analysis of metabolic networks in multiple species yields important information on their evolution, and has great practical value in metabolic engineering, human disease analysis, drug design etc. In this work, we aim to systematically search for conserved pathways in two species, quantify their similarities, and focus on the variations between themElectrical Engineering, Mathematics and Computer Scienc

    Semantic Similarity for Automatic Classification of Chemical Compounds

    Get PDF
    With the increasing amount of data made available in the chemical field, there is a strong need for systems capable of comparing and classifying chemical compounds in an efficient and effective way. The best approaches existing today are based on the structure-activity relationship premise, which states that biological activity of a molecule is strongly related to its structural or physicochemical properties. This work presents a novel approach to the automatic classification of chemical compounds by integrating semantic similarity with existing structural comparison methods. Our approach was assessed based on the Matthews Correlation Coefficient for the prediction, and achieved values of 0.810 when used as a prediction of blood-brain barrier permeability, 0.694 for P-glycoprotein substrate, and 0.673 for estrogen receptor binding activity. These results expose a significant improvement over the currently existing methods, whose best performances were 0.628, 0.591, and 0.647 respectively. It was demonstrated that the integration of semantic similarity is a feasible and effective way to improve existing chemical compound classification systems. Among other possible uses, this tool helps the study of the evolution of metabolic pathways, the study of the correlation of metabolic networks with properties of those networks, or the improvement of ontologies that represent chemical information

    Propagating semantic information in biochemical network models

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To enable automatic searches, alignments, and model combination, the elements of systems biology models need to be compared and matched across models. Elements can be identified by machine-readable biological annotations, but assigning such annotations and matching non-annotated elements is tedious work and calls for automation.</p> <p>Results</p> <p>A new method called "semantic propagation" allows the comparison of model elements based not only on their own annotations, but also on annotations of surrounding elements in the network. One may either propagate feature vectors, describing the annotations of individual elements, or quantitative similarities between elements from different models. Based on semantic propagation, we align partially annotated models and find annotations for non-annotated model elements.</p> <p>Conclusions</p> <p>Semantic propagation and model alignment are included in the open-source library semanticSBML, available on sourceforge. Online services for model alignment and for annotation prediction can be used at <url>http://www.semanticsbml.org</url>.</p

    Optimized ancestral state reconstruction using Sankoff parsimony

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Parsimony methods are widely used in molecular evolution to estimate the most plausible phylogeny for a set of characters. Sankoff parsimony determines the minimum number of changes required in a given phylogeny when a cost is associated to transitions between character states. Although optimizations exist to reduce the computations in the number of taxa, the original algorithm takes time <it>O</it>(<it>n</it><sup>2</sup>) in the number of states, making it impractical for large values of <it>n</it>.</p> <p>Results</p> <p>In this study we introduce an optimization of Sankoff parsimony for the reconstruction of ancestral states when ultrametric or additive cost matrices are used. We analyzed its performance for randomly generated matrices, Jukes-Cantor and Kimura's two-parameter models of DNA evolution, and in the reconstruction of elongation factor-1<it>α </it>and ancestral metabolic states of a group of eukaryotes, showing that in all cases the execution time is significantly less than with the original implementation.</p> <p>Conclusion</p> <p>The algorithms here presented provide a fast computation of Sankoff parsimony for a given phylogeny. Problems where the number of states is large, such as reconstruction of ancestral metabolism, are particularly adequate for this optimization. Since we are reducing the computations required to calculate the parsimony cost of a single tree, our method can be combined with optimizations in the number of taxa that aim at finding the most parsimonious tree.</p

    Trends in life science grid: from computing grid to knowledge grid

    Get PDF
    BACKGROUND: Grid computing has great potential to become a standard cyberinfrastructure for life sciences which often require high-performance computing and large data handling which exceeds the computing capacity of a single institution. RESULTS: This survey reviews the latest grid technologies from the viewpoints of computing grid, data grid and knowledge grid. Computing grid technologies have been matured enough to solve high-throughput real-world life scientific problems. Data grid technologies are strong candidates for realizing "resourceome" for bioinformatics. Knowledge grids should be designed not only from sharing explicit knowledge on computers but also from community formulation for sharing tacit knowledge among a community. CONCLUSION: Extending the concept of grid from computing grid to knowledge grid, it is possible to make use of a grid as not only sharable computing resources, but also as time and place in which people work together, create knowledge, and share knowledge and experiences in a community

    SubMAP: Aligning Metabolic Pathways with Subnetwork Mappings

    No full text
    We consider the problem of aligning two metabolic pathways. Unlike traditional approaches, we do not restrict the alignment to one-to-one mappings between the molecules (nodes) of the input pathways (graphs). We follow the observation that, in nature, different organisms can perform the same or similar functions through different sets of reactions and molecules. The number and the topology of the molecules in these alternative sets often vary from one organism to another. With the motivation that an accurate biological alignment should be able to reveal these functionally similar molecule sets across different species, we develop an algorithm that first measures the similarities between different nodes using a mixture of homology and topological similarity. We combine the two metrics by employing an eigenvalue formulation. We then search for an alignment between the two input pathways that maximizes a similarity score, evaluated as the sum of the similarities of the mapped subnetworks of size at most a given integer k, and also does not contain any conflicting mappings. Here we prove that this maximization is NP-hard by a reduction from the maximum weight independent set (MWIS) problem. We then convert our problem to an instance of MWIS and use an efficient vertex-selection strategy to extract the mappings that constitute our alignment. We name our algorithm SubMAP (Subnetwork Mappings in Alignment of Pathways). We evaluate its accuracy and performance on real datasets. Our empirical results demonstrate that SubMAP can identify biologically relevant mappings that are missed by traditional alignment methods. Furthermore, we observe that SubMAP is scalable for metabolic pathways of arbitrary topology, including searching for a query pathway of size 70 against the complete KEGG database of 1,842 pathways. Implementation in C++ is available at http://bioinformatics.cise.ufl.edu/SubMAP.html.National Science Foundation (U.S.) (Grant CCF-0829867)National Science Foundation (U.S.) (Grant IIS-0845439
    corecore