2,147 research outputs found

    Accelerated search for biomolecular network models to interpret high-throughput experimental data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The functions of human cells are carried out by biomolecular networks, which include proteins, genes, and regulatory sites within DNA that encode and control protein expression. Models of biomolecular network structure and dynamics can be inferred from high-throughput measurements of gene and protein expression. We build on our previously developed fuzzy logic method for bridging quantitative and qualitative biological data to address the challenges of noisy, low resolution high-throughput measurements, i.e., from gene expression microarrays. We employ an evolutionary search algorithm to accelerate the search for hypothetical fuzzy biomolecular network models consistent with a biological data set. We also develop a method to estimate the probability of a potential network model fitting a set of data by chance. The resulting metric provides an estimate of both model quality and dataset quality, identifying data that are too noisy to identify meaningful correlations between the measured variables.</p> <p>Results</p> <p>Optimal parameters for the evolutionary search were identified based on artificial data, and the algorithm showed scalable and consistent performance for as many as 150 variables. The method was tested on previously published human cell cycle gene expression microarray data sets. The evolutionary search method was found to converge to the results of exhaustive search. The randomized evolutionary search was able to converge on a set of similar best-fitting network models on different training data sets after 30 generations running 30 models per generation. Consistent results were found regardless of which of the published data sets were used to train or verify the quantitative predictions of the best-fitting models for cell cycle gene dynamics.</p> <p>Conclusion</p> <p>Our results demonstrate the capability of scalable evolutionary search for fuzzy network models to address the problem of inferring models based on complex, noisy biomolecular data sets. This approach yields multiple alternative models that are consistent with the data, yielding a constrained set of hypotheses that can be used to optimally design subsequent experiments.</p

    Tackling Exascale Software Challenges in Molecular Dynamics Simulations with GROMACS

    Full text link
    GROMACS is a widely used package for biomolecular simulation, and over the last two decades it has evolved from small-scale efficiency to advanced heterogeneous acceleration and multi-level parallelism targeting some of the largest supercomputers in the world. Here, we describe some of the ways we have been able to realize this through the use of parallelization on all levels, combined with a constant focus on absolute performance. Release 4.6 of GROMACS uses SIMD acceleration on a wide range of architectures, GPU offloading acceleration, and both OpenMP and MPI parallelism within and between nodes, respectively. The recent work on acceleration made it necessary to revisit the fundamental algorithms of molecular simulation, including the concept of neighborsearching, and we discuss the present and future challenges we see for exascale simulation - in particular a very fine-grained task parallelism. We also discuss the software management, code peer review and continuous integration testing required for a project of this complexity.Comment: EASC 2014 conference proceedin

    Computational Approaches To Anti-Toxin Therapies And Biomarker Identification

    Get PDF
    This work describes the fundamental study of two bacterial toxins with computational methods, the rational design of a potent inhibitor using molecular dynamics, as well as the development of two bioinformatic methods for mining genomic data. Clostridium difficile is an opportunistic bacillus which produces two large glucosylating toxins. These toxins, TcdA and TcdB cause severe intestinal damage. As Clostridium difficile harbors considerable antibiotic resistance, one treatment strategy is to prevent the tissue damage that the toxins cause. The catalytic glucosyltransferase domain of TcdA and TcdB was studied using molecular dynamics in the presence of both a protein-protein binding partner and several substrates. These experiments were combined with lead optimization techniques to create a potent irreversible inhibitor which protects 95% of cells in vitro. Dynamics studies on a TcdB cysteine protease domain were performed to an allosteric communication pathway. Comparative analysis of the static and dynamic properties of the TcdA and TcdB glucosyltransferase domains were carried out to determine the basis for the differential lethality of these toxins. Large scale biological data is readily available in the post-genomic era, but it can be difficult to effectively use that data. Two bioinformatics methods were developed to process whole-genome data. Software was developed to return all genes containing a motif in single genome. This provides a list of genes which may be within the same regulatory network or targeted by a specific DNA binding factor. A second bioinformatic method was created to link the data from genome-wide association studies (GWAS) to specific genes. GWAS studies are frequently subjected to statistical analysis, but mutations are rarely investigated structurally. HyDn-SNP-S allows a researcher to find mutations in a gene that correlate to a GWAS studied phenotype. Across human DNA polymerases, this resulted in strongly predictive haplotypes for breast and prostate cancer. Molecular dynamics applied to DNA Polymerase Lambda suggested a structural explanation for the decrease in polymerase fidelity with that mutant. When applied to Histone Deacetylases, mutations were found that alter substrate binding, and post-translational modification

    Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling

    Full text link
    The U.S. Department of Energy recently announced the first five grants for the Genomes to Life (GTL) Program. The goal of this program is to "achieve the most far-reaching of all biological goals: a fundamental, comprehensive, and systematic understanding of life." While more information about the program can be found at the GTL website (www.doegenomestolife.org), this paper provides an overview of one of the five GTL projects funded, "Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling." This project is a combined experimental and computational effort emphasizing developing, prototyping, and applying new computational tools and methods to ellucidate the biochemical mechanisms of the carbon sequestration of Synechococcus Sp., an abundant marine cyanobacteria known to play an important role in the global carbon cycle. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO2 are important terms in the global environmental response to anthropogenic atmospheric inputs of CO2 and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. The project includes five subprojects: an experimental investigation, three computational biology efforts, and a fifth which deals with addressing computational infrastructure challenges of relevance to this project and the Genomes to Life program as a whole. Our experimental effort is designed to provide biology and data to drive the computational efforts and includes significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microarray experiments. Our computational efforts include coupling molecular simulation methods with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes and developing a set of novel capabilities for inference of regulatory pathways in microbial genomes across multiple sources of information through the integration of computational and experimental technologies. These capabilities will be applied to Synechococcus regulatory pathways to characterize their interaction map and identify component proteins in these pathways. We will also investigate methods for combining experimental and computational results with visualization and natural language tools to accelerate discovery of regulatory pathways. Furthermore, given that the ultimate goal of this effort is to develop a systems-level of understanding of how the Synechococcus genome affects carbon fixation at the global scale, we will develop and apply a set of tools for capturing the carbon fixation behavior of complex of Synechococcus at different levels of resolution. Finally, because the explosion of data being produced by high-throughput experiments requires data analysis and models which are more computationally complex, more heterogeneous, and require coupling to ever increasing amounts of experimentally obtained data in varying formats, we have also established a companion computational infrastructure to support this effort as well as the Genomes to Life program as a whole.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/63164/1/153623102321112746.pd

    Pathway and network analysis in proteomics

    Get PDF
    Proteomics is inherently a systems science that studies not only measured protein and their expressions in a cell, but also the interplay of proteins, protein complexes, signaling pathways, and network modules. There is a rapid accumulation of Proteomics data in recent years. However, Proteomics data are highly variable, with results sensitive to data preparation methods, sample condition, instrument types, and analytical methods. To address the challenge in Proteomics data analysis, we review current tools being developed to incorporate biological function and network topological information. We categorize these tools into four types: tools with basic functional information and little topological features (e.g., GO category analysis), tools with rich functional information and little topological features (e.g., GSEA), tools with basic functional information and rich topological features (e.g., Cytoscape), and tools with rich functional information and rich topological features (e.g., PathwayExpress). We first review the potential application of these tools to Proteomics; then we review tools that can achieve automated learning of pathway modules and features, and tools that help perform integrated network visual analytics

    Predicting biomolecular function from 3D dynamics : sequence-sensitive coarse-grained elastic network model coupled to machine learning

    Full text link
    La dynamique structurelle des biomolécules est intimement liée à leur fonction, mais très coûteuse à étudier expériementalement. Pour cette raison, de nombreuses méthodologies computationnelles ont été développées afin de simuler la dynamique structurelle biomoléculaire. Toutefois, lorsque l'on s'intéresse à la modélisation des effects de milliers de mutations, les méthodes de simulations classiques comme la dynamique moléculaire, que ce soit à l'échelle atomique ou gros-grain, sont trop coûteuses pour la majorité des applications. D'autre part, les méthodes d'analyse de modes normaux de modèles de réseaux élastiques gros-grain (ENM pour "elastic network model") sont très rapides et procurent des solutions analytiques comprenant toutes les échelles de temps. Par contre, la majorité des ENMs considèrent seulement la géométrie du squelette biomoléculaire, ce qui en fait de mauvais choix pour étudier les effets de mutations qui ne changeraient pas cette géométrie. Le "Elastic Network Contact Model" (ENCoM) est le premier ENM sensible à la séquence de la biomolécule à l'étude, ce qui rend possible son utilisation pour l'exploration efficace d'espaces conformationnels complets de variants de séquence. La présente thèse introduit le pipeline computationel ENCoM-DynaSig-ML, qui réduit les espaces conformationnels prédits par ENCoM à des Signatures Dynamiques qui sont ensuite utilisées pour entraîner des modèles d'apprentissage machine simples. ENCoM-DynaSig-ML est capable de prédire la fonction de variants de séquence avec une précision significative, est complémentaire à toutes les méthodes existantes, et peut générer de nouvelles hypothèses à propos des éléments importants de dynamique structurelle pour une fonction moléculaire donnée. Nous présentons trois exemples d'étude de relations séquence-dynamique-fonction: la maturation des microARN, le potentiel d'activation de ligands du récepteur mu-opioïde et l'efficacité enzymatique de l'enzyme VIM-2 lactamase. Cette application novatrice de l'analyse des modes normaux est rapide, demandant seulement quelques secondes de temps de calcul par variant de séquence, et est généralisable à toute biomolécule pour laquelle des données expérimentale de mutagénèse sont disponibles.The dynamics of biomolecules are intimately tied to their functions but experimentally elusive, making their computational study attractive. When modelling the effects of thousands of mutations, time-stepping methods such as classical or enhanced sampling molecular dynamics are too costly for most applications. On the other hand, normal mode analysis of coarse-grained elastic network models (ENMs) provides fast analytical dynamics spanning all timescales. However, the vast majority of ENMs consider backbone geometry alone, making them a poor choice to study point mutations which do not affect the equilibrium structure. The Elastic Network Contact Model (ENCoM) is the first sequence-sensitive ENM, enabling its use for the efficient exploration of full conformational spaces from sequence variants. The present work introduces the ENCoM-DynaSig-ML computational pipeline, in which the ENCoM conformational spaces are reduced to Dynamical Signatures and coupled to simple machine learning algorithms. ENCoM-DynaSig-ML predicts the function of sequence variants with significant accuracy, is complementary to all existing methods, and can generate new hypotheses about which dynamical features are important for the studied biomolecule's function. Examples given are the maturation efficiency of microRNA variants, the activation potential of mu-opioid receptor ligands and the effect of point mutations on VIM-2 lactamase's enzymatic efficiency. This novel application of normal mode analysis is very fast, taking a few seconds CPU time per variant, and is generalizable to any biomolecule on which experimental mutagenesis data exist

    Bridging molecular docking to molecular dynamics in exploring ligand-protein recognition process: An overview

    Get PDF
    Computational techniques have been applied in the drug discovery pipeline since the 1980s. Given the low computational resources of the time, the first molecular modeling strategies relied on a rigid view of the ligand-target binding process. During the years, the evolution of hardware technologies has gradually allowed simulating the dynamic nature of the binding event. In this work, we present an overview of the evolution of structure-based drug discovery techniques in the study of ligand-target recognition phenomenon, going from the static molecular docking toward enhanced molecular dynamics strategies

    Computer-Aided Drug Design and Drug Discovery: A Prospective Analysis

    Get PDF
    In the dynamic landscape of drug discovery, Computer-Aided Drug Design (CADD) emerges as a transformative force, bridging the realms of biology and technology. This paper overviews CADDs historical evolution, categorization into structure-based and ligand-based approaches, and its crucial role in rationalizing and expediting drug discovery. As CADD advances, incorporating diverse biological data and ensuring data privacy become paramount. Challenges persist, demanding the optimization of algorithms and robust ethical frameworks. Integrating Machine Learning and Artificial Intelligence amplifies CADDs predictive capabilities, yet ethical considerations and scalability challenges linger. Collaborative efforts and global initiatives, exemplified by platforms like Open-Source Malaria, underscore the democratization of drug discovery. The convergence of CADD with personalized medicine offers tailored therapeutic solutions, though ethical dilemmas and accessibility concerns must be navigated. Emerging technologies like quantum computing, immersive technologies, and green chemistry promise to redefine the future of CADD. The trajectory of CADD, marked by rapid advancements, anticipates challenges in ensuring accuracy, addressing biases in AI, and incorporating sustainability metrics. This paper concludes by highlighting the need for proactive measures in navigating the ethical, technological, and educational frontiers of CADD to shape a healthier, brighter future in drug discovery
    • …
    corecore