4,224 research outputs found
Doctor of Philosophy
dissertationOver 40 years ago, the first computer simulation of a protein was reported: the atomic motions of a 58 amino acid protein were simulated for few picoseconds. With today's supercomputers, simulations of large biomolecular systems with hundreds of thousands of atoms can reach biologically significant timescales. Through dynamics information biomolecular simulations can provide new insights into molecular structure and function to support the development of new drugs or therapies. While the recent advances in high-performance computing hardware and computational methods have enabled scientists to run longer simulations, they also created new challenges for data management. Investigators need to use local and national resources to run these simulations and store their output, which can reach terabytes of data on disk. Because of the wide variety of computational methods and software packages available to the community, no standard data representation has been established to describe the computational protocol and the output of these simulations, preventing data sharing and collaboration. Data exchange is also limited due to the lack of repositories and tools to summarize, index, and search biomolecular simulation datasets. In this dissertation a common data model for biomolecular simulations is proposed to guide the design of future databases and APIs. The data model was then extended to a controlled vocabulary that can be used in the context of the semantic web. Two different approaches to data management are also proposed. The iBIOMES repository offers a distributed environment where input and output files are indexed via common data elements. The repository includes a dynamic web interface to summarize, visualize, search, and download published data. A simpler tool, iBIOMES Lite, was developed to generate summaries of datasets hosted at remote sites where user privileges and/or IT resources might be limited. These two informatics-based approaches to data management offer new means for the community to keep track of distributed and heterogeneous biomolecular simulation data and create collaborative networks
Synthetic molecular dynamics for efficient trajectory generation
We propose that synthetic molecular dynamics (synMD) trajectories from
learned generative models may be a highly useful addition to the biomolecular
simulation toolbox. The computational expense of explicitly integrating the
equations of motion in molecular dynamics currently is a severe limit on the
number and length of trajectories which can be generated. Approximate, but more
computationally efficient, generative models can be used in place of explicit
integration of the equations of motion, and can produce meaningful trajectories
at greatly reduced computational cost. A very simple example demonstrated here
is a Markov state model (MSM) with states mapped to specific atomistic
configurations, but more sophisticated MSM variants and true coordinate-based
generative models could also be used. We anticipate at least three applications
for synMD trajectories: (i) testing of new methods via generation of arbitrary
amounts of data in highly non-trivial models, which may be exactly solvable;
(ii) generation of large numbers of instances of mechanistic processes of
interest, such as rare transitions, with the goals of characterizing,
assessing, and potentially correcting the underlying model, e.g., by comparison
to experimental data; and (iii) in the long term, acting as a partial
replacement for numerical integration of equations of motion based on ongoing
advances in statistical modeling and machine learning. We demonstrate the use
of a MSM to generate atomistic synMD trajectories for the fast-folding
miniprotein Trp-cage, at a rate of over 200 milliseconds per day on a standard
workstation. We also sketch a number of improvements to the present simple
pipeline
Atomic detail visualization of photosynthetic membranes with GPU-accelerated ray tracing
The cellular process responsible for providing energy for most life on Earth, namely, photosynthetic light-harvesting, requires the cooperation of hundreds of proteins across an organelle, involving length and time scales spanning several orders of magnitude over quantum and classical regimes. Simulation and visualization of this fundamental energy conversion process pose many unique methodological and computational challenges. We present, in two accompanying movies, light-harvesting in the photosynthetic apparatus found in purple bacteria, the so-called chromatophore. The movies are the culmination of three decades of modeling efforts, featuring the collaboration of theoretical, experimental, and computational scientists. We describe the techniques that were used to build, simulate, analyze, and visualize the structures shown in the movies, and we highlight cases where scientific needs spurred the development of new parallel algorithms that efficiently harness GPU accelerators and petascale computers
Challenges and frontiers of computational modelling of biomolecular recognition
Biomolecular recognition including binding of small molecules, peptides and proteins to their target receptors plays a key role in cellular function and has been targeted for therapeutic drug design. However, the high flexibility of biomolecules and slow binding and dissociation processes have presented challenges for computational modelling. Here, we review the challenges and computational approaches developed to characterize biomolecular binding, including molecular docking, molecular dynamics simulations (especially enhanced sampling) and machine learning. Further improvements are still needed in order to accurately and efficiently characterise binding structures, mechanisms, thermodynamics and kinetics of biomolecules in the future
Thirty years of molecular dynamics simulations on posttranslational modifications of proteins
Posttranslational modifications (PTMs) are an integral component to how cells
respond to perturbation. While experimental advances have enabled improved PTM
identification capabilities, the same throughput for characterizing how
structural changes caused by PTMs equate to altered physiological function has
not been maintained. In this Perspective, we cover the history of computational
modeling and molecular dynamics simulations which have characterized the
structural implications of PTMs. We distinguish results from different
molecular dynamics studies based upon the timescales simulated and analysis
approaches used for PTM characterization. Lastly, we offer insights into how
opportunities for modern research efforts on in silico PTM characterization may
proceed given current state-of-the-art computing capabilities and
methodological advancements.Comment: 64 pages, 11 figure
Markov state models of biomolecular conformational dynamics
It has recently become practical to construct Markov state models (MSMs) that reproduce the long-time statistical conformational dynamics of biomolecules using data from molecular dynamics simulations. MSMs can predict both stationary and kinetic quantities on long timescales (e.g. milliseconds) using a set of atomistic molecular dynamics simulations that are individually much shorter, thus addressing the well-known sampling problem in molecular dynamics simulation. In addition to providing predictive quantitative models, MSMs greatly facilitate both the extraction of insight into biomolecular mechanism (such as folding and functional dynamics) and quantitative comparison with single-molecule and ensemble kinetics experiments. A variety of methodological advances and software packages now bring the construction of these models closer to routine practice. Here, we review recent progress in this field, considering theoretical and methodological advances, new software tools, and recent applications of these approaches in several domains of biochemistry and biophysics, commenting on remaining challenges
An Introduction to Programming for Bioscientists: A Python-based Primer
Computing has revolutionized the biological sciences over the past several
decades, such that virtually all contemporary research in the biosciences
utilizes computer programs. The computational advances have come on many
fronts, spurred by fundamental developments in hardware, software, and
algorithms. These advances have influenced, and even engendered, a phenomenal
array of bioscience fields, including molecular evolution and bioinformatics;
genome-, proteome-, transcriptome- and metabolome-wide experimental studies;
structural genomics; and atomistic simulations of cellular-scale molecular
assemblies as large as ribosomes and intact viruses. In short, much of
post-genomic biology is increasingly becoming a form of computational biology.
The ability to design and write computer programs is among the most
indispensable skills that a modern researcher can cultivate. Python has become
a popular programming language in the biosciences, largely because (i) its
straightforward semantics and clean syntax make it a readily accessible first
language; (ii) it is expressive and well-suited to object-oriented programming,
as well as other modern paradigms; and (iii) the many available libraries and
third-party toolkits extend the functionality of the core language into
virtually every biological domain (sequence and structure analyses,
phylogenomics, workflow management systems, etc.). This primer offers a basic
introduction to coding, via Python, and it includes concrete examples and
exercises to illustrate the language's usage and capabilities; the main text
culminates with a final project in structural bioinformatics. A suite of
Supplemental Chapters is also provided. Starting with basic concepts, such as
that of a 'variable', the Chapters methodically advance the reader to the point
of writing a graphical user interface to compute the Hamming distance between
two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables,
numerous exercises, and 19 pages of Supporting Information; currently in
press at PLOS Computational Biolog
Make Research Data Public? -- Not Always so Simple: A Dialogue for Statisticians and Science Editors
Putting data into the public domain is not the same thing as making those
data accessible for intelligent analysis. A distinguished group of editors and
experts who were already engaged in one way or another with the issues inherent
in making research data public came together with statisticians to initiate a
dialogue about policies and practicalities of requiring published research to
be accompanied by publication of the research data. This dialogue carried
beyond the broad issues of the advisability, the intellectual integrity, the
scientific exigencies to the relevance of these issues to statistics as a
discipline and the relevance of statistics, from inference to modeling to data
exploration, to science and social science policies on these issues.Comment: Published in at http://dx.doi.org/10.1214/10-STS320 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Roadmap on semiconductor-cell biointerfaces.
This roadmap outlines the role semiconductor-based materials play in understanding the complex biophysical dynamics at multiple length scales, as well as the design and implementation of next-generation electronic, optoelectronic, and mechanical devices for biointerfaces. The roadmap emphasizes the advantages of semiconductor building blocks in interfacing, monitoring, and manipulating the activity of biological components, and discusses the possibility of using active semiconductor-cell interfaces for discovering new signaling processes in the biological world
- …