2,184 research outputs found
Development of selected mesoscopic physical models with the aid of machine learning methods and their applications in studies of molecular systems
This dissertation is concerned with the development and application of unsupervised machine learning methods in the field of theoretical biophysics and bioinformatics. The machine learning approach offers a powerful framework for extracting and purifying valuable information from large, multi-dimensional sets of data generated in simulations and experiments of biomolecular systems. It is not, however, the case that ready-made machine learning methods offer infallible means of dealing with all sorts of complex, and partially chaotic data encountered in computational biophysics and structural biology. Large portion of this work is devoted to the adaptation of unsupervised machine learning techniques to our particular purposes.
In this dissertation, we employed unsupervised machine learning strategies dealing with two problems arising in theoretical biophysics and bioinformatics. The first problem was the identification of quasi-rigid structural parts in proteins, whereas the second one was devoted to discovery of internal cooperation of molecular subsystems that propels a conformational transition. Both problems involved dynamical properties of molecular systems, and the analyses presented in this dissertation allowed for a simplified description of these phenomena.
We demonstrate how the unsupervised machine learning approach can help in explaining intricacies hidden within seemingly chaotic molecular dynamics simulation data. The methods developed in this thesis increase our ability to understand complex molecular phenomena. But we also point out potential problems associated with applying unsupervised machine learning algorithms in the field of molecular biophysics
Recommended from our members
Understanding virtual solvent through large-scale ligand discovery
Predicting new ligands and their binding poses for a protein target relies on an understanding of the physical forces that exist between the water-submerged protein and ligand. The relative favorability of these molecular and atomic interactions between the protein and ligand compared with their interactions with water determine the binding affinity, which in turn can be converted into a binding free energy. Protein-ligand binding energetics are, with varying levels of success, encoded into scoring functions, which at their best, can only partially emulate the true binding affinity of a protein-ligand binding event. In the context of virtually screening millions or hundreds of millions of drug-like ligands, molecular docking algorithms take advantage of scoring functions to rank the binding energies of these molecules relative to one another to help prioritize the most promising ligands.The focus of this dissertation is the balance between scoring function energy terms with an emphasis on water energetics, specifically the desolvation of the protein upon ligand binding. It is thought that our limited understanding of water is largely responsible for our limitations in discovering and designing drugs. This is due to the large number of roles that water can play, as well as its significant, and even dominant, contribution to protein-ligand binding energetics, which in the realm of molecular docking, is typically under-modeled or completely neglected. First, I focus on the incorporation of receptor desolvation into the standard DOCK3.7 scoring function to more accurately model protein-ligand binding interactions by including further contributions of water. This is the original implementation of Grid Inhomogeneous Solvation Theory applied to the model cavity, cytochrome c peroxidate, and spearheaded by Trent Balius and Marcus Fischer. Second, I discuss an extension of GIST in DOCK3.7, a new implementation that relies on pre-computed Gaussian-weighted GIST receptor desolvation enthalpies. This results in negligible slowdown of the standard DOCK3.7 scoring function, similar performance to the original implementation of GIST, and the identification of new ligands for the drug-like model system, AmpC β-lactamase. The work on receptor desolvation contained within these two chapters inspires the name of this thesis, and were started in my rotation and have continued until the end. Third, I focus on the use of property-matched and property-unmatched decoys for use in retrospective enrichment calculations prior to running a large-scale molecular docking virtual screen. Decoy molecules share the same physical properties as ligands that bind a protein but are topologically dissimilar to ensure that they do not actually bind the protein. What we found was that charge mismatching between ligands and decoys could bias one’s docking setup towards artifactually strong performance. Chapter 3 focuses on how we both decreased and increased the property space of decoys relative to ligands to safeguard against these docking setup biases. Fourth, I employ this knowledge of protein-ligand binding affinities to identify novel selective melatonin receptor ligands that are active in in vivo circadian rhythm assays. Finally, I discuss my current project on the CB1 cannabinoid receptor in the context of analgesia, followed by future directions
Characterizing Interdisciplinarity of Researchers and Research Topics Using Web Search Engines
Researchers' networks have been subject to active modeling and analysis.
Earlier literature mostly focused on citation or co-authorship networks
reconstructed from annotated scientific publication databases, which have
several limitations. Recently, general-purpose web search engines have also
been utilized to collect information about social networks. Here we
reconstructed, using web search engines, a network representing the relatedness
of researchers to their peers as well as to various research topics.
Relatedness between researchers and research topics was characterized by
visibility boost-increase of a researcher's visibility by focusing on a
particular topic. It was observed that researchers who had high visibility
boosts by the same research topic tended to be close to each other in their
network. We calculated correlations between visibility boosts by research
topics and researchers' interdisciplinarity at individual level (diversity of
topics related to the researcher) and at social level (his/her centrality in
the researchers' network). We found that visibility boosts by certain research
topics were positively correlated with researchers' individual-level
interdisciplinarity despite their negative correlations with the general
popularity of researchers. It was also found that visibility boosts by
network-related topics had positive correlations with researchers' social-level
interdisciplinarity. Research topics' correlations with researchers'
individual- and social-level interdisciplinarities were found to be nearly
independent from each other. These findings suggest that the notion of
"interdisciplinarity" of a researcher should be understood as a
multi-dimensional concept that should be evaluated using multiple assessment
means.Comment: 20 pages, 7 figures. Accepted for publication in PLoS On
Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?
The organization and mining of malaria genomic and post-genomic data is
highly motivated by the necessity to predict and characterize new biological
targets and new drugs. Biological targets are sought in a biological space
designed from the genomic data from Plasmodium falciparum, but using also the
millions of genomic data from other species. Drug candidates are sought in a
chemical space containing the millions of small molecules stored in public and
private chemolibraries. Data management should therefore be as reliable and
versatile as possible. In this context, we examined five aspects of the
organization and mining of malaria genomic and post-genomic data: 1) the
comparison of protein sequences including compositionally atypical malaria
sequences, 2) the high throughput reconstruction of molecular phylogenies, 3)
the representation of biological processes particularly metabolic pathways, 4)
the versatile methods to integrate genomic data, biological representations and
functional profiling obtained from X-omic experiments after drug treatments and
5) the determination and prediction of protein structures and their molecular
docking with drug candidate structures. Progresses toward a grid-enabled
chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa
Reliable estimation of prediction uncertainty for physico-chemical property models
The predictions of parameteric property models and their uncertainties are
sensitive to systematic errors such as inconsistent reference data, parametric
model assumptions, or inadequate computational methods. Here, we discuss the
calibration of property models in the light of bootstrapping, a sampling method
akin to Bayesian inference that can be employed for identifying systematic
errors and for reliable estimation of the prediction uncertainty. We apply
bootstrapping to assess a linear property model linking the 57Fe Moessbauer
isomer shift to the contact electron density at the iron nucleus for a diverse
set of 44 molecular iron compounds. The contact electron density is calculated
with twelve density functionals across Jacob's ladder (PWLDA, BP86, BLYP, PW91,
PBE, M06-L, TPSS, B3LYP, B3PW91, PBE0, M06, TPSSh). We provide systematic-error
diagnostics and reliable, locally resolved uncertainties for isomer-shift
predictions. Pure and hybrid density functionals yield average prediction
uncertainties of 0.06-0.08 mm/s and 0.04-0.05 mm/s, respectively, the latter
being close to the average experimental uncertainty of 0.02 mm/s. Furthermore,
we show that both model parameters and prediction uncertainty depend
significantly on the composition and number of reference data points.
Accordingly, we suggest that rankings of density functionals based on
performance measures (e.g., the coefficient of correlation, r2, or the
root-mean-square error, RMSE) should not be inferred from a single data set.
This study presents the first statistically rigorous calibration analysis for
theoretical Moessbauer spectroscopy, which is of general applicability for
physico-chemical property models and not restricted to isomer-shift
predictions. We provide the statistically meaningful reference data set MIS39
and a new calibration of the isomer shift based on the PBE0 functional.Comment: 49 pages, 9 figures, 7 table
Investigation of Membrane Receptors’ Oligomers Using Fluorescence Resonance Energy Transfer and Multiphoton Microscopy in Living Cells
Investigating quaternary structure (oligomerization) of macromolecules (such as proteins and nucleic acids) in living systems (in vivo) has been a great challenge in biophysics, due to molecular diffusion, fluctuations in several biochemical parameters such as pH, quenching of fluorescence by oxygen (when fluorescence methods are used), etc.
We studied oligomerization of membrane receptors in living cells by means of Fluorescence (Förster) Resonance Energy Transfer (FRET) using fluorescent markers and two photon excitation fluorescence micro-spectroscopy. Using suitable FRET models, we determined the stoichiometry and quaternary structure of various macromolecular complexes. The proteins of interest for this work are : (1) sigma-1 receptor and (2) rhodopsin, are described as below.
(1) Sigma-1 receptors are molecular chaperone proteins, which also regulate ion channels. S1R seems to be involved in substance abuse, as well as several diseases such as Alzheimer’s. We studied S1R in the presence and absence of its ligands haloperidol (an antagonist) and pentazocine +/- (an agonist), and found that at low concentration they reside as a mixture of monomers and dimers and that they may form higher order oligomers at higher concentrations.
(2) Rhodopsin is a prototypical G protein coupled receptor (GPCR) and is directly involved in vision. GPCRs form a large family of receptors that participate in cell signaling by responding to external stimuli such as drugs, thus being a major drug target (more than 40% drugs target GPCRs). Their oligomerization has been largely controversial. Understanding this may help to understand the functional role of GPCRs oligomerization, and may lead to the discovery of more drugs targeting GPCR oligomers. It may also contribute toward finding a cure for Retinitis Pigmentosa, which is caused by a mutation (G188R) in rhodopsin, a disease which causes blindness and has no cure so far. Comparing healthy rhodopsin’s oligomeric structure with that of the mutant may give clues to find the cure
Modeling single microtubules as a colloidal system to measure the harmonic interactions between tubulin dimers in bovine brain derived versus cancer cell derived microtubules
The local properties of tubulin dimers dictate the properties of the larger microtubule assembly. In order to elucidate this connection, tubulin-tubulin interactions are be modeled as harmonic interactions to map the stiffness matrix along the length of the microtubule. The strength of the interactions are measured by imaging and tracking the movement of segments along the microtubule over time, and then performing a fourier transform to extract the natural vibrational frequencies. Using this method the first ever reported experimental phonon spectrum of the microtubule is reported. This method can also be applied to other biological materials, and opens new doors for structural analysis in the life sciences.
Methods used in colloidal soft matter physics were also adapted to the study of the microtubule to develop new methods to measure local stiffness in biological materials. Using this method it is shown that there is local variability in the mechanical properties of bovine brain derived versus cancer cell derived microtubules. This provide insight to how local changes affect the dynamic instability of microtubules of different types.
Finally, a nanofluidic device to isolate single microtubules is also reported, and is designed to be used for the study of any biological polymer. It can also be adapted to incorporate nano-scale electrodes for the sensing and actuation of single isolated proteins
Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models
Large Language Models (LLMs), with their remarkable task-handling
capabilities and innovative outputs, have catalyzed significant advancements
across a spectrum of fields. However, their proficiency within specialized
domains such as biomolecular studies remains limited. To address this
challenge, we introduce Mol-Instructions, a meticulously curated, comprehensive
instruction dataset expressly designed for the biomolecular realm.
Mol-Instructions is composed of three pivotal components: molecule-oriented
instructions, protein-oriented instructions, and biomolecular text
instructions, each curated to enhance the understanding and prediction
capabilities of LLMs concerning biomolecular features and behaviors. Through
extensive instruction tuning experiments on the representative LLM, we
underscore the potency of Mol-Instructions to enhance the adaptability and
cognitive acuity of large models within the complex sphere of biomolecular
studies, thereby promoting advancements in the biomolecular research community.
Mol-Instructions is made publicly accessible for future research endeavors and
will be subjected to continual updates for enhanced applicability.Comment: Project homepage: https://github.com/zjunlp/Mol-Instructions. Add
quantitative evaluation
- …