2,184 research outputs found

    Development of selected mesoscopic physical models with the aid of machine learning methods and their applications in studies of molecular systems

    Get PDF
    This dissertation is concerned with the development and application of unsupervised machine learning methods in the field of theoretical biophysics and bioinformatics. The machine learning approach offers a powerful framework for extracting and purifying valuable information from large, multi-dimensional sets of data generated in simulations and experiments of biomolecular systems. It is not, however, the case that ready-made machine learning methods offer infallible means of dealing with all sorts of complex, and partially chaotic data encountered in computational biophysics and structural biology. Large portion of this work is devoted to the adaptation of unsupervised machine learning techniques to our particular purposes. In this dissertation, we employed unsupervised machine learning strategies dealing with two problems arising in theoretical biophysics and bioinformatics. The first problem was the identification of quasi-rigid structural parts in proteins, whereas the second one was devoted to discovery of internal cooperation of molecular subsystems that propels a conformational transition. Both problems involved dynamical properties of molecular systems, and the analyses presented in this dissertation allowed for a simplified description of these phenomena. We demonstrate how the unsupervised machine learning approach can help in explaining intricacies hidden within seemingly chaotic molecular dynamics simulation data. The methods developed in this thesis increase our ability to understand complex molecular phenomena. But we also point out potential problems associated with applying unsupervised machine learning algorithms in the field of molecular biophysics

    Characterizing Interdisciplinarity of Researchers and Research Topics Using Web Search Engines

    Get PDF
    Researchers' networks have been subject to active modeling and analysis. Earlier literature mostly focused on citation or co-authorship networks reconstructed from annotated scientific publication databases, which have several limitations. Recently, general-purpose web search engines have also been utilized to collect information about social networks. Here we reconstructed, using web search engines, a network representing the relatedness of researchers to their peers as well as to various research topics. Relatedness between researchers and research topics was characterized by visibility boost-increase of a researcher's visibility by focusing on a particular topic. It was observed that researchers who had high visibility boosts by the same research topic tended to be close to each other in their network. We calculated correlations between visibility boosts by research topics and researchers' interdisciplinarity at individual level (diversity of topics related to the researcher) and at social level (his/her centrality in the researchers' network). We found that visibility boosts by certain research topics were positively correlated with researchers' individual-level interdisciplinarity despite their negative correlations with the general popularity of researchers. It was also found that visibility boosts by network-related topics had positive correlations with researchers' social-level interdisciplinarity. Research topics' correlations with researchers' individual- and social-level interdisciplinarities were found to be nearly independent from each other. These findings suggest that the notion of "interdisciplinarity" of a researcher should be understood as a multi-dimensional concept that should be evaluated using multiple assessment means.Comment: 20 pages, 7 figures. Accepted for publication in PLoS On

    Integration and mining of malaria molecular, functional and pharmacological data: how far are we from a chemogenomic knowledge space?

    Get PDF
    The organization and mining of malaria genomic and post-genomic data is highly motivated by the necessity to predict and characterize new biological targets and new drugs. Biological targets are sought in a biological space designed from the genomic data from Plasmodium falciparum, but using also the millions of genomic data from other species. Drug candidates are sought in a chemical space containing the millions of small molecules stored in public and private chemolibraries. Data management should therefore be as reliable and versatile as possible. In this context, we examined five aspects of the organization and mining of malaria genomic and post-genomic data: 1) the comparison of protein sequences including compositionally atypical malaria sequences, 2) the high throughput reconstruction of molecular phylogenies, 3) the representation of biological processes particularly metabolic pathways, 4) the versatile methods to integrate genomic data, biological representations and functional profiling obtained from X-omic experiments after drug treatments and 5) the determination and prediction of protein structures and their molecular docking with drug candidate structures. Progresses toward a grid-enabled chemogenomic knowledge space are discussed.Comment: 43 pages, 4 figures, to appear in Malaria Journa

    Reliable estimation of prediction uncertainty for physico-chemical property models

    Full text link
    The predictions of parameteric property models and their uncertainties are sensitive to systematic errors such as inconsistent reference data, parametric model assumptions, or inadequate computational methods. Here, we discuss the calibration of property models in the light of bootstrapping, a sampling method akin to Bayesian inference that can be employed for identifying systematic errors and for reliable estimation of the prediction uncertainty. We apply bootstrapping to assess a linear property model linking the 57Fe Moessbauer isomer shift to the contact electron density at the iron nucleus for a diverse set of 44 molecular iron compounds. The contact electron density is calculated with twelve density functionals across Jacob's ladder (PWLDA, BP86, BLYP, PW91, PBE, M06-L, TPSS, B3LYP, B3PW91, PBE0, M06, TPSSh). We provide systematic-error diagnostics and reliable, locally resolved uncertainties for isomer-shift predictions. Pure and hybrid density functionals yield average prediction uncertainties of 0.06-0.08 mm/s and 0.04-0.05 mm/s, respectively, the latter being close to the average experimental uncertainty of 0.02 mm/s. Furthermore, we show that both model parameters and prediction uncertainty depend significantly on the composition and number of reference data points. Accordingly, we suggest that rankings of density functionals based on performance measures (e.g., the coefficient of correlation, r2, or the root-mean-square error, RMSE) should not be inferred from a single data set. This study presents the first statistically rigorous calibration analysis for theoretical Moessbauer spectroscopy, which is of general applicability for physico-chemical property models and not restricted to isomer-shift predictions. We provide the statistically meaningful reference data set MIS39 and a new calibration of the isomer shift based on the PBE0 functional.Comment: 49 pages, 9 figures, 7 table

    Investigation of Membrane Receptors’ Oligomers Using Fluorescence Resonance Energy Transfer and Multiphoton Microscopy in Living Cells

    Get PDF
    Investigating quaternary structure (oligomerization) of macromolecules (such as proteins and nucleic acids) in living systems (in vivo) has been a great challenge in biophysics, due to molecular diffusion, fluctuations in several biochemical parameters such as pH, quenching of fluorescence by oxygen (when fluorescence methods are used), etc. We studied oligomerization of membrane receptors in living cells by means of Fluorescence (Förster) Resonance Energy Transfer (FRET) using fluorescent markers and two photon excitation fluorescence micro-spectroscopy. Using suitable FRET models, we determined the stoichiometry and quaternary structure of various macromolecular complexes. The proteins of interest for this work are : (1) sigma-1 receptor and (2) rhodopsin, are described as below. (1) Sigma-1 receptors are molecular chaperone proteins, which also regulate ion channels. S1R seems to be involved in substance abuse, as well as several diseases such as Alzheimer’s. We studied S1R in the presence and absence of its ligands haloperidol (an antagonist) and pentazocine +/- (an agonist), and found that at low concentration they reside as a mixture of monomers and dimers and that they may form higher order oligomers at higher concentrations. (2) Rhodopsin is a prototypical G protein coupled receptor (GPCR) and is directly involved in vision. GPCRs form a large family of receptors that participate in cell signaling by responding to external stimuli such as drugs, thus being a major drug target (more than 40% drugs target GPCRs). Their oligomerization has been largely controversial. Understanding this may help to understand the functional role of GPCRs oligomerization, and may lead to the discovery of more drugs targeting GPCR oligomers. It may also contribute toward finding a cure for Retinitis Pigmentosa, which is caused by a mutation (G188R) in rhodopsin, a disease which causes blindness and has no cure so far. Comparing healthy rhodopsin’s oligomeric structure with that of the mutant may give clues to find the cure

    Modeling single microtubules as a colloidal system to measure the harmonic interactions between tubulin dimers in bovine brain derived versus cancer cell derived microtubules

    Get PDF
    The local properties of tubulin dimers dictate the properties of the larger microtubule assembly. In order to elucidate this connection, tubulin-tubulin interactions are be modeled as harmonic interactions to map the stiffness matrix along the length of the microtubule. The strength of the interactions are measured by imaging and tracking the movement of segments along the microtubule over time, and then performing a fourier transform to extract the natural vibrational frequencies. Using this method the first ever reported experimental phonon spectrum of the microtubule is reported. This method can also be applied to other biological materials, and opens new doors for structural analysis in the life sciences. Methods used in colloidal soft matter physics were also adapted to the study of the microtubule to develop new methods to measure local stiffness in biological materials. Using this method it is shown that there is local variability in the mechanical properties of bovine brain derived versus cancer cell derived microtubules. This provide insight to how local changes affect the dynamic instability of microtubules of different types. Finally, a nanofluidic device to isolate single microtubules is also reported, and is designed to be used for the study of any biological polymer. It can also be adapted to incorporate nano-scale electrodes for the sensing and actuation of single isolated proteins

    Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models

    Full text link
    Large Language Models (LLMs), with their remarkable task-handling capabilities and innovative outputs, have catalyzed significant advancements across a spectrum of fields. However, their proficiency within specialized domains such as biomolecular studies remains limited. To address this challenge, we introduce Mol-Instructions, a meticulously curated, comprehensive instruction dataset expressly designed for the biomolecular realm. Mol-Instructions is composed of three pivotal components: molecule-oriented instructions, protein-oriented instructions, and biomolecular text instructions, each curated to enhance the understanding and prediction capabilities of LLMs concerning biomolecular features and behaviors. Through extensive instruction tuning experiments on the representative LLM, we underscore the potency of Mol-Instructions to enhance the adaptability and cognitive acuity of large models within the complex sphere of biomolecular studies, thereby promoting advancements in the biomolecular research community. Mol-Instructions is made publicly accessible for future research endeavors and will be subjected to continual updates for enhanced applicability.Comment: Project homepage: https://github.com/zjunlp/Mol-Instructions. Add quantitative evaluation
    • …
    corecore