100 research outputs found

    The application of Hadoop in structural bioinformatics

    Get PDF
    The paper reviews the use of the Hadoop platform in structural bioinformatics applications. For structural bioinformatics, Hadoop provides a new framework to analyse large fractions of the Protein Data Bank that is key for high-throughput studies of, for example, protein-ligand docking, clustering of protein-ligand complexes and structural alignment. Specifically we review in the literature a number of implementations using Hadoop of high-throughput analyses and their scalability. We find that these deployments for the most part use known executables called from MapReduce rather than rewriting the algorithms. The scalability exhibits a variable behaviour in comparison with other batch schedulers, particularly as direct comparisons on the same platform are generally not available. Direct comparisons of Hadoop with batch schedulers are absent in the literature but we note there is some evidence that Message Passing Interface implementations scale better than Hadoop. A significant barrier to the use of the Hadoop ecosystem is the difficulty of the interface and configuration of a resource to use Hadoop. This will improve over time as interfaces to Hadoop, e.g. Spark improve, usage of cloud platforms (e.g. Azure and Amazon Web Services (AWS)) increases and standardised approaches such as Workflow Languages (i.e. Workflow Definition Language, Common Workflow Language and Nextflow) are taken up

    Folding@home: achievements from over twenty years of citizen science herald the exascale era

    Full text link
    Simulations of biomolecules have enormous potential to inform our understanding of biology but require extremely demanding calculations. For over twenty years, the Folding@home distributed computing project has pioneered a massively parallel approach to biomolecular simulation, harnessing the resources of citizen scientists across the globe. Here, we summarize the scientific and technical advances this perspective has enabled. As the project's name implies, the early years of Folding@home focused on driving advances in our understanding of protein folding by developing statistical methods for capturing long-timescale processes and facilitating insight into complex dynamical processes. Success laid a foundation for broadening the scope of Folding@home to address other functionally relevant conformational changes, such as receptor signaling, enzyme dynamics, and ligand binding. Continued algorithmic advances, hardware developments such as GPU-based computing, and the growing scale of Folding@home have enabled the project to focus on new areas where massively parallel sampling can be impactful. While previous work sought to expand toward larger proteins with slower conformational changes, new work focuses on large-scale comparative studies of different protein sequences and chemical compounds to better understand biology and inform the development of small molecule drugs. Progress on these fronts enabled the community to pivot quickly in response to the COVID-19 pandemic, expanding to become the world's first exascale computer and deploying this massive resource to provide insight into the inner workings of the SARS-CoV-2 virus and aid the development of new antivirals. This success provides a glimpse of what's to come as exascale supercomputers come online, and Folding@home continues its work.Comment: 24 pages, 6 figure

    Acceleration and Verification of Virtual High-throughput Multiconformer Docking

    Get PDF
    The work in this dissertation explores the use of massive computational power available through modern supercomputers as a virtual laboratory to aid drug discovery. As of November 2013, Tianhe-2, the fastest supercomputer in the world, has a theoretical performance peak of 54,902 TFlop/s or nearly 55 thousand trillion calculations per second. The Titan supercomputer located at Oak Ridge National Laboratory has 560,640 computing cores that can work in parallel to solve scientific problems. In order to harness this computational power to assist in drug discovery, tools are developed to aid in the preparation and analysis of high-throughput virtual docking screens, a tool to predict how and how well small molecules bind to disease associated proteins and potentially serve as a novel drug candidate. Methods and software for performing large screens are developed that run on high-performance computer systems. The future potential and benefits of using these tools to study polypharmacology and revolutionizing the pharmaceutical industry are also discussed

    Estimation of binding free energies with Monte Carlo atomistic simulations and enhanced sampling

    Get PDF
    The advances in computing power have motivated the hope that computational methods can accelerate the pace of drug discovery pipelines. For this, fast, reliable and user-friendly tools are required. One of the fields that has gotten more attentions is the prediction of binding affinities. Two main problems have been identified for such methods: insufficient sampling and inaccurate models. This thesis is focused on tackling the first problem. To this end, we present the development of efficient methods for the estimation of protein-ligand binding free energies. We have developed a protocol that combines enhanced sampling with more standard simulations methods to achieve higher efficiency. First, we run an exploratory enhanced sampling simulation, starting from the bound conformation and partially biased towards unbound poses. The we leverage the information gained from this short simulation to run, longer unbiased simulations to collect statistics. Thanks to the modularity and automation that the protocol offers we were able to test three different methods for the long simulations: PELE, molecular dynamics and AdaptivePELE. PELE and molecular dynamics showed similar results, although PELE used less computational resources. Both seemed to work well with small protein-fragment systems or proteins with not very flexible binding sites. Both failed to accurately reproduce the binding of a kinase, the Mitogen-activated protein kinase 1 (ERK2). On the other hand, AdaptivePELE did not show a great improvement over PELE, with positive results for the Urokinase-type plasminogen activator (URO) and a clear lack of sampling for the Progesterone receptor (PR). We demonstrated the importance of well-designed suite of test systems for the development of new methods. Through the use of a diverse benchmark of protein systems we have established the cases in which the protocol is expected to give accurate results, and which areas require further development. This benchmark consisted of four proteins, and over 30 ligands, much larger than the test systems typically used in the development of pathway-based free energy methods. In summary, the methodology developed in this work can contribute to the drug discovery process for a limited range of protein systems. For many other, we have observed that regular unbiased simulations are not efficient enough and more sophisticated, enhanced sampling methods are required.Els grans avenços en la capacitat de computació han motivat l'esperança que els mètodes de simulacions per ordinador puguin accelerar el ritme de descobriment de nous fàrmacs. Per a què això sigui possible, es necessiten eines ràpides, acurades i fàcils d'utilitzar. Un dels problemes que han rebut més atenció és el de la predicció d'energies lliures d'unió entre proteïna i lligand. Dos grans problemes han estat identificats per a aquests mètodes: la falta de mostreig i les aproximacions dels models. Aquesta tesi està enfocada a resoldre el primer problema. Per a això, presentem el desenvolupament de mètodes eficients per a l'estimació de d'energies lliures d'unió entre proteïna i lligand. Hem desenvolupat un protocol que combina mètodes anomenats enhanced sampling amb simulació clàssiques per a obtenir una major eficiència. Els mètodes d'enhanced sampling són una classe d'eines que apliquen algun tipus de pertorbació externa al sistema que s'està estudiant per tal d'accelerar-ne el mostreig. En el nostre protocol, primer correm una simulació exploratòria d'enhanced sampling, començant per una mostra de la unió de la proteïna i el lligand. Aquesta simulació esta parcialment esbiaixada cap a aquells estats del sistema on els dos components es troben més separats. Després utilitzem la informació obtinguda d'aquesta primera simulació més curta per a córrer una segona simulació més llarga, amb mètodes sense biaix per obtenir una estadística fidedigna del sistema. Gràcies a la modularitat i el grau d'automatització que la implementació del protocol ofereix, hem pogut provar tres mètodes diferents per les simulacions llargues: PELE, dinàmica molecular i AdaptivePELE. PELE i dinàmica molecular han mostrat resultats similars, tot i que PELE utilitza menys recursos. Els dos han mostrat bons resultats en l'estudi de sistemes de fragments o amb proteïnes amb llocs d'unió poc flexibles. Però, els dos han fallat a l'hora de reproduir els resultats experimentals per a una quinasa, la Mitogen-activated protein kinase 1 (ERK2). D'altra banda, AdaptivePELE no ha mostrat una gran millora respecte a PELE, amb resultats positius per a la proteïna Urokinase-type plasminogen activator (URO) i una clara falta de mostreig per al receptor de progesterona (PR). En aquest treball hem demostrat la importància d'establir un banc de proves equilibrat durant el desenvolupament de nous mètodes. Mitjançant l'ús d'un banc de proves divers hem pogut establir en quins casos es pot esperar que el protocol obtingui resultats acurats, i quines àrees necessiten més desenvolupament. El banc de proves ha consistit de quatre proteïnes i més de trenta lligands, molt més dels que comunament s'utilitzen en el desenvolupament de mètodes per a la predicció d'energies d'unió mitjançant mètodes basats en camins (pathway-based). En resum, la metodologia desenvolupada durant aquesta tesi pot contribuir al procés de recerca de nous fàrmacs per a certs tipus de sistemes de proteïnes. Per a la resta, hem observat que els mètodes de simulació no esbiaixats no són prou eficients i tècniques més sofisticades són necessàries.Postprint (published version

    Estimation of binding free energies with Monte Carlo atomistic simulations and enhanced sampling

    Get PDF
    The advances in computing power have motivated the hope that computational methods can accelerate the pace of drug discovery pipelines. For this, fast, reliable and user-friendly tools are required. One of the fields that has gotten more attentions is the prediction of binding affinities. Two main problems have been identified for such methods: insufficient sampling and inaccurate models. This thesis is focused on tackling the first problem. To this end, we present the development of efficient methods for the estimation of protein-ligand binding free energies. We have developed a protocol that combines enhanced sampling with more standard simulations methods to achieve higher efficiency. First, we run an exploratory enhanced sampling simulation, starting from the bound conformation and partially biased towards unbound poses. The we leverage the information gained from this short simulation to run, longer unbiased simulations to collect statistics. Thanks to the modularity and automation that the protocol offers we were able to test three different methods for the long simulations: PELE, molecular dynamics and AdaptivePELE. PELE and molecular dynamics showed similar results, although PELE used less computational resources. Both seemed to work well with small protein-fragment systems or proteins with not very flexible binding sites. Both failed to accurately reproduce the binding of a kinase, the Mitogen-activated protein kinase 1 (ERK2). On the other hand, AdaptivePELE did not show a great improvement over PELE, with positive results for the Urokinase-type plasminogen activator (URO) and a clear lack of sampling for the Progesterone receptor (PR). We demonstrated the importance of well-designed suite of test systems for the development of new methods. Through the use of a diverse benchmark of protein systems we have established the cases in which the protocol is expected to give accurate results, and which areas require further development. This benchmark consisted of four proteins, and over 30 ligands, much larger than the test systems typically used in the development of pathway-based free energy methods. In summary, the methodology developed in this work can contribute to the drug discovery process for a limited range of protein systems. For many other, we have observed that regular unbiased simulations are not efficient enough and more sophisticated, enhanced sampling methods are required.Els grans avenços en la capacitat de computació han motivat l'esperança que els mètodes de simulacions per ordinador puguin accelerar el ritme de descobriment de nous fàrmacs. Per a què això sigui possible, es necessiten eines ràpides, acurades i fàcils d'utilitzar. Un dels problemes que han rebut més atenció és el de la predicció d'energies lliures d'unió entre proteïna i lligand. Dos grans problemes han estat identificats per a aquests mètodes: la falta de mostreig i les aproximacions dels models. Aquesta tesi està enfocada a resoldre el primer problema. Per a això, presentem el desenvolupament de mètodes eficients per a l'estimació de d'energies lliures d'unió entre proteïna i lligand. Hem desenvolupat un protocol que combina mètodes anomenats enhanced sampling amb simulació clàssiques per a obtenir una major eficiència. Els mètodes d'enhanced sampling són una classe d'eines que apliquen algun tipus de pertorbació externa al sistema que s'està estudiant per tal d'accelerar-ne el mostreig. En el nostre protocol, primer correm una simulació exploratòria d'enhanced sampling, començant per una mostra de la unió de la proteïna i el lligand. Aquesta simulació esta parcialment esbiaixada cap a aquells estats del sistema on els dos components es troben més separats. Després utilitzem la informació obtinguda d'aquesta primera simulació més curta per a córrer una segona simulació més llarga, amb mètodes sense biaix per obtenir una estadística fidedigna del sistema. Gràcies a la modularitat i el grau d'automatització que la implementació del protocol ofereix, hem pogut provar tres mètodes diferents per les simulacions llargues: PELE, dinàmica molecular i AdaptivePELE. PELE i dinàmica molecular han mostrat resultats similars, tot i que PELE utilitza menys recursos. Els dos han mostrat bons resultats en l'estudi de sistemes de fragments o amb proteïnes amb llocs d'unió poc flexibles. Però, els dos han fallat a l'hora de reproduir els resultats experimentals per a una quinasa, la Mitogen-activated protein kinase 1 (ERK2). D'altra banda, AdaptivePELE no ha mostrat una gran millora respecte a PELE, amb resultats positius per a la proteïna Urokinase-type plasminogen activator (URO) i una clara falta de mostreig per al receptor de progesterona (PR). En aquest treball hem demostrat la importància d'establir un banc de proves equilibrat durant el desenvolupament de nous mètodes. Mitjançant l'ús d'un banc de proves divers hem pogut establir en quins casos es pot esperar que el protocol obtingui resultats acurats, i quines àrees necessiten més desenvolupament. El banc de proves ha consistit de quatre proteïnes i més de trenta lligands, molt més dels que comunament s'utilitzen en el desenvolupament de mètodes per a la predicció d'energies d'unió mitjançant mètodes basats en camins (pathway-based). En resum, la metodologia desenvolupada durant aquesta tesi pot contribuir al procés de recerca de nous fàrmacs per a certs tipus de sistemes de proteïnes. Per a la resta, hem observat que els mètodes de simulació no esbiaixats no són prou eficients i tècniques més sofisticades són necessàries

    Spectral approaches for identifying kinetic features in molecular dynamics simulations of globular proteins

    Get PDF
    Proteins live in an environment of random thermal vibrations yet they convert this constant disorder into selective biological function. As data acquisition methods for resolving protein motions improve more of the randomness is also captured; there is thus a parallel need for analysis methods that filter out the disorder and clarify functionally-relevant protein behavior. Few behaviors are more relevant than folding in the first place, and this thesis opens by addressing which conformational states are kinetically relevant for promoting or inhibiting attainment of the folded native state. Our modeling approach discretizes simulation data into a network of nodes and edges representing, respectively, different protein conformations and observed conformational transitions. A perturbative strategy is then invoked to quantify the importance of each node, i.e. conformational substate, with regard to theoretical folding rates. On a test of 10 proteins this framework identifies unique ‘kinetic traps’ and ‘facilitator substates’ that sometimes evade detection with traditional RMSD-based analysis. We then apply spectral approaches and auto-regressive models to (1) address efficiency concerns for more general networks and (2) mimic protein flexibility with compact linear models

    Investigation of cytotoxic properties of some heterocyclic derivatives by molecular modeling approaches

    Get PDF
    Currently, many technologies have been adopted to boost the efficiency of drug development and overcome obstacles in the drug discovery pipeline. The application of these approaches spans a wide range, from bioactivity predictions, de novo compound synthesis, target identification to hit discovery, and lead optimization. This dissertation comprises two studies. First, we proposed an original approach based on statistical consideration dedicated to k-means clustering analysis in order to define a set of rules for structural features that would help in designing novel anti-cancer drug candidates. It has been applied successfully to classify 500 cytotoxic compounds with 21 molecular descriptors into distinct clusters. The percentage of molecules in each cluster is 50%, 24.88%, and 25.12% for cluster 1, cluster 2, and cluster 3, respectively. Each cluster groups a homogeneous class of molecules with respect to their molecular descriptors. Silhouette analysis, used as a cluster validation approch proves that the molecules are very well clustered, and there are no molecules placed in the wrong cluster. In silico screening of pharmacological properties ADME and evaluation of drug-likeness were performed for all molecules. The quantitative analysis of molecular electrostatic potential was performed to identify the nucleophilic and electrophilic sites in the representative molecule of each cluster. In addition, a molecular docking study was carried out to investigate the interactions of the paragon molecules with the active binding sites of six different targets. Our findings provide a guide to assist the chemist in selecting and testing only the potential molecules with good pharmacokinetic profiles to improve the clinical outcomes of drug therapies. Second, a simulation-based investigation was conducted to examine the CHK1 inhibitory activity of cytotoxic xanthone derivatives using a hierarchical workflow for molecular docking, MD simulation, ADME-TOX prediction, and MEP analysis. A molecular docking study was conducted for the forty-three xanthone derivatives along with standard Prexasertib into the selected CHK1 protein structures 7AKM and 7AKO. Furthermore, MD studies support molecular docking results and validate the stability of studied complexes in physiological conditions. Moreover, in silico ADME-TOX studies are used to predict the pharmacokinetic, pharmacodynamic, and toxicological properties of the selected eight xanthones and the standard Prexasertib. The quantitative analysis of electrostatic potential was performed for the lead compound L36 to identify the reactive sites and possible non- covalent interactions. Our study provides new unexplored insights into xanthones as CHK1 inhibitors and identified L36 as a potential drug candidate that could undergo further in vivo assays and optimization, laying a solid foundation for the development of CHK1 inhibitors and cancer drug discovery. To the best of our knowledge, this is the first time such a study was conducted for the xanthones with CHK1 by using a computational based approach

    Investigation of cytotoxic properties of some heterocyclic derivatives by molecular modeling

    Get PDF
    Currently, many technologies have been adopted to boost the efficiency of drugdevelopment and overcome obstacles in the drug discovery pipeline. The application of these approaches spans a wide range, from bioactivity predictions, de novo compound synthesis, target identification to hit discovery, and lead optimization. This dissertation comprises two studies. First, we proposed an original approach based on statistical consideration dedicated to k-means clustering analysis in order to define a set of rules for structural features that would help in designing novel anti-cancer drug candidates. It has been applied successfully to classify 500 cytotoxic compounds with 21 molecular descriptors into distinct clusters. The percentage of molecules in each cluster is 50%, 24.88%, and 25.12% for cluster 1, cluster 2, and cluster 3, respectively. Each cluster groups a homogeneous class of molecules with respect to their molecular descriptors. Silhouette analysis, used as a cluster validation approach proves that the molecules are very well clustered, and there are no molecules placed in the wrong cluster. In silico screening of pharmacological properties ADME and evaluation of drug-likeness were performed for all molecules. The quantitative analysis of molecular electrostatic potential was performed to identify the nucleophilic and electrophilic sites in the representative molecule of each cluster. In addition, a molecular docking study was carried out to investigate the interactions of the paragon molecules with the active binding sites of six different targets. Our findings provide a guide to assist the chemist in selecting and testing only the potential molecules with good pharmacokinetic profiles to improve the clinical outcomes of drug therapies. Second, a simulation-based investigation was conducted to examine the CHK1 inhibitory activity of cytotoxic xanthone derivatives using a hierarchical workflow for molecular docking, MD simulation, ADME-TOX prediction, and MEP analysis. A molecular docking study was conducted for the forty-three xanthone derivatives along with standard Prexasertib into the selected CHK1 protein structures 7AKM and 7AKO. Furthermore, MD studies support molecular docking results and validate the stability of studied complexes in physiological conditions. Moreover, in silico ADME-TOX studies are used to predict the pharmacokinetic, pharmacodynamic, and toxicological properties of the selected eight xanthones and the standard Prexasertib. The quantitative analysis of electrostatic potential was performed for the lead compound L36 to identify the reactive sites and possible noncovalent interactions. Our study provides new unexplored insights into xanthones as CHK1 inhibitors and identified L36 as a potential drug candidate that could undergo further in vivo assays and optimization, laying a solid foundation for the development of CHK1 inhibitors and cancer drug discovery. To the best of our knowledge, this is the first time such a study was conducted for the xanthones with CHK1 by using a computational based approach
    corecore