28 research outputs found

    Automated QuantMap for rapid quantitative molecular network topology analysis

    Get PDF
    ABSTRACT Summary: The previously disclosed QuantMap method for grouping chemicals by biological activity used online services for much of the data gathering and some of the numerical analysis. The present work attempts to streamline this process by using local copies of the databases and in-house analysis. Using computational methods similar or identical to those used in the previous work, a qualitatively equivalent result was found in just a few seconds on the same dataset (collection of 18 drugs). We use the user-friendly Galaxy framework to enable users to analyze their own datasets. Hopefully, this will make the QuantMap method more practical and accessible and help achieve its goals to provide substantial assistance to drug repositioning, pharmacology evaluation and toxicology risk assessment. Availability: http:

    Computational Studies of HIV-1 Protease Inhibitors

    No full text
    Human Immunodeficiency Virus (HIV) is the causative agent of the pandemic disease Acquired Immune Deficiency Syndrome (AIDS). HIV acts to disrupt the immune system which makes the body susceptible to opportunistic infections. Untreated, AIDS is generally fatal. Twenty years of research by countless scientists around the world has led to the discovery and exploitation of several targets in the replication cycle of HIV. Many lives have been saved, prolonged and improved as a result of this massive effort. One particularly successful target has been the inhibition of HIV protease. In combination with the inhibition of HIV reverse transcriptase, protease inhibitors have helped to reduce viral loads and partially restore the immune system. Unfortunately, viral mutations leading to drug resistance and harmful side-effects of the current medicines have identified the need for new drugs to combat HIV. This study presents computational efforts to understand the interaction of inhibitors to HIV protease. The first part of this study has used molecular modelling and Comparative Molecular Field Analysis (CoMFA) to help explain the structure-active relationship of a novel series of protease inhibitors. The inhibitors are sulfamide derivatives structurally similar to the cyclic urea candidate drug mozenavir (DMP-450). The central ring of the sulfamides twists to adopt a nonsymmetrical binding mode distinct from that of the cyclic ureas. The energetics of this twist has been studied with ab initio calculations to develop improved empirical force field parameters for use in molecular modelling. The second part of this study has focused on an analysis of the association and dissociation kinetics of a broad collection of HIV protease inhibitors. Quantitative models have been derived using CoMFA which relate the dissociation rate back to the chemical structures. Efforts have also been made to improve the models by systematically varying the parameters used to generate them

    Tracking the NGS revolution : managing life science research on shared high-performance computing clusters

    No full text
    Background Next-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences. Results The number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat. Conclusions Hosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases

    Automated QuantMap for rapid quantitative molecular network topology analysis

    No full text
    SUMMARY: The previously disclosed QuantMap method for grouping chemicals by biological activity used online services for much of the data gathering and some of the numerical analysis. The present work attempts to streamline this process by using local copies of the databases and in-house analysis. Using computational methods similar or identical to those used in the previous work, a qualitatively equivalent result was found in just a few seconds on the same dataset (collection of 18 drugs). We use the user-friendly Galaxy framework to enable users to analyze their own datasets. Hopefully, this will make the QuantMap method more practical and accessible and help achieve its goals to provide substantial assistance to drug repositioning, pharmacology evaluation and toxicology risk assessment. AVAILABILITY: http://galaxy.predpharmtox.org CONTACT: [email protected] or [email protected] SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online

    Rapid Increase in Carriage Rates of Enterobacteriaceae Producing Extended-Spectrum β-Lactamases in Healthy Preschool Children, Sweden

    No full text
    By collecting and analyzing diapers, we identified a >6-fold increase in carriage of extended-spectrum β-lactamase (ESBL)-producing Enterobacteriaceae for healthy preschool children in Sweden (p<0.0001). For 6 of the 50 participating preschools, the carriage rate was >40%. We analyzed samples from 334 children and found 56 containing >1 ESBL producer. The prevalence in the study population increased from 2.6% in 2010 to 16.8% in 2016 (p<0.0001), and for 6 of the 50 participating preschools, the carriage rate was >40%. Furthermore, 58% of the ESBL producers were multidrug resistant, and transmission of ESBL-producing and non-ESBL-producing strains was observed at several of the preschools. Toddlers appear to be major carriers of ESBL producers in Sweden

    Migrating to Long-Read Sequencing for Clinical Routine BCR-ABL1 TKI Resistance Mutation Screening

    No full text
    OBJECTIVE: The aim of this project was to implement long-read sequencing for BCR-ABL1 TKI resistance mutation screening in a clinical setting for patients undergoing treatment for chronic myeloid leukemia. MATERIALS AND METHODS: Processes were established for registering and transferring samples from the clinic to an academic sequencing facility for long-read sequencing. An automated analysis pipeline for detecting mutations was established, and an information system was implemented comprising features for data management, analysis and visualization. Clinical validation was performed by identifying BCR-ABL1 TKI resistance mutations by Sanger and long-read sequencing in parallel. The developed software is available as open source via GitHub at https://github.com/pharmbio/clamp RESULTS: The information system enabled traceable transfer of samples from the clinic to the sequencing facility, robust and automated analysis of the long-read sequence data, and communication of results from sequence analysis in a reporting format that could be easily interpreted and acted upon by clinical experts. In a validation study, all 17 resistance mutations found by Sanger sequencing were also detected by long-read sequencing. An additional 16 mutations were found only by long-read sequencing, all of them with frequencies below the limit of detection for Sanger sequencing. The clonal distributions of co-existing mutations were automatically resolved through the long- read data analysis. After the implementation and validation, the clinical laboratory switched their routine protocol from using Sanger to long-read sequencing for this application. CONCLUSIONS: Long-read sequencing delivers results with higher sensitivity compared to Sanger sequencing and enables earlier detection of emerging TKI resistance mutations. The developed processes, analysis workflow, and software components lower barriers for adoption and could be extended to other applications. KEYWORDS: Long-read sequencing, SMRT sequencing, drug resistance, chronic myeloid leukemia, BCR-ABL1, CML, mutation screenin

    Efficient iterative virtual screening with Apache Spark and conformal prediction.

    No full text
    BACKGROUND: Docking and scoring large libraries of ligands against target proteins forms the basis of structure-based virtual screening. The problem is trivially parallelizable, and calculations are generally carried out on computer clusters or on large workstations in a brute force manner, by docking and scoring all available ligands. CONTRIBUTION: In this study we propose a strategy that is based on iteratively docking a set of ligands to form a training set, training a ligand-based model on this set, and predicting the remainder of the ligands to exclude those predicted as 'low-scoring' ligands. Then, another set of ligands are docked, the model is retrained and the process is repeated until a certain model efficiency level is reached. Thereafter, the remaining ligands are docked or excluded based on this model. We use SVM and conformal prediction to deliver valid prediction intervals for ranking the predicted ligands, and Apache Spark to parallelize both the docking and the modeling. RESULTS: We show on 4 different targets that conformal prediction based virtual screening (CPVS) is able to reduce the number of docked molecules by 62.61% while retaining an accuracy for the top 30 hits of 94% on average and a speedup of 3.7. The implementation is available as open source via GitHub ( https://github.com/laeeq80/spark-cpvs ) and can be run on high-performance computers as well as on cloud resources

    Large-scale ligand-based predictive modelling using support vector machines

    Get PDF
    The increasing size of datasets in drug discovery makes it challenging to build robust and accurate predictive models within a reasonable amount of time. In order to investigate the effect of dataset sizes on predictive performance and modelling time, ligand-based regression models were trained on open datasets of varying sizes of up to 1.2 million chemical structures. For modelling, two implementations of support vector machines (SVM) were used. Chemical structures were described by the signatures molecular descriptor. Results showed that for the larger datasets, the LIBLINEAR SVM implementation performed on par with the well-established libsvm with a radial basis function kernel, but with dramatically less time for model building even on modest computer resources. Using a non-linear kernel proved to be infeasible for large data sizes, even with substantial computational resources on a computer cluster. To deploy the resulting models, we extended the Bioclipse decision support framework to support models from LIBLINEAR and made our models of logD and solubility available from within Bioclipse

    Predicting protein network topology clusters from chemical structure using deep learning

    No full text
    Comparing chemical structures to infer protein targets and functions is a common approach, but basing comparisons on chemical similarity alone can be misleading. Here we present a methodology for predicting target protein clusters using deep neural networks. The model is trained on clusters of compounds based on similarities calculated from combined compound-protein and protein-protein interaction data using a network topology approach. We compare several deep learning architectures including both convolutional and recurrent neural networks. The best performing method, the recurrent neural network architecture MolPMoFiT, achieved an F1 score approaching 0.9 on a held-out test set of 8907 compounds. In addition, in-depth analysis on a set of eleven well-studied chemical compounds with known functions showed that predictions were justifiable for all but one of the chemicals. Four of the compounds, similar in their molecular structure but with dissimilarities in their function, revealed advantages of our method compared to using chemical similarity

    RDF Dataset for article: A confidence predictor for logD using conformal regression and a support-vector machine

    No full text
    RDF dataset described in article: "A confidence predictor for logD using conformal regression and a support-vector machine" (Manuscript in preparation). The dataset contains conformal logD values at 90% confidence level, computed for 91M compounds from PubChem, in RDF format. The .hdt.gz version contains the dataset in RDF HDT format (http://www.rdfhdt.org/), compressed with tar and gzip. The archive contains both the .hdt file, and an index file, generated by the hdtSearch C++ tool. The .ttl.gz file is a gzipped file in RDF Turtle format (https://www.w3.org/TR/turtle/)
    corecore