28 research outputs found

    INDIGO - INtegrated Data Warehouse of MIcrobial GenOmes with Examples from the Red Sea Extremophiles.

    Get PDF
    Background: The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. Results: We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. Conclusions: We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.IA and AAK were supported from the KAUST CBRC Base Fund of VBB. WBa and VBB were supported from the KAUST Base Funds of VBB. US was supported by the KAUST Base Fund of US. This study was partly supported by the Saudi Economic and Development Company (SEDCO) Research Excellence award to US and VBB. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

    The implications of model–informed drug discovery and development for tuberculosis

    Get PDF
    The research leading to these results received support from the Innovative Medicines Initiative Joint Undertaking under grant agreement n°115337, the resources of which comprise financial contributions from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies’ in kind contribution.Despite promising advances in the field and highly effective first-line treatment, an estimated 9.6 million people are still infected with tuberculosis (TB). Innovative methods are required to effectively transition the growing number of compounds into novel combination regimens. However, progression of compounds into patients occurs despite the lack of clear understanding of the pharmacokinetic-pharmacodynamic (PK/PD) relations. The PreDiCT-TB consortium was established in response to the existing gaps in TB drug development. The aim of the consortium is to develop new preclinical tools in concert with an in silico model-based approach, grounded in PKPD principles. Here, we highlight the potential impact of such an integrated framework on various stages in TB drug development and on the dose rationale for drug combinations.PostprintPeer reviewe

    Creating reproducible pharmacogenomic analysis pipelines

    Get PDF
    BSTRACT"/jats:title""jats:p"The field of Pharmacogenomics presents great challenges for researchers that are willing to make their studies reproducible and shareable. This is attributed to the generation of large volumes of high-throughput multimodal data, and the lack of standardized workflows that are robust, scalable, and flexible to perform large-scale analyses. To address this issue, we developed pharmacogenomic workflows in the Common Workflow Language to process two breast cancer datasets in a reproducible and transparent manner. Our pipelines combine both pharmacological and molecular profiles into a portable data object that can be used for future analyses in cancer research. Our data objects and workflows are shared on Harvard Dataverse and Code Ocean where they have been assigned a unique Digital Object Identifier, providing a level of data provenance and a persistent location to access and share our data with the community. Document type: Preprin

    Evaluation of statistical approaches for association testing in noisy drug screening data

    Get PDF
    Background Identifying associations among biological variables is a major challenge in modern quantitative biological research, particularly given the systemic and statistical noise endemic to biological systems. Drug sensitivity data has proven to be a particularly challenging field for identifying associations to inform patient treatment. Results To address this, we introduce two semi-parametric variations on the commonly used concordance index: the robust concordance index and the kernelized concordance index (rCI, kCI), which incorporate measurements about the noise distribution from the data. We demonstrate that common statistical tests applied to the concordance index and its variations fail to control for false positives, and introduce efficient implementations to compute p-values using adaptive permutation testing. We then evaluate the statistical power of these coefficients under simulation and compare with Pearson and Spearman correlation coefficients. Finally, we evaluate the various statistics in matching drugs across pharmacogenomic datasets. Conclusions We observe that the rCI and kCI are better powered than the concordance index in simulation and show some improvement on real data. Surprisingly, we observe that the Pearson correlation was the most robust to measurement noise among the different metrics.Peer reviewe

    Creating reproducible pharmacogenomic analysis pipelines

    No full text
    This dataset contains the following data that were generated through our reproducible PharmacoGx CWL workflows: 1. GRAY (2013, 2017), UHNBreast (2017, 2019) PharmacoSet (PSet) 2. Research Object for each respective PSet A PSet is a data object that possesses cell line and drug curations, processed drug sensitivity, and molecular profile data for a pharmacogenomic dataset. We have created PSets for multiple updates of the Oregon Health and Science University (OHSU) breast cancer screen generated within Dr. Joe Gray's laboratory, and the University Health Network (UHN) breast cancer screen (UHNBreast)

    MOESM1 of DASPfind: new efficient method to predict drug–target interactions

    No full text
    Additional file 1. This file includes the following: a) Pseudocode of DASPfind algorithm; b) 10-fold cross validation for different methods; c) detailed comparison between NRWRH and DASPfind; d) all ‘top 1’ predictions for each data set used in our study

    Mining Chemical Activity Status from High-Throughput Screening Assays

    No full text
    <div><p>High-throughput screening (HTS) experiments provide a valuable resource that reports biological activity of numerous chemical compounds relative to their molecular targets. Building computational models that accurately predict such activity status (active vs. inactive) in specific assays is a challenging task given the large volume of data and frequently small proportion of active compounds relative to the inactive ones. We developed a method, DRAMOTE, to predict activity status of chemical compounds in HTP activity assays. For a class of HTP assays, our method achieves considerably better results than the current state-of-the-art-solutions. We achieved this by modification of a minority oversampling technique. To demonstrate that DRAMOTE is performing better than the other methods, we performed a comprehensive comparison analysis with several other methods and evaluated them on data from 11 PubChem assays through 1,350 experiments that involved approximately 500,000 interactions between chemicals and their target proteins. As an example of potential use, we applied DRAMOTE to develop robust models for predicting FDA approved drugs that have high probability to interact with the thyroid stimulating hormone receptor (TSHR) in humans. Our findings are further partially and indirectly supported by 3D docking results and literature information. The results based on approximately 500,000 interactions suggest that DRAMOTE has performed the best and that it can be used for developing robust virtual screening models. The datasets and implementation of all solutions are available as a MATLAB toolbox online at <a href="http://www.cbrc.kaust.edu.sa/dramote" target="_blank">www.cbrc.kaust.edu.sa/dramote</a> and can be found on Figshare.</p></div

    Illustration of generating synthetic instances.

    No full text
    <p>A) SMOTE generates the light blue samples by interpolation between a randomly chosen minority sample and k-nearest neighbors. B) DRAMOTE generates the light blue samples by choosing a minority sample based on its importance (i.e. contribution to precision) and the direction towards a safe region. A minority sample (red colored) that is very close to the majority negatives circles will be probably misclassified as a negative one and hence, it should get more support compared to the green colored minority samples. Once a minority sample is chosen, another point needs to be chosen for interpolation. The direction of interpolation can be controlled by choosing a nearest neighbor which is not overlapping with the negative class. This, in turn, helps in providing support for the red colored point while not harming the classifier performance in its surrounding region.</p

    Boxplot over free energy of binding and RMSD values for experimental, random and DRAMOTE docking results.

    No full text
    <p>The random set is based on choosing 10 random drugs from approved drugs list in DrugBank database. The experimental set includes the top 10 drugs as listed in the original BioAssay AID 938 of PubChem database.</p

    Workflow of annotation process and data warehousing.

    No full text
    <p>Here, the section marked (A) shows steps in the annotation process. Section (B) shows a PERL based conversion of annotations into an XML schema - validated using the class attributes and data types defined in the genomic model, and finally, section (C) shows the process of data warehouse development steps.</p
    corecore