Search CORE

28 research outputs found

INDIGO - INtegrated Data Warehouse of MIcrobial GenOmes with Examples from the Red Sea Extremophiles.

Author: Alam Intikhab
Antunes André
Ba Alawi Wail
Bajic Vladimir B
Kalkatawi Manal
Kamau Allan Anthony
Stingl Ulrich
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Background: The next generation sequencing technologies substantially increased the throughput of microbial genome sequencing. To functionally annotate newly sequenced microbial genomes, a variety of experimental and computational methods are used. Integration of information from different sources is a powerful approach to enhance such annotation. Functional analysis of microbial genomes, necessary for downstream experiments, crucially depends on this annotation but it is hampered by the current lack of suitable information integration and exploration systems for microbial genomes. Results: We developed a data warehouse system (INDIGO) that enables the integration of annotations for exploration and analysis of newly sequenced microbial genomes. INDIGO offers an opportunity to construct complex queries and combine annotations from multiple sources starting from genomic sequence to protein domain, gene ontology and pathway levels. This data warehouse is aimed at being populated with information from genomes of pure cultures and uncultured single cells of Red Sea bacteria and Archaea. Currently, INDIGO contains information from Salinisphaera shabanensis, Haloplasma contractile, and Halorhabdus tiamatea - extremophiles isolated from deep-sea anoxic brine lakes of the Red Sea. We provide examples of utilizing the system to gain new insights into specific aspects on the unique lifestyle and adaptations of these organisms to extreme environments. Conclusions: We developed a data warehouse system, INDIGO, which enables comprehensive integration of information from various resources to be used for annotation, exploration and analysis of microbial genomes. It will be regularly updated and extended with new genomes. It is aimed to serve as a resource dedicated to the Red Sea microbes. In addition, through INDIGO, we provide our Automatic Annotation of Microbial Genomes (AAMG) pipeline. The INDIGO web server is freely available at http://www.cbrc.kaust.edu.sa/indigo.IA and AAK were supported from the KAUST CBRC Base Fund of VBB. WBa and VBB were supported from the KAUST Base Funds of VBB. US was supported by the KAUST Base Fund of US. This study was partly supported by the Saudi Economic and Development Company (SEDCO) Research Excellence award to US and VBB. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript

CiteSeerX

Public Library of Science (PLOS)

Universidade do Minho: RepositoriUM

Crossref

Directory of Open Access Journals

Edge Hill University Research Information Repository

PubMed Central

The implications of model–informed drug discovery and development for tuberculosis

Author: Magbubah Essack (419839)
Moataz Afeef (837121)
Othman Soufan (696313)
Panos Kalnis (463815)
Vladimir Bajic (3479747)
Wail Ba-Alawi (3761047)
Publication venue
Publication date: 01/01/2016
Field of study

The research leading to these results received support from the Innovative Medicines Initiative Joint Undertaking under grant agreement n°115337, the resources of which comprise financial contributions from the European Union's Seventh Framework Programme (FP7/2007-2013) and EFPIA companies’ in kind contribution.Despite promising advances in the field and highly effective first-line treatment, an estimated 9.6 million people are still infected with tuberculosis (TB). Innovative methods are required to effectively transition the growing number of compounds into novel combination regimens. However, progression of compounds into patients occurs despite the lack of clear understanding of the pharmacokinetic-pharmacodynamic (PK/PD) relations. The PreDiCT-TB consortium was established in response to the existing gaps in TB drug development. The aim of the consortium is to develop new preclinical tools in concert with an in silico model-based approach, grounded in PKPD principles. Here, we highlight the potential impact of such an integrated framework on various stages in TB drug development and on the dose rationale for drug combinations.PostprintPeer reviewe

Crossref

University of St. Andrews - Pure

St Andrews Research Repository

FigShare

Creating reproducible pharmacogenomic analysis pipelines

Author: Ba-alawi Wail
Haibe-Kains Benjamin
Mammoliti Anthony
Safikhani Zhaleh
Safikhani Zhaleh
Smirnov Petr
Smirnov Petr
Publication venue
Publication date: 02/09/2019
Field of study

BSTRACT"/jats:title""jats:p"The field of Pharmacogenomics presents great challenges for researchers that are willing to make their studies reproducible and shareable. This is attributed to the generation of large volumes of high-throughput multimodal data, and the lack of standardized workflows that are robust, scalable, and flexible to perform large-scale analyses. To address this issue, we developed pharmacogenomic workflows in the Common Workflow Language to process two breast cancer datasets in a reproducible and transparent manner. Our pipelines combine both pharmacological and molecular profiles into a portable data object that can be used for future analyses in cancer research. Our data objects and workflows are shared on Harvard Dataverse and Code Ocean where they have been assigned a unique Digital Object Identifier, providing a level of data provenance and a persistent location to access and share our data with the community. Document type: Preprin

Scipedia

Evaluation of statistical approaches for association testing in noisy drug screening data

Author: Aittokallio Tero
Ba-alawi Wail
Hafner Marc
Haibe-Kains Benjamin
Khodakarami Farnoosh
Lin Eva
Martin Scott
Ortmann Janosch
Safikhani Zhaleh
Smirnov Petr
Smith Ian
Yu Yihong
Publication venue
Publication date: 01/01/2022
Field of study

Background Identifying associations among biological variables is a major challenge in modern quantitative biological research, particularly given the systemic and statistical noise endemic to biological systems. Drug sensitivity data has proven to be a particularly challenging field for identifying associations to inform patient treatment. Results To address this, we introduce two semi-parametric variations on the commonly used concordance index: the robust concordance index and the kernelized concordance index (rCI, kCI), which incorporate measurements about the noise distribution from the data. We demonstrate that common statistical tests applied to the concordance index and its variations fail to control for false positives, and introduce efficient implementations to compute p-values using adaptive permutation testing. We then evaluate the statistical power of these coefficients under simulation and compare with Pearson and Spearman correlation coefficients. Finally, we evaluate the various statistics in matching drugs across pharmacogenomic datasets. Conclusions We observe that the rCI and kCI are better powered than the concordance index in simulation and show some improvement on real data. Surprisingly, we observe that the Pearson correlation was the most robust to measurement noise among the different metrics.Peer reviewe

PubMed Central

Helsingin yliopiston digitaalinen arkisto

NORA - Norwegian Open Research Archives

Creating reproducible pharmacogenomic analysis pipelines

Author: Ba-Alawi Wail
Haibe-Kains Benjamin
Mammoliti Anthony
Safikhani Zhaleh
Smirnov Petr
Publication venue: Harvard Dataverse
Publication date
Field of study

This dataset contains the following data that were generated through our reproducible PharmacoGx CWL workflows: 1. GRAY (2013, 2017), UHNBreast (2017, 2019) PharmacoSet (PSet) 2. Research Object for each respective PSet A PSet is a data object that possesses cell line and drug curations, processed drug sensitivity, and molecular profile data for a pharmacogenomic dataset. We have created PSets for multiple updates of the Oregon Health and Science University (OHSU) breast cancer screen generated within Dr. Joe Gray's laboratory, and the University Health Network (UHN) breast cancer screen (UHNBreast)

Harvard Dataverse Network

MOESM1 of DASPfind: new efficient method to predict drugâtarget interactions

Author: Magbubah Essack (419839)
Othman Soufan (696313)
Panos Kalnis (463815)
Vladimir Bajic (3479747)
Wail Ba-alawi (837120)
Publication venue
Publication date
Field of study

Additional file 1. This file includes the following: a) Pseudocode of DASPfind algorithm; b) 10-fold cross validation for different methods; c) detailed comparison between NRWRH and DASPfind; d) all âtop 1â predictions for each data set used in our study

FigShare

Mining Chemical Activity Status from High-Throughput Screening Assays

Author: Magbubah Essack (419839)
Moataz Afeef (837121)
Othman Soufan (696313)
Panos Kalnis (463815)
Valentin Rodionov (837122)
Vladimir B. Bajic (8687)
Wail Ba-alawi (837120)
Publication venue
Publication date: 01/01/2015
Field of study

<div><p>High-throughput screening (HTS) experiments provide a valuable resource that reports biological activity of numerous chemical compounds relative to their molecular targets. Building computational models that accurately predict such activity status (active vs. inactive) in specific assays is a challenging task given the large volume of data and frequently small proportion of active compounds relative to the inactive ones. We developed a method, DRAMOTE, to predict activity status of chemical compounds in HTP activity assays. For a class of HTP assays, our method achieves considerably better results than the current state-of-the-art-solutions. We achieved this by modification of a minority oversampling technique. To demonstrate that DRAMOTE is performing better than the other methods, we performed a comprehensive comparison analysis with several other methods and evaluated them on data from 11 PubChem assays through 1,350 experiments that involved approximately 500,000 interactions between chemicals and their target proteins. As an example of potential use, we applied DRAMOTE to develop robust models for predicting FDA approved drugs that have high probability to interact with the thyroid stimulating hormone receptor (TSHR) in humans. Our findings are further partially and indirectly supported by 3D docking results and literature information. The results based on approximately 500,000 interactions suggest that DRAMOTE has performed the best and that it can be used for developing robust virtual screening models. The datasets and implementation of all solutions are available as a MATLAB toolbox online at <a href="http://www.cbrc.kaust.edu.sa/dramote" target="_blank">www.cbrc.kaust.edu.sa/dramote</a> and can be found on Figshare.</p></div

Directory of Open Access Journals

FigShare

Illustration of generating synthetic instances.

Author: Magbubah Essack (419839)
Moataz Afeef (837121)
Othman Soufan (696313)
Panos Kalnis (463815)
Valentin Rodionov (837122)
Vladimir B. Bajic (8687)
Wail Ba-alawi (837120)
Publication venue
Publication date
Field of study

<p>A) SMOTE generates the light blue samples by interpolation between a randomly chosen minority sample and k-nearest neighbors. B) DRAMOTE generates the light blue samples by choosing a minority sample based on its importance (i.e. contribution to precision) and the direction towards a safe region. A minority sample (red colored) that is very close to the majority negatives circles will be probably misclassified as a negative one and hence, it should get more support compared to the green colored minority samples. Once a minority sample is chosen, another point needs to be chosen for interpolation. The direction of interpolation can be controlled by choosing a nearest neighbor which is not overlapping with the negative class. This, in turn, helps in providing support for the red colored point while not harming the classifier performance in its surrounding region.</p

FigShare

Boxplot over free energy of binding and RMSD values for experimental, random and DRAMOTE docking results.

Author: Magbubah Essack (419839)
Moataz Afeef (837121)
Othman Soufan (696313)
Panos Kalnis (463815)
Valentin Rodionov (837122)
Vladimir B. Bajic (8687)
Wail Ba-alawi (837120)
Publication venue
Publication date
Field of study

<p>The random set is based on choosing 10 random drugs from approved drugs list in DrugBank database. The experimental set includes the top 10 drugs as listed in the original BioAssay AID 938 of PubChem database.</p

FigShare

Workflow of annotation process and data warehousing.

Author: Allan Anthony Kamau (494206)
André Antunes (5654182)
Intikhab Alam (40779)
Manal Kalkatawi (494208)
Ulrich Stingl (119514)
Vladimir B. Bajic (8687)
Wail Ba alawi (494207)
Publication venue
Publication date
Field of study

<p>Here, the section marked (A) shows steps in the annotation process. Section (B) shows a PERL based conversion of annotations into an XML schema - validated using the class attributes and data types defined in the genomic model, and finally, section (C) shows the process of data warehouse development steps.</p

FigShare