35 research outputs found

    Similarities and Differences Between Variants Called With Human Reference Genome HG19 or HG38

    Get PDF
    Background: Reference genome selection is a prerequisite for successful analysis of next generation sequencing (NGS) data. Current practice employs one of the two most recent human reference genome versions: HG19 or HG38. To date, the impact of genome version on SNV identification has not been rigorously assessed. Results: We conducted analysis comparing the SNVs identified based on HG19 vs HG38, leveraging whole genome sequencing (WGS) data from the genome-in-a-bottle (GIAB) project. First, SNVs were called using 26 different bioinformatics pipelines with either HG19 or HG38. Next, two tools were used to convert the called SNVs between HG19 and HG38. Lastly we calculated conversion rates, analyzed discordant rates between SNVs called with HG19 or HG38, and characterized the discordant SNVs. Conclusions: The conversion rates from HG38 to HG19 (average 95%) were lower than the conversion rates from HG19 to HG38 (average 99%). The conversion rates varied slightly among the various calling pipelines. Around 1.5% SNVs were discordantly converted between HG19 or HG38. The conversions from HG38 to HG19 had more SNVs which failed conversion and more discordant SNVs than the opposite conversion (HG19 to HG38). Most of the discordant SNVs had low read depth, were low confidence SNVs as defined by GIAB, and/or were predominated by G/C alleles (52% observed versus 42% expected)

    Modeling Chemical Interaction Profiles: II. Molecular Docking, Spectral Data-Activity Relationship, and Structure-Activity Relationship Models for Potent and Weak Inhibitors of Cytochrome P450 CYP3A4 Isozyme

    Get PDF
    Polypharmacy increasingly has become a topic of public health concern, particularly as the U.S. population ages. Drug labels often contain insufficient information to enable the clinician to safely use multiple drugs. Because many of the drugs are bio-transformed by cytochrome P450 (CYP) enzymes, inhibition of CYP activity has long been associated with potentially adverse health effects. In an attempt to reduce the uncertainty pertaining to CYP-mediated drug-drug/chemical interactions, an interagency collaborative group developed a consensus approach to prioritizing information concerning CYP inhibition. The consensus involved computational molecular docking, spectral data-activity relationship (SDAR), and structure-activity relationship (SAR) models that addressed the clinical potency of CYP inhibition. The models were built upon chemicals that were categorized as either potent or weak inhibitors of the CYP3A4 isozyme. The categorization was carried out using information from clinical trials because currently available in vitro high-throughput screening data were not fully representative of the in vivo potency of inhibition. During categorization it was found that compounds, which break the Lipinski rule of five by molecular weight, were about twice more likely to be inhibitors of CYP3A4 compared to those, which obey the rule. Similarly, among inhibitors that break the rule, potent inhibitors were 2–3 times more frequent. The molecular docking classification relied on logistic regression, by which the docking scores from different docking algorithms, CYP3A4 three-dimensional structures, and binding sites on them were combined in a unified probabilistic model. The SDAR models employed a multiple linear regression approach applied to binned 1D 13C-NMR and 1D 15N-NMR spectral descriptors. Structure-based and physical-chemical descriptors were used as the basis for developing SAR models by the decision forest method. Thirty-three potent inhibitors and 88 weak inhibitors of CYP3A4 were used to train the models. Using these models, a synthetic majority rules consensus classifier was implemented, while the confidence of estimation was assigned following the percent agreement strategy. The classifier was applied to a testing set of 120 inhibitors not included in the development of the models. Five compounds of the test set, including known strong inhibitors dalfopristin and tioconazole, were classified as probable potent inhibitors of CYP3A4. Other known strong inhibitors, such as lopinavir, oltipraz, quercetin, raloxifene, and troglitazone, were among 18 compounds classified as plausible potent inhibitors of CYP3A4. The consensus estimation of inhibition potency is expected to aid in the nomination of pharmaceuticals, dietary supplements, environmental pollutants, and occupational and other chemicals for in-depth evaluation of the CYP3A4 inhibitory activity. It may serve also as an estimate of chemical interactions via CYP3A4 metabolic pharmacokinetic pathways occurring through polypharmacy and nutritional and environmental exposures to chemical mixtures

    Assessing Reproducibility of Inherited Variants Detected With Short-Read Whole Genome Sequencing

    Get PDF
    Background: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. Results: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when \u3e 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30×. Conclusions: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS

    Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genome-wide association studies (GWAS) aim to identify genetic variants (usually single nucleotide polymorphisms [SNPs]) across the entire human genome that are associated with phenotypic traits such as disease status and drug response. Highly accurate and reproducible genotype calling are paramount since errors introduced by calling algorithms can lead to inflation of false associations between genotype and phenotype. Most genotype calling algorithms currently used for GWAS are based on multiple arrays. Because hundreds of gigabytes (GB) of raw data are generated from a GWAS, the samples are typically partitioned into batches containing subsets of the entire dataset for genotype calling. High call rates and accuracies have been achieved. However, the effects of batch size (i.e., number of chips analyzed together) and of batch composition (i.e., the choice of chips in a batch) on call rate and accuracy as well as the propagation of the effects into significantly associated SNPs identified have not been investigated. In this paper, we analyzed both the batch size and batch composition for effects on the genotype calling algorithm BRLMM using raw data of 270 HapMap samples analyzed with the Affymetrix Human Mapping 500 K array set.</p> <p>Results</p> <p>Using data from 270 HapMap samples interrogated with the Affymetrix Human Mapping 500 K array set, three different batch sizes and three different batch compositions were used for genotyping using the BRLMM algorithm. Comparative analysis of the calling results and the corresponding lists of significant SNPs identified through association analysis revealed that both batch size and composition affected genotype calling results and significantly associated SNPs. Batch size and batch composition effects were more severe on samples and SNPs with lower call rates than ones with higher call rates, and on heterozygous genotype calls compared to homozygous genotype calls.</p> <p>Conclusion</p> <p>Batch size and composition affect the genotype calling results in GWAS using BRLMM. The larger the differences in batch sizes, the larger the effect. The more homogenous the samples in the batches, the more consistent the genotype calls. The inconsistency propagates to the lists of significantly associated SNPs identified in downstream association analysis. Thus, uniform and large batch sizes should be used to make genotype calls for GWAS. In addition, samples of high homogeneity should be placed into the same batch.</p

    Assessing reproducibility of inherited variants detected with short-read whole genome sequencing

    Get PDF
    Background: Reproducible detection of inherited variants with whole genome sequencing (WGS) is vital for the implementation of precision medicine and is a complicated process in which each step affects variant call quality. Systematically assessing reproducibility of inherited variants with WGS and impact of each step in the process is needed for understanding and improving quality of inherited variants from WGS. Results: To dissect the impact of factors involved in detection of inherited variants with WGS, we sequence triplicates of eight DNA samples representing two populations on three short-read sequencing platforms using three library kits in six labs and call variants with 56 combinations of aligners and callers. We find that bioinformatics pipelines (callers and aligners) have a larger impact on variant reproducibility than WGS platform or library preparation. Single-nucleotide variants (SNVs), particularly outside difficult-to-map regions, are more reproducible than small insertions and deletions (indels), which are least reproducible when > 5 bp. Increasing sequencing coverage improves indel reproducibility but has limited impact on SNVs above 30x. Conclusions: Our findings highlight sources of variability in variant detection and the need for improvement of bioinformatics pipelines in the era of precision medicine with WGS.Peer reviewe

    Consensus Modeling for Prediction of Estrogenic Activity of Ingredients Commonly Used in Sunscreen Products

    No full text
    Sunscreen products are predominantly regulated as over-the-counter (OTC) drugs by the US FDA. The “active” ingredients function as ultraviolet filters. Once a sunscreen product is generally recognized as safe and effective (GRASE) via an OTC drug review process, new formulations using these ingredients do not require FDA review and approval, however, the majority of ingredients have never been tested to uncover any potential endocrine activity and their ability to interact with the estrogen receptor (ER) is unknown, despite the fact that this is a very extensively studied target related to endocrine activity. Consequently, we have developed an in silico model to prioritize single ingredient estrogen receptor activity for use when actual animal data are inadequate, equivocal, or absent. It relies on consensus modeling to qualitatively and quantitatively predict ER binding activity. As proof of concept, the model was applied to ingredients commonly used in sunscreen products worldwide and a few reference chemicals. Of the 32 chemicals with unknown ER binding activity that were evaluated, seven were predicted to be active estrogenic compounds. Five of the seven were confirmed by the published data. Further experimental data is needed to confirm the other two predictions

    Pathway Analysis Revealed Potential Diverse Health Impacts of Flavonoids that Bind Estrogen Receptors

    No full text
    Flavonoids are frequently used as dietary supplements in the absence of research evidence regarding health benefits or toxicity. Furthermore, ingested doses could far exceed those received from diet in the course of normal living. Some flavonoids exhibit binding to estrogen receptors (ERs) with consequential vigilance by regulatory authorities at the U.S. EPA and FDA. Regulatory authorities must consider both beneficial claims and potential adverse effects, warranting the increases in research that has spanned almost two decades. Here, we report pathway enrichment of 14 targets from the Comparative Toxicogenomics Database (CTD) and the Herbal Ingredients’ Targets (HIT) database for 22 flavonoids that bind ERs. The selected flavonoids are confirmed ER binders from our earlier studies, and were here found in mainly involved in three types of biological processes, ER regulation, estrogen metabolism and synthesis, and apoptosis. Besides cancers, we conjecture that the flavonoids may affect several diseases via apoptosis pathways. Diseases such as amyotrophic lateral sclerosis, viral myocarditis and non-alcoholic fatty liver disease could be implicated. More generally, apoptosis processes may be importantly evolved biological functions of flavonoids that bind ERs and high dose ingestion of those flavonoids could adversely disrupt the cellular apoptosis process

    Structural Changes Due to Antagonist Binding in Ligand Binding Pocket of Androgen Receptor Elucidated Through Molecular Dynamics Simulations

    No full text
    When a small molecule binds to the androgen receptor (AR), a conformational change can occur which impacts subsequent binding of co-regulator proteins and DNA. In order to accurately study this mechanism, the scientific community needs a crystal structure of the Wild type AR (WT-AR) ligand binding domain, bound with antagonist. To address this open need, we leveraged molecular docking and molecular dynamics (MD) simulations to construct a structure of the WT-AR ligand binding domain bound with antagonist bicalutamide. The structure of mutant AR (Mut-AR) bound with this same antagonist informed this study. After molecular docking analysis pinpointed the suitable binding orientation of a ligand in AR, the model was further optimized through 1 μs of MD simulations. Using this approach, three molecular systems were studied: (1) WT-AR bound with agonist R1881, (2) WT-AR bound with antagonist bicalutamide, and (3) Mut-AR bound with bicalutamide. Our structures were very similar to the experimentally determined structures of both WT-AR with R1881 and Mut-AR with bicalutamide, demonstrating the trustworthiness of this approach. In our model, when WT-AR is bound with bicalutamide, Val716/Lys720/Gln733, or Met734/Gln738/Glu897 move and thus disturb the positive and negative charge clumps of the AF2 site. This disruption of the AF2 site is key for understanding the impact of antagonist binding on subsequent co-regulator binding. In conclusion, the antagonist induced structural changes in WT-AR detailed in this study will enable further AR research and will facilitate AR targeting drug discovery
    corecore