30 research outputs found

    Testosterone Pathway Genetic Polymorphisms in Relation to Primary Open-Angle Glaucoma: An Analysis in Two Large Datasets

    Get PDF
    Purpose Sex hormones may be associated with primary open-angle glaucoma (POAG), although the mechanisms are unclear. We previously observed that gene variants involved with estrogen metabolism were collectively associated with POAG in women but not men; here we assessed gene variants related to testosterone metabolism collectively and POAG risk. Methods: We used two datasets: one from the United States (3853 cases and 33,480 controls) and another from Australia (1155 cases and 1992 controls). Both datasets contained densely called genotypes imputed to the 1000 Genomes reference panel. We used pathway- and gene-based approaches with Pathway Analysis by Randomization Incorporating Structure (PARIS) software to assess the overall association between a panel of single nucleotide polymorphisms (SNPs) in testosterone metabolism genes and POAG. In sex-stratified analyses, we evaluated POAG overall and POAG subtypes defined by maximum IOP (high-tension [HTG] or normal tension glaucoma [NTG]). Results: In the US dataset, the SNP panel was not associated with POAG (permuted P = 0.77), although there was an association in the Australian sample (permuted P = 0.018). In both datasets, the SNP panel was associated with POAG in men (permuted P ≤ 0.033) and not women (permuted P ≥ 0.42), but in gene-based analyses, there was no consistency on the main genes responsible for these findings. In both datasets, the testosterone pathway association with HTG was significant (permuted P ≤ 0.011), but again, gene-based analyses showed no consistent driver gene associations. Conclusions: Collectively, testosterone metabolism pathway SNPs were consistently associated with the high-tension subtype of POAG in two datasets

    GPU-accelerated machine learning techniques enable QSAR modeling of large HTS data

    No full text
    Abstract—Quantitative structure activity relationship (QSAR) modeling using high-throughput screening (HTS) data is a powerful technique which enables the construction of predictive models. These models are utilized for the in silico screening of libraries of molecules for which experimental screening methods are both cost- and time-expensive. Machine learning techniques excel in QSAR modeling where the relationship between structure and activity is often complex and non-linear. As these HTS data sets continue to increase in number of compounds screened, extensive feature selection and cross validation becomes computationally expensive. Leveraging massively parallel architectures such as graphics processing units (GPUs) to accelerate the training algorithms for these machine learning techniques is a cost-efficient manner in which to combat this problem. In this work, several machine learning techniques are ported in OpenCL for GPU-acceleration to enable construction of QSAR ensemble models using HTS data. We report computational performance numbers using several HTS data sets freely available from PubChem database. We also report results of a case study using HTS data for a target of pharmacological and pharmaceutical relevance, cytochrome P450 3A4, for which an enrichment of 94 % of the theoretical maximum is achieved

    Shared Genetic Etiology of Autoimmune Diseases in Patients from a Biorepository Linked to De-identified Electronic Health Records

    No full text
    Autoimmune diseases represent a significant medical burden affecting up to 5-8% of the U.S. population. While genetics is known to play a role, studies of common autoimmune diseases are complicated by phenotype heterogeneity, limited sample sizes, and a single disease approach. Here we performed a targeted genetic association study for cases of multiple sclerosis (MS), rheumatoid arthritis (RA), and Crohn’s disease (CD) to assess which common genetic variants contribute individually and pleiotropically to disease risk. Joint modeling and pathway analysis combining the three phenotypes were performed to identify common underlying mechanisms of risk of autoimmune conditions. European American cases of MS, RA, and CD, (n=119, 53, and 129, respectively) and 1,924 controls were identified using de-identified electronic health records (EHRs) through a combination of International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) billing codes, Current Procedural Terminology (CPT) codes, medications lists, and text matching. As expected, hallmark SNPs in MS, such as DQA1 rs9271366 (OR=1.91; p=0.008), replicated in the present study. Both MS and CD were associated with TIMMDC1 rs2293370 (OR = 0.27, p=0.01; OR=0.25, p=0.02; respectively). Additionally, PDE2A rs3781913 was significantly associated with both CD and RA (OR=0.46, p=0.02; OR=0.32, p=0.02; respectively). Joint modeling and pathway analysis identified variants within the KEGG NOD-like receptor signaling pathway and Shigellosis pathway as being correlated with the combined autoimmune phenotype. Our study replicated previously reported genetic associations for MS and CD in a population derived from de-identified EHRs. We found evidence to support a shared genetic etiology between CD/MS and CD/RA outside of the major histocompatibility complex region and identified KEGG pathways indicative of a bacterial pathogenesis risk for autoimmunity in a joint model. Future work to elucidate this shared etiology will be key in the development of risk models as envisioned in the era of precision medicine

    BCL::EMAS — Enantioselective Molecular Asymmetry Descriptor for 3D-QSAR

    No full text
    Stereochemistry is an important determinant of a molecule’s biological activity. Stereoisomers can have different degrees of efficacy or even opposing effects when interacting with a target protein. Stereochemistry is a molecular property difficult to represent in 2D-QSAR as it is an inherently three-dimensional phenomenon. A major drawback of most proposed descriptors for 3D-QSAR that encode stereochemistry is that they require a heuristic for defining all stereocenters and rank-ordering its substituents. Here we propose a novel 3D-QSAR descriptor termed Enantioselective Molecular ASymmetry (EMAS) that is capable of distinguishing between enantiomers in the absence of such heuristics. The descriptor aims to measure the deviation from an overall symmetric shape of the molecule. A radial-distribution function (RDF) determines a signed volume of tetrahedrons of all triplets of atoms and the molecule center. The descriptor can be enriched with atom-centric properties such as partial charge. This descriptor showed good predictability when tested with a dataset of thirty-one steroids commonly used to benchmark stereochemistry descriptors (r<sup>2</sup> = 0.89, q<sup>2</sup> = 0.78). Additionally, EMAS improved enrichment of 4.38 versus 3.94 without EMAS in a simulated virtual high-throughput screening (vHTS) for inhibitors and substrates of cytochrome P450 (PUBCHEM AID891)

    Benchmarking Ligand-Based Virtual High-Throughput Screening with the PubChem Database

    No full text
    With the rapidly increasing availability of High-Throughput Screening (HTS) data in the public domain, such as the PubChem database, methods for ligand-based computer-aided drug discovery (LB-CADD) have the potential to accelerate and reduce the cost of probe development and drug discovery efforts in academia. We assemble nine data sets from realistic HTS campaigns representing major families of drug target proteins for benchmarking LB-CADD methods. Each data set is public domain through PubChem and carefully collated through confirmation screens validating active compounds. These data sets provide the foundation for benchmarking a new cheminformatics framework BCL::ChemInfo, which is freely available for non-commercial use. Quantitative structure activity relationship (QSAR) models are built using Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Decision Trees (DTs), and Kohonen networks (KNs). Problem-specific descriptor optimization protocols are assessed including Sequential Feature Forward Selection (SFFS) and various information content measures. Measures of predictive power and confidence are evaluated through cross-validation, and a consensus prediction scheme is tested that combines orthogonal machine learning algorithms into a single predictor. Enrichments ranging from 15 to 101 for a TPR cutoff of 25% are observed

    Introduction to the BioChemical Library (BCL): An Application-Based Open-Source Toolkit for Integrated Cheminformatics and Machine Learning in Computer-Aided Drug Discovery

    Get PDF
    The BioChemical Library (BCL) cheminformatics toolkit is an application-based academic open-source software package designed to integrate traditional small molecule cheminformatics tools with machine learning-based quantitative structure-activity/ property relationship (QSAR/QSPR) modeling. In this pedagogical article we provide a detailed introduction to core BCL cheminformatics functionality, showing how traditional tasks (e.g., computing chemical properties, estimating druglikeness) can be readily combined with machine learning. In addition, we have included multiple examples covering areas of advanced use, such as reaction-based library design. We anticipate that this manuscript will be a valuable resource for researchers in computer-aided drug discovery looking to integrate modular cheminformatics and machine learning tools into their pipelines
    corecore