249 research outputs found

    Functional annotation signatures of disease susceptibility loci improve SNP association analysis.

    Get PDF
    BACKGROUND: Genetic association studies are conducted to discover genetic loci that contribute to an inherited trait, identify the variants behind these associations and ascertain their functional role in determining the phenotype. To date, functional annotations of the genetic variants have rarely played more than an indirect role in assessing evidence for association. Here, we demonstrate how these data can be systematically integrated into an association study's analysis plan. RESULTS: We developed a Bayesian statistical model for the prior probability of phenotype-genotype association that incorporates data from past association studies and publicly available functional annotation data regarding the susceptibility variants under study. The model takes the form of a binary regression of association status on a set of annotation variables whose coefficients were estimated through an analysis of associated SNPs in the GWAS Catalog (GC). The functional predictors examined included measures that have been demonstrated to correlate with the association status of SNPs in the GC and some whose utility in this regard is speculative: summaries of the UCSC Human Genome Browser ENCODE super-track data, dbSNP function class, sequence conservation summaries, proximity to genomic variants in the Database of Genomic Variants and known regulatory elements in the Open Regulatory Annotation database, PolyPhen-2 probabilities and RegulomeDB categories. Because we expected that only a fraction of the annotations would contribute to predicting association, we employed a penalized likelihood method to reduce the impact of non-informative predictors and evaluated the model's ability to predict GC SNPs not used to construct the model. We show that the functional data alone are predictive of a SNP's presence in the GC. Further, using data from a genome-wide study of ovarian cancer, we demonstrate that their use as prior data when testing for association is practical at the genome-wide scale and improves power to detect associations. CONCLUSIONS: We show how diverse functional annotations can be efficiently combined to create 'functional signatures' that predict the a priori odds of a variant's association to a trait and how these signatures can be integrated into a standard genome-wide-scale association analysis, resulting in improved power to detect truly associated variants

    Scalable HPC & AI infrastructure for COVID-19 therapeutics

    Get PDF
    COVID-19 has claimed more than 2.7 Γ— 106 lives and resulted in over 124 Γ— 106 infections. There is an urgent need to identify drugs that can inhibit SARS-CoV-2. We discuss innovations in computational infrastructure and methods that are accelerating and advancing drug design. Specifically, we describe several methods that integrate artificial intelligence and simulation-based approaches, and the design of computational infrastructure to support these methods at scale. We discuss their implementation, characterize their performance, and highlight science advances that these capabilities have enabled

    Dual Mechanism for the Translation of Subgenomic mRNA from Sindbis Virus in Infected and Uninfected Cells

    Get PDF
    Infection of BHK cells by Sindbis virus (SV) gives rise to a profound inhibition of cellular protein synthesis, whereas translation of viral subgenomic mRNA that encodes viral structural proteins, continues for hours. To gain further knowledge on the mechanism by which this subgenomic mRNA is translated, the requirements for some initiation factors (eIFs) and for the presence of the initiator AUG were examined both in infected and in uninfected cells. To this end, BHK cells were transfected with different SV replicons or with in vitro made SV subgenomic mRNAs after inactivation of some eIFs. Specifically, eIF4G was cleaved by expression of the poliovirus 2A protease (2Apro) and the alpha subunit of eIF2 was inactivated by phosphorylation induced by arsenite treatment. Moreover, cellular location of these and other translation components was analyzed in BHK infected cells by confocal microscopy. Cleavage of eIF4G by poliovirus 2Apro does not hamper translation of subgenomic mRNA in SV infected cells, but bisection of this factor blocks subgenomic mRNA translation in uninfected cells or in cell-free systems. SV infection induces phosphorylation of eIF2Ξ±, a process that is increased by arsenite treatment. Under these conditions, translation of subgenomic mRNA occurs to almost the same extent as controls in the infected cells but is drastically inhibited in uninfected cells. Notably, the correct initiation site on the subgenomic mRNA is still partially recognized when the initiation codon AUG is modified to other codons only in infected cells. Finally, immunolocalization of different eIFs reveals that eIF2 Ξ± and eIF4G are excluded from the foci, where viral RNA replication occurs, while eIF3, eEF2 and ribosomes concentrate in these regions. These findings support the notion that canonical initiation takes place when the subgenomic mRNA is translated out of the infection context, while initiation can occur without some eIFs and even at non-AUG codons in infected cells

    Pandemic Drugs at Pandemic Speed: Infrastructure for Accelerating COVID-19 Drug Discovery with Hybrid Machine Learning- and Physics-based Simulations on High Performance Computers

    Get PDF
    The race to meet the challenges of the global pandemic has served as a reminder that the existing drug discovery process is expensive, inefficient and slow. There is a major bottleneck screening the vast number of potential small molecules to shortlist lead compounds for antiviral drug development. New opportunities to accelerate drug discovery lie at the interface between machine learning methods, in this case, developed for linear accelerators, and physics-based methods. The two in silico methods, each have their own advantages and limitations which, interestingly, complement each other. Here, we present an innovative infrastructural development that combines both approaches to accelerate drug discovery. The scale of the potential resulting workflow is such that it is dependent on supercomputing to achieve extremely high throughput. We have demonstrated the viability of this workflow for the study of inhibitors for four COVID-19 target proteins and our ability to perform the required large-scale calculations to identify lead antiviral compounds through repurposing on a variety of supercomputers

    Association between DNA Damage Response and Repair Genes and Risk of Invasive Serous Ovarian Cancer

    Get PDF
    BACKGROUND: We analyzed the association between 53 genes related to DNA repair and p53-mediated damage response and serous ovarian cancer risk using case-control data from the North Carolina Ovarian Cancer Study (NCOCS), a population-based, case-control study. METHODS/PRINCIPAL FINDINGS: The analysis was restricted to 364 invasive serous ovarian cancer cases and 761 controls of white, non-Hispanic race. Statistical analysis was two staged: a screen using marginal Bayes factors (BFs) for 484 SNPs and a modeling stage in which we calculated multivariate adjusted posterior probabilities of association for 77 SNPs that passed the screen. These probabilities were conditional on subject age at diagnosis/interview, batch, a DNA quality metric and genotypes of other SNPs and allowed for uncertainty in the genetic parameterizations of the SNPs and number of associated SNPs. Six SNPs had Bayes factors greater than 10 in favor of an association with invasive serous ovarian cancer. These included rs5762746 (median OR(odds ratio)(per allele) = 0.66; 95% credible interval (CI) = 0.44-1.00) and rs6005835 (median OR(per allele) = 0.69; 95% CI = 0.53-0.91) in CHEK2, rs2078486 (median OR(per allele) = 1.65; 95% CI = 1.21-2.25) and rs12951053 (median OR(per allele) = 1.65; 95% CI = 1.20-2.26) in TP53, rs411697 (median OR (rare homozygote) = 0.53; 95% CI = 0.35 - 0.79) in BACH1 and rs10131 (median OR( rare homozygote) = not estimable) in LIG4. The six most highly associated SNPs are either predicted to be functionally significant or are in LD with such a variant. The variants in TP53 were confirmed to be associated in a large follow-up study. CONCLUSIONS/SIGNIFICANCE: Based on our findings, further follow-up of the DNA repair and response pathways in a larger dataset is warranted to confirm these results

    Bayesian Wavelet Shrinkage of the Haar-Fisz Transformed Wavelet Periodogram.

    Get PDF
    It is increasingly being realised that many real world time series are not stationary and exhibit evolving second-order autocovariance or spectral structure. This article introduces a Bayesian approach for modelling the evolving wavelet spectrum of a locally stationary wavelet time series. Our new method works by combining the advantages of a Haar-Fisz transformed spectrum with a simple, but powerful, Bayesian wavelet shrinkage method. Our new method produces excellent and stable spectral estimates and this is demonstrated via simulated data and on differenced infant electrocardiogram data. A major additional benefit of the Bayesian paradigm is that we obtain rigorous and useful credible intervals of the evolving spectral structure. We show how the Bayesian credible intervals provide extra insight into the infant electrocardiogram data

    Australian Library Job Advertisements: Seeking Inclusion and Diversity

    Get PDF
    A growing body of literature is drawing our attention to on diversity in librarianship, arguing for improved diversity through better recruitment, retention, and career advancement of minority professionals. While much of the discussion about diversity in libraries is taking place in United States, this article attempts to extend the discussion, bringing attention to diversity in Australian librarianship through analysis of Australian library job ads. This article uses content analysis of 96 Australian job ads posted from 22 January to 3 February 2018 in key Australian library job search engines. The analysis focuses on how diversity is reflected in these ads, with a content analysis of wording focused on inviting diversity in terms of ability/disability, ethnicity and language, and gender and sexuality

    IMPECCABLE: Integrated Modeling PipelinE for COVID Cure by Assessing Better LEads

    Get PDF
    The drug discovery process currently employed in the pharmaceutical industry typically requires about 10 years and $2–3 billion to deliver one new drug. This is both too expensive and too slow, especially in emergencies like the COVID-19 pandemic. In silico methodologies need to be improved both to select better lead compounds, so as to improve the efficiency of later stages in the drug discovery protocol, and to identify those lead compounds more quickly. No known methodological approach can deliver this combination of higher quality and speed. Here, we describe an Integrated Modeling PipEline for COVID Cure by Assessing Better LEads (IMPECCABLE) that employs multiple methodological innovations to overcome this fundamental limitation. We also describe the computational framework that we have developed to support these innovations at scale, and characterize the performance of this framework in terms of throughput, peak performance, and scientific results. We show that individual workflow components deliver 100 Γ— to 1000 Γ— improvement over traditional methods, and that the integration of methods, supported by scalable infrastructure, speeds up drug discovery by orders of magnitudes. IMPECCABLE has screened ∼ 1011 ligands and has been used to discover a promising drug candidate. These capabilities have been used by the US DOE National Virtual Biotechnology Laboratory and the EU Centre of Excellence in Computational Biomedicine
    • …
    corecore