16 research outputs found

    Revisiting Some Useful Statistical Guidelines in Circulation Research in Response to a Changing Landscape

    Get PDF
    In the 40 years since “Some Statistical Methods Useful in Circulation Research” was published, many of the same battery of statistical tests and concepts, such as t-tests, ANOVA, p-values, effect sizes, and standard errors, are still abundantly employed in hypothesis-driven research. Newer methods, too, have emerged to address the challenges of big data analysis. Some methods now routinely employed to extract insights from data include regression analysis, supervised and unsupervised machine learning for clustering, density estimation, and dimensionality reduction (e.g., viSNE), as well as prediction modeling and enrichment analyses. Additionally, in basic science research, it is now common to encounter hypothesis-free analyses, in marked contrast to traditional statistical analyses that begin with an explicit hypothesis. To encourage reproducibility, rigor, interpretability, and transparency, many editorial teams, including those of Circulation Research and the AHA journals, have developed statistical guidelines for authors. Given the rapidly changing data landscape, such guidelines must extend beyond “What statistical test should I use?” (a question that often can be addressed by a decision tree diagram in applied statistical analysis textbooks), to address higher-level challenges that frequently face authors including multiple testing, standards of reporting, robustness to violations of assumptions, and the limitations of conventional measures of significance. To better support authors and readers, we have assembled some topics that warrant particular attention in basic and clinical scientific publications such as those published in Circulation Research. These guidelines are intended to complement those outlined by the American Heart Association’s Statistical Taskforce in their concurrent “Guidelines for Statistical Reporting in Cardiovascular Medicine: A Special Report from the American Heart Association”

    Novel diabetes gene discovery through comprehensive characterization and integrative analysis of longitudinal gene expression changes

    Get PDF
    Type 2 diabetes is a complex, systemic disease affected by both genetic and environmental factors. Previous research has identified genetic variants associated with type 2 diabetes risk; however, gene regulatory changes underlying progression to metabolic dysfunction are still largely unknown. We investigated RNA expression changes that occur during diabetes progression using a two-stage approach. In our discovery stage, we compared changes in gene expression using two longitudinally collected blood samples from subjects whose fasting blood glucose transitioned to a level consistent with type 2 diabetes diagnosis between the time points against those who did not with a novel analytical network approach. Our network methodology identified 17 networks, one of which was significantly associated with transition status. This 822-gene network harbors many genes novel to the type 2 diabetes literature but is also significantly enriched for genes previously associated with type 2 diabetes. In the validation stage, we queried associations of genetically determined expression with diabetes-related traits in a large biobank with linked electronic health records. We observed a significant enrichment of genes in our identified network whose genetically determined expression is associated with type 2 diabetes and other metabolic traits and validated 31 genes that are not near previously reported type 2 diabetes loci. Finally, we provide additional functional support, which suggests that the genes in this network are regulated by enhancers that operate in human pancreatic islet cells. We present an innovative and systematic approach that identified and validated key gene expression changes associated with type 2 diabetes transition status and demonstrated their translational relevance in a large clinical resource

    IMMerge: merging imputation data at scale

    Get PDF
    SUMMARY: Genomic data are often processed in batches and analyzed together to save time. However, it is challenging to combine multiple large VCFs and properly handle imputation quality and missing variants due to the limitations of available tools. To address these concerns, we developed IMMerge, a Python-based tool that takes advantage of multiprocessing to reduce running time. For the first time in a publicly available tool, imputation quality scores are correctly combined with Fisher's z transformation. AVAILABILITY AND IMPLEMENTATION: IMMerge is an open-source project under MIT license. Source code and user manual are available at https://github.com/belowlab/IMMerge

    Comparison of Breast Cancer Molecular Features and Survival by African and European Ancestry in The Cancer Genome Atlas

    Get PDF
    Importance: African Americans have the highest breast cancer mortality rate. Although racial difference in the distribution of intrinsic subtypes of breast cancer is known, it is unclear if there are other inherent genomic differences that contribute to the survival disparities. Objectives: To investigate racial differences in breast cancer molecular features and survival and to estimate the heritability of breast cancer subtypes. Design, Setting, and Participants: Among a convenience cohort of patients with invasive breast cancer, breast tumor and matched normal tissue sample data (as of September 18, 2015) were obtained from The Cancer Genome Atlas. Main Outcomes and Measures: Breast cancer–free interval, tumor molecular features, and genetic variants. Results: Participants were 930 patients with breast cancer, including 154 black patients of African ancestry (mean [SD] age at diagnosis, 55.66 [13.01] years; 98.1% [n = 151] female) and 776 white patients of European ancestry (mean [SD] age at diagnosis, 59.51 [13.11] years; 99.0% [n = 768] female). Compared with white patients, black patients had a worse breast cancer-free interval (hazard ratio, HR=1.67; 95% CI, 1.02-2.74; P = .043). They had a higher likelihood of basal-like (odds ratio, 3.80; 95% CI, 2.46-5.87; P < .001) and human epidermal growth factor receptor 2 (ERBB2 [formerly HER2])–enriched (odds ratio, 2.22; 95% CI, 1.10-4.47; P = .027) breast cancer subtypes, with the Luminal A subtype as the reference. Blacks had more TP53 mutations and fewer PIK3CA mutations than whites. While most molecular differences were eliminated after adjusting for intrinsic subtype, the study found 16 DNA methylation probes, 4 DNA copy number segments, 1 protein, and 142 genes that were differentially expressed, with the gene-based signature having an excellent capacity for distinguishing breast tumors from black vs white patients (cross-validation C index, 0.878). Using germline genotypes, the heritability of breast cancer subtypes (basal vs nonbasal) was estimated to be 0.436 (P = 1.5 × 10−14). The estrogen receptor–positive polygenic risk score built from 89 known susceptibility variants was higher in blacks than in whites (difference, 0.24; P = 2.3 × 10−5), while the estrogen receptor–negative polygenic risk score was much higher in blacks than in whites (difference, 0.48; P = 2.8 × 10−11). Conclusions and Relevance: On the molecular level, after adjusting for intrinsic subtype frequency differences, this study found a modest number of genomic differences but a significant clinical survival outcome difference between blacks and whites in The Cancer Genome Atlas data set. Moreover, more than 40% of breast cancer subtype frequency differences could be explained by genetic variants. These data could form the basis for the development of molecular targeted therapies to improve clinical outcomes for the specific subtypes of breast cancers that disproportionately affect black women. Findings also indicate that personalized risk assessment and optimal treatment could reduce deaths from aggressive breast cancers for black women

    Functionally oriented analysis of cardiometabolic traits in a trans-ethnic sample

    Get PDF
    Interpretation of genetic association results is difficult because signals often lack biological context. To generate hypotheses of the functional genetic etiology of complex cardiometabolic traits, we estimated the genetically determined component of gene expression from common variants using PrediXcan (1) and determined genes with differential predicted expression by trait. PrediXcan imputes tissue-specific expression levels from genetic variation using variant-level effect on gene expression in transcriptome data. To explore the value of imputed genetically regulated gene expression (GReX) models across different ancestral populations, we evaluated imputed expression levels for predictive accuracy genome-wide in RNA sequence data in samples drawn from European-Ancestry and African-Ancestry populations and identified substantial predictive power using European-derived models in a non-European target population.We then tested the association of GReX on 15 cardiometabolic traits including blood lipid levels, body mass index, height, blood pressure, fasting glucose and insulin, RR interval, fibrinogen level, factor VII level and white blood cell and platelet counts in 15 755 individuals across three ancestry groups, resulting in 20 novel gene-phenotype associations reaching experiment-wide significance across ancestries. In addition, we identified 18 significant novel gene-phenotype associations in our ancestry-specific analyses. Top associations were assessed for additional support via query of S-PrediXcan (2) results derived from publicly available genome-wide association studies summary data. Collectively, these findings illustrate the utility of transcriptome-based imputation models for discovery of cardiometabolic effect genes in a diverse dataset

    The genetic architecture of neuropsychiatric traits: mechanism, polygenicity, and genome function

    Get PDF
    Recent progress in genome science has enabled advances in our understanding of the molecular basis underlying susceptibility to a broad spectrum of complex traits, including neuropsychiatric disorders. Methodologically, genome-wide association studies have been remarkably successful in identifying trait-associated variation, but the ever-increasing repository of reproducible genetic associations has highlighted an important gap, namely crucial insight into the underlying mechanisms for most of the identified genetic loci. Here we develop analytic methods that improve on existing approaches and present molecular data that may serve as intermediate phenotypes to higher-order clinical traits with the goal of helping to dissect, more precisely, the functional consequences of the discovered variants. We explore the challenges (methodological and translational) that emerge from the recognition of the polygenicity of neuropsychiatric disease predisposition and the implications on the types of analyses that can advance our understanding of the genetic architecture of neuropsychiatric disorders. Finally, we present research aimed at leveraging our current understanding of regulatory variation in the genome and developing an integrative approach to systematize this knowledge into elucidating the cellular and biological consequences of trait-associated variation

    Post-GWAS analysis of six substance use traits improves the identification and functional interpretation of genetic risk loci

    No full text
    Background: Little is known about the functional mechanisms through which genetic loci associated with substance use traits ascertain their effect. This study aims to identify and functionally annotate loci associated with substance use traits based on their role in genetic regulation of gene expression. Methods: We evaluated expression Quantitative Trait Loci (eQTLs) from 13 brain regions and whole blood of the Genotype-Tissue Expression (GTEx) database, and from whole blood of the Depression Genes and Networks (DGN) database. The role of single eQTLs was examined for six substance use traits: alcohol consumption (N = 537,349), cigarettes per day (CPD; N = 263,954), former vs. current smoker (N = 312,821), age of smoking initiation (N = 262,990), ever smoker (N = 632,802), and cocaine dependence (N = 4,769). Subsequently, we conducted a gene level analysis of gene expression on these substance use traits using S-PrediXcan. Results: Using an FDR-adjusted p-value < 0.05 we found 2,976 novel candidate genetic loci for substance use traits, and identified genes and tissues through which these loci potentially exert their effects. Using S-PrediXcan, we identified significantly associated genes for all substance traits. Discussion: Annotating genes based on transcriptomic regulation improves the identification and functional characterization of candidate loci and genes for substance use traits

    Author Correction: Genetic architecture of host proteins involved in SARS-CoV-2 infection.

    No full text
    The original version of this Article cited “Mehra, M. R., Desai, S. S., Kuy, S., Henry, T. D. &amp; Patel, A. N. Cardiovascular disease, drug therapy, and mortality in Covid-19. N. Engl. J. Med. 382, e102 (2020)” as Ref. 20. The cited paper was retracted; accordingly, Ref. 20 has been replaced with "Grasselli G et al. Risk factors associated with mortality among patients with COVID-19 in intensive care units in Lombardy, Italy. JAMA Intern. Med. 180, 1345–1355 (2020)”. This has been corrected in the PDF and HTML versions of the article
    corecore