34 research outputs found

    Revisiting Some Useful Statistical Guidelines in Circulation Research in Response to a Changing Landscape

    Get PDF
    In the 40 years since “Some Statistical Methods Useful in Circulation Research” was published, many of the same battery of statistical tests and concepts, such as t-tests, ANOVA, p-values, effect sizes, and standard errors, are still abundantly employed in hypothesis-driven research. Newer methods, too, have emerged to address the challenges of big data analysis. Some methods now routinely employed to extract insights from data include regression analysis, supervised and unsupervised machine learning for clustering, density estimation, and dimensionality reduction (e.g., viSNE), as well as prediction modeling and enrichment analyses. Additionally, in basic science research, it is now common to encounter hypothesis-free analyses, in marked contrast to traditional statistical analyses that begin with an explicit hypothesis. To encourage reproducibility, rigor, interpretability, and transparency, many editorial teams, including those of Circulation Research and the AHA journals, have developed statistical guidelines for authors. Given the rapidly changing data landscape, such guidelines must extend beyond “What statistical test should I use?” (a question that often can be addressed by a decision tree diagram in applied statistical analysis textbooks), to address higher-level challenges that frequently face authors including multiple testing, standards of reporting, robustness to violations of assumptions, and the limitations of conventional measures of significance. To better support authors and readers, we have assembled some topics that warrant particular attention in basic and clinical scientific publications such as those published in Circulation Research. These guidelines are intended to complement those outlined by the American Heart Association’s Statistical Taskforce in their concurrent “Guidelines for Statistical Reporting in Cardiovascular Medicine: A Special Report from the American Heart Association”

    Novel diabetes gene discovery through comprehensive characterization and integrative analysis of longitudinal gene expression changes

    Get PDF
    Type 2 diabetes is a complex, systemic disease affected by both genetic and environmental factors. Previous research has identified genetic variants associated with type 2 diabetes risk; however, gene regulatory changes underlying progression to metabolic dysfunction are still largely unknown. We investigated RNA expression changes that occur during diabetes progression using a two-stage approach. In our discovery stage, we compared changes in gene expression using two longitudinally collected blood samples from subjects whose fasting blood glucose transitioned to a level consistent with type 2 diabetes diagnosis between the time points against those who did not with a novel analytical network approach. Our network methodology identified 17 networks, one of which was significantly associated with transition status. This 822-gene network harbors many genes novel to the type 2 diabetes literature but is also significantly enriched for genes previously associated with type 2 diabetes. In the validation stage, we queried associations of genetically determined expression with diabetes-related traits in a large biobank with linked electronic health records. We observed a significant enrichment of genes in our identified network whose genetically determined expression is associated with type 2 diabetes and other metabolic traits and validated 31 genes that are not near previously reported type 2 diabetes loci. Finally, we provide additional functional support, which suggests that the genes in this network are regulated by enhancers that operate in human pancreatic islet cells. We present an innovative and systematic approach that identified and validated key gene expression changes associated with type 2 diabetes transition status and demonstrated their translational relevance in a large clinical resource

    IMMerge: merging imputation data at scale

    Get PDF
    SUMMARY: Genomic data are often processed in batches and analyzed together to save time. However, it is challenging to combine multiple large VCFs and properly handle imputation quality and missing variants due to the limitations of available tools. To address these concerns, we developed IMMerge, a Python-based tool that takes advantage of multiprocessing to reduce running time. For the first time in a publicly available tool, imputation quality scores are correctly combined with Fisher's z transformation. AVAILABILITY AND IMPLEMENTATION: IMMerge is an open-source project under MIT license. Source code and user manual are available at https://github.com/belowlab/IMMerge

    Population-based genetic effects for developmental stuttering

    Get PDF
    Despite a lifetime prevalence of at least 5%, developmental stuttering, characterized by prolongations, blocks, and repetitions of speech sounds, remains a largely idiopathic speech disorder. Family, twin, and segregation studies overwhelmingly support a strong genetic influence on stuttering risk; however, its complex mode of inheritance combined with thus-far underpowered genetic studies contribute to the challenge of identifying and reproducing genes implicated in developmental stuttering susceptibility. We conducted a trans-ancestry genome-wide association study (GWAS) and meta-analysis of developmental stuttering in two primary datasets: The International Stuttering Project comprising 1,345 clinically ascertained cases from multiple global sites and 6,759 matched population controls from the biobank at Vanderbilt University Medical Center (VUMC), and 785 self-reported stuttering cases and 7,572 controls ascertained from The National Longitudinal Study of Adolescent to Adult Health (Add Health). Meta-analysis of these genome-wide association studies identified a genome-wide significant (GWS) signal for clinically reported developmental stuttering in the general population: a protective variant in the intronic or genic upstream region of SSUH2 (rs113284510, protective allele frequency = 7.49%, Z = −5.576, p = 2.46 × 10−8) that acts as an expression quantitative trait locus (eQTL) in esophagus-muscularis tissue by reducing its gene expression. In addition, we identified 15 loci reaching suggestive significance (p < 5 × 10−6). This foundational population-based genetic study of a common speech disorder reports the findings of a clinically ascertained study of developmental stuttering and highlights the need for further research

    Functionally oriented analysis of cardiometabolic traits in a trans-ethnic sample

    Get PDF
    Interpretation of genetic association results is difficult because signals often lack biological context. To generate hypotheses of the functional genetic etiology of complex cardiometabolic traits, we estimated the genetically determined component of gene expression from common variants using PrediXcan (1) and determined genes with differential predicted expression by trait. PrediXcan imputes tissue-specific expression levels from genetic variation using variant-level effect on gene expression in transcriptome data. To explore the value of imputed genetically regulated gene expression (GReX) models across different ancestral populations, we evaluated imputed expression levels for predictive accuracy genome-wide in RNA sequence data in samples drawn from European-Ancestry and African-Ancestry populations and identified substantial predictive power using European-derived models in a non-European target population.We then tested the association of GReX on 15 cardiometabolic traits including blood lipid levels, body mass index, height, blood pressure, fasting glucose and insulin, RR interval, fibrinogen level, factor VII level and white blood cell and platelet counts in 15 755 individuals across three ancestry groups, resulting in 20 novel gene-phenotype associations reaching experiment-wide significance across ancestries. In addition, we identified 18 significant novel gene-phenotype associations in our ancestry-specific analyses. Top associations were assessed for additional support via query of S-PrediXcan (2) results derived from publicly available genome-wide association studies summary data. Collectively, these findings illustrate the utility of transcriptome-based imputation models for discovery of cardiometabolic effect genes in a diverse dataset

    Multi-ethnic GWAS and fine-mapping of glycaemic traits identify novel loci in the PAGE Study

    Get PDF
    Aims/hypothesis: Type 2 diabetes is a growing global public health challenge. Investigating quantitative traits, including fasting glucose, fasting insulin and HbA1c, that serve as early markers of type 2 diabetes progression may lead to a deeper understanding of the genetic aetiology of type 2 diabetes development. Previous genome-wide association studies (GWAS) have identified over 500 loci associated with type 2 diabetes, glycaemic traits and insulin-related traits. However, most of these findings were based only on populations of European ancestry. To address this research gap, we examined the genetic basis of fasting glucose, fasting insulin and HbA1c in participants of the diverse Population Architecture using Genomics and Epidemiology (PAGE) Study. Methods: We conducted a GWAS of fasting glucose (n = 52,267), fasting insulin (n = 48,395) and HbA1c (n = 23,357) in participants without diabetes from the diverse PAGE Study (23% self-reported African American, 46% Hispanic/Latino, 40% European, 4% Asian, 3% Native Hawaiian, 0.8% Native American), performing transethnic and population-specific GWAS meta-analyses, followed by fine-mapping to identify and characterise novel loci and independent secondary signals in known loci. Results: Four novel associations were identified (p < 5 × 10−9), including three loci associated with fasting insulin, and a novel, low-frequency African American-specific locus associated with fasting glucose. Additionally, seven secondary signals were identified, including novel independent secondary signals for fasting glucose at the known GCK locus and for fasting insulin at the known PPP1R3B locus in transethnic meta-analysis. Conclusions/interpretation: Our findings provide new insights into the genetic architecture of glycaemic traits and highlight the continued importance of conducting genetic studies in diverse populations. Data availability: Full summary statistics from each of the population-specific and transethnic results are available at NHGRI-EBI GWAS catalog (https://www.ebi.ac.uk/gwas/downloads/summary-statistics)

    GWAS of QRS duration identifies new loci specific to Hispanic/Latino populations

    Get PDF
    Background The electrocardiographically quantified QRS duration measures ventricular depolarization and conduction. QRS prolongation has been associated with poor heart failure prognosis and cardiovascular mortality, including sudden death. While previous genome-wide association studies (GWAS) have identified 32 QRS SNPs across 26 loci among European, African, and Asian-descent populations, the genetics of QRS among Hispanics/Latinos has not been previously explored. Methods We performed a GWAS of QRS duration among Hispanic/Latino ancestry populations (n = 15,124) from four studies using 1000 Genomes imputed genotype data (adjusted for age, sex, global ancestry, clinical and study-specific covariates). Study-specific results were combined using fixed-effects, inverse variance-weighted meta-analysis. Results We identified six loci associated with QRS (P&lt;5x10-8), including two novel loci: MYOCD, a nuclear protein expressed in the heart, and SYT1, an integral membrane protein. The top SNP in the MYOCD locus, intronic SNP rs16946539, was found in Hispanics/Latinos with a minor allele frequency (MAF) of 0.04, but is monomorphic in European and African descent populations. The most significant QRS duration association was with intronic SNP rs3922344 (P = 1.19x10-24) in SCN5A/SCN10A. Three other previously identified loci, CDKN1A, VTI1A, and HAND1, also exceeded the GWAS significance threshold among Hispanics/Latinos. A total of 27 of 32 previously identified QRS duration SNPs were shown to generalize in Hispanics/Latinos. Conclusions Our QRS duration GWAS, the first in Hispanic/Latino populations, identified two new loci, underscoring the utility of extending large scale genomic studies to currently under-examined populations

    Evaluating the contribution of rare variants to type 2 diabetes and related traits using pedigrees

    Get PDF
    A major challenge in evaluating the contribution of rare variants to complex disease is identifying enough copies of the rare alleles to permit informative statistical analysis. To investigate the contribution of rare variants to the risk of type 2 diabetes (T2D) and related traits, we performed deep whole-genome analysis of 1,034 members of 20 large Mexican-American families with high prevalence of T2D. If rare variants of large effect accounted for much of the diabetes risk in these families, our experiment was powered to detect association. Using gene expression data on 21,677 transcripts for 643 pedigree members, we identified evidence for large-effect rare-variant cis-expression quantitative trait loci that could not be detected in population studies, validating our approach. However, we did not identify any rare variants of large effect associated with T2D, or the related traits of fasting glucose and insulin, suggesting that large-effect rare variants account for only a modest fraction of the genetic risk of these traits in this sample of families. Reliable identification of large-effect rare variants will require larger samples of extended pedigrees or different study designs that further enrich for such variants

    New insights into the genetic etiology of Alzheimer's disease and related dementias.

    Get PDF
    Characterization of the genetic landscape of Alzheimer's disease (AD) and related dementias (ADD) provides a unique opportunity for a better understanding of the associated pathophysiological processes. We performed a two-stage genome-wide association study totaling 111,326 clinically diagnosed/'proxy' AD cases and 677,663 controls. We found 75 risk loci, of which 42 were new at the time of analysis. Pathway enrichment analyses confirmed the involvement of amyloid/tau pathways and highlighted microglia implication. Gene prioritization in the new loci identified 31 genes that were suggestive of new genetically associated processes, including the tumor necrosis factor alpha pathway through the linear ubiquitin chain assembly complex. We also built a new genetic risk score associated with the risk of future AD/dementia or progression from mild cognitive impairment to AD/dementia. The improvement in prediction led to a 1.6- to 1.9-fold increase in AD risk from the lowest to the highest decile, in addition to effects of age and the APOE Δ4 allele
    corecore