27 research outputs found

    Revisiting Some Useful Statistical Guidelines in Circulation Research in Response to a Changing Landscape

    Get PDF
    In the 40 years since “Some Statistical Methods Useful in Circulation Research” was published, many of the same battery of statistical tests and concepts, such as t-tests, ANOVA, p-values, effect sizes, and standard errors, are still abundantly employed in hypothesis-driven research. Newer methods, too, have emerged to address the challenges of big data analysis. Some methods now routinely employed to extract insights from data include regression analysis, supervised and unsupervised machine learning for clustering, density estimation, and dimensionality reduction (e.g., viSNE), as well as prediction modeling and enrichment analyses. Additionally, in basic science research, it is now common to encounter hypothesis-free analyses, in marked contrast to traditional statistical analyses that begin with an explicit hypothesis. To encourage reproducibility, rigor, interpretability, and transparency, many editorial teams, including those of Circulation Research and the AHA journals, have developed statistical guidelines for authors. Given the rapidly changing data landscape, such guidelines must extend beyond “What statistical test should I use?” (a question that often can be addressed by a decision tree diagram in applied statistical analysis textbooks), to address higher-level challenges that frequently face authors including multiple testing, standards of reporting, robustness to violations of assumptions, and the limitations of conventional measures of significance. To better support authors and readers, we have assembled some topics that warrant particular attention in basic and clinical scientific publications such as those published in Circulation Research. These guidelines are intended to complement those outlined by the American Heart Association’s Statistical Taskforce in their concurrent “Guidelines for Statistical Reporting in Cardiovascular Medicine: A Special Report from the American Heart Association”

    Comparison of adaptive multiple phenotype association tests using summary statistics in genome-wide association studies

    Get PDF
    Genome-wide association studies have been successful mapping loci for individual phenotypes, but few studies have comprehensively interrogated evidence of shared genetic effects across multiple phenotypes simultaneously. Statistical methods have been proposed for analyzing multiple phenotypes using summary statistics, which enables studies of shared genetic effects while avoiding challenges associated with individual-level data sharing. Adaptive tests have been developed to maintain power against multiple alternative hypotheses because the most powerful single-alternative test depends on the underlying structure of the associations between the multiple phenotypes and a single nucleotide polymorphism (SNP). Here we compare the performance of six such adaptive tests: two adaptive sum of powered scores (aSPU) tests, the unified score association test (metaUSAT), the adaptive test in a mixed-models framework (mixAda) and two principal-component-based adaptive tests (PCAQ and PCO). Our simulations highlight practical challenges that arise when multivariate distributions of phenotypes do not satisfy assumptions of multivariate normality. Previous reports in this context focus on low minor allele count (MAC) and omit the aSPU test, which relies less than other methods on asymptotic and distributional assumptions. When these assumptions are not satisfied, particularly when MAC is low and/or phenotype covariance matrices are singular or nearly singular, aSPU better preserves type I error, sometimes at the cost of decreased power. We illustrate this trade-off with multiple phenotype analyses of six quantitative electrocardiogram traits in the Population Architecture using Genomics and Epidemiology (PAGE) study

    IMMerge: merging imputation data at scale

    Get PDF
    SUMMARY: Genomic data are often processed in batches and analyzed together to save time. However, it is challenging to combine multiple large VCFs and properly handle imputation quality and missing variants due to the limitations of available tools. To address these concerns, we developed IMMerge, a Python-based tool that takes advantage of multiprocessing to reduce running time. For the first time in a publicly available tool, imputation quality scores are correctly combined with Fisher's z transformation. AVAILABILITY AND IMPLEMENTATION: IMMerge is an open-source project under MIT license. Source code and user manual are available at https://github.com/belowlab/IMMerge

    Neonatal diabetes, gallbladder agenesis, duodenal atresia, and intestinal malrotation caused by a novel homozygous mutation in RFX6

    Get PDF
    Recently, bi-allelic mutations in the transcription factor RFX6 were described as the cause of a rare condition characterized by neonatal diabetes with pancreatic and biliary hypoplasia and duodenal/jejunal atresia. A male infant developed severe hyperglycemia (446mg/dL) within 24h of birth. Acute abdominal concerns by day five necessitated exploratory surgery that revealed duodenal atresia, gallbladder agenesis, annular pancreas and intestinal malrotation. He also exhibited chronic diarrhea and feeding intolerance, cholestatic jaundice, and subsequent liver failure. He died of sepsis at four months old while awaiting liver transplantation. The phenotype of neonatal diabetes with intestinal atresia and biliary agenesis clearly pointed to RFX6 as the causative gene; indeed, whole exome sequencing revealed a novel homozygous RFX6 mutation c.779A>C; p.Lys260Thr (K260T). This missense mutation also changes the consensus 5′ splice donor site before intron 7 and is thus predicted to cause disruption in splicing. Both parents, who were not known to be related, were heterozygous carriers. Targeted genetic testing based on consideration of phenotypic features may reveal a cause among the many genes now associated with heterogeneous forms of monogenic neonatal diabetes. Our study demonstrates the feasibility of using modern sequencing technology to identify one such rare cause. Continued research is needed to determine the possible cost-effectiveness of this approach, especially when clear phenotypic clues are absent. Further study of patients with RFX6 mutations should clarify its role in pancreatic, intestinal and enteroendocrine cellular development and explain features such as the diarrhea exhibited in our case

    Continued lessons from the INS gene: An intronic mutation causing diabetes through a novel mechanism

    Get PDF
    Background Diabetes in neonates usually has a monogenic aetiology; however, the cause remains unknown in 20-30%. Heterozygous INS mutations represent one of the most common gene causes of neonatal diabetes mellitus. Methods Clinical and functional characterisation of a novel homozygous intronic mutation (c.187+241G>A) in the insulin gene in a child identified through the Monogenic Diabetes Registry (http://monogenicdiabetes. uchicago.edu). Results The proband had insulin-requiring diabetes from birth. Ultrasonography revealed a structurally normal pancreas and C-peptide was undetectable despite readily detectable amylin, suggesting the presence of dysfunctional ß cells. Whole-exome sequencing revealed the novel mutation. In silico analysis predicted a mutant mRNA product resulting from preferential recognition of a newly created splice site. Wild-type and mutant human insulin gene constructs were derived and transiently expressed in INS-1 cells. We confirmed the predicted transcript and found an additional transcript created via an ectopic splice acceptor site. Conclusions Dominant INS mutations cause diabetes via a mutated translational product causing endoplasmic reticulum stress. We describe a novel mechanism of diabetes, without ß cell death, due to creation of two unstable mutant transcripts predicted to undergo nonsense and non-stop-mediated decay, respectively. Our discovery may have broader implications for those with insulin deficiency later in life

    Microcephaly, epilepsy, and neonatal diabetes due to compound heterozygous mutations in IER3IP1: Insights into the natural history of a rare disorder

    Get PDF
    Neonatal diabetes mellitus is known to have over 20 different monogenic causes. A syndrome of permanent neonatal diabetes along with primary microcephaly with simplified gyral pattern associated with severe infantile epileptic encephalopathy was recently described in two independent reports in which disease-causing homozygous mutations were identified in the immediate early response-3 interacting protein-1 (IER3IP1) gene. We report here an affected male born to a non-consanguineous couple who was noted to have insulin-requiring permanent neonatal diabetes, microcephaly, and generalized seizures. He was also found to have cortical blindness, severe developmental delay and numerous dysmorphic features. He experienced a slow improvement but not abrogation of seizure frequency and severity on numerous anti-epileptic agents. His clinical course was further complicated by recurrent respiratory tract infections and he died at 8years of age. Whole exome sequencing was performed on DNA from the proband and parents. He was found to be a compound heterozygote with two different mutations in IER3IP1: p.Val21Gly (V21G) and a novel frameshift mutation p.Phe27fsSer*25. IER3IP1 is a highly conserved protein with marked expression in the cerebral cortex and in beta cells. This is the first reported case of compound heterozygous mutations within IER3IP1 resulting in neonatal diabetes. The triad of microcephaly, generalized seizures, and permanent neonatal diabetes should prompt screening for mutations in IER3IP1. As mutations in genes such as NEUROD1 and PTF1A could cause a similar phenotype, next-generation sequencing approaches-such as exome sequencing reported here-may be an efficient means of uncovering a diagnosis in future cases

    Population-based genetic effects for developmental stuttering

    Get PDF
    Despite a lifetime prevalence of at least 5%, developmental stuttering, characterized by prolongations, blocks, and repetitions of speech sounds, remains a largely idiopathic speech disorder. Family, twin, and segregation studies overwhelmingly support a strong genetic influence on stuttering risk; however, its complex mode of inheritance combined with thus-far underpowered genetic studies contribute to the challenge of identifying and reproducing genes implicated in developmental stuttering susceptibility. We conducted a trans-ancestry genome-wide association study (GWAS) and meta-analysis of developmental stuttering in two primary datasets: The International Stuttering Project comprising 1,345 clinically ascertained cases from multiple global sites and 6,759 matched population controls from the biobank at Vanderbilt University Medical Center (VUMC), and 785 self-reported stuttering cases and 7,572 controls ascertained from The National Longitudinal Study of Adolescent to Adult Health (Add Health). Meta-analysis of these genome-wide association studies identified a genome-wide significant (GWS) signal for clinically reported developmental stuttering in the general population: a protective variant in the intronic or genic upstream region of SSUH2 (rs113284510, protective allele frequency = 7.49%, Z = −5.576, p = 2.46 × 10−8) that acts as an expression quantitative trait locus (eQTL) in esophagus-muscularis tissue by reducing its gene expression. In addition, we identified 15 loci reaching suggestive significance (p < 5 × 10−6). This foundational population-based genetic study of a common speech disorder reports the findings of a clinically ascertained study of developmental stuttering and highlights the need for further research

    Strengthening Causal Inference in Exposomics Research: Application of Genetic Data and Methods

    Get PDF
    Advances in technologies to measure a broad set of exposures have led to a range of exposome research efforts. Yet, these efforts have insufficiently integrated methods that incorporate genetic data to strengthen causal inference, despite evidence that many exposome-associated phenotypes are heritable. OBJECTIVE: We demonstrate how integration of methods and study designs that incorporate genetic data can strengthen causal inference in exposomics research by helping address six challenges: reverse causation and unmeasured confounding, comprehensive examination of phenotypic effects, low efficiency, replication, multilevel data integration, and characterization of tissue-specific effects. Examples are drawn from studies of biomarkers and health behaviors, exposure domains where the causal inference methods we describe are most often applied. DISCUSSION: Technological, computational, and statistical advances in genotyping, imputation, and analysis, combined with broad data sharing and cross-study collaborations, offer multiple opportunities to strengthen causal inference in exposomics research. Full application of these opportunities will require an expanded understanding of genetic variants that predict exposome phenotypes as well as an appreciation that the utility of genetic variants for causal inference will vary by exposure and may depend on large sample sizes. However, several of these challenges can be addressed through international scientific collaborations that prioritize data sharing. Ultimately, we anticipate that efforts to better integrate methods that incorporate genetic data will extend the reach of exposomics research by helping address the challenges of comprehensively measuring the exposome and its health effects across studies, the life course, and in varied contexts and diverse populations

    Natural selection of immune and metabolic genes associated with health in two lowland Bolivian populations

    Get PDF
    A growing body of work has addressed human adaptations to diverse environments using genomic data, but few studies have connected putatively selected alleles to phenotypes, much less among underrepresented populations such as Amerindians. Studies of natural selection and genotype-phenotype relationships in underrepresented populations hold potential to uncover previously undescribed loci underlying evolutionarily and biomedically relevant traits. Here, we worked with the Tsimane and the Moseten, two Amerindian populations inhabiting the Bolivian lowlands. We focused most intensively on the Tsimane, because long-term anthropological work with this group has shown that they have a high burden of both macro and microparasites, as well as minimal cardiometabolic disease or dementia. We therefore generated genome-wide genotype data for Tsimane individuals to study natural selection, and paired this with blood mRNA-seq as well as cardiometabolic and immune biomarker data generated from a larger sample that included both populations. In the Tsimane, we identified 21 regions that are candidates for selective sweeps, as well as 5 immune traits that show evidence for polygenic selection (e.g., C-reactive protein levels and the response to coronaviruses). Genes overlapping candidate regions were strongly enriched for known involvement in immune-related traits, such as abundance of lymphocytes and eosinophils. Importantly, we were also able to draw on extensive phenotype information for the Tsimane and Moseten and link five regions (containing PSD4, MUC21 and MUC22, TOX2, ANXA6, and ABCA1) with biomarkers of immune and metabolic function. Together, our work highlights the utility of pairing evolutionary analyses with anthropological and biomedical data to gain insight into the genetic basis of health-related traits

    Ancestry-specific associations identified in genome-wide combined-phenotype study of red blood cell traits emphasize benefits of diversity in genomics

    Get PDF
    Background: Quantitative red blood cell (RBC) traits are highly polygenic clinically relevant traits, with approximately 500 reported GWAS loci. The majority of RBC trait GWAS have been performed in European- or East Asian-ancestry populations, despite evidence that rare or ancestry-specific variation contributes substantially to RBC trait heritability. Recently developed combined-phenotype methods which leverage genetic trait correlation to improve statistical power have not yet been applied to these traits. Here we leveraged correlation of seven quantitative RBC traits in performing a combined-phenotype analysis in a multi-ethnic study population. Results: We used the adaptive sum of powered scores (aSPU) test to assess combined-phenotype associations between ~ 21 million SNPs and seven RBC traits in a multi-ethnic population (maximum n = 67,885 participants; 24% African American, 30% Hispanic/Latino, and 43% European American; 76% female). Thirty-nine loci in our multi-ethnic population contained at least one significant association signal (p 5%) across all ancestral populations. Nineteen additional independent association signals were identified at seven known loci (HFE, KIT, HBS1L/MYB, CITED2/FILNC1, ABO, HBA1/2, and PLIN4/5). For example, the HBA1/2 locus contained 14 conditionally independent association signals, 11 of which were previously unreported and are specific to African and Amerindian ancestries. One variant in this region was common in all ancestries, but exhibited a narrower LD block in African Americans than European Americans or Hispanics/Latinos. GTEx eQTL analysis of all independent lead SNPs yielded 31 significant associations in relevant tissues, over half of which were not at the gene immediately proximal to the lead SNP. Conclusion: This work identified seven loci containing multiple independent association signals for RBC traits using a combined-phenotype approach, which may improve discovery in genetically correlated traits. Highly complex genetic architecture at the HBA1/2 locus was only revealed by the inclusion of African Americans and Hispanics/Latinos, underscoring the continued importance of expanding large GWAS to include ancestrally diverse populations. © 2020 The Author(s)
    corecore