Search CORE

318 research outputs found

Alternative contingency table measures improve the power and detection of multifactor dimensionality reduction

Author: Bush William S
Dudek Scott M
Edwards Todd L
McKinney Brett A
Ritchie Marylyn D
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background Multifactor Dimensionality Reduction (MDR) has been introduced previously as a non-parametric statistical method for detecting gene-gene interactions. MDR performs a dimensional reduction by assigning multi-locus genotypes to either high- or low-risk groups and measuring the percentage of cases and controls incorrectly labelled by this classification – the classification error. The combination of variables that produces the lowest classification error is selected as the best or most fit model. The correctly and incorrectly labelled cases and controls can be expressed as a two-way contingency table. We sought to improve the ability of MDR to detect gene-gene interactions by replacing classification error with a different measure to score model quality. Results In this study, we compare the detection and power of MDR using a variety of measures for two-way contingency table analysis. We simulated 40 genetic models, varying the number of disease loci in the model (2 – 5), allele frequencies of the disease loci (.2/.8 or .4/.6) and the broad-sense heritability of the model (.05 – .3). Overall, detection using NMI was 65.36% across all models, and specific detection was 59.4% versus detection using classification error at 62% and specific detection was 52.2%. Conclusion Of the 10 measures evaluated, the likelihood ratio and normalized mutual information (NMI) are measures that consistently improve the detection and power of MDR in simulated data over using classification error. These measures also reduce the inclusion of spurious variables in a multi-locus model. Thus, MDR, which has already been demonstrated as a powerful tool for detecting gene-gene interactions, can be improved with the use of alternative fitness functions.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Synthesis-View: visualization and interpretation of SNP association results for multi-cohort, multi-phenotype data and meta-analysis

Author: CJ Willer
Dana C Crawford
JC Barrett
Marylyn D Ritchie
Sarah A Pendergrass
Scott M Dudek
WS Bush
Publication venue: BioMed Central
Publication date: 01/12/2010
Field of study

Abstract Background Initial genome-wide association study (GWAS) discoveries are being further explored through the use of large cohorts across multiple and diverse populations involving meta-analyses within large consortia and networks. Many of the additional studies characterize less than 100 single nucleotide polymorphisms (SNPs), often include multiple and correlated phenotypic measurements, and can include data from multiple-sites, multiple-studies, as well as multiple race/ethnicities. New approaches for visualizing resultant data are necessary in order to fully interpret results and obtain a broad view of the trends between DNA variation and phenotypes, as well as provide information on specific SNP and phenotype relationships. Results The Synthesis-View software tool was designed to visually synthesize the results of the aforementioned types of studies. Presented herein are multiple examples of the ways Synthesis-View can be used to report results from association studies of DNA variation and phenotypes, including the visual integration of p-values or other metrics of significance, allele frequencies, sample sizes, effect size, and direction of effect. Conclusions To truly allow a user to visually integrate multiple pieces of information typical of a genetic association study, innovative views are needed to integrate multiple pieces of information. As a result, we have created "Synthesis-View" software for the visualization of genotype-phenotype association data in multiple cohorts. Synthesis-View is freely available for non-commercial research institutions, for full details see <url>https://chgr.mc.vanderbilt.edu/synthesisview</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The effects of linkage disequilibrium in large scale SNP datasets for MDR

Author: Benjamin J Grady
Eric S Torstenson
FJ Richards
JK Yang
LR Cardon
M Slatkin
Marylyn D Ritchie
MD Ritchie
MD Ritchie
P Gasso
S Dudek
SD Turner
SH Nordgard
SM Naushad
TL Edwards
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background In the analysis of large-scale genomic datasets, an important consideration is the power of analytical methods to identify accurate predictive models of disease. When trying to assess sensitivity from such analytical methods, a confounding factor up to this point has been the presence of linkage disequilibrium (LD). In this study, we examined the effect of LD on the sensitivity of the Multifactor Dimensionality Reduction (MDR) software package. Results Four relative amounts of LD were simulated in multiple one- and two-locus scenarios for which the position of the functional SNP(s) within LD blocks varied. Simulated data was analyzed with MDR to determine the sensitivity of the method in different contexts, where the sensitivity of the method was gauged as the number of times out of 100 that the method identifies the correct one- or two-locus model as the best overall model. As the amount of LD increases, the sensitivity of MDR to detect the correct functional SNP drops but the sensitivity to detect the disease signal and find an indirect association increases. Conclusions Higher levels of LD begin to confound the MDR algorithm and lead to a drop in sensitivity with respect to the identification of a direct association; it does not, however, affect the ability to detect indirect association. Careful examination of the solution models generated by MDR reveals that MDR can identify loci in the correct LD block; though it is not always the functional SNP. As such, the results of MDR analysis in datasets with LD should be carefully examined to consider the underlying LD structure of the dataset.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Recommended from our members

An application of conditional logistic regression and multifactor dimensionality reduction for detecting gene-gene Interactions on risk of myocardial infarction: The importance of model validation

Author: Brown Nancy J
Coffey Christopher S
Gaziano J Michael
Hebert Patricia R
Krumholz Harlan M
Moore Jason H
Ridker Paul M
Ritchie Marylyn D
Vaughan Douglas E
Publication venue: BioMed Central
Publication date: 30/04/2004
Field of study

BACKGROUND: To examine interactions among the angiotensin converting enzyme (ACE) insertion/deletion, plasminogen activator inhibitor-1 (PAI-1) 4G/5G, and tissue plasminogen activator (t-PA) insertion/deletion gene polymorphisms on risk of myocardial infarction using data from 343 matched case-control pairs from the Physicians Health Study. We examined the data using both conditional logistic regression and the multifactor dimensionality reduction (MDR) method. One advantage of the MDR method is that it provides an internal prediction error for validation. We summarize our use of this internal prediction error for model validation. RESULTS: The overall results for the two methods were consistent, with both suggesting an interaction between the ACE I/D and PAI-1 4G/5G polymorphisms. However, using ten-fold cross validation, the 46% prediction error for the final MDR model was not significantly lower than that expected by chance. CONCLUSIONS: The significant interaction initially observed does not validate and may represent a type I error. As data-driven analytic methods continue to be developed and used to examine complex genetic interactions, it will become increasingly important to stress model validation in order to ensure that significant effects represent true relationships rather than chance findings

Harvard University - DASH

Springer - Publisher Connector

PubMed Central

Tamkang University Institutional Repository

FAM-MDR: A Flexible Family-Based Multifactor Dimensionality Reduction Technique to Detect Epistasis Using Related Individuals

Author: Adam C. Naj
Haiqing Shen
Jestinah M. Mahachie John
Kristel Van Steen
Lizzy De Lobel
M. Luz Calle
Mao Fu
Marylyn D. Ritchie
Todd L. Edwards
Tom Cattaert
Vanessa De Wit
Víctor Urrea
Zoltán Bochdanovits
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

We propose a novel multifactor dimensionality reduction method for epistasis detection in small or extended pedigrees, FAM-MDR. It combines features of the Genome-wide Rapid Association using Mixed Model And Regression approach (GRAMMAR) with Model-Based MDR (MB-MDR). We focus on continuous traits, although the method is general and can be used for outcomes of any type, including binary and censored traits. When comparing FAM-MDR with Pedigree-based Generalized MDR (PGMDR), which is a generalization of Multifactor Dimensionality Reduction (MDR) to continuous traits and related individuals, FAM-MDR was found to outperform PGMDR in terms of power, in most of the considered simulated scenarios. Additional simulations revealed that PGMDR does not appropriately deal with multiple testing and consequently gives rise to overly optimistic results. FAM-MDR adequately deals with multiple testing in epistasis screens and is in contrast rather conservative, by construction. Furthermore, simulations show that correcting for lower order (main) effects is of utmost importance when claiming epistasis. As Type 2 Diabetes Mellitus (T2DM) is a complex phenotype likely influenced by gene-gene interactions, we applied FAM-MDR to examine data on glucose area-under-the-curve (GAUC), an endophenotype of T2DM for which multiple independent genetic associations have been observed, in the Amish Family Diabetes Study (AFDS). This application reveals that FAM-MDR makes more efficient use of the available data than PGMDR and can deal with multi-generational pedigrees more easily. In conclusion, we have validated FAM-MDR and compared it to PGMDR, the current state-of-the-art MDR method for family data, using both simulations and a practical dataset. FAM-MDR is found to outperform PGMDR in that it handles the multiple testing issue more correctly, has increased power, and efficiently uses all available information

Lirias

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Ghent University Academic Bibliography

Directory of Open Access Journals

PubMed Central

Open Repository and Bibliography - Liège

RIUVic

Diverse Convergent Evidence in the Genetic Analysis of Complex Disease: Coordinating Omic, Informatic, and Experimental Evidence to Better Identify and Validate Risk Factors

Author: Amos Christopher I
Bartlett Jacquelaine
Ciesielski Timothy H
Gui Jiang
Huang Minjun
Kodaman Nuri
Li Jing
Moore Jason H
Pan Qinxin
Pendergrass Sarah A
Ritchie Marylyn D
Selleck Scott B
Sobota Rafal S
White Marquitta J
Williams Scott M
Publication venue: Dartmouth Digital Commons
Publication date: 01/01/2014
Field of study

In omic research, such as genome wide association studies, researchers seek to repeat their results in other datasets to reduce false positive findings and thus provide evidence for the existence of true associations. Unfortunately this standard validation approach cannot completely eliminate false positive conclusions, and it can also mask many true associations that might otherwise advance our understanding of pathology. These issues beg the question: How can we increase the amount of knowledge gained from high throughput genetic data? To address this challenge, we present an approach that complements standard statistical validation methods by drawing attention to both potential false negative and false positive conclusions, as well as providing broad information for directing future research. The Diverse Convergent Evidence approach (DiCE) we propose integrates information from multiple sources (omics, informatics, and laboratory experiments) to estimate the strength of the available corroborating evidence supporting a given association. This process is designed to yield an evidence metric that has utility when etiologic heterogeneity, variable risk factor frequencies, and a variety of observational data imperfections might lead to false conclusions. We provide proof of principle examples in which DiCE identified strong evidence for associations that have established biological importance, when standard validation methods alone did not provide support. If used as an adjunct to standard validation methods this approach can leverage multiple distinct data types to improve genetic risk factor discovery/validation, promote effective science communication, and guide future research directions

Springer - Publisher Connector

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

Novel EDGE encoding method enhances ability to identify genetic interactions

Author: Bradford Yuki
Brilliant Murray
Hakonarson Hakon
Hall Molly A.
Harley John B.
Jiang Beibei
Kiryluk Krzysztof
Lucas Anastasia M.
McGuigan John
Moore Jason H.
Mueller-Myhsok Bertram
Passero Kristin
Peissig Peggy
Pendergrass Sarah A.
Ritchie Marylyn D.
Sleiman Patrick
Van Steen Kristel
Verma Shefali S.
Wallace John
Zhang Yanfei
Zhou Jiayan
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2021
Field of study

Assumptions are made about the genetic model of single nucleotide polymorphisms (SNPs) when choosing a traditional genetic encoding: additive, dominant, and recessive. Furthermore, SNPs across the genome are unlikely to demonstrate identical genetic models. However, running SNP-SNP interaction analyses with every combination of encodings raises the multiple testing burden. Here, we present a novel and flexible encoding for genetic interactions, the elastic data-driven genetic encoding (EDGE), in which SNPs are assigned a heterozygous value based on the genetic model they demonstrate in a dataset prior to interaction testing. We assessed the power of EDGE to detect genetic interactions using 29 combinations of simulated genetic models and found it outperformed the traditional encoding methods across 10%, 30%, and 50% minor allele frequencies (MAFs). Further, EDGE maintained a low false-positive rate, while additive and dominant encodings demonstrated inflation. We evaluated EDGE and the traditional encodings with genetic data from the Electronic Medical Records and Genomics (eMERGE) Network for five phenotypes: age-related macular degeneration (AMD), age-related cataract, glaucoma, type 2 diabetes (T2D), and resistant hypertension. A multi-encoding genome-wide association study (GWAS) for each phenotype was performed using the traditional encodings, and the top results of the multi-encoding GWAS were considered for SNP-SNP interaction using the traditional encodings and EDGE. EDGE identified a novel SNP-SNP interaction for age-related cataract that no other method identified: rs7787286 (MAF: 0.041;intergenic region of chromosome 7)-rs4695885 (MAF: 0.34;intergenic region of chromosome 4) with a Bonferroni LRT p of 0.018. A SNP-SNP interaction was found in data from the UK Biobank within 25 kb of these SNPs using the recessive encoding: rs60374751 (MAF: 0.030) and rs6843594 (MAF: 0.34) (Bonferroni LRT p: 0.026). We recommend using EDGE to flexibly detect interactions between SNPs exhibiting diverse action. Author summary Although traditional genetic encodings are widely implemented in genetics research, including in genome-wide association studies (GWAS) and epistasis, each method makes assumptions that may not reflect the underlying etiology. Here, we introduce a novel encoding method that estimates and assigns an individualized data-driven encoding for each single nucleotide polymorphism (SNP): the elastic data-driven genetic encoding (EDGE). With simulations, we demonstrate that this novel method is more accurate and robust than traditional encoding methods in estimating heterozygous genotype values, reducing the type I error, and detecting SNP-SNP interactions. We further applied the traditional encodings and EDGE to biomedical data from the Electronic Medical Records and Genomics (eMERGE) Network for five phenotypes, and EDGE identified a novel interaction for age-related cataract not detected by traditional methods, which replicated in data from the UK Biobank. EDGE provides an alternative approach to understanding and modeling diverse SNP models and is recommended for studying complex genetics in common human phenotypes

Open Access LMU

LD-Spline: Mapping SNPs on genotyping platforms to genomic regions using patterns of linkage disequilibrium

Author: A Torkamani
B Devlin
BS Weir
C Li
C Schmegner
CS Carlson
D Fallin
Eric S Torstenson
Guanhua Chen
IB Borecki
International HapMap Consortium
J Aubert
J Cohen
JC Barrett
JP Lewinger
K Hao
M Slatkin
MA Eberle
MA Eberle
MA Province
Marylyn D Ritchie
N Wang
NE Morton
PC Sabeti
RC Lewontin
S Purcell
SB Gabriel
TA Manolio
TD Wickens
TJ Hubbard
TL Edwards
W Zhang
William S Bush
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Gene-centric analysis tools for genome-wide association study data are being developed both to annotate single locus statistics and to prioritize or group single nucleotide polymorphisms (SNPs) prior to analysis. These approaches require knowledge about the relationships between SNPs on a genotyping platform and genes in the human genome. SNPs in the genome can represent broader genomic regions via linkage disequilibrium (LD), and population-specific patterns of LD can be exploited to generate a data-driven map of SNPs to genes. Methods In this study, we implemented LD-Spline, a database routine that defines the genomic boundaries a particular SNP represents using linkage disequilibrium statistics from the International HapMap Project. We compared the LD-Spline haplotype block partitioning approach to that of the four gamete rule and the Gabriel et al. approach using simulated data; in addition, we processed two commonly used genome-wide association study platforms. Results We illustrate that LD-Spline performs comparably to the four-gamete rule and the Gabriel et al. approach; however as a SNP-centric approach LD-Spline has the added benefit of systematically identifying a genomic boundary for each SNP, where the global block partitioning approaches may falter due to sampling variation in LD statistics. Conclusion LD-Spline is an integrated database routine that quickly and effectively defines the genomic region marked by a SNP using linkage disequilibrium, with a SNP-centric block definition algorithm.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Carolina Digital Repository

Neural networks for genetic epidemiology: past, present, and future

During the past two decades, the field of human genetics has experienced an information explosion. The completion of the human genome project and the development of high throughput SNP technologies have created a wealth of data; however, the analysis and interpretation of these data have created a research bottleneck. While technology facilitates the measurement of hundreds or thousands of genes, statistical and computational methodologies are lacking for the analysis of these data. New statistical methods and variable selection strategies must be explored for identifying disease susceptibility genes for common, complex diseases. Neural networks (NN) are a class of pattern recognition methods that have been successfully implemented for data mining and prediction in a variety of fields. The application of NN for statistical genetics studies is an active area of research. Neural networks have been applied in both linkage and association analysis for the identification of disease susceptibility genes

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central