51 research outputs found

    Comparison of information-theoretic to statistical methods for gene-gene interactions in the presence of genetic heterogeneity

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Multifactorial diseases such as cancer and cardiovascular diseases are caused by the complex interplay between genes and environment. The detection of these interactions remains challenging due to computational limitations. Information theoretic approaches use computationally efficient directed search strategies and thus provide a feasible solution to this problem. However, the power of information theoretic methods for interaction analysis has not been systematically evaluated. In this work, we compare power and Type I error of an information-theoretic approach to existing interaction analysis methods.</p> <p>Methods</p> <p>The <it>k-</it>way interaction information (KWII) metric for identifying variable combinations involved in gene-gene interactions (GGI) was assessed using several simulated data sets under models of genetic heterogeneity driven by susceptibility increasing loci with varying allele frequency, penetrance values and heritability. The power and proportion of false positives of the KWII was compared to multifactor dimensionality reduction (MDR), restricted partitioning method (RPM) and logistic regression.</p> <p>Results</p> <p>The power of the KWII was considerably greater than MDR on all six simulation models examined. For a given disease prevalence at high values of heritability, the power of both RPM and KWII was greater than 95%. For models with low heritability and/or genetic heterogeneity, the power of the KWII was consistently greater than RPM; the improvements in power for the KWII over RPM ranged from 4.7% to 14.2% at for α = 0.001 in the three models at the lowest heritability values examined. KWII performed similar to logistic regression.</p> <p>Conclusions</p> <p>Information theoretic models are flexible and have excellent power to detect GGI under a variety of conditions that characterize complex diseases.</p

    Using a higher criticism statistic to detect modest effects in a genome-wide study of rheumatoid arthritis

    Get PDF
    In high-dimensional studies such as genome-wide association studies, the correction for multiple testing in order to control total type I error results in decreased power to detect modest effects. We present a new analytical approach based on the higher criticism statistic that allows identification of the presence of modest effects. We apply our method to the genome-wide study of rheumatoid arthritis provided in the Genetic Analysis Workshop 16 Problem 1 data set. There is evidence for unknown bias in this study that could be explained by the presence of undetected modest effects. We compared the asymptotic and empirical thresholds for the higher criticism statistic. Using the asymptotic threshold we detected the presence of modest effects genome-wide. We also detected modest effects using 90th percentile of the empirical null distribution as a threshold; however, there is no such evidence when the 95th and 99th percentiles were used. While the higher criticism method suggests that there is some evidence for modest effects, interpreting individual single-nucleotide polymorphisms with significant higher criticism statistics is of undermined value. The goal of higher criticism is to alert the researcher that genetic effects remain to be discovered and to promote the use of more targeted and powerful studies to detect the remaining effects

    Pathway-based analysis of a genome-wide case-control association study of rheumatoid arthritis

    Get PDF
    Evaluation of the association between single-nucleotide polymorphisms (SNPs) and disease outcomes is widely used to identify genetic risk factors for complex diseases. Although this analysis paradigm has made significant progress in many genetic studies, many challenges remain, such as the requirement of a large sample size to achieve adequate power. Here we use rheumatoid arthritis (RA) as an example and explore a new analysis strategy: pathway-based analysis to search for related genes and SNPs contributing to the disease

    Filtering Genes for Cluster and Network Analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Prior to cluster analysis or genetic network analysis it is customary to filter, or remove genes considered to be irrelevant from the set of genes to be analyzed. Often genes whose variation across samples is less than an arbitrary threshold value are deleted. This can improve interpretability and reduce bias.</p> <p>Results</p> <p>This paper introduces modular models for representing network structure in order to study the relative effects of different filtering methods. We show that cluster analysis and principal components are strongly affected by filtering. Filtering methods intended specifically for cluster and network analysis are introduced and compared by simulating modular networks with known statistical properties. To study more realistic situations, we analyze simulated "real" data based on well-characterized E. coli and S. cerevisiae regulatory networks.</p> <p>Conclusion</p> <p>The methods introduced apply very generally, to any similarity matrix describing gene expression. One of the proposed methods, SUMCOV, performed well for all models simulated.</p

    Cluster analysis of protein array results via similarity of Gene Ontology annotation

    Get PDF
    BACKGROUND: With the advent of high-throughput proteomic experiments such as arrays of purified proteins comes the need to analyse sets of proteins as an ensemble, as opposed to the traditional one-protein-at-a-time approach. Although there are several publicly available tools that facilitate the analysis of protein sets, they do not display integrated results in an easily-interpreted image or do not allow the user to specify the proteins to be analysed. RESULTS: We developed a novel computational approach to analyse the annotation of sets of molecules. As proof of principle, we analysed two sets of proteins identified in published protein array screens. The distance between any two proteins was measured as the graph similarity between their Gene Ontology (GO) annotations. These distances were then clustered to highlight subsets of proteins sharing related GO annotation. In the first set of proteins found to bind small molecule inhibitors of rapamycin, we identified three subsets containing four or five proteins each that may help to elucidate how rapamycin affects cell growth whereas the original authors chose only one novel protein from the array results for further study. In a set of phosphoinositide-binding proteins, we identified subsets of proteins associated with different intracellular structures that were not highlighted by the analysis performed in the original publication. CONCLUSION: By determining the distances between annotations, our methodology reveals trends and enrichment of proteins of particular functions within high-throughput datasets at a higher sensitivity than perusal of end-point annotations. In an era of increasingly complex datasets, such tools will help in the formulation of new, testable hypotheses from high-throughput experimental data

    A Molecular Signature of Proteinuria in Glomerulonephritis

    Get PDF
    Proteinuria is the most important predictor of outcome in glomerulonephritis and experimental data suggest that the tubular cell response to proteinuria is an important determinant of progressive fibrosis in the kidney. However, it is unclear whether proteinuria is a marker of disease severity or has a direct effect on tubular cells in the kidneys of patients with glomerulonephritis. Accordingly we studied an in vitro model of proteinuria, and identified 231 “albumin-regulated genes” differentially expressed by primary human kidney tubular epithelial cells exposed to albumin. We translated these findings to human disease by studying mRNA levels of these genes in the tubulo-interstitial compartment of kidney biopsies from patients with IgA nephropathy using microarrays. Biopsies from patients with IgAN (n = 25) could be distinguished from those of control subjects (n = 6) based solely upon the expression of these 231 “albumin-regulated genes.” The expression of an 11-transcript subset related to the degree of proteinuria, and this 11-mRNA subset was also sufficient to distinguish biopsies of subjects with IgAN from control biopsies. We tested if these findings could be extrapolated to other proteinuric diseases beyond IgAN and found that all forms of primary glomerulonephritis (n = 33) can be distinguished from controls (n = 21) based solely on the expression levels of these 11 genes derived from our in vitro proteinuria model. Pathway analysis suggests common regulatory elements shared by these 11 transcripts. In conclusion, we have identified an albumin-regulated 11-gene signature shared between all forms of primary glomerulonephritis. Our findings support the hypothesis that albuminuria may directly promote injury in the tubulo-interstitial compartment of the kidney in patients with glomerulonephritis
    corecore