38 research outputs found

    Effects of dependence in high-dimensional multiple testing problems

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>We consider effects of dependence among variables of high-dimensional data in multiple hypothesis testing problems, in particular the False Discovery Rate (FDR) control procedures. Recent simulation studies consider only simple correlation structures among variables, which is hardly inspired by real data features. Our aim is to systematically study effects of several network features like sparsity and correlation strength by imposing dependence structures among variables using random correlation matrices.</p> <p>Results</p> <p>We study the robustness against dependence of several FDR procedures that are popular in microarray studies, such as Benjamin-Hochberg FDR, Storey's q-value, SAM and resampling based FDR procedures. False Non-discovery Rates and estimates of the number of null hypotheses are computed from those methods and compared. Our simulation study shows that methods such as SAM and the q-value do not adequately control the FDR to the level claimed under dependence conditions. On the other hand, the adaptive Benjamini-Hochberg procedure seems to be most robust while remaining conservative. Finally, the estimates of the number of true null hypotheses under various dependence conditions are variable.</p> <p>Conclusion</p> <p>We discuss a new method for efficient guided simulation of dependent data, which satisfy imposed network constraints as conditional independence structures. Our simulation set-up allows for a structural study of the effect of dependencies on multiple testing criterions and is useful for testing a potentially new method on <it>π</it><sub>0 </sub>or FDR estimation in a dependency context.</p

    Differential analysis for high density tiling microarray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High density oligonucleotide tiling arrays are an effective and powerful platform for conducting unbiased genome-wide studies. The <it>ab initio </it>probe selection method employed in tiling arrays is unbiased, and thus ensures consistent sampling across coding and non-coding regions of the genome. These arrays are being increasingly used to study the associated processes of transcription, transcription factor binding, chromatin structure and their association. Studies of differential expression and/or regulation provide critical insight into the mechanics of transcription and regulation that occurs during the developmental program of a cell. The time-course experiment, which comprises an <it>in-vivo </it>system and the proposed analyses, is used to determine if annotated and un-annotated portions of genome manifest coordinated differential response to the induced developmental program.</p> <p>Results</p> <p>We have proposed a novel approach, based on a piece-wise function – to analyze genome-wide differential response. This enables segmentation of the response based on protein-coding and non-coding regions; for genes the methodology also partitions differential response with a 5' versus 3' versus intra-genic bias.</p> <p>Conclusion</p> <p>The algorithm built upon the framework of Significance Analysis of Microarrays, uses a generalized logic to define regions/patterns of coordinated differential change. By not adhering to the gene-centric paradigm, discordant differential expression patterns between exons and introns have been identified at a FDR of less than 12 percent. A co-localization of differential binding between RNA Polymerase II and tetra-acetylated histone has been quantified at a p-value < 0.003; it is most significant at the 5' end of genes, at a p-value < 10<sup>-13</sup>. The prototype R code has been made available as supplementary material [see Additional file <supplr sid="S1">1</supplr>].</p> <suppl id="S1"> <title> <p>Additional file 1</p> </title> <text> <p>gsam_prototypercode.zip. File archive comprising of prototype R code for gSAM implementation including readme and examples.</p> </text> <file name="1471-2105-8-359-S1.zip"> <p>Click here for file</p> </file> </suppl

    GeneTools – application for functional annotation and statistical hypothesis testing

    Get PDF
    BACKGROUND: Modern biology has shifted from "one gene" approaches to methods for genomic-scale analysis like microarray technology, which allow simultaneous measurement of thousands of genes. This has created a need for tools facilitating interpretation of biological data in "batch" mode. However, such tools often leave the investigator with large volumes of apparently unorganized information. To meet this interpretation challenge, gene-set, or cluster testing has become a popular analytical tool. Many gene-set testing methods and software packages are now available, most of which use a variety of statistical tests to assess the genes in a set for biological information. However, the field is still evolving, and there is a great need for "integrated" solutions. RESULTS: GeneTools is a web-service providing access to a database that brings together information from a broad range of resources. The annotation data are updated weekly, guaranteeing that users get data most recently available. Data submitted by the user are stored in the database, where it can easily be updated, shared between users and exported in various formats. GeneTools provides three different tools: i) NMC Annotation Tool, which offers annotations from several databases like UniGene, Entrez Gene, SwissProt and GeneOntology, in both single- and batch search mode. ii) GO Annotator Tool, where users can add new gene ontology (GO) annotations to genes of interest. These user defined GO annotations can be used in further analysis or exported for public distribution. iii) eGOn, a tool for visualization and statistical hypothesis testing of GO category representation. As the first GO tool, eGOn supports hypothesis testing for three different situations (master-target situation, mutually exclusive target-target situation and intersecting target-target situation). An important additional function is an evidence-code filter that allows users, to select the GO annotations for the analysis. CONCLUSION: GeneTools is the first "all in one" annotation tool, providing users with a rapid extraction of highly relevant gene annotation data for e.g. thousands of genes or clones at once. It allows a user to define and archive new GO annotations and it supports hypothesis testing related to GO category representations. GeneTools is freely available through www.genetools.n

    Cardiovascular Response to Beta-Adrenergic Blockade or Activation in 23 Inbred Mouse Strains

    Get PDF
    We report the characterisation of 27 cardiovascular-related traits in 23 inbred mouse strains. Mice were phenotyped either in response to chronic administration of a single dose of the β-adrenergic receptor blocker atenolol or under a low and a high dose of the β-agonist isoproterenol and compared to baseline condition. The robustness of our data is supported by high trait heritabilities (typically H2>0.7) and significant correlations of trait values measured in baseline condition with independent multistrain datasets of the Mouse Phenome Database. We then focused on the drug-, dose-, and strain-specific responses to β-stimulation and β-blockade of a selection of traits including heart rate, systolic blood pressure, cardiac weight indices, ECG parameters and body weight. Because of the wealth of data accumulated, we applied integrative analyses such as comprehensive bi-clustering to investigate the structure of the response across the different phenotypes, strains and experimental conditions. Information extracted from these analyses is discussed in terms of novelty and biological implications. For example, we observe that traits related to ventricular weight in most strains respond only to the high dose of isoproterenol, while heart rate and atrial weight are already affected by the low dose. Finally, we observe little concordance between strain similarity based on the phenotypes and genotypic relatedness computed from genomic SNP profiles. This indicates that cardiovascular phenotypes are unlikely to segregate according to global phylogeny, but rather be governed by smaller, local differences in the genetic architecture of the various strains

    Methylomic Signatures of High Grade Serous Ovarian Cancer

    No full text

    Approaches to multiplicity issues in complex research in microarray analysis

    No full text
    The multiplicity problem is evident in the simplest form of statistical analysis of gene expression data – the identification of differentially expressed genes. In more complex analysis, the problem is com-pounded by the multiplicity of hypotheses per gene. Thus, in some cases, it may be necessary to consider testing millions of hypo-theses. We present three general approaches for addressing multi-plicity in large research problems. (a) Use the scalability of false discovery rate (FDR) controlling procedures; (b) apply FDR-control-ling procedures to a selected subset of hypotheses; (c) apply hierar-chical FDR-controlling procedures.We also offer a general framework for ensuring reproducible results in complex research, where a researcher faces more than just one large research problem. We demonstrate these approaches by analyzing the results of a complex experiment involving the study of gene expression levels in different brain regions across multiple mouse strains
    corecore