18 research outputs found

    The PowerAtlas: a power and sample size atlas for microarray experimental design and research

    Get PDF
    BACKGROUND: Microarrays permit biologists to simultaneously measure the mRNA abundance of thousands of genes. An important issue facing investigators planning microarray experiments is how to estimate the sample size required for good statistical power. What is the projected sample size or number of replicate chips needed to address the multiple hypotheses with acceptable accuracy? Statistical methods exist for calculating power based upon a single hypothesis, using estimates of the variability in data from pilot studies. There is, however, a need for methods to estimate power and/or required sample sizes in situations where multiple hypotheses are being tested, such as in microarray experiments. In addition, investigators frequently do not have pilot data to estimate the sample sizes required for microarray studies. RESULTS: To address this challenge, we have developed a Microrarray PowerAtlas [1]. The atlas enables estimation of statistical power by allowing investigators to appropriately plan studies by building upon previous studies that have similar experimental characteristics. Currently, there are sample sizes and power estimates based on 632 experiments from Gene Expression Omnibus (GEO). The PowerAtlas also permits investigators to upload their own pilot data and derive power and sample size estimates from these data. This resource will be updated regularly with new datasets from GEO and other databases such as The Nottingham Arabidopsis Stock Center (NASC). CONCLUSION: This resource provides a valuable tool for investigators who are planning efficient microarray studies and estimating required sample sizes

    HDBStat!: A platform-independent software suite for statistical analysis of high dimensional biology data

    Get PDF
    BACKGROUND: Many efforts in microarray data analysis are focused on providing tools and methods for the qualitative analysis of microarray data. HDBStat! (High-Dimensional Biology-Statistics) is a software package designed for analysis of high dimensional biology data such as microarray data. It was initially developed for the analysis of microarray gene expression data, but it can also be used for some applications in proteomics and other aspects of genomics. HDBStat! provides statisticians and biologists a flexible and easy-to-use interface to analyze complex microarray data using a variety of methods for data preprocessing, quality control analysis and hypothesis testing. RESULTS: Results generated from data preprocessing methods, quality control analysis and hypothesis testing methods are output in the form of Excel CSV tables, graphs and an Html report summarizing data analysis. CONCLUSION: HDBStat! is a platform-independent software that is freely available to academic institutions and non-profit organizations. It can be downloaded from our website

    A Low Dose of Dietary Resveratrol Partially Mimics Caloric Restriction and Retards Aging Parameters in Mice

    Get PDF
    Resveratrol in high doses has been shown to extend lifespan in some studies in invertebrates and to prevent early mortality in mice fed a high-fat diet. We fed mice from middle age (14-months) to old age (30-months) either a control diet, a low dose of resveratrol (4.9 mg kg−1 day−1), or a calorie restricted (CR) diet and examined genome-wide transcriptional profiles. We report a striking transcriptional overlap of CR and resveratrol in heart, skeletal muscle and brain. Both dietary interventions inhibit gene expression profiles associated with cardiac and skeletal muscle aging, and prevent age-related cardiac dysfunction. Dietary resveratrol also mimics the effects of CR in insulin mediated glucose uptake in muscle. Gene expression profiling suggests that both CR and resveratrol may retard some aspects of aging through alterations in chromatin structure and transcription. Resveratrol, at doses that can be readily achieved in humans, fulfills the definition of a dietary compound that mimics some aspects of CR

    How accurate are the extremely small P-values used in genomic research: An evaluation of numerical libraries

    No full text
    In the fields of genomics and high-dimensional biology (HDB), massive multiple testing prompts the use of extremely small significance levels. Because tail areas of statistical distributions are needed for hypothesis testing, the accuracy of these areas is important to confidently make scientific judgments. Previous work on accuracy was primarily focused on evaluating professionally written statistical software, like SAS, on the Statistical Reference Datasets (StRD) provided by the National Institute of Standards and Technology (NIST) and on the accuracy of tail areas in statistical distributions. The goal of this paper is to provide guidance to investigators, who are developing their own custom scientific software built upon numerical libraries written by others. Specifically, we evaluate the accuracy of small tail areas from cumulative distribution functions (CDF) of the Chi-square and t-distribution by comparing several open-source, free, or commercially licensed numerical libraries in Java, C, and R to widely accepted standards of comparison like ELV and DCDFLIB. In our evaluation, the C libraries and R functions are consistently accurate up to six significant digits. Amongst the evaluated Java libraries, Colt is the most accurate. These languages and libraries are popular choices among programmers developing scientific software, so the results herein can be useful to programmers in choosing libraries for CDF accuracy.
    corecore