161 research outputs found

    Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates

    Get PDF
    Plasmode is a term coined several years ago to describe data sets that are derived from real data but for which some truth is known. Omic techniques, most especially microarray and genomewide association studies, have catalyzed a new zeitgeist of data sharing that is making data and data sets publicly available on an unprecedented scale. Coupling such data resources with a science of plasmode use would allow statistical methodologists to vet proposed techniques empirically (as opposed to only theoretically) and with data that are by definition realistic and representative. We illustrate the technique of empirical statistics by consideration of a common task when analyzing high dimensional data: the simultaneous testing of hundreds or thousands of hypotheses to determine which, if any, show statistical significance warranting follow-on research. The now-common practice of multiple testing in high dimensional experiment (HDE) settings has generated new methods for detecting statistically significant results. Although such methods have heretofore been subject to comparative performance analysis using simulated data, simulating data that realistically reflect data from an actual HDE remains a challenge. We describe a simulation procedure using actual data from an HDE where some truth regarding parameters of interest is known. We use the procedure to compare estimates for the proportion of true null hypotheses, the false discovery rate (FDR), and a local version of FDR obtained from 15 different statistical methods

    Increased entropy of signal transduction in the cancer metastasis phenotype

    Get PDF
    Studies into the statistical properties of biological networks have led to important biological insights, such as the presence of hubs and hierarchical modularity. There is also a growing interest in studying the statistical properties of networks in the context of cancer genomics. However, relatively little is known as to what network features differ between the cancer and normal cell physiologies, or between different cancer cell phenotypes. Based on the observation that frequent genomic alterations underlie a more aggressive cancer phenotype, we asked if such an effect could be detectable as an increase in the randomness of local gene expression patterns. Using a breast cancer gene expression data set and a model network of protein interactions we derive constrained weighted networks defined by a stochastic information flux matrix reflecting expression correlations between interacting proteins. Based on this stochastic matrix we propose and compute an entropy measure that quantifies the degree of randomness in the local pattern of information flux around single genes. By comparing the local entropies in the non-metastatic versus metastatic breast cancer networks, we here show that breast cancers that metastasize are characterised by a small yet significant increase in the degree of randomness of local expression patterns. We validate this result in three additional breast cancer expression data sets and demonstrate that local entropy better characterises the metastatic phenotype than other non-entropy based measures. We show that increases in entropy can be used to identify genes and signalling pathways implicated in breast cancer metastasis. Further exploration of such integrated cancer expression and protein interaction networks will therefore be a fruitful endeavour.Comment: 5 figures, 2 Supplementary Figures and Table

    Genome-wide DNA methylation analysis for diabetic nephropathy in type 1 diabetes mellitus

    Get PDF
    BACKGROUND: Diabetic nephropathy is a serious complication of diabetes mellitus and is associated with considerable morbidity and high mortality. There is increasing evidence to suggest that dysregulation of the epigenome is involved in diabetic nephropathy. We assessed whether epigenetic modification of DNA methylation is associated with diabetic nephropathy in a case-control study of 192 Irish patients with type 1 diabetes mellitus (T1D). Cases had T1D and nephropathy whereas controls had T1D but no evidence of renal disease. METHODS: We performed DNA methylation profiling in bisulphite converted DNA from cases and controls using the recently developed Illumina Infinium(R) HumanMethylation27 BeadChip, that enables the direct investigation of 27,578 individual cytosines at CpG loci throughout the genome, which are focused on the promoter regions of 14,495 genes. RESULTS: Singular Value Decomposition (SVD) analysis indicated that significant components of DNA methylation variation correlated with patient age, time to onset of diabetic nephropathy, and sex. Adjusting for confounding factors using multivariate Cox-regression analyses, and with a false discovery rate (FDR) of 0.05, we observed 19 CpG sites that demonstrated correlations with time to development of diabetic nephropathy. Of note, this included one CpG site located 18 bp upstream of the transcription start site of UNC13B, a gene in which the first intronic SNP rs13293564 has recently been reported to be associated with diabetic nephropathy. CONCLUSION: This high throughput platform was able to successfully interrogate the methylation state of individual cytosines and identified 19 prospective CpG sites associated with risk of diabetic nephropathy. These differences in DNA methylation are worthy of further follow-up in replication studies using larger cohorts of diabetic patients with and without nephropathy

    Improved annotation of 3' untranslated regions and complex loci by combination of strand-specific direct RNA sequencing, RNA-seq and ESTs

    Get PDF
    The reference annotations made for a genome sequence provide the framework for all subsequent analyses of the genome. Correct annotation is particularly important when interpreting the results of RNA-seq experiments where short sequence reads are mapped against the genome and assigned to genes according to the annotation. Inconsistencies in annotations between the reference and the experimental system can lead to incorrect interpretation of the effect on RNA expression of an experimental treatment or mutation in the system under study. Until recently, the genome-wide annotation of 3-prime untranslated regions received less attention than coding regions and the delineation of intron/exon boundaries. In this paper, data produced for samples in Human, Chicken and A. thaliana by the novel single-molecule, strand-specific, Direct RNA Sequencing technology from Helicos Biosciences which locates 3-prime polyadenylation sites to within +/- 2 nt, were combined with archival EST and RNA-Seq data. Nine examples are illustrated where this combination of data allowed: (1) gene and 3-prime UTR re-annotation (including extension of one 3-prime UTR by 5.9 kb); (2) disentangling of gene expression in complex regions; (3) clearer interpretation of small RNA expression and (4) identification of novel genes. While the specific examples displayed here may become obsolete as genome sequences and their annotations are refined, the principles laid out in this paper will be of general use both to those annotating genomes and those seeking to interpret existing publically available annotations in the context of their own experimental dataComment: 44 pages, 9 figure

    Seasonal changes in patterns of gene expression in avian song control brain regions.

    Get PDF
    This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Photoperiod and hormonal cues drive dramatic seasonal changes in structure and function of the avian song control system. Little is known, however, about the patterns of gene expression associated with seasonal changes. Here we address this issue by altering the hormonal and photoperiodic conditions in seasonally-breeding Gambel's white-crowned sparrows and extracting RNA from the telencephalic song control nuclei HVC and RA across multiple time points that capture different stages of growth and regression. We chose HVC and RA because while both nuclei change in volume across seasons, the cellular mechanisms underlying these changes differ. We thus hypothesized that different genes would be expressed between HVC and RA. We tested this by using the extracted RNA to perform a cDNA microarray hybridization developed by the SoNG initiative. We then validated these results using qRT-PCR. We found that 363 genes varied by more than 1.5 fold (>log(2) 0.585) in expression in HVC and/or RA. Supporting our hypothesis, only 59 of these 363 genes were found to vary in both nuclei, while 132 gene expression changes were HVC specific and 172 were RA specific. We then assigned many of these genes to functional categories relevant to the different mechanisms underlying seasonal change in HVC and RA, including neurogenesis, apoptosis, cell growth, dendrite arborization and axonal growth, angiogenesis, endocrinology, growth factors, and electrophysiology. This revealed categorical differences in the kinds of genes regulated in HVC and RA. These results show that different molecular programs underlie seasonal changes in HVC and RA, and that gene expression is time specific across different reproductive conditions. Our results provide insights into the complex molecular pathways that underlie adult neural plasticity

    A new multitest correction (SGoF) that increases its statistical power when increasing the number of tests

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The detection of true significant cases under multiple testing is becoming a fundamental issue when analyzing high-dimensional biological data. Unfortunately, known multitest adjustments reduce their statistical power as the number of tests increase. We propose a new multitest adjustment, based on a sequential goodness of fit metatest (SGoF), which increases its statistical power with the number of tests. The method is compared with Bonferroni and FDR-based alternatives by simulating a multitest context via two different kinds of tests: 1) one-sample t-test, and 2) homogeneity G-test.</p> <p>Results</p> <p>It is shown that SGoF behaves especially well with small sample sizes when 1) the alternative hypothesis is weakly to moderately deviated from the null model, 2) there are widespread effects through the family of tests, and 3) the number of tests is large.</p> <p>Conclusion</p> <p>Therefore, SGoF should become an important tool for multitest adjustment when working with high-dimensional biological data.</p

    Two chemically similar stellar overdensities on opposite sides of the plane of the Galaxy

    Get PDF
    Our Galaxy is thought to have undergone an active evolutionary history dominated by star formation, the accretion of cold gas, and, in particular, mergers up to 10 gigayear ago. The stellar halo reveals rich fossil evidence of these interactions in the form of stellar streams, substructures, and chemically distinct stellar components. The impact of dwarf galaxy mergers on the content and morphology of the Galactic disk is still being explored. Recent studies have identified kinematically distinct stellar substructures and moving groups, which may have extragalactic origin. However, there is mounting evidence that stellar overdensities at the outer disk/halo interface could have been caused by the interaction of a dwarf galaxy with the disk. Here we report detailed spectroscopic analysis of 14 stars drawn from two stellar overdensities, each lying about 5 kiloparsecs above and below the Galactic plane - locations suggestive of association with the stellar halo. However, we find that the chemical compositions of these stars are almost identical, both within and between these groups, and closely match the abundance patterns of the Milky Way disk stars. This study hence provides compelling evidence that these stars originate from the disk and the overdensities they are part of were created by tidal interactions of the disk with passing or merging dwarf galaxies.Comment: accepted for publication in Natur

    ChIPseqR: analysis of ChIP-seq experiments

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The use of high-throughput sequencing in combination with chromatin immunoprecipitation (ChIP-seq) has enabled the study of genome-wide protein binding at high resolution. While the amount of data generated from such experiments is steadily increasing, the methods available for their analysis remain limited. Although several algorithms for the analysis of ChIP-seq data have been published they focus almost exclusively on transcription factor studies and are usually not well suited for the analysis of other types of experiments.</p> <p>Results</p> <p>Here we present ChIPseqR, an algorithm for the analysis of nucleosome positioning and histone modification ChIP-seq experiments. The performance of this novel method is studied on short read sequencing data of <it>Arabidopsis thaliana </it>mononucleosomes as well as on simulated data.</p> <p>Conclusions</p> <p>ChIPseqR is shown to improve sensitivity and spatial resolution over existing methods while maintaining high specificity. Further analysis of predicted nucleosomes reveals characteristic patterns in nucleosome sequences and placement.</p

    Sample size calculation for microarray experiments with blocked one-way design

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>One of the main objectives of microarray analysis is to identify differentially expressed genes for different types of cells or treatments. Many statistical methods have been proposed to assess the treatment effects in microarray experiments.</p> <p>Results</p> <p>In this paper, we consider discovery of the genes that are differentially expressed among <it>K </it>(> 2) treatments when each set of <it>K </it>arrays consists of a block. In this case, the array data among <it>K </it>treatments tend to be correlated because of block effect. We propose to use the blocked one-way ANOVA <it>F</it>-statistic to test if each gene is differentially expressed among <it>K </it>treatments. The marginal p-values are calculated using a permutation method accounting for the block effect, adjusting for the multiplicity of the testing procedure by controlling the false discovery rate (FDR). We propose a sample size calculation method for microarray experiments with a blocked one-way design. With FDR level and effect sizes of genes specified, our formula provides a sample size for a given number of true discoveries.</p> <p>Conclusion</p> <p>The calculated sample size is shown via simulations to provide an accurate number of true discoveries while controlling the FDR at the desired level.</p

    Biological-effective versus conventional dose volume histograms correlated with late genitourinary and gastrointestinal toxicity after external beam radiotherapy for prostate cancer: a matched pair analysis

    Get PDF
    BACKGROUND: To determine whether the dose-volume histograms (DVH's) for the rectum and bladder constructed using biological-effective dose (BED-DVH's) better correlate with late gastrointestinal (GI) and genitourinary (GU) toxicity after treatment with external beam radiotherapy for prostate cancer than conventional DVH's (C-DVH's). METHODS: The charts of 190 patients treated with external beam radiotherapy with a minimum follow-up of 2 years were reviewed. Six patients (3.2%) were found to have RTOG grade 3 GI toxicity, and similarly 6 patients (3.2%) were found to have RTOG grade 3 GU toxicity. Average late C-DVH's and BED-DVH's of the bladder and rectum were computed for these patients as well as for matched-pair control patients. For each matched pair the following measures of normalized difference in the DVH's were computed: (a) δ(AUC )= (Area Under Curve [AUC] in grade 3 patient – AUC in grade 0 patient)/(AUC in grade 0 patient) and (b) δ(V60 )= (Percent volume receiving = 60 Gy [V60] in grade 3 patient – V60 in grade 0 patient)/(V60 in grade 0 patient). RESULTS: As expected, the grade 3 curve is to the right of and above the grade 0 curve for all four sets of average DVH's – suggesting that both the C-DVH and the BED-DVH can be used for predicting late toxicity. δ(AUC )was higher for the BED-DVH's than for the C-DVH's – 0.27 vs 0.23 (p = 0.036) for the rectum and 0.24 vs 0.20 (p = 0.065) for the bladder. δ(V60 )was also higher for the BED-DVH's than for the C-DVH's – 2.73 vs 1.49 for the rectum (p = 0.021) and 1.64 vs 0.71 (p = 0.021) for the bladder. CONCLUSIONS: When considering well-established dosimetric endpoints used in evaluating treatment plans, BED-DVH's for the rectum and bladder correlate better with late toxicity than C-DVH's and should be considered when attempting to minimize late GI and GU toxicity after external beam radiotherapy for prostate cancer
    corecore