1,173 research outputs found

    Mulcom: a multiple comparison statistical test for microarray data in Bioconductor

    Get PDF
    Many microarray experiments compare a common control group with several ”test ” groups, like in the case, for example of a time-course experiments where time zero serves as a common reference point. The MulCom package described here implements the Dunnett’s t-test, which has been specifically developed to handle multiple comparisons against a common reference, in a version tailored for genomic data analysis that we named MulCom (Multiple Comparisons) test. The implementation includes two test parameters, namely the t value and an optional minimal fold-change value, m, with automated, permutation-based estimation of False Discovery Rate (FDR) for parameter combinations of choice. The package permits automated optimization of the test parameters to obtain the maximum number of significant genes at a given FDR value. In this vignette we present the rationale, implementation and usage of the MulCom package, plus a practical application on a time-course microarra

    KC-SMARTR: An R package for detection of statistically significant aberrations in multi-experiment aCGH data

    Get PDF
    Background: Most approaches used to find recurrent or differential DNA Copy Number Alterations (CNA) in array Comparative Genomic Hybridization (aCGH) data from groups of tumour samples depend on the discretization of the aCGH data to gain, loss or no-change states. This causes loss of valuable biological information in tumour samples, which are frequently heterogeneous. We have previously developed an algorithm, KC-SMART, that bases its estimate of the magnitude of the CNA at a given genomic location on kernel convolution (Klijn et al., 2008). This accounts for the intensity of the probe signal, its local genomic environment and the signal distribution across multiple samples. Results: Here we extend the approach to allow comparative analyses of two groups of samples and introduce the R implementation of these two approaches. The comparative module allows for a supervised analysis to be performed, to enable the identification of regions that are differentially aberrated between two user-defined classes. We analyzed data from a series of B- and T-cell lymphomas and were able to retrieve all positive control regions (VDJ regions) in addition to a number of new regions. A t-test employing segmented data, that we implemented, was also able to locate all the positive control regions and a number of new regions but these regions were highly fragmented. Conclusions: KC-SMARTR offers recurrent CNA and class specific CNA detection, at different genomic scales, in a single package without the need for additional segmentation. It is memory efficient and runs on a wide range of machines. Most importantly, it does not rely on data discretization and therefore maximally exploits the biological information in the aCGH data.MediamaticsElectrical Engineering, Mathematics and Computer Scienc

    A powerful method for detecting differentially expressed genes from GeneChip arrays that does not require replicates

    Get PDF
    BACKGROUND: Studies of differential expression that use Affymetrix GeneChip arrays are often carried out with a limited number of replicates. Reasons for this include financial considerations and limits on the available amount of RNA for sample preparation. In addition, failed hybridizations are not uncommon leading to a further reduction in the number of replicates available for analysis. Most existing methods for studying differential expression rely on the availability of replicates and the demand for alternative methods that require few or no replicates is high. RESULTS: We describe a statistical procedure for performing differential expression analysis without replicates. The procedure relies on a Bayesian integrated approach (BGX) to the analysis of Affymetrix GeneChips. The BGX method estimates a posterior distribution of expression for each gene and condition, from a simultaneous consideration of the available probe intensities representing the gene in a condition. Importantly, posterior distributions of expression are obtained regardless of the number of replicates available. We exploit these posterior distributions to create ranked gene lists that take into account the estimated expression difference as well as its associated uncertainty. We estimate the proportion of non-differentially expressed genes empirically, allowing an informed choice of cut-off for the ranked gene list, adapting an approach proposed by Efron. We assess the performance of the method, and compare it to those of other methods, on publicly available spike-in data sets, as well as in a proper biological setting. CONCLUSION: The method presented is a powerful tool for extracting information on differential expression from GeneChip expression studies with limited or no replicates

    Patterns of subnet usage reveal distinct scales of regulation in the transcriptional regulatory network of Escherichia coli

    Get PDF
    The set of regulatory interactions between genes, mediated by transcription factors, forms a species' transcriptional regulatory network (TRN). By comparing this network with measured gene expression data one can identify functional properties of the TRN and gain general insight into transcriptional control. We define the subnet of a node as the subgraph consisting of all nodes topologically downstream of the node, including itself. Using a large set of microarray expression data of the bacterium Escherichia coli, we find that the gene expression in different subnets exhibits a structured pattern in response to environmental changes and genotypic mutation. Subnets with less changes in their expression pattern have a higher fraction of feed-forward loop motifs and a lower fraction of small RNA targets within them. Our study implies that the TRN consists of several scales of regulatory organization: 1) subnets with more varying gene expression controlled by both transcription factors and post-transcriptional RNA regulation, and 2) subnets with less varying gene expression having more feed-forward loops and less post-transcriptional RNA regulation.Comment: 14 pages, 8 figures, to be published in PLoS Computational Biolog

    Extended analysis of benchmark datasets for Agilent two-color microarrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>As part of its broad and ambitious mission, the MicroArray Quality Control (MAQC) project reported the results of experiments using External RNA Controls (ERCs) on five microarray platforms. For most platforms, several different methods of data processing were considered. However, there was no similar consideration of different methods for processing the data from the Agilent two-color platform. While this omission is understandable given the scale of the project, it can create the false impression that there is consensus about the best way to process Agilent two-color data. It is also important to consider whether ERCs are representative of all the probes on a microarray.</p> <p>Results</p> <p>A comparison of different methods of processing Agilent two-color data shows substantial differences among methods for low-intensity genes. The sensitivity and specificity for detecting differentially expressed genes varies substantially for different methods. Analysis also reveals that the ERCs in the MAQC data only span the upper half of the intensity range, and therefore cannot be representative of all genes on the microarray.</p> <p>Conclusion</p> <p>Although ERCs demonstrate good agreement between observed and expected log-ratios on the Agilent two-color platform, such an analysis is incomplete. Simple loess normalization outperformed data processing with Agilent's Feature Extraction software for accurate identification of differentially expressed genes. Results from studies using ERCs should not be over-generalized when ERCs are not representative of all probes on a microarray.</p

    Examining smoking-induced differential gene expression changes in buccal mucosa

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Gene expression changes resulting from conditions such as disease, environmental stimuli, and drug use, can be monitored in the blood. However, a less invasive method of sample collection is of interest because of the discomfort and specialized personnel necessary for blood sampling especially if multiple samples are being collected. Buccal mucosa cells are easily collected and may be an alternative sample material for biomarker testing. A limited number of studies, primarily in the smoker/oral cancer literature, address this tissue's efficacy as an RNA source for expression analysis. The current study was undertaken to determine if total RNA isolated from buccal mucosa could be used as an alternative tissue source to assay relative gene expression.</p> <p>Methods</p> <p>Total RNA was isolated from swabs, reverse transcribed and amplified. The amplified cDNA was used in RT-qPCR and microarray analyses to evaluate gene expression in buccal cells. Initially, RT-qPCR was used to assess relative transcript levels of four genes from whole blood and buccal cells collected from the same seven individuals, concurrently. Second, buccal cell RNA was used for microarray-based differential gene expression studies by comparing gene expression between a group of female smokers and nonsmokers.</p> <p>Results</p> <p>An amplification protocol allowed use of less buccal cell total RNA (50 ng) than had been reported previously with human microarrays. Total RNA isolated from buccal cells was degraded but was of sufficient quality to be used with RT-qPCR to detect expression of specific genes. We report here the finding of a small number of statistically significant differentially expressed genes between smokers and nonsmokers, using buccal cells as starting material. Gene Set Enrichment Analysis confirmed that these genes had a similar expression pattern to results from another study.</p> <p>Conclusions</p> <p>Our results suggest that despite a high degree of degradation, RNA from buccal cells from cheek mucosa could be used to detect differential gene expression between smokers and nonsmokers. However the RNA degradation, increase in sample variability and microarray failure rate show that buccal samples should be used with caution as source material in expression studies.</p

    A close examination of double filtering with fold change and t test in microarray analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many researchers use the double filtering procedure with fold change and <it>t </it>test to identify differentially expressed genes, in the hope that the double filtering will provide extra confidence in the results. Due to its simplicity, the double filtering procedure has been popular with applied researchers despite the development of more sophisticated methods.</p> <p>Results</p> <p>This paper, for the first time to our knowledge, provides theoretical insight on the drawback of the double filtering procedure. We show that fold change assumes all genes to have a common variance while <it>t </it>statistic assumes gene-specific variances. The two statistics are based on contradicting assumptions. Under the assumption that gene variances arise from a mixture of a common variance and gene-specific variances, we develop the theoretically most powerful likelihood ratio test statistic. We further demonstrate that the posterior inference based on a Bayesian mixture model and the widely used significance analysis of microarrays (SAM) statistic are better approximations to the likelihood ratio test than the double filtering procedure.</p> <p>Conclusion</p> <p>We demonstrate through hypothesis testing theory, simulation studies and real data examples, that well constructed shrinkage testing methods, which can be united under the mixture gene variance assumption, can considerably outperform the double filtering procedure.</p