49 research outputs found

    beadarrayFilter : an R package to filter beads

    Get PDF
    Microarrays enable the expression levels of thousands of genes to be measured simultaneously. However, only a small fraction of these genes are expected to be expressed under different experimental conditions. Nowadays, filtering has been introduced as a step in the microarray preprocessing pipeline. Gene filtering aims at reducing the dimensionality of data by filtering redundant features prior to the actual statistical analysis. Previous filtering methods focus on the Affymetrix platform and can not be easily ported to the Illumina platform. As such, we developed a filtering method for Illumina bead arrays. We developed an R package, beadarrayFilter, to implement the latter method. In this paper, the main functions in the package are highlighted and using many examples, we illustrate how beadarrayFilter can be used to filter bead arrays

    BASH: a tool for managing BeadArray spatial artefacts

    Get PDF
    Summary: With their many replicates and their random layouts, Illumina BeadArrays provide greater scope fordetecting spatial artefacts than do other microarray technologies. They are also robust to artefact exclusion, yet there is a lack of tools that can perform these tasks for Illumina. We present BASH, a tool for this purpose. BASH adopts the concepts of Harshlight, but implements them in a manner that utilizes the unique characteristics of the Illumina technology. Using bead-level data, spatial artefacts of various kinds can thus be identified and excluded from further analyses

    Spike-in validation of an Illumina-specific variance-stabilizing transformation

    Get PDF
    BACKGROUND: Variance-stabilizing techniques have been used for some time in the analysis of gene expression microarray data. A new adaptation, the variance-stabilizing transformation (VST), has recently been developed to take advantage of the unique features of Illumina BeadArrays. VST has been shown to perform well in comparison with the widely-used approach of taking a log2 transformation, but has not been validated on a spike-in experiment. We apply VST to the data from a recently published spike-in experiment and compare it both to a regular log2 analysis and a recently recommended analysis that can be applied if all raw data are available. FINDINGS: VST provides more power to detect differentially expressed genes than a log2 transformation. However, the gain in power is roughly the same as utilizing the raw data from an experiment and weighting observations accordingly. VST is still advantageous when large changes in expression are anticipated, while a weighted log2 approach performs better for smaller changes. CONCLUSION: VST can be recommended for summarized Illumina data regardless of which Illumina pre-processing options have been used. However, using the raw data is still encouraged whenever possible

    SAMQA: error classification and validation of high-throughput sequenced read data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The advances in high-throughput sequencing technologies and growth in data sizes has highlighted the need for scalable tools to perform quality assurance testing. These tests are necessary to ensure that data is of a minimum necessary standard for use in downstream analysis. In this paper we present the SAMQA tool to rapidly and robustly identify errors in population-scale sequence data.</p> <p>Results</p> <p>SAMQA has been used on samples from three separate sets of cancer genome data from The Cancer Genome Atlas (TCGA) project. Using technical standards provided by the SAM specification and biological standards defined by researchers, we have classified errors in these sequence data sets relative to individual reads within a sample. Due to an observed linearithmic speedup through the use of a high-performance computing (HPC) framework for the majority of tasks, poor quality data was identified prior to secondary analysis in significantly less time on the HPC framework than the same data run using alternative parallelization strategies on a single server.</p> <p>Conclusions</p> <p>The SAMQA toolset validates a minimum set of data quality standards across whole-genome and exome sequences. It is tuned to run on a high-performance computational framework, enabling QA across hundreds gigabytes of samples regardless of coverage or sample type.</p

    Statistical expression deconvolution from mixed tissue samples

    Get PDF
    Motivation: Global expression patterns within cells are used for purposes ranging from the identification of disease biomarkers to basic understanding of cellular processes. Unfortunately, tissue samples used in cancer studies are usually composed of multiple cell types and the non-cancerous portions can significantly affect expression profiles. This severely limits the conclusions that can be made about the specificity of gene expression in the cell-type of interest. However, statistical analysis can be used to identify differentially expressed genes that are related to the biological question being studied

    Transcript length bias in RNA-seq data confounds systems biology

    Get PDF
    BACKGROUND: Several recent studies have demonstrated the effectiveness of deep sequencing for transcriptome analysis (RNA-seq) in mammals. As RNA-seq becomes more affordable, whole genome transcriptional profiling is likely to become the platform of choice for species with good genomic sequences. As yet, a rigorous analysis methodology has not been developed and we are still in the stages of exploring the features of the data. RESULTS: We investigated the effect of transcript length bias in RNA-seq data using three different published data sets. For standard analyses using aggregated tag counts for each gene, the ability to call differentially expressed genes between samples is strongly associated with the length of the transcript. CONCLUSION: Transcript length bias for calling differentially expressed genes is a general feature of current protocols for RNA-seq technology. This has implications for the ranking of differentially expressed genes, and in particular may introduce bias in gene set testing for pathway analysis and other multi-gene systems biology analyses. REVIEWERS: This article was reviewed by Rohan Williams (nominated by Gavin Huttley), Nicole Cloonan (nominated by Mark Ragan) and James Bullard (nominated by Sandrine Dudoit)

    Influence of ARHGEF3 and RHOA Knockdown on ACTA2 and Other Genes in Osteoblasts and Osteoclasts

    Get PDF
    Osteoporosis is a common bone disease that has a strong genetic component. Genome-wide linkage studies have identified the chromosomal region 3p14-p22 as a quantitative trait locus for bone mineral density (BMD). We have previously identified associations between variation in two related genes located in 3p14-p22, ARHGEF3 and RHOA, and BMD in women. In this study we performed knockdown of these genes using small interfering RNA (siRNA) in human osteoblast-like and osteoclast-like cells in culture, with subsequent microarray analysis to identify genes differentially regulated from a list of 264 candidate genes. Validation of selected findings was then carried out in additional human cell lines/cultures using quantitative real-time PCR (qRT-PCR). The qRT-PCR results showed significant down-regulation of the ACTA2 gene, encoding the cytoskeletal protein alpha 2 actin, in response to RHOA knockdown in both osteoblast-like (P<0.001) and osteoclast-like cells (P = 0.002). RHOA knockdown also caused up-regulation of the PTH1R gene, encoding the parathyroid hormone 1 receptor, in Saos-2 osteoblast-like cells (P<0.001). Other findings included down-regulation of the TNFRSF11B gene, encoding osteoprotegerin, in response to ARHGEF3 knockdown in the Saos-2 and hFOB 1.19 osteoblast-like cells (P = 0.003– 0.02), and down-regulation of ARHGDIA, encoding the Rho GDP dissociation inhibitor alpha, in response to RHOA knockdown in osteoclast-like cells (P<0.001). These studies identify ARHGEF3 and RHOA as potential regulators of a number of genes in bone cells, including TNFRSF11B, ARHGDIA, PTH1R and ACTA2, with influences on the latter evident in both osteoblast-like and osteoclast-like cells. This adds further evidence to previous studies suggesting a role for the ARHGEF3 and RHOA genes in bone metabolism

    BeadArray Expression Analysis Using Bioconductor

    Get PDF
    Illumina whole-genome expression BeadArrays are a popular choice in gene profiling studies. Aside from the vendor-provided software tools for analyzing BeadArray expression data (GenomeStudio/BeadStudio), there exists a comprehensive set of open-source analysis tools in the Bioconductor project, many of which have been tailored to exploit the unique properties of this platform. In this article, we explore a number of these software packages and demonstrate how to perform a complete analysis of BeadArray data in various formats. The key steps of importing data, performing quality assessments, preprocessing, and annotation in the common setting of assessing differential expression in designed experiments will be covered

    BeadDataPackR: A Tool to Facilitate the Sharing of Raw Data from Illumina BeadArray Studies

    Get PDF
    Microarray technologies have been an increasingly important tool in cancer research in the last decade, and a number of initiatives have sought to stress the importance of the provision and sharing of raw microarray data. Illumina BeadArrays provide a particular problem in this regard, as their random construction simultaneously adds value to analysis of the raw data and obstructs the sharing of those data

    The cerebellum ages slowly according to the epigenetic clock

    Get PDF
    Studies that elucidate why some human tissues age faster than others may shed light on how we age, and ultimately suggest what interventions may be possible. Here we utilize a recent biomarker of aging (referred to as epigenetic clock) to assess the epigenetic ages of up to 30 anatomic sites from supercentenarians (subjects who reached an age of 110 or older) and younger subjects. Using three novel and three published human DNA methylation data sets, we demonstrate that the cerebellum ages more slowly than other parts of the human body. We used both transcriptional data and genetic data to elucidate molecular mechanisms which may explain this finding. The two largest superfamilies of helicases (SF1 and SF2) are significantly over-represented (p=9.2x10-9) among gene transcripts that are over-expressed in the cerebellum compared to other brain regions from the same subject. Furthermore, SNPs that are associated with epigenetic age acceleration in the cerebellum tend to be located near genes from helicase superfamilies SF1 and SF2 (enrichment p=5.8x10-3). Our genetic and transcriptional studies of epigenetic age acceleration support the hypothesis that the slow aging rate of the cerebellum is due to processes that involve RNA helicases
    corecore