1,031 research outputs found
Pre-processing and differential expression analysis of Agilent microRNA arrays using the AgiMicroRna Bioconductor library
<p>Abstract</p> <p>Background</p> <p>The main research tool for identifying microRNAs involved in specific cellular processes is gene expression profiling using microarray technology. Agilent is one of the major producers of microRNA arrays, and microarray data are commonly analyzed by using R and the functions and packages collected in the Bioconductor project. However, an analytical package that integrates the specific characteristics of microRNA Agilent arrays has been lacking.</p> <p>Results</p> <p>This report presents the new bioinformatic tool <it>AgiMicroRNA </it>for the pre-processing and differential expression analysis of Agilent microRNA array data. The software is implemented in the open-source statistical scripting language R and is integrated in the Bioconductor project (<url>http://www.bioconductor.org</url>) under the GPL license. For the pre-processing of the microRNA signal, <it>AgiMicroRNA </it>incorporates the <it>robust multiarray average algorithm</it>, a method that produces a summary measure of the microRNA expression using a linear model that takes into account the probe affinity effect. To obtain a normalized microRNA signal useful for the statistical analysis, <it>AgiMicroRna </it>offers the possibility of employing either the processed signal estimated by the <it>robust multiarray average algorithm </it>or the processed signal produced by the Agilent image analysis software. The <it>AgiMicroRNA </it>package also incorporates different graphical utilities to assess the quality of the data. <it>AgiMicroRna </it>uses the linear model features implemented in the <it>limma </it>package to assess the differential expression between different experimental conditions and provides links to the <it>miRBase </it>for those microRNAs that have been declared as significant in the statistical analysis.</p> <p>Conclusions</p> <p><it>AgiMicroRna </it>is a rational collection of Bioconductor functions that have been wrapped into specific functions in order to ease and systematize the pre-processing and statistical analysis of Agilent microRNA data. The development of this package contributes to the Bioconductor project filling the gap in microRNA array data analysis.</p
Probe set algorithms: is there a rational best bet?
Affymetrix microarrays have become a standard experimental platform for studies of mRNA expression profiling. Their success is due, in part, to the multiple oligonucleotide features (probes) against each transcript (probe set). This multiple testing allows for more robust background assessments and gene expression measures, and has permitted the development of many computational methods to translate image data into a single normalized "signal" for mRNA transcript abundance. There are now many probe set algorithms that have been developed, with a gradual movement away from chip-by-chip methods (MAS5), to project-based model-fitting methods (dCHIP, RMA, others). Data interpretation is often profoundly changed by choice of algorithm, with disoriented biologists questioning what the "accurate" interpretation of their experiment is. Here, we summarize the debate concerning probe set algorithms. We provide examples of how changes in mismatch weight, normalizations, and construction of expression ratios each dramatically change data interpretation. All interpretations can be considered as computationally appropriate, but with varying biological credibility. We also illustrate the performance of two new hybrid algorithms (PLIER, GC-RMA) relative to more traditional algorithms (dCHIP, MAS5, Probe Profiler PCA, RMA) using an interactive power analysis tool. PLIER appears superior to other algorithms in avoiding false positives with poorly performing probe sets. Based on our interpretation of the literature, and examples presented here, we suggest that the variability in performance of probe set algorithms is more dependent upon assumptions regarding "background", than on calculations of "signal". We argue that "background" is an enormously complex variable that can only be vaguely quantified, and thus the "best" probe set algorithm will vary from project to project
Recommended from our members
Error, reproducibility and sensitivity : a pipeline for data processing of Agilent oligonucleotide expression arrays
Background
Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples.
Results
We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2% of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log2 units ( 6% of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators.
Conclusions
This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells
TiArA: A Virtual Appliance for the Analysis of Tiling Array Data
Genomic tiling arrays have been described in the scientific literature since 2003, yet there is a shortage of user-friendly applications available for their analysis.Tiling Array Analyzer (TiArA) is a software program that provides a user-friendly graphical interface for the background subtraction, normalization, and summarization of data acquired through the Affymetrix tiling array platform. The background signal is empirically measured using a group of nonspecific probes with varying levels of GC content and normalization is performed to enforce a common dynamic range.TiArA is implemented as a standalone program for Linux systems and is available as a cross-platform virtual machine that will run under most modern operating systems using virtualization software such as Sun VirtualBox or VMware. The software is available as a Debian package or a virtual appliance at http://purl.org/NET/tiara
Serum microRNA array analysis identifies miR-140-3p, miR-33b-3p and miR-671-3p as potential osteoarthritis biomarkers involved in metabolic processes.
Background: MicroRNAs (miRNAs) in circulation have emerged as promising biomarkers. In this study, we aimed to identify a circulating miRNA signature for osteoarthritis (OA) patients and in combination with bioinformatics analysis to evaluate the utility of selected differentially expressed miRNAs in the serum as potential OA biomarkers. Methods: Serum samples were collected from 12 primary OA patients, and 12 healthy individuals were screened using the Agilent Human miRNA Microarray platform interrogating 2549 miRNAs. Receiver Operating Characteristic (ROC) curves were constructed to evaluate the diagnostic performance of the deregulated miRNAs. Expression levels of selected miRNAs were validated by quantitative real-time PCR (qRT-PCR) in all serum and in articular cartilage samples from OA patients (n = 12) and healthy individuals (n = 7). Bioinformatics analysis was used to investigate the involved pathways and target genes for the above miRNAs. Results: We identified 279 differentially expressed miRNAs in the serum of OA patients compared to controls. Two hundred and five miRNAs (73.5%) were upregulated and 74 (26.5%) downregulated. ROC analysis revealed that 77 miRNAs had area under the curve (AUC) > 0.8 and p < 0.05. Bioinformatics analysis in the 77 miRNAs revealed that their target genes were involved in multiple signaling pathways associated with OA, among which FoxO, mTOR, Wnt, pI3K/akt, TGF-β signaling pathways, ECM-receptor interaction, and fatty acid biosynthesis. qRT-PCR validation in seven selected out of the 77 miRNAs revealed 3 significantly downregulated miRNAs (hsa-miR-33b-3p, hsa-miR-671-3p, and hsa-miR-140-3p) in the serum of OA patients, which were in silico predicted to be enriched in pathways involved in metabolic processes. Target-gene analysis of hsa-miR-140-3p, hsa-miR-33b-3p, and hsa-miR-671-3p revealed that InsR and IGFR1 were common targets of all three miRNAs, highlighting their involvement in regulation of metabolic processes that contribute to OA pathology. Hsa-miR-140-3p and hsa-miR-671-3p expression levels were consistently downregulated in articular cartilage of OA patients compared to healthy individuals. Conclusions: A serum miRNA signature was established for the first time using high density resolution miR-arrays in OA patients. We identified a three-miRNA signature, hsa-miR-140-3p, hsa-miR-671-3p, and hsa-miR-33b-3p, in the serum of OA patients, predicted to regulate metabolic processes, which could serve as a potential biomarker for the evaluation of OA risk and progression.Peer reviewedFinal Published versio
Determining gene expression on a single pair of microarrays
<p>Abstract</p> <p>Background</p> <p>In microarray experiments the numbers of replicates are often limited due to factors such as cost, availability of sample or poor hybridization. There are currently few choices for the analysis of a pair of microarrays where N = 1 in each condition. In this paper, we demonstrate the effectiveness of a new algorithm called PINC (PINC is Not Cyber-T) that can analyze Affymetrix microarray experiments.</p> <p>Results</p> <p>PINC treats each pair of probes within a probeset as an independent measure of gene expression using the Bayesian framework of the Cyber-T algorithm and then assigns a corrected p-value for each gene comparison.</p> <p>The p-values generated by PINC accurately control False Discovery rate on Affymetrix control data sets, but are small enough that family-wise error rates (such as the Holm's step down method) can be used as a conservative alternative to false discovery rate with little loss of sensitivity on control data sets.</p> <p>Conclusion</p> <p>PINC outperforms previously published methods for determining differentially expressed genes when comparing Affymetrix microarrays with N = 1 in each condition. When applied to biological samples, PINC can be used to assess the degree of variability observed among biological replicates in addition to analyzing isolated pairs of microarrays.</p
Definition of the σW regulon of Bacillus subtilis in the absence of stress
Bacteria employ extracytoplasmic function (ECF) sigma factors for their responses to environmental stresses. Despite intensive research, the molecular dissection of ECF sigma factor regulons has remained a major challenge due to overlaps in the ECF sigma factor-regulated genes and the stimuli that activate the different ECF sigma factors. Here we have employed tiling arrays to single out the ECF σW regulon of the Gram-positive bacterium Bacillus subtilis from the overlapping ECF σX, σY, and σM regulons. For this purpose, we profiled the transcriptome of a B. subtilis sigW mutant under non-stress conditions to select candidate genes that are strictly σW-regulated. Under these conditions, σW exhibits a basal level of activity. Subsequently, we verified the σW-dependency of candidate genes by comparing their transcript profiles to transcriptome data obtained with the parental B. subtilis strain 168 grown under 104 different conditions, including relevant stress conditions, such as salt shock. In addition, we investigated the transcriptomes of rasP or prsW mutant strains that lack the proteases involved in the degradation of the σW anti-sigma factor RsiW and subsequent activation of the σW-regulon. Taken together, our studies identify 89 genes as being strictly σW-regulated, including several genes for non-coding RNAs. The effects of rasP or prsW mutations on the expression of σW-dependent genes were relatively mild, which implies that σW-dependent transcription under non-stress conditions is not strictly related to RasP and PrsW. Lastly, we show that the pleiotropic phenotype of rasP mutant cells, which have defects in competence development, protein secretion and membrane protein production, is not mirrored in the transcript profile of these cells. This implies that RasP is not only important for transcriptional regulation via σW, but that this membrane protease also exerts other important post-transcriptional regulatory functions
Examining smoking-induced differential gene expression changes in buccal mucosa
<p>Abstract</p> <p>Background</p> <p>Gene expression changes resulting from conditions such as disease, environmental stimuli, and drug use, can be monitored in the blood. However, a less invasive method of sample collection is of interest because of the discomfort and specialized personnel necessary for blood sampling especially if multiple samples are being collected. Buccal mucosa cells are easily collected and may be an alternative sample material for biomarker testing. A limited number of studies, primarily in the smoker/oral cancer literature, address this tissue's efficacy as an RNA source for expression analysis. The current study was undertaken to determine if total RNA isolated from buccal mucosa could be used as an alternative tissue source to assay relative gene expression.</p> <p>Methods</p> <p>Total RNA was isolated from swabs, reverse transcribed and amplified. The amplified cDNA was used in RT-qPCR and microarray analyses to evaluate gene expression in buccal cells. Initially, RT-qPCR was used to assess relative transcript levels of four genes from whole blood and buccal cells collected from the same seven individuals, concurrently. Second, buccal cell RNA was used for microarray-based differential gene expression studies by comparing gene expression between a group of female smokers and nonsmokers.</p> <p>Results</p> <p>An amplification protocol allowed use of less buccal cell total RNA (50 ng) than had been reported previously with human microarrays. Total RNA isolated from buccal cells was degraded but was of sufficient quality to be used with RT-qPCR to detect expression of specific genes. We report here the finding of a small number of statistically significant differentially expressed genes between smokers and nonsmokers, using buccal cells as starting material. Gene Set Enrichment Analysis confirmed that these genes had a similar expression pattern to results from another study.</p> <p>Conclusions</p> <p>Our results suggest that despite a high degree of degradation, RNA from buccal cells from cheek mucosa could be used to detect differential gene expression between smokers and nonsmokers. However the RNA degradation, increase in sample variability and microarray failure rate show that buccal samples should be used with caution as source material in expression studies.</p
Detection of Perturbation Phases and Developmental Stages in Organisms from DNA Microarray Time Series Data
Available DNA microarray time series that record gene expression along the developmental stages of multicellular eukaryotes, or in unicellular organisms subject to external perturbations such as stress and diauxie, are analyzed. By pairwise comparison of the gene expression profiles on the basis of a translation-invariant and scale-invariant distance measure corresponding to least-rectangle regression, it is shown that peaks in the average distance values are noticeable and are localized around specific time points. These points systematically coincide with the transition points between developmental phases or just follow the external perturbations. This approach can thus be used to identify automatically, from microarray time series alone, the presence of external perturbations or the succession of developmental stages in arbitrary cell systems. Moreover, our results show that there is a striking similarity between the gene expression responses to these a priori very different phenomena. In contrast, the cell cycle does not involve a perturbation-like phase, but rather continuous gene expression remodeling. Similar analyses were conducted using three other standard distance measures, showing that the one we introduced was superior. Based on these findings, we set up an adapted clustering method that uses this distance measure and classifies the genes on the basis of their expression profiles within each developmental stage or between perturbation phases
The PathOlogist: an automated tool for pathway-centric analysis
<p>Abstract</p> <p>Background</p> <p>The PathOlogist is a new tool designed to transform large sets of gene expression data into quantitative descriptors of pathway-level behavior. The tool aims to provide a robust alternative to the search for single-gene-to-phenotype associations by accounting for the complexity of molecular interactions.</p> <p>Results</p> <p>Molecular abundance data is used to calculate two metrics - 'activity' and 'consistency' - for each pathway in a set of more than 500 canonical molecular pathways (source: Pathway Interaction Database, <url>http://pid.nci.nih.gov</url>). The tool then allows a detailed exploration of these metrics through integrated visualization of pathway components and structure, hierarchical clustering of pathways and samples, and statistical analyses designed to detect associations between pathway behavior and clinical features.</p> <p>Conclusions</p> <p>The PathOlogist provides a straightforward means to identify the functional processes, rather than individual molecules, that are altered in disease. The statistical power and biologic significance of this approach are made easily accessible to laboratory researchers and informatics analysts alike. Here we show as an example, how the PathOlogist can be used to establish pathway signatures that robustly differentiate breast cancer cell lines based on response to treatment.</p
- …