3,935 research outputs found

    Error, reproducibility and sensitivity : a pipeline for data processing of Agilent oligonucleotide expression arrays

    Get PDF
    Background Expression microarrays are increasingly used to obtain large scale transcriptomic information on a wide range of biological samples. Nevertheless, there is still much debate on the best ways to process data, to design experiments and analyse the output. Furthermore, many of the more sophisticated mathematical approaches to data analysis in the literature remain inaccessible to much of the biological research community. In this study we examine ways of extracting and analysing a large data set obtained using the Agilent long oligonucleotide transcriptomics platform, applied to a set of human macrophage and dendritic cell samples. Results We describe and validate a series of data extraction, transformation and normalisation steps which are implemented via a new R function. Analysis of replicate normalised reference data demonstrate that intrarray variability is small (only around 2% of the mean log signal), while interarray variability from replicate array measurements has a standard deviation (SD) of around 0.5 log2 units ( 6% of mean). The common practise of working with ratios of Cy5/Cy3 signal offers little further improvement in terms of reducing error. Comparison to expression data obtained using Arabidopsis samples demonstrates that the large number of genes in each sample showing a low level of transcription reflect the real complexity of the cellular transcriptome. Multidimensional scaling is used to show that the processed data identifies an underlying structure which reflect some of the key biological variables which define the data set. This structure is robust, allowing reliable comparison of samples collected over a number of years and collected by a variety of operators. Conclusions This study outlines a robust and easily implemented pipeline for extracting, transforming normalising and visualising transcriptomic array data from Agilent expression platform. The analysis is used to obtain quantitative estimates of the SD arising from experimental (non biological) intra- and interarray variability, and for a lower threshold for determining whether an individual gene is expressed. The study provides a reliable basis for further more extensive studies of the systems biology of eukaryotic cells

    Profound influence of microarray scanner characteristics on gene expression ratios: analysis and procedure for correction

    Get PDF
    BACKGROUND: High throughput gene expression data from spotted cDNA microarrays are collected by scanning the signal intensities of the corresponding spots by dedicated fluorescence scanners. The major scanner settings for increasing the spot intensities are the laser power and the voltage of the photomultiplier tube (PMT). It is required that the expression ratios are independent of these settings. We have investigated the relationships between PMT voltage, spot intensities, and expression ratios for different scanners, in order to define an optimal scanning procedure. RESULTS: All scanners showed a limited intensity range from 200 to 50 000 (mean spot intensity), for which the expression ratios were independent of PMT voltage. This usable intensity range was considerably less than the maximum detection range of the PMTs. The use of spot and background intensities outside this range led to errors in the ratios. The errors at high intensities were caused by saturation of pixel intensities within the spots. An algorithm was developed to correct the intensities of these spots, and, hence, extend the upper limit of the usable intensity range. CONCLUSIONS: It is suggested that the PMT voltage should be increased to avoid intensities of the weakest spots below the usable range, allowing the brightest spots to reach the level of saturation. Subsequently, a second set of images should be acquired with a lower PMT setting such that no pixels are in saturation. Reliable data for spots with saturation in the first set of images can easily be extracted from the second set of images by the use of our algorithm. This procedure would lead to an increase in the accuracy of the data and in the number of data points achieved in each experiment compared to traditional procedures

    Microarray scanner calibration curves: characteristics and implications

    Get PDF
    BACKGROUND: Microarray-based measurement of mRNA abundance assumes a linear relationship between the fluorescence intensity and the dye concentration. In reality, however, the calibration curve can be nonlinear. RESULTS: By scanning a microarray scanner calibration slide containing known concentrations of fluorescent dyes under 18 PMT gains, we were able to evaluate the differences in calibration characteristics of Cy5 and Cy3. First, the calibration curve for the same dye under the same PMT gain is nonlinear at both the high and low intensity ends. Second, the degree of nonlinearity of the calibration curve depends on the PMT gain. Third, the two PMTs (for Cy5 and Cy3) behave differently even under the same gain. Fourth, the background intensity for the Cy3 channel is higher than that for the Cy5 channel. The impact of such characteristics on the accuracy and reproducibility of measured mRNA abundance and the calculated ratios was demonstrated. Combined with simulation results, we provided explanations to the existence of ratio underestimation, intensity-dependence of ratio bias, and anti-correlation of ratios in dye-swap replicates. We further demonstrated that although Lowess normalization effectively eliminates the intensity-dependence of ratio bias, the systematic deviation from true ratios largely remained. A method of calculating ratios based on concentrations estimated from the calibration curves was proposed for correcting ratio bias. CONCLUSION: It is preferable to scan microarray slides at fixed, optimal gain settings under which the linearity between concentration and intensity is maximized. Although normalization methods improve reproducibility of microarray measurements, they appear less effective in improving accuracy

    Data analysis issues for allele-specific expression using Illumina's GoldenGate assay.

    Get PDF
    BACKGROUND: High-throughput measurement of allele-specific expression (ASE) is a relatively new and exciting application area for array-based technologies. In this paper, we explore several data sets which make use of Illumina's GoldenGate BeadArray technology to measure ASE. This platform exploits coding SNPs to obtain relative expression measurements for alleles at approximately 1500 positions in the genome. RESULTS: We analyze data from a mixture experiment where genomic DNA samples from pairs of individuals of known genotypes are pooled to create allelic imbalances at varying levels for the majority of SNPs on the array. We observe that GoldenGate has less sensitivity at detecting subtle allelic imbalances (around 1.3 fold) compared to extreme imbalances, and note the benefit of applying local background correction to the data. Analysis of data from a dye-swap control experiment allowed us to quantify dye-bias, which can be reduced considerably by careful normalization. The need to filter the data before carrying out further downstream analysis to remove non-responding probes, which show either weak, or non-specific signal for each allele, was also demonstrated. Throughout this paper, we find that a linear model analysis of the data from each SNP is a flexible modelling strategy that allows for testing of allelic imbalances in each sample when replicate hybridizations are available. CONCLUSIONS: Our analysis shows that local background correction carried out by Illumina's software, together with quantile normalization of the red and green channels within each array, provides optimal performance in terms of false positive rates. In addition, we strongly encourage intensity-based filtering to remove SNPs which only measure non-specific signal. We anticipate that a similar analysis strategy will prove useful when quantifying ASE on Illumina's higher density Infinium BeadChips.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    A framework for the informed normalization of printed microarrays

    Get PDF
    Microarray technology has become an essential part of contemporary molecular biological research. An aspect central to any microarray experiment is that of normalization, a form of data processing directed at removing technical noise while preserving biological meaning, thereby allowing for more accurate interpretations of data. The statistics underlying many normalization methods can appear overwhelming to microarray newcomers, a situation which is further compounded by a lack of accessible, non-statistical descriptions of common approaches to normalization. Normalization strategies significantly affect the analytical outcome of a microarray experiment, and consequently it is important that the statistical assumptions underlying normalization algorithms are understood and met before researchers embark upon the processing of raw microarray data. Many of these assumptions pertain only to whole-genome arrays, and are not valid for custom or directed microarrays. A thorough diagnostic evaluation of the nature and extent to which technical noise affects individual arrays is paramount to the success of any chosen normalization strategy. Here we suggest an approach to normalization based on extensive stepwise exploration and diagnostic assessment of data prior to, and after, normalization. Common data visualization and diagnostic approaches are highlighted, followed by descriptions of popular normalization methods, and the underlying assumptions they are based on, within the context of removing general technical artefacts associated with microarray data

    Methodological study of affine transformations of gene expression data with proposed robust non-parametric multi-dimensional normalization method

    Get PDF
    BACKGROUND: Low-level processing and normalization of microarray data are most important steps in microarray analysis, which have profound impact on downstream analysis. Multiple methods have been suggested to date, but it is not clear which is the best. It is therefore important to further study the different normalization methods in detail and the nature of microarray data in general. RESULTS: A methodological study of affine models for gene expression data is carried out. Focus is on two-channel comparative studies, but the findings generalize also to single- and multi-channel data. The discussion applies to spotted as well as in-situ synthesized microarray data. Existing normalization methods such as curve-fit ("lowess") normalization, parallel and perpendicular translation normalization, and quantile normalization, but also dye-swap normalization are revisited in the light of the affine model and their strengths and weaknesses are investigated in this context. As a direct result from this study, we propose a robust non-parametric multi-dimensional affine normalization method, which can be applied to any number of microarrays with any number of channels either individually or all at once. A high-quality cDNA microarray data set with spike-in controls is used to demonstrate the power of the affine model and the proposed normalization method. CONCLUSION: We find that an affine model can explain non-linear intensity-dependent systematic effects in observed log-ratios. Affine normalization removes such artifacts for non-differentially expressed genes and assures that symmetry between negative and positive log-ratios is obtained, which is fundamental when identifying differentially expressed genes. In addition, affine normalization makes the empirical distributions in different channels more equal, which is the purpose of quantile normalization, and may also explain why dye-swap normalization works or fails. All methods are made available in the aroma package, which is a platform-independent package for R

    DNA microarray experimental design and software based data normalization and analysis

    Get PDF
    [no abstract

    Low-level analysis of microarray data

    Get PDF
    This thesis consists of an extensive introduction followed by seven papers (A-F) on low-level analysis of microarray data. Focus is on calibration and normalization of observed data. The introduction gives a brief background of the microarray technology and its applications in order for anyone not familiar with the field to read the thesis. Formal definitions of calibration and normalization are given. Paper A illustrates a typical statistical analysis of microarray data with background correction, normalization, and identification of differentially expressed genes (among thousands of candidates). A small analysis on the final results for different number of replicates and different image analysis software is also given. Paper B introduces a novel way for displaying microarray data called the print-order plot, which displays data in the order the corresponding spots were printed to the array. Utilizing these, so called (microtiter-) plate effects are identified. Then, based on a simple variability measure for replicated spots across arrays, different normalization sequences are tested and evidence for the existence of plate effects are claimed. Paper C presents an object-oriented extension with transparent reference variables to the R language. It is provides the necessary foundation in order to implement the microarray analysis package described in Paper F. Paper D is on affine transformations of two-channel microarray data and their effects on the log-ratio log-intensity transform. Affine transformations, that is, the existence of channel biases, can explain commonly observed intensity-dependent effects in the log-ratios. In the light of the affine transformation, several normalization methods are revisited. At the end of the paper, a new robust affine normalization is suggested that relies on iterative reweighted principal component analysis. Paper E suggests a multiscan calibration method where each array is scanned at various sensitivity levels in order to uniquely identify the affine transformation of signals that the scanner and the image-analysis methods introduce. Observed data strongly support this method. In addition, multiscan-calibrated data has an extended dynamical range and higher signal-to-noise levels. This is real-world evidence for the existence of affine transformations of microarray data. Paper F describes the aroma package – An R Object-oriented Microarray Analysis environment – implemented in R and that provides easy access to our and others low-level analysis methods. Paper G provides an calibration method for spotted microarrays with dilution series or spike-ins. The method is based on a heteroscedastic affine stochastic model. The parameter estimates are robust against model misspecification
    corecore