7 research outputs found

    A proposed metric for assessing the measurement quality of individual microarrays

    Get PDF
    BACKGROUND: High-density microarray technology is increasingly applied to study gene expression levels on a large scale. Microarray experiments rely on several critical steps that may introduce error and uncertainty in analyses. These steps include mRNA sample extraction, amplification and labeling, hybridization, and scanning. In some cases this may be manifested as systematic spatial variation on the surface of microarray in which expression measurements within an individual array may vary as a function of geographic position on the array surface. RESULTS: We hypothesized that an index of the degree of spatiality of gene expression measurements associated with their physical geographic locations on an array could indicate the summary of the physical reliability of the microarray. We introduced a novel way to formulate this index using a statistical analysis tool. Our approach regressed gene expression intensity measurements on a polynomial response surface of the microarray's Cartesian coordinates. We demonstrated this method using a fixed model and presented results from real and simulated datasets. CONCLUSION: We demonstrated the potential of such a quantitative metric for assessing the reliability of individual arrays. Moreover, we showed that this procedure can be incorporated into laboratory practice as a means to set quality control specifications and as a tool to determine whether an array has sufficient quality to be retained in terms of spatial correlation of gene expression measurements

    Evaluating Statistical Methods Using Plasmode Data Sets in the Age of Massive Public Databases: An Illustration Using False Discovery Rates

    Get PDF
    Plasmode is a term coined several years ago to describe data sets that are derived from real data but for which some truth is known. Omic techniques, most especially microarray and genomewide association studies, have catalyzed a new zeitgeist of data sharing that is making data and data sets publicly available on an unprecedented scale. Coupling such data resources with a science of plasmode use would allow statistical methodologists to vet proposed techniques empirically (as opposed to only theoretically) and with data that are by definition realistic and representative. We illustrate the technique of empirical statistics by consideration of a common task when analyzing high dimensional data: the simultaneous testing of hundreds or thousands of hypotheses to determine which, if any, show statistical significance warranting follow-on research. The now-common practice of multiple testing in high dimensional experiment (HDE) settings has generated new methods for detecting statistically significant results. Although such methods have heretofore been subject to comparative performance analysis using simulated data, simulating data that realistically reflect data from an actual HDE remains a challenge. We describe a simulation procedure using actual data from an HDE where some truth regarding parameters of interest is known. We use the procedure to compare estimates for the proportion of true null hypotheses, the false discovery rate (FDR), and a local version of FDR obtained from 15 different statistical methods

    The impact of measurement errors in the identification of regulatory networks

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>There are several studies in the literature depicting measurement error in gene expression data and also, several others about regulatory network models. However, only a little fraction describes a combination of measurement error in mathematical regulatory networks and shows how to identify these networks under different rates of noise.</p> <p>Results</p> <p>This article investigates the effects of measurement error on the estimation of the parameters in regulatory networks. Simulation studies indicate that, in both time series (dependent) and non-time series (independent) data, the measurement error strongly affects the estimated parameters of the regulatory network models, biasing them as predicted by the theory. Moreover, when testing the parameters of the regulatory network models, p-values computed by ignoring the measurement error are not reliable, since the rate of false positives are not controlled under the null hypothesis. In order to overcome these problems, we present an improved version of the Ordinary Least Square estimator in independent (regression models) and dependent (autoregressive models) data when the variables are subject to noises. Moreover, measurement error estimation procedures for microarrays are also described. Simulation results also show that both corrected methods perform better than the standard ones (i.e., ignoring measurement error). The proposed methodologies are illustrated using microarray data from lung cancer patients and mouse liver time series data.</p> <p>Conclusions</p> <p>Measurement error dangerously affects the identification of regulatory network models, thus, they must be reduced or taken into account in order to avoid erroneous conclusions. This could be one of the reasons for high biological false positive rates identified in actual regulatory network models.</p

    A proposed metric for assessing the measurement quality of individual microarrays-2

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "A proposed metric for assessing the measurement quality of individual microarrays"</p><p>BMC Bioinformatics 2006;7():35-35.</p><p>Published online 23 Jan 2006</p><p>PMCID:PMC1373606.</p><p>Copyright © 2006 Kim et al; licensee BioMed Central Ltd.</p>36 blocks per chip (i.e., N= 36). The X-axis indicates a chip identification number. b. A boxplot of 36 blocks: Each box shows the distribution of blockwise GEODEX for 25 chips per block. The X-axis indicates a block identification number within the chip. Crosses represent outliers

    A proposed metric for assessing the measurement quality of individual microarrays-0

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "A proposed metric for assessing the measurement quality of individual microarrays"</p><p>BMC Bioinformatics 2006;7():35-35.</p><p>Published online 23 Jan 2006</p><p>PMCID:PMC1373606.</p><p>Copyright © 2006 Kim et al; licensee BioMed Central Ltd.</p>and model (2), respectively. Black rectangles and white diamonds represent GEODEX computed by model (2), using measurements in the log scale on base 2 and trimmed measurements after removing four blocks on the corners, respectively. b. For comparisons between different measures of gene expression, black circles, white circles, and black upper triangles represent GEODEX of difference (PM-MM) measurements of pairs of the perfect match (PM) and mismatch (MM) probes, of PM-only measurements, and of MM-only measurements, respectively. GEODEX were computed by model (2) (N= 36). X-axis indicates a chip identification number and Y-axis indicates the degree of GEODEX

    A proposed metric for assessing the measurement quality of individual microarrays-1

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "A proposed metric for assessing the measurement quality of individual microarrays"</p><p>BMC Bioinformatics 2006;7():35-35.</p><p>Published online 23 Jan 2006</p><p>PMCID:PMC1373606.</p><p>Copyright © 2006 Kim et al; licensee BioMed Central Ltd.</p>t corner of chip 12 is shown enlarged

    A proposed metric for assessing the measurement quality of individual microarrays-3

    No full text
    <p><b>Copyright information:</b></p><p>Taken from "A proposed metric for assessing the measurement quality of individual microarrays"</p><p>BMC Bioinformatics 2006;7():35-35.</p><p>Published online 23 Jan 2006</p><p>PMCID:PMC1373606.</p><p>Copyright © 2006 Kim et al; licensee BioMed Central Ltd.</p>riation (CV) of measurements for experiment A (a simulation study). Scatter plot b (d) presents the relationship between pairwise GEODEX computed by equation (5) and the degree of inter-chip discrepancy computed by equation (4) for every pair of technical replicates for biological samples in experiment A (for every pair of the sample array and simulated replicates for a simulation study). GEODEX was computed by model (2)
    corecore