3,809 research outputs found

    maigesPack: A Computational Environment for Microarray Data Analysis

    Full text link
    Microarray technology is still an important way to assess gene expression in molecular biology, mainly because it measures expression profiles for thousands of genes simultaneously, what makes this technology a good option for some studies focused on systems biology. One of its main problem is complexity of experimental procedure, presenting several sources of variability, hindering statistical modeling. So far, there is no standard protocol for generation and evaluation of microarray data. To mitigate the analysis process this paper presents an R package, named maigesPack, that helps with data organization. Besides that, it makes data analysis process more robust, reliable and reproducible. Also, maigesPack aggregates several data analysis procedures reported in literature, for instance: cluster analysis, differential expression, supervised classifiers, relevance networks and functional classification of gene groups or gene networks

    Study of meta-analysis strategies for network inference using information-theoretic approaches

    Get PDF
    © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Reverse engineering of gene regulatory networks (GRNs) from gene expression data is a classical challenge in systems biology. Thanks to high-throughput technologies, a massive amount of gene-expression data has been accumulated in the public repositories. Modelling GRNs from multiple experiments (also called integrative analysis) has; therefore, naturally become a standard procedure in modern computational biology. Indeed, such analysis is usually more robust than the traditional approaches focused on individual datasets, which typically suffer from some experimental bias and a small number of samples. To date, there are mainly two strategies for the problem of interest: the first one (”data merging”) merges all datasets together and then infers a GRN whereas the other (”networks ensemble”) infers GRNs from every dataset separately and then aggregates them using some ensemble rules (such as ranksum or weightsum). Unfortunately, a thorough comparison of these two approaches is lacking. In this paper, we evaluate the performances of various metaanalysis approaches mentioned above with a systematic set of experiments based on in silico benchmarks. Furthermore, we present a new meta-analysis approach for inferring GRNs from multiple studies. Our proposed approach, adapted to methods based on pairwise measures such as correlation or mutual information, consists of two steps: aggregating matrices of the pairwise measures from every dataset followed by extracting the network from the meta-matrix.Peer ReviewedPostprint (author's final draft

    Normalization and Gene p-Value Estimation: Issues in Microarray Data Processing

    Get PDF
    Introduction: Numerous methods exist for basic processing, e.g. normalization, of microarray gene expression data. These methods have an important effect on the final analysis outcome. Therefore, it is crucial to select methods appropriate for a given dataset in order to assure the validity and reliability of expression data analysis. Furthermore, biological interpretation requires expression values for genes, which are often represented by several spots or probe sets on a microarray. How to best integrate spot/probe set values into gene values has so far been a somewhat neglecte

    The Reproducibility of Lists of Differentially Expressed Genes in Microarray Studies

    Get PDF
    Reproducibility is a fundamental requirement in scientific experiments and clinical contexts. Recent publications raise concerns about the reliability of microarray technology because of the apparent lack of agreement between lists of differentially expressed genes (DEGs). In this study we demonstrate that (1) such discordance may stem from ranking and selecting DEGs solely by statistical significance (P) derived from widely used simple t-tests; (2) when fold change (FC) is used as the ranking criterion, the lists become much more reproducible, especially when fewer genes are selected; and (3) the instability of short DEG lists based on P cutoffs is an expected mathematical consequence of the high variability of the t-values. We recommend the use of FC ranking plus a non-stringent P cutoff as a baseline practice in order to generate more reproducible DEG lists. The FC criterion enhances reproducibility while the P criterion balances sensitivity and specificity

    Data mining cDNA microarray experiment with a GEE approach

    Get PDF
    The use of microarray technology provides access to the simultaneous expression of thousands of genes and is revolutionizing the scientic community of functional genomics. This thesis investigates a cDNA microarray experiment with the goal of discovering dierentially expressed genes across several factors. The analysis rst "normalizes" the data through the VSN package which is a robust calibration and variance stabilization software that removes systematic bias which could impair the analysis. A generalized estimating equation (GEE) approach is used to model the data and investigate the null hypothesis of no dierence in expression levels. To accommodate the numerous hypothesis being tested, we used the q-value method to control the false discovery rate of the analysis. The analytical procedures are performed by using the statistical software packages SAS r and R

    Graphical technique for identifying a monotonic variance stabilizing transformation for absolute gene intensity signals

    Get PDF
    BACKGROUND: The usefulness of log(2 )transformation for cDNA microarray data has led to its widespread application to Affymetrix data. For Affymetrix data, where absolute intensities are indicative of number of transcripts, there is a systematic relationship between variance and magnitude of measurements. Application of the log(2 )transformation expands the scale of genes with low intensities while compressing the scale of genes with higher intensities thus reversing the mean by variance relationship. The usefulness of these transformations needs to be examined. RESULTS: Using an Affymetrix GeneChip(® )dataset, problems associated with applying the log(2 )transformation to absolute intensity data are demonstrated. Use of the spread-versus-level plot to identify an appropriate variance stabilizing transformation is presented. For the data presented, the spread-versus-level plot identified a power transformation that successfully stabilized the variance of probe set summaries. CONCLUSION: The spread-versus-level plot is helpful to identify transformations for variance stabilization. This is robust against outliers and avoids assumption of models and maximizations

    An introduction to low-level analysis methods of DNA microarray data

    Get PDF
    This article gives an overview over the methods used in the low--level analysis of gene expression data generated using DNA microarrays. This type of experiment allows to determine relative levels of nucleic acid abundance in a set of tissues or cell populations for thousands of transcripts or loci simultaneously. Careful statistical design and analysis are essential to improve the efficiency and reliability of microarray experiments throughout the data acquisition and analysis process. This includes the design of probes, the experimental design, the image analysis of microarray scanned images, the normalization of fluorescence intensities, the assessment of the quality of microarray data and incorporation of quality information in subsequent analyses, the combination of information across arrays and across sets of experiments, the discovery and recognition of patterns in expression at the single gene and multiple gene levels, and the assessment of significance of these findings, considering the fact that there is a lot of noise and thus random features in the data. For all of these components, access to a flexible and efficient statistical computing environment is an essential aspect

    Does Logarithm Transformation of Microarray Data Affect Ranking Order of Differentially Expressed Genes?

    Full text link
    A common practice in microarray analysis is to transform the microarray raw data (light intensity) by a logarithmic transformation, and the justification for this transformation is to make the distribution more symmetric and Gaussian-like. Since this transformation is not universally practiced in all microarray analysis, we examined whether the discrepancy of this treatment of raw data affect the "high level" analysis result. In particular, whether the differentially expressed genes as obtained by tt-test, regularized t-test, or logistic regression have altered rank orders due to presence or absence of the transformation. We show that as much as 20%--40% of significant genes are "discordant" (significant only in one form of the data and not in both), depending on the test being used and the threshold value for claiming significance. The t-test is more likely to be affected by logarithmic transformation than logistic regression, and regularized tt-test more affected than t-test. On the other hand, the very top ranking genes (e.g. up to top 20--50 genes, depending on the test) are not affected by the logarithmic transformation.Comment: submitted to IEEE/EMBS Conference'0

    Using Generalized Procrustes Analysis (GPA) for normalization of cDNA microarray data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Normalization is essential in dual-labelled microarray data analysis to remove non-biological variations and systematic biases. Many normalization methods have been used to remove such biases within slides (Global, Lowess) and across slides (Scale, Quantile and VSN). However, all these popular approaches have critical assumptions about data distribution, which is often not valid in practice.</p> <p>Results</p> <p>In this study, we propose a novel assumption-free normalization method based on the Generalized Procrustes Analysis (GPA) algorithm. Using experimental and simulated normal microarray data and boutique array data, we systemically evaluate the ability of the GPA method in normalization compared with six other popular normalization methods including Global, Lowess, Scale, Quantile, VSN, and one boutique array-specific housekeeping gene method. The assessment of these methods is based on three different empirical criteria: across-slide variability, the Kolmogorov-Smirnov (K-S) statistic and the mean square error (MSE). Compared with other methods, the GPA method performs effectively and consistently better in reducing across-slide variability and removing systematic bias.</p> <p>Conclusion</p> <p>The GPA method is an effective normalization approach for microarray data analysis. In particular, it is free from the statistical and biological assumptions inherent in other normalization methods that are often difficult to validate. Therefore, the GPA method has a major advantage in that it can be applied to diverse types of array sets, especially to the boutique array where the majority of genes may be differentially expressed.</p

    Three-parameter lognormal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment

    Get PDF
    BACKGROUND: To cancel experimental variations, microarray data must be normalized prior to analysis. Where an appropriate model for statistical data distribution is available, a parametric method can normalize a group of data sets that have common distributions. Although such models have been proposed for microarray data, they have not always fit the distribution of real data and thus have been inappropriate for normalization. Consequently, microarray data in most cases have been normalized with non-parametric methods that adjust data in a pair-wise manner. However, data analysis and the integration of resultant knowledge among experiments have been difficult, since such normalization concepts lack a universal standard. RESULTS: A three-parameter lognormal distribution model was tested on over 300 sets of microarray data. The model treats the hybridization background, which is difficult to identify from images of hybridization, as one of the parameters. A rigorous coincidence of the model to data sets was found, proving the model's appropriateness for microarray data. In fact, a closer fitting to Northern analysis was obtained. The model showed inconsistency only at very strong or weak data intensities. Measurement of z-scores as well as calculated ratios was reproducible only among data in the model-consistent intensity range; also, the ratios were independent of signal intensity at the corresponding range. CONCLUSION: The model could provide a universal standard for data, simplifying data analysis and knowledge integration. It was deduced that the ranges of inconsistency were caused by experimental errors or additive noise in the data; therefore, excluding the data corresponding to those marginal ranges will prevent misleading analytical conclusions
    corecore