49 research outputs found

    Evaluation of normalization methods for microarray data

    Get PDF
    BACKGROUND: Microarray technology allows the monitoring of expression levels for thousands of genes simultaneously. This novel technique helps us to understand gene regulation as well as gene by gene interactions more systematically. In the microarray experiment, however, many undesirable systematic variations are observed. Even in replicated experiment, some variations are commonly observed. Normalization is the process of removing some sources of variation which affect the measured gene expression levels. Although a number of normalization methods have been proposed, it has been difficult to decide which methods perform best. Normalization plays an important role in the earlier stage of microarray data analysis. The subsequent analysis results are highly dependent on normalization. RESULTS: In this paper, we use the variability among the replicated slides to compare performance of normalization methods. We also compare normalization methods with regard to bias and mean square error using simulated data. CONCLUSIONS: Our results show that intensity-dependent normalization often performs better than global normalization methods, and that linear and nonlinear normalization methods perform similarly. These conclusions are based on analysis of 36 cDNA microarrays of 3,840 genes obtained in an experiment to search for changes in gene expression profiles during neuronal differentiation of cortical stem cells. Simulation studies confirm our findings

    Gene-Gene Interaction Analysis for the Accelerated Failure Time Model Using a Unified Model-Based Multifactor Dimensionality Reduction Method

    Get PDF
    Although a large number of genetic variants have been identified to be associated with common diseases through genome-wide association studies, there still exits limitations in explaining the missing heritability. One approach to solving this missing heritability problem is to investigate gene-gene interactions, rather than a single-locus approach. For gene-gene interaction analysis, the multifactor dimensionality reduction (MDR) method has been widely applied, since the constructive induction algorithm of MDR efficiently reduces high-order dimensions into one dimension by classifying multi-level genotypes into high- and low-risk groups. The MDR method has been extended to various phenotypes and has been improved to provide a significance test for gene-gene interactions. In this paper, we propose a simple method, called accelerated failure time (AFT) UM-MDR, in which the idea of a unified model-based MDR is extended to the survival phenotype by incorporating AFT-MDR into the classification step. The proposed AFT UM-MDR method is compared with AFT-MDR through simulation studies, and a short discussion is given

    A comparative study on gene-set analysis methods for assessing differential expression associated with the survival phenotype

    Get PDF
    Abstract Background Many gene-set analysis methods have been previously proposed and compared through simulation studies and analysis of real datasets for binary phenotypes. We focused on the survival phenotype and compared the performances of Gene Set Enrichment Analysis (GSEA), Global Test (GT), Wald-type Test (WT) and Global Boost Test (GBST) methods in a simulation study and on two ovarian cancer data sets. We considered two versions of GSEA by allowing different weights: GSEA1 uses equal weights, yielding results similar to the Kolmogorov-Smirnov test; while GSEA2's weights are based on the correlation between genes and the phenotype. Results We compared GSEA1, GSEA2, GT, WT and GBST in a simulation study with various settings for the correlation structure of the genes and the association parameter between the survival outcome and the genes. Simulation results indicated that GT, WT and GBST consistently have higher power than GSEA1 and GSEA2 across all scenarios. However, the power of the five tests depends on the combination of correlation structure and association parameter. For the ovarian cancer data set, using the FDR threshold of q Conclusion Simulation studies and a real data example indicate that GT, WT and GBST tend to have high power, whereas GSEA1 and GSEA2 have lower power. We also found that the power of the five tests is much higher when genes are correlated than when genes are independent, when survival is positively associated with genes. It seems that there is a synergistic effect in detecting significant gene sets when significant genes have within-class correlation and the association between survival and genes is positive or negative (i.e., one-direction correlation).</p

    Analysis of significant protein abundance from multiple reaction-monitoring data

    Get PDF
    Background Discovering reliable protein biomarkers is one of the most important issues in biomedical research. The ELISA is a traditional technique for accurate quantitation of well-known proteins. Recently, the multiple reaction-monitoring (MRM) mass spectrometry has been proposed for quantifying newly discovered protein and has become a popular alternative to ELISA. For the MRM data analysis, linear mixed modeling (LMM) has been used to analyze MRM data. MSstats is one of the most widely used tools for MRM data analysis that is based on the LMMs. However, LMMs often provide various significance results, depending on model specification. Sometimes it would be difficult to specify a correct LMM method for the analysis of MRM data. Here, we propose a new logistic regression-based method for Significance Analysis of Multiple Reaction Monitoring (LR-SAM). Results Through simulation studies, we demonstrate that LMM methods may not preserve type I error, thus yielding high false- positive errors, depending on how random effects are specified. Our simulation study also shows that the LR-SAM approach performs similarly well as LMM approaches, in most cases. However, LR-SAM performs better than the LMMs, particularly when the effects sizes of peptides from the same protein are heterogeneous. Our proposed method was applied to MRM data for identification of proteins associated with clinical responses of treatment of 115 hepatocellular carcinoma (HCC) patients with the tyrosine kinase inhibitor sorafenib. Of 124 candidate proteins, LMM approaches provided 6 results varying in significance, while LR-SAM, by contrast, yielded 18 significant results that were quite reproducibly consistent. Conclusion As exemplified by an application to HCC data set, LR-SAM more effectively identified proteins associated with clinical responses of treatment than LMM did.This research was supported by a grant of the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HI16C2037, HI15C2165). Publication of this article was sponsored by HI16C2037 grant

    Testing for and adjusting for dependent censoring in survival analysis.

    Full text link
    In survival analysis, it is very common to have censored data in which the occurrence times of the event of interest are not completely observed for some subjects under study. Censoring can occur for different reasons, but it is generally assumed that the censoring time is independent of the survival time. In some cases, however, this assumption may not hold. However, many statistical methods for analyzing survival data have been developed on the basis of the assumption of independent censoring. The assumption of independent censoring cannot be tested when only right censored data are available. In this paper, we propose a method for testing the assumption of independent censoring, which involves further following-up of a randomly selected subset of lost-to-follow-up censored subjects until the prearranged time. This further following-up provides additional information for testing the independence assumption. On the basis of the method suggested here, a log rank test statistic for two samples is derived by way of the multivariate counting process and multiplicative intensity model proposed by Gill (1980) and Aalen (1978). The asymptotic properties of the log rank statistic are derived using the result of Andersen et al. (1982). In addition, it is shown that the log rank statistic is equivalent to the score statistic in the time-dependent Cox model. Furthermore, an adjusted estimator of the survivor function is given with an asymptotic variance estimator for the dependent censoring model. Finally, results are reported of a simulation study for calculating the power under proportional hazards alternatives. In summary, this research provides a method for testing the assumption of independent censoring, and provides a method for adjusting the survivor function estimate if the independence assumption is violated.Ph.D.BiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/105230/1/9116233.pdfDescription of 9116233.pdf : Restricted to UM users only
    corecore