103 research outputs found

    A statistical method for the conservative adjustment of false discovery rate (q-value)

    Get PDF
    Background q-value is a widely used statistical method for estimating false discovery rate (FDR), which is a conventional significance measure in the analysis of genome-wide expression data. q-value is a random variable and it may underestimate FDR in practice. An underestimated FDR can lead to unexpected false discoveries in the follow-up validation experiments. This issue has not been well addressed in literature, especially in the situation when the permutation procedure is necessary for p-value calculation. Results We proposed a statistical method for the conservative adjustment of q-value. In practice, it is usually necessary to calculate p-value by a permutation procedure. This was also considered in our adjustment method. We used simulation data as well as experimental microarray or sequencing data to illustrate the usefulness of our method. Conclusions The conservativeness of our approach has been mathematically confirmed in this study. We have demonstrated the importance of conservative adjustment of q-value, particularly in the situation that the proportion of differentially expressed genes is small or the overall differential expression signal is weak

    A Statistical Method for the Conservative Adjustment of False Discovery Rate (q-value).

    Get PDF
    BACKGROUND: q-value is a widely used statistical method for estimating false discovery rate (FDR), which is a conventional significance measure in the analysis of genome-wide expression data. q-value is a random variable and it may underestimate FDR in practice. An underestimated FDR can lead to unexpected false discoveries in the follow-up validation experiments. This issue has not been well addressed in literature, especially in the situation when the permutation procedure is necessary for p-value calculation. RESULTS: We proposed a statistical method for the conservative adjustment of q-value. In practice, it is usually necessary to calculate p-value by a permutation procedure. This was also considered in our adjustment method. We used simulation data as well as experimental microarray or sequencing data to illustrate the usefulness of our method. CONCLUSIONS: The conservativeness of our approach has been mathematically confirmed in this study. We have demonstrated the importance of conservative adjustment of q-value, particularly in the situation that the proportion of differentially expressed genes is small or the overall differential expression signal is weak

    Change-point analysis of paired allele-specific copy number variation data

    Get PDF
    The recent genome-wide allele-specific copy number variation data enable us to explore two types of genomic information including chromosomal genotype variations as well as DNA copy number variations. For a cancer study, it is common to collect data for paired normal and tumor samples. Then, two types of paired data can be obtained to study a disease subject. However, there is a lack of methods for a simultaneous analysis of these four sequences of data. In this study, we propose a statistical framework based on the change-point analysis approach. The validity and usefulness of our proposed statistical framework are demonstrated through the simulation studies and applications based on an experimental data set

    Detecting discordance enrichment among a series of two-sample genome-wide expression data sets

    Get PDF
    Background With the current microarray and RNA-seq technologies, two-sample genome-wide expression data have been widely collected in biological and medical studies. The related differential expression analysis and gene set enrichment analysis have been frequently conducted. Integrative analysis can be conducted when multiple data sets are available. In practice, discordant molecular behaviors among a series of data sets can be of biological and clinical interest. Methods In this study, a statistical method is proposed for detecting discordance gene set enrichment. Our method is based on a two-level multivariate normal mixture model. It is statistically efficient with linearly increased parameter space when the number of data sets is increased. The model-based probability of discordance enrichment can be calculated for gene set detection. Results We apply our method to a microarray expression data set collected from forty-five matched tumor/non-tumor pairs of tissues for studying pancreatic cancer. We divided the data set into a series of non-overlapping subsets according to the tumor/non-tumor paired expression ratio of gene PNLIP (pancreatic lipase, recently shown it association with pancreatic cancer). The log-ratio ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). Our purpose is to understand whether any gene sets are enriched in discordant behaviors among these subsets (when the log-ratio is increased from negative to positive). We focus on KEGG pathways. The detected pathways will be useful for our further understanding of the role of gene PNLIP in pancreatic cancer research. Among the top list of detected pathways, the neuroactive ligand receptor interaction and olfactory transduction pathways are the most significant two. Then, we consider gene TP53 that is well-known for its role as tumor suppressor in cancer research. The log-ratio also ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). We divided the microarray data set again according to the expression ratio of gene TP53. After the discordance enrichment analysis, we observed overall similar results and the above two pathways are still the most significant detections. More interestingly, only these two pathways have been identified for their association with pancreatic cancer in a pathway analysis of genome-wide association study (GWAS) data. Conclusions This study illustrates that some disease-related pathways can be enriched in discordant molecular behaviors when an important disease-related gene changes its expression. Our proposed statistical method is useful in the detection of these pathways. Furthermore, our method can also be applied to genome-wide expression data collected by the recent RNA-seq technology

    Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets

    Get PDF
    Background Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. Methods We categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets. Results We used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method. Conclusions This study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets

    miR-671-5p inhibits epithelial-to-mesenchymal transition by downregulating FOXM1 expression in breast cancer.

    Get PDF
    MicroRNA (miRNA) dysfunction is associated with a variety of human diseases, including cancer. Our previous study showed that miR-671-5p was deregulated throughout breast cancer progression. Here, we report for the first time that miR-671-5p is a tumor-suppressor miRNA in breast tumorigenesis. We found that expression of miR-671-5p was decreased significantly in invasive ductal carcinoma (IDC) compared to normal in microdissected formalin-fixed, paraffin-embedded (FFPE) tissues. Forkhead Box M1 (FOXM1), an oncogenic transcription factor, was predicted as one of the direct targets of miR-671-5p, which was subsequently confirmed by luciferase assays. Forced expression of miR-671-5p in breast cancer cell lines downregulated FOXM1 expression, and attenuated the proliferation and invasion in breast cancer cell lines. Notably, overexpression of miR-671-5p resulted in a shift from epithelial-to-mesenchymal transition (EMT) to mesenchymal-to-epithelial transition (MET) phenotypes in MDA-MB-231 breast cancer cells and induced S-phase arrest. Moreover, miR-671-5p sensitized breast cancer cells to cisplatin, 5-fluorouracil (5-FU) and epirubicin exposure. Host cell reactivation (HCR) assays showed that miR-671-5p reduces DNA repair capability in post-drug exposed breast cancer cells. cDNA microarray data revealed that differentially expressed genes when miR-671-5p was transfected are associated with cell proliferation, invasion, cell cycle, and EMT. These data indicate that miR-671-5p functions as a tumor suppressor miRNA in breast cancer by directly targeting FOXM1. Hence, miR-671-5p may serve as a novel therapeutic target for breast cancer management

    Length of Latency with Preterm Premature Rupture of Membranes before 32 Weeks' Gestation

    Get PDF
    To describe latency for patients with preterm premature membrane rupture (PPROM) between 24 0/7 and 31 6/7 weeks’ gestation

    Genomic profiling reveals the potential role of TCL1A and MDR1 Deficiency in chemotherapy-induced cardiotoxicity

    Get PDF
    Background: Anthracyclines, such as doxorubicin (Adriamycin), are highly effective chemotherapeutic agents, but are well known to cause myocardial dysfunction and life-threatening congestive heart failure (CHF) in some patients. Methods: To generate new hypotheses about its etiology, genome-wide transcript analysis was performed on whole blood RNA from women that received doxorubicin-based chemotherapy and either did, or did not develop CHF, as defined by ejection fractions (EF)≤40%. Women with non-ischemic cardiomyopathy unrelated to chemotherapy were compared to breast cancer patients prior to chemo with normal EF to identify heart failure-related transcripts in women not receiving chemotherapy. Byproducts of oxidative stress in plasma were measured in a subset of patients. Results: The results indicate that patients treated with doxorubicin showed sustained elevations in oxidative byproducts in plasma. At the RNA level, women who exhibited low EFs after chemotherapy had 260 transcripts that differed \u3e2-fold (pIn vitro studies confirmed that inhibition of MDR1 by verapamil in rat H9C2 cardiomyocytes increased their susceptibility to doxorubicin-induced toxicity. Conclusions: It is proposed that chemo-induced cardiomyopathy may be due to a reduction in TCL1A levels, thereby causing increased apoptotic sensitivity, and leading to reduced cardiac MDR1 levels, causing higher cardiac levels of doxorubicin and intracellular free radicals. If so, screening for TCL1A and MDR1 SNPs or expression level in blood, might identify women at greatest risk of chemo-induced heart failure

    A statistical framework for integrating two microarray data sets in differential expression analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Different microarray data sets can be collected for studying the same or similar diseases. We expect to achieve a more efficient analysis of differential expression if an efficient statistical method can be developed for integrating different microarray data sets. Although many statistical methods have been proposed for data integration, the genome-wide concordance of different data sets has not been well considered in the analysis.</p> <p>Results</p> <p>Before considering data integration, it is necessary to evaluate the genome-wide concordance so that misleading results can be avoided. Based on the test results, different subsequent actions are suggested. The evaluation of genome-wide concordance and the data integration can be achieved based on the normal distribution based mixture models.</p> <p>Conclusion</p> <p>The results from our simulation study suggest that misleading results can be generated if the genome-wide concordance issue is not appropriately considered. Our method provides a rigorous parametric solution. The results also show that our method is robust to certain model misspecification and is practically useful for the integrative analysis of differential expression.</p

    Prediction of Spontaneous Preterm Birth Among Nulliparous Women With a Short Cervix

    Get PDF
    To evaluate whether demographic and sonographic factors associated with spontaneous preterm birth (sPTB) among nulliparous women with a cervical length (CL) < 30 mm could be combined into an accurate prediction model for sPTB
    • …
    corecore