121 research outputs found

    Detecting discordance enrichment among a series of two-sample genome-wide expression data sets

    Get PDF
    Background With the current microarray and RNA-seq technologies, two-sample genome-wide expression data have been widely collected in biological and medical studies. The related differential expression analysis and gene set enrichment analysis have been frequently conducted. Integrative analysis can be conducted when multiple data sets are available. In practice, discordant molecular behaviors among a series of data sets can be of biological and clinical interest. Methods In this study, a statistical method is proposed for detecting discordance gene set enrichment. Our method is based on a two-level multivariate normal mixture model. It is statistically efficient with linearly increased parameter space when the number of data sets is increased. The model-based probability of discordance enrichment can be calculated for gene set detection. Results We apply our method to a microarray expression data set collected from forty-five matched tumor/non-tumor pairs of tissues for studying pancreatic cancer. We divided the data set into a series of non-overlapping subsets according to the tumor/non-tumor paired expression ratio of gene PNLIP (pancreatic lipase, recently shown it association with pancreatic cancer). The log-ratio ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). Our purpose is to understand whether any gene sets are enriched in discordant behaviors among these subsets (when the log-ratio is increased from negative to positive). We focus on KEGG pathways. The detected pathways will be useful for our further understanding of the role of gene PNLIP in pancreatic cancer research. Among the top list of detected pathways, the neuroactive ligand receptor interaction and olfactory transduction pathways are the most significant two. Then, we consider gene TP53 that is well-known for its role as tumor suppressor in cancer research. The log-ratio also ranges from a negative value (e.g. more expressed in non-tumor tissue) to a positive value (e.g. more expressed in tumor tissue). We divided the microarray data set again according to the expression ratio of gene TP53. After the discordance enrichment analysis, we observed overall similar results and the above two pathways are still the most significant detections. More interestingly, only these two pathways have been identified for their association with pancreatic cancer in a pathway analysis of genome-wide association study (GWAS) data. Conclusions This study illustrates that some disease-related pathways can be enriched in discordant molecular behaviors when an important disease-related gene changes its expression. Our proposed statistical method is useful in the detection of these pathways. Furthermore, our method can also be applied to genome-wide expression data collected by the recent RNA-seq technology

    Concordant integrative gene set enrichment analysis of multiple large-scale two-sample expression data sets

    Get PDF
    Background Gene set enrichment analysis (GSEA) is an important approach to the analysis of coordinate expression changes at a pathway level. Although many statistical and computational methods have been proposed for GSEA, the issue of a concordant integrative GSEA of multiple expression data sets has not been well addressed. Among different related data sets collected for the same or similar study purposes, it is important to identify pathways or gene sets with concordant enrichment. Methods We categorize the underlying true states of differential expression into three representative categories: no change, positive change and negative change. Due to data noise, what we observe from experiments may not indicate the underlying truth. Although these categories are not observed in practice, they can be considered in a mixture model framework. Then, we define the mathematical concept of concordant gene set enrichment and calculate its related probability based on a three-component multivariate normal mixture model. The related false discovery rate can be calculated and used to rank different gene sets. Results We used three published lung cancer microarray gene expression data sets to illustrate our proposed method. One analysis based on the first two data sets was conducted to compare our result with a previous published result based on a GSEA conducted separately for each individual data set. This comparison illustrates the advantage of our proposed concordant integrative gene set enrichment analysis. Then, with a relatively new and larger pathway collection, we used our method to conduct an integrative analysis of the first two data sets and also all three data sets. Both results showed that many gene sets could be identified with low false discovery rates. A consistency between both results was also observed. A further exploration based on the KEGG cancer pathway collection showed that a majority of these pathways could be identified by our proposed method. Conclusions This study illustrates that we can improve detection power and discovery consistency through a concordant integrative analysis of multiple large-scale two-sample gene expression data sets

    VlincRNAs controlled by retroviral elements are a hallmark of pluripotency and cancer

    Get PDF
    Background The function of the non-coding portion of the human genome remains one of the most important questions of our time. Its vast complexity is exemplified by the recent identification of an unusual and notable component of the transcriptome - very long intergenic non-coding RNAs, termed vlincRNAs. Results Here we identify 2,147 vlincRNAs covering 10 percent of our genome. We show they are present not only in cancerous cells, but also in primary cells and normal human tissues, and are controlled by canonical promoters. Furthermore, vlincRNA promoters frequently originate from within endogenous retroviral sequences. Strikingly, the number of vlincRNAs expressed from endogenous retroviral promoters strongly correlates with pluripotency or the degree of malignant transformation. These results suggest a previously unknown connection between the pluripotent state and cancer via retroviral repeat-driven expression of vlincRNAs. Finally, we show that vlincRNAs can be syntenically conserved in humans and mouse and their depletion using RNAi can cause apoptosis in cancerous cells. Conclusions These intriguing observations suggest that vlincRNAs could create a framework that combines many existing short ESTs and lincRNAs into a landscape of very long transcripts functioning in the regulation of gene expression in the nucleus. Certain types of vlincRNAs participate at specific stages of normal development and, based on analysis of a limited set of cancerous and primary cell lines, they appear to be co-opted by cancer-associated transcriptional programs. This provides additional understanding of transcriptome regulation during the malignant state, and could lead to additional targets and options for its reversal

    Intronic RNAs constitute the major fraction of the non-coding RNA in mammalian cells

    Get PDF
    Background The function of RNA from the non-coding (the so called “dark matter”) regions of the genome has been a subject of considerable recent debate. Perhaps the most controversy is regarding the function of RNAs found in introns of annotated transcripts, where most of the reads that map outside of exons are usually found. However, it has been reported that the levels of RNA in introns are minor relative to those of the corresponding exons, and that changes in the levels of intronic RNAs correlate tightly with that of adjacent exons. This would suggest that RNAs produced from the vast expanse of intronic space are just pieces of pre-mRNAs or excised introns en route to degradation. Results We present data that challenges the notion that intronic RNAs are mere by-standers in the cell. By performing a highly quantitative RNAseq analysis of transcriptome changes during an inflammation time course, we show that intronic RNAs have a number of features that would be expected from functional, standalone RNA species. We show that there are thousands of introns in the mouse genome that generate RNAs whose overall abundance, which changes throughout the inflammation timecourse, and other properties suggest that they function in yet unknown ways. Conclusions So far, the focus of non-coding RNA discovery has shied away from intronic regions as those were believed to simply encode parts of pre-mRNAs. Results presented here suggest a very different situation – the sequences encoded in the introns appear to harbor a yet unexplored reservoir of novel, functional RNAs. As such, they should not be ignored in surveys of functional transcripts or other genomic studies

    Benchmark Evaluation of True Single Molecular Sequencing to Determine Cystic Fibrosis Airway Microbiome Diversity

    Get PDF
    Cystic fibrosis (CF) is an autosomal recessive disease associated with recurrent lung infections that can lead to morbidity and mortality. The impact of antibiotics for treatment of acute pulmonary exacerbations on the CF airway microbiome remains unclear with prior studies giving conflicting results and being limited by their use of 16S ribosomal RNA sequencing. Our primary objective was to validate the use of true single molecular sequencing (tSMS) and PathoScope in the analysis of the CF airway microbiome. Three control samples were created with differing amounts of Burkholderia cepacia, Pseudomonas aeruginosa, and Prevotella melaninogenica, three common bacteria found in cystic fibrosis lungs. Paired sputa were also obtained from three study participants with CF before and \u3e6 days after initiation of antibiotics. Antibiotic resistant B. cepacia and P. aeruginosa were identified in concurrently obtained respiratory cultures. Direct sequencing was performed using tSMS, and filtered reads were aligned to reference genomes from NCBI using PathoScope and Kraken and unique clade-specific marker genes using MetaPhlAn. A total of 180-518K of 6-12 million filtered reads were aligned for each sample. Detection of known pathogens in control samples was most successful using PathoScope. In the CF sputa, alpha diversity measures varied based on the alignment method used, but similar trends were found between pre- and post-antibiotic samples. PathoScope outperformed Kraken and MetaPhlAn in our validation study of artificial bacterial community controls and also has advantages over Kraken and MetaPhlAn of being able to determine bacterial strains and the presence of fungal organisms. PathoScope can be confidently used when evaluating metagenomic data to determine CF airway microbiome diversity

    Predictors of High On-Aspirin Platelet Reactivity in Elderly Patients with Coronary Artery Disease.

    Get PDF
    Objectives Previous studies have illustrated the link between high on-aspirin platelet reactivity (HAPR) with increasing thrombotic risks. The aim of our study was to investigate relative risk factors of HAPR in elderly patients with coronary artery disease. Methods Elderly, hospitalized coronary artery disease patients on regular aspirin treatment were enrolled from January 2014 to September 2016. Medical records of each patient were collected, including demographic information, cardiovascular risk factors, concomitant drugs and routine biological parameters. Arachidonic acid (AA, 0.5 mg/mL) and adenosine diphosphate (ADP, 5 µmol/L) induced platelet aggregation were measured via light transmission assay (LTA) to evaluate antiplatelet responses, referred as LTA–AA and LTA–ADP. Results A total of 275 elderly patients were included, with mean age of 77.2±8.1 years, and males accounted for 81.8%. HAPR was defined as LTA–AA in the upper quartile of the enrolled population. HAPR patients tended to have lower renal function (P=0.052). Higher serum uric acid (SUA) level, as well as lower platelet count, hemoglobin and hematocrit were observed in HAPR patients, with a higher proportion of diuretics use (P\u3c0.05). Multivariate analysis revealed that SUA (OR: 1.004, 95% CI: 1.000–1.007, P=0.048), platelet count (OR: 0.994, 95% CI: 0.989–1.000, P=0.045), hematocrit (OR: 0.921, 95% CI: 0.864–0.981, P=0.011) and concomitant P2Y12 receptor inhibitors use (OR: 1.965, 95% CI: 1.075–3.592, P=0.028) were correlated with HAPR. Spearman’s correlation analysis demonstrated an inverse association of LTA–AA with hematocrit (r=−0.234, P\u3c0.001), hemoglobin (r=−0.209, P\u3c0.001) and estimated glomerular filtration rate (r=−0.132, P=0.031). Conclusion SUA, platelet count, hematocrit and P2Y12 receptor inhibitors use were independently correlated with HAPR. These parameters might provide novel therapeutic targets for optimizing antiplatelet therapy

    Role of deregulated microRNAs in breast cancer progression Using FFPE tissue

    Get PDF
    MicroRNAs (miRNAs) contribute to cancer initiation and progression by silencing the expression of their target genes, causing either mRNA molecule degradation or translational inhibition. Intraductal epithelial proliferations of the breast are histologically and clinically classified into normal, atypical ductal hyperplasia (ADH), ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC). To better understand the progression of ductal breast cancer development, we attempt to identify deregulated miRNAs in this process using Formalin-Fixed, Paraffin-Embedded (FFPE) tissues from breast cancer patients. Following tissue microdissection, we obtained 8 normal, 4 ADH, 6 DCIS and 7 IDC samples, which were subject to RNA isolation and miRNA expression profiling analysis. We found that miR-21, miR-200b/c, miR-141, and miR-183 were consistently up-regulated in ADH, DCIS and IDC compared to normal, while miR-557 was uniquely down-regulated in DCIS. Interestingly, the most significant miRNA deregulations occurred during the transition from normal to ADH. However, the data did not reveal a step-wise miRNA alteration among discrete steps along tumor progression, which is in accordance with previous reports of mRNA profiling of different stages of breast cancer. Furthermore, the expression of MSH2 and SMAD7, two important molecules involving TGF-β pathway, was restored following miR-21 knockdown in both MCF-7 and Hs578T breast cancer cells. In this study, we have not only identified a number of potential candidate miRNAs for breast cancer, but also found that deregulation of miRNA expression during breast tumorigenesis might be an early event since it occurred significantly during normal to ADH transition. Consequently, we have demonstrated the feasibility of miRNA expression profiling analysis using archived FFPE tissues, typically with rich clinical information, as a means of miRNA biomarker discovery

    Metataxonomic and Metagenomic Approaches vs. Culture-Based Techniques for Clinical Pathology.

    Get PDF
    Diagnoses that are both timely and accurate are critically important for patients with life-threatening or drug resistant infections. Technological improvements in High-Throughput Sequencing (HTS) have led to its use in pathogen detection and its application in clinical diagnoses of infectious diseases. The present study compares two HTS methods, 16S rRNA marker gene sequencing (metataxonomics) and whole metagenomic shotgun sequencing (metagenomics), in their respective abilities to match the same diagnosis as traditional culture methods (culture inference) for patients with ventilator associated pneumonia (VAP). The metagenomic analysis was able to produce the same diagnosis as culture methods at the species-level for five of the six samples, while the metataxonomic analysis was only able to produce results with the same species-level identification as culture for two of the six samples. These results indicate that metagenomic analyses have the accuracy needed for a clinical diagnostic tool, but full integration in diagnostic protocols is contingent on technological improvements to decrease turnaround time and lower costs
    corecore