Search CORE

23,241 research outputs found

Differential meta-analysis of RNA-seq data from multiple studies

Author: Jaffrézic Florence
Marot Guillemette
Rau Andrea
Publication venue
Publication date: 14/06/2013
Field of study

High-throughput sequencing is now regularly used for studies of the transcriptome (RNA-seq), particularly for comparisons among experimental conditions. For the time being, a limited number of biological replicates are typically considered in such experiments, leading to low detection power for differential expression. As their cost continues to decrease, it is likely that additional follow-up studies will be conducted to re-address the same biological question. We demonstrate how p-value combination techniques previously used for microarray meta-analyses can be used for the differential analysis of RNA-seq data from multiple related studies. These techniques are compared to a negative binomial generalized linear model (GLM) including a fixed study effect on simulated data and real data on human melanoma cell lines. The GLM with fixed study effect performed well for low inter-study variation and small numbers of studies, but was outperformed by the meta-analysis methods for moderate to large inter-study variability and larger numbers of studies. To conclude, the p-value combination techniques illustrated here are a valuable tool to perform differential meta-analyses of RNA-seq data by appropriately accounting for biological and technical variability within studies as well as additional study-specific effects. An R package metaRNASeq is available on the R Forge

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

INRIA a CCSD electronic archive server

PubMed Central

HAL Descartes

Integrative approaches for differential analysis of transcriptome data

Author: Baik Bukyung
Publication venue: Ulsan National Institute of Science and Technology
Publication date: 01/08/2022
Field of study

Department of Biological SciencesThe high-throughput sequencing technologies have produced a huge amount of omics data. Myriads of computational methods have been developed to analyze such data efficiently and accurately. In particular, recently developed single-cell sequencing technologies provided highly sparse and noisy data, further necessitating development of data analysis methods. As large amounts of omics data accumulate in public repositories, it has become the common practice to collect multiple datasets with the same theme (e.g., disease) and integrate them to increase the power of analysis. Because the data from individual studies differ in size, technologies, experimenters, and many other environmental factors, they often exhibit systematic differences in distribution, which is called batch effects. Thus, how to handle batch effects is crucial in integrative omics data analysis. This dissertation investigates computational methods to identify genes differentially expressed between different biological conditions from transcriptome data and how to integrate the analyses across different samples (batches). In Chapter 2, performance of 12 differential expression [DE] analysis methods for RNA sequencing (RNA-seq) data was compared. These methods include the widely used R packages such as edgeR, DESeq2 and limma as well as their recent variants. The benchmark data include RNA-spike-in, simulated read counts, and real RNA-seq data. Extensive conditions such as proportion of DE genes, sample sizes, presence of random outliers, mean and dispersion estimates were tested for simulated data. We analyzed the impact of each factor to overall performance of DE analysis and suggested suitable methods for each test condition. DESeq2, a robust version of edgeR and voom with TMM normalization exhibited overall good performance. In Chapter 3, two novel meta-analysis methods that are capable of capturing ???incomplete association??? were proposed. Incomplete association represents the coexistence of ???associated??? and ???unassociated??? statistics in the list of summary statistics obtained from different studies in integrative analysis. Meta-analysis integrates the summary statistics from different individual study to increase the statistical power. We demonstrated that the power of conventional meta-analysis methods rapidly decreased as the number of unassociated statistics increased. The classical Fisher???s method and the newly proposed weighted Fisher???s method (wFisher) effectively detected these incomplete associations. Another method, dubbed ordmeta, employed joint distribution of ordered p-values and also showed outperforming results in detecting incomplete associations. wFisher and ordmeta exclusively detected genes with high biological relevance from meta-analysis with prostate cancer gene expression data. Lastly, integrative DE analysis methods for single-cell RNA-seq (scRNA-seq) data were compared. In total, 41 computational pipelines that combine batch-effects correction methods, covariate modeling, and DE analysis methods were tested using simulation and real data. In particular, the single-cell RNA-seq data for seven patients with lung adenocarcinoma were analyzed. Remarkably, analysis of epithelialcells in scRNA-seq data outperformed the analysis large-scale bulk RNA-seq data available from the Cancer Genome Atlas in detecting known lung cancer genes and prognostic genes. Furthermore, GSEA analysis revealed distinct aspects of enriched pathways between epithelial cell and bulk RNA-seq data analyses.ope

ScholarWorks@UNIST

Differential expression and feature selection in the analysis of multiple omics studies

Author: Ma Tianzhou
Publication venue
Publication date: 28/06/2018
Field of study

With the rapid advances of high-throughput technologies in the past decades, various kinds of omics data have been generated from many labs and accumulated in the public domain. These studies have been designed for different biological purposes, including the identification of differentially expressed genes, the selection of predictive biomarkers, etc. Effective meta-analysis of omics data from multiple studies can improve statistical power, accuracy and reproducibility of single study. This dissertation covered a few methods for differential expression (Chapter 2 and 3) and feature selection (Chapter 4) in the analysis of multiple omics studies. In Chapter 2, we proposed a full Bayesian hierarchical model for RNA-seq meta-analysis by modeling count data, integrating information across genes and across studies, and modeling differential signals across studies via latent variables. A Dirichlet process mixture prior was further applied on the latent variables to provide categorization of detected biomarkers according to their differential expression patterns across studies. We used both simulations and a real application on multiple brain region HIV-1 transgenic rats to demonstrate improved sensitivity, accuracy and biological findings of our method. In Chapter 3, we extended the previous Bayesian model to jointly integrate transcriptomic data from the two platforms: microarray and RNA-seq. In Chapter 4, we considered a general framework for variable screening with multiple omics studies and further proposed a novel two-step screening procedure for high-dimensional regression analysis in this framework. Compared to the one-step procedure and rank-based sure independence screening procedure, our procedure greatly reduced false negative errors while keeping a low false positive rate. Theoretically, we showed that our procedure possesses the sure screening property with weaker assumptions on signal strengths and allows the number of features to grow at an exponential rate of the sample size. Public health significance: The proposed methods are useful in detecting important biomarkers that are either differentially expressed or predictive of clinical outcomes. This is essential for searching for potential drug targets and understanding the disease mechanism. Such findings in basic science can be translated into preventive medicine or potential treatment for disease to promote human health and improve the global healthcare system

D-Scholarship@Pitt

Recommended from our members

An atlas of cortical circular RNA expression in Alzheimer disease brains demonstrates clinical and pathological associations.

Author: Bateman Randall J
Budde John P
Chhatwal Jasmeer P
Cruchaga Carlos
Del-Aguila Jorge L
Dominantly Inherited Alzheimer Network (DIAN)
Dube Umber
Farias Fabiana
Fernandez Maria Victoria
Gentsch Jen
Graff-Radford Neill R
Harari Oscar
Hsu Simon
Ibanez Laura
Jiang Shan
Karch Celeste M
Lee Jae-Hong
Li Zeran
Masters Colin L
Morris John C
Norton Joanne
Salloway Stephen
Wang Fengxian
Publication venue: eScholarship, University of California
Publication date: 01/11/2019
Field of study

Parietal cortex RNA-sequencing (RNA-seq) data were generated from individuals with and without Alzheimer disease (AD; ncontrol = 13; nAD = 83) from the Knight Alzheimer Disease Research Center (Knight ADRC). Using this and an independent (Mount Sinai Brain Bank (MSBB)) AD RNA-seq dataset, cortical circular RNA (circRNA) expression was quantified in the context of AD. Significant associations were identified between circRNA expression and AD diagnosis, clinical dementia severity and neuropathological severity. It was demonstrated that most circRNA-AD associations are independent of changes in cognate linear messenger RNA expression or estimated brain cell-type proportions. Evidence was provided for circRNA expression changes occurring early in presymptomatic AD and in autosomal dominant AD. It was also observed that AD-associated circRNAs co-expressed with known AD genes. Finally, potential microRNA-binding sites were identified in AD-associated circRNAs for miRNAs predicted to target AD genes. Together, these results highlight the importance of analyzing non-linear RNAs and support future studies exploring the potential roles of circRNAs in AD pathogenesis

eScholarship - University of California

The ROS wheel: refining ROS transcriptional footprints

Author: Gevaert Kris
Kerchev Pavel
M'Hamdi Amna
Noctor Graham
Stael Simon
Storme Veronique
Van Breusegem Frank
Willems Patrick
Publication venue: 'American Society of Plant Biologists (ASPB)'
Publication date: 01/01/2016
Field of study

In the last decade, microarray studies have delivered extensive inventories of transcriptome-wide changes in messenger RNA levels provoked by various types of oxidative stress in Arabidopsis (Arabidopsis thaliana). Previous cross-study comparisons indicated how different types of reactive oxygen species (ROS) and their subcellular accumulation sites are able to reshape the transcriptome in specific manners. However, these analyses often employed simplistic statistical frameworks that are not compatible with large-scale analyses. Here, we reanalyzed a total of 79 Affymetrix ATH1 microarray studies of redox homeostasis perturbation experiments. To create hierarchy in such a high number of transcriptomic data sets, all transcriptional profiles were clustered on the overlap extent of their differentially expressed transcripts. Subsequently, meta-analysis determined a single magnitude of differential expression across studies and identified common transcriptional footprints per cluster. The resulting transcriptional footprints revealed the regulation of various metabolic pathways and gene families. The RESPIRATORY BURST OXIDASE HOMOLOG F-mediated respiratory burst had a major impact and was a converging point among several studies. Conversely, the timing of the oxidative stress response was a determining factor in shaping different transcriptome footprints. Our study emphasizes the need to interpret transcriptomic data sets in a systematic context, where initial, specific stress triggers can converge to common, aspecific transcriptional changes. We believe that these refined transcriptional footprints provide a valuable resource for assessing the involvement of ROS in biological processes in plants

HAL Evry

Crossref

Ghent University Academic Bibliography

PubMed Central

HAL Descartes

Hal-Diderot

Recommended from our members

Allele-specific NKX2-5 binding underlies multiple genetic associations with human electrocardiographic traits.

Author: Benaglio Paola
D'Antonio Matteo
D'Antonio-Chronowska Agnieszka
DeBoever Christopher
Donovan Margaret KR
Drees Frauke
Frazer Kelly A
Gaulton Kyle J
Li He
Ma Wubin
Matsui Hiroko
Rosenfeld Michael G
Singhal Sanghamitra
Smith Erin N
Sotoodehnia Nona
van Setten Jessica
Yang Feng
Young Greenwald William W
Publication venue: eScholarship, University of California
Publication date: 01/10/2019
Field of study

The cardiac transcription factor (TF) gene NKX2-5 has been associated with electrocardiographic (EKG) traits through genome-wide association studies (GWASs), but the extent to which differential binding of NKX2-5 at common regulatory variants contributes to these traits has not yet been studied. We analyzed transcriptomic and epigenomic data from induced pluripotent stem cell-derived cardiomyocytes from seven related individuals, and identified ~2,000 single-nucleotide variants associated with allele-specific effects (ASE-SNVs) on NKX2-5 binding. NKX2-5 ASE-SNVs were enriched for altered TF motifs, for heart-specific expression quantitative trait loci and for EKG GWAS signals. Using fine-mapping combined with epigenomic data from induced pluripotent stem cell-derived cardiomyocytes, we prioritized candidate causal variants for EKG traits, many of which were NKX2-5 ASE-SNVs. Experimentally characterizing two NKX2-5 ASE-SNVs (rs3807989 and rs590041) showed that they modulate the expression of target genes via differential protein binding in cardiac cells, indicating that they are functional variants underlying EKG GWAS signals. Our results show that differential NKX2-5 binding at numerous regulatory variants across the genome contributes to EKG phenotypes

eScholarship - University of California

Essential guidelines for computational method benchmarking

Author: Boulesteix Anne-Laure
Cannoodt Robrecht
Gardner Paul P.
Hapfelmeier Alexander
Robinson Mark D.
Saelens Wouter
Saeys Yvan
Soneson Charlotte
Weber Lukas M.
Publication venue
Publication date: 01/01/2019
Field of study

In computational biology and other sciences, researchers are frequently faced with a choice between several computational methods for performing data analyses. Benchmarking studies aim to rigorously compare the performance of different methods using well-characterized benchmark datasets, to determine the strengths of each method or to provide recommendations regarding suitable choices of methods for an analysis. However, benchmarking studies must be carefully designed and implemented to provide accurate, unbiased, and informative results. Here, we summarize key practical guidelines and recommendations for performing high-quality benchmarking analyses, based on our experiences in computational biology.Comment: Minor update

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Open Access LMU

ZORA

Recommended from our members

Common CHD8 Genomic Targets Contrast With Model-Specific Transcriptional Impacts of CHD8 Haploinsufficiency.

Author: Catta-Preta Rinaldo
Lim Kenneth
Nord Alex S
Wade A Ayanna
Publication venue: eScholarship, University of California
Publication date: 01/01/2018
Field of study

The packaging of DNA into chromatin determines the transcriptional potential of cells and is central to eukaryotic gene regulation. Case sequencing studies have revealed mutations to proteins that regulate chromatin state, known as chromatin remodeling factors, with causal roles in neurodevelopmental disorders. Chromodomain helicase DNA binding protein 8 (CHD8) encodes a chromatin remodeling factor with among the highest de novo loss-of-function mutation rates in patients with autism spectrum disorder (ASD). However, mechanisms associated with CHD8 pathology have yet to be elucidated. We analyzed published transcriptomic data across CHD8 in vitro and in vivo knockdown and knockout models and CHD8 binding across published ChIP-seq datasets to identify convergent mechanisms of gene regulation by CHD8. Differentially expressed genes (DEGs) across models varied, but overlap was observed between downregulated genes involved in neuronal development and function, cell cycle, chromatin dynamics, and RNA processing, and between upregulated genes involved in metabolism and immune response. Considering the variability in transcriptional changes and the cells and tissues represented across ChIP-seq analysis, we found a surprisingly consistent set of high-affinity CHD8 genomic interactions. CHD8 was enriched near promoters of genes involved in basic cell functions and gene regulation. Overlap between high-affinity CHD8 targets and DEGs shows that reduced dosage of CHD8 directly relates to decreased expression of cell cycle, chromatin organization, and RNA processing genes, but only in a subset of studies. This meta-analysis verifies CHD8 as a master regulator of gene expression and reveals a consistent set of high-affinity CHD8 targets across human, mouse, and rat in vivo and in vitro studies. These conserved regulatory targets include many genes that are also implicated in ASD. Our findings suggest a model where perturbation to dosage-sensitive CHD8 genomic interactions with a highly-conserved set of regulatory targets leads to model-specific downstream transcriptional impacts

eScholarship - University of California

FigShare