Search CORE

12,896 research outputs found

TWO-SIGMA: A novel two-component single cell model-based association method for single-cell RNA-seq data

Author: Hu M.
Jin F.
Li Y.
Li Y.
Van Buren E.
Weng C.
Wu D.
Publication venue: Wiley-Liss Inc.
Publication date: 01/01/2021
Field of study

In this paper, we develop TWO-SIGMA, a TWO-component SInGle cell Model-based Association method for differential expression (DE) analyses in single-cell RNA-seq (scRNA-seq) data. The first component models the probability of “drop-out” with a mixed-effects logistic regression model and the second component models the (conditional) mean expression with a mixed-effects negative binomial regression model. TWO-SIGMA is extremely flexible in that it: (i) does not require a log-transformation of the outcome, (ii) allows for overdispersed and zero-inflated counts, (iii) accommodates a correlation structure between cells from the same individual via random effect terms, (iv) can analyze unbalanced designs (in which the number of cells does not need to be identical for all samples), (v) can control for additional sample-level and cell-level covariates including batch effects, (vi) provides interpretable effect size estimates, and (vii) enables general tests of DE beyond two-group comparisons. To our knowledge, TWO-SIGMA is the only method for analyzing scRNA-seq data that can simultaneously accomplish each of these features. Simulations studies show that TWO-SIGMA outperforms alternative regression-based approaches in both type-I error control and power enhancement when the data contains even moderate within-sample correlation. A real data analysis using pancreas islet single-cells exhibits the flexibility of TWO-SIGMA and demonstrates that incorrectly failing to include random effect terms can have dramatic impacts on scientific conclusions. TWO-SIGMA is implemented in the R package twosigma available at https://github.com/edvanburen/twosigma

PubMed Central

Carolina Digital Repository

De novo prediction of PTBP1 binding and splicing targets reveals unexpected features of its RNA recognition and function.

Author: Black Douglas L
Fu Xiang-Dong
Han Areum
Linares Anthony J
Stoilov Peter
Zhou Yu
Publication venue: eScholarship, University of California
Publication date: 01/01/2014
Field of study

The splicing regulator Polypyrimidine Tract Binding Protein (PTBP1) has four RNA binding domains that each binds a short pyrimidine element, allowing recognition of diverse pyrimidine-rich sequences. This variation makes it difficult to evaluate PTBP1 binding to particular sites based on sequence alone and thus to identify target RNAs. Conversely, transcriptome-wide binding assays such as CLIP identify many in vivo targets, but do not provide a quantitative assessment of binding and are informative only for the cells where the analysis is performed. A general method of predicting PTBP1 binding and possible targets in any cell type is needed. We developed computational models that predict the binding and splicing targets of PTBP1. A Hidden Markov Model (HMM), trained on CLIP-seq data, was used to score probable PTBP1 binding sites. Scores from this model are highly correlated (ρ = -0.9) with experimentally determined dissociation constants. Notably, we find that the protein is not strictly pyrimidine specific, as interspersed Guanosine residues are well tolerated within PTBP1 binding sites. This model identifies many previously unrecognized PTBP1 binding sites, and can score PTBP1 binding across the transcriptome in the absence of CLIP data. Using this model to examine the placement of PTBP1 binding sites in controlling splicing, we trained a multinomial logistic model on sets of PTBP1 regulated and unregulated exons. Applying this model to rank exons across the mouse transcriptome identifies known PTBP1 targets and many new exons that were confirmed as PTBP1-repressed by RT-PCR and RNA-seq after PTBP1 depletion. We find that PTBP1 dependent exons are diverse in structure and do not all fit previous descriptions of the placement of PTBP1 binding sites. Our study uncovers new features of RNA recognition and splicing regulation by PTBP1. This approach can be applied to other multi-RRM domain proteins to assess binding site degeneracy and multifactorial splicing regulation

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

The Research Repository @ WVU (West Virginia University)

Computational search for UV radiation resistance strategies in Deinococcus swuensis isolated from Paramo ecosystems

Author: Acosta Iván Camilo
Díaz-Riaño Jorge
García-Castillo Catalina
Posada Leonardo
Reyes Alejandro
Ruíz-Pérez Carlos
Zambrano María Mercedes
Publication venue: Digital Commons@Becker
Publication date: 01/01/2019
Field of study

Ultraviolet radiation (UVR) is widely known as deleterious for many organisms since it can cause damage to biomolecules either directly or indirectly via the formation of reactive oxygen species. The goal of this study was to analyze the capacity of high-mountain Espeletia hartwegiana plant phyllosphere microorganisms to survive UVR and to identify genes related to resistance strategies. A strain of Deinococcus swuensis showed a high survival rate of up to 60% after UVR treatment at 800J/m2 and was used for differential expression analysis using RNA-seq after exposing cells to 400J/m2 of UVR (with \u3e95% survival rate). Differentially expressed genes were identified using the R-Bioconductor package NOISeq and compared with other reported resistance strategies reported for this genus. Genes identified as being overexpressed included transcriptional regulators and genes involved in protection against damage by UVR. Non-coding (nc)RNAs were also differentially expressed, some of which have not been previously implicated. This study characterized the immediate radiation response of D. swuensis and indicates the involvement of ncRNAs in the adaptation to extreme environmental conditions

Directory of Open Access Journals

Digital Commons@Becker

aFold – using polynomial uncertainty modelling for differential gene expression estimation from RNA sequencing data

Author: Rosenstiel P.
Schulenburg H.
Yang W.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2019
Field of study

Data normalization and identification of significant differential expression represent crucial steps in RNA-Seq analysis. Many available tools rely on assumptions that are often not met by real data, including the common assumption of symmetrical distribution of up- and down-regulated genes, the presence of only few differentially expressed genes and/or few outliers. Moreover, the cut-off for selecting significantly differentially expressed genes for further downstream analysis often depend on arbitrary choices

MPG.PuRe

FigShare

Dysregulated protocadherin-pathway activity as an intrinsic defect in induced pluripotent stem cell-derived cortical interneurons from subjects with schizophrenia.

Author: Apud Jose
Berman Karen F
Bin Kim Woong
Cho Jun-Hyeong
Chung Sangmi
Cohen Bruce M
Cote Sarah E
Coyle Joseph T
Eggan Kevin C
Eisenberg Leonard M
Fukuda Emi
Ghosh Sulagna
Hirayama Teruyoshi
Huang Weihua
Kim Hae-Young
Lanz Thomas A
McPhie Donna L
Moghadam Alexander A
Nguyen Christine
Ni Peiyan
Noh Haneul
Noyes Elizabeth
Ongur Dost
Park James M
Park Joshua J
Parsons Teagan
Perlis Roy H
Rapoport Judith L
Shao Zhicheng
Stanton Patric K
Straub Richard E
Weinberger Daniel R
Xi Hualin Simon
Yagi Takeshi
Yin Changhong
Zhao Joyce
Zheng Kelvin
Publication venue: eScholarship, University of California
Publication date: 01/02/2019
Field of study

We generated cortical interneurons (cINs) from induced pluripotent stem cells derived from 14 healthy controls and 14 subjects with schizophrenia. Both healthy control cINs and schizophrenia cINs were authentic, fired spontaneously, received functional excitatory inputs from host neurons, and induced GABA-mediated inhibition in host neurons in vivo. However, schizophrenia cINs had dysregulated expression of protocadherin genes, which lie within documented schizophrenia loci. Mice lacking protocadherin-α showed defective arborization and synaptic density of prefrontal cortex cINs and behavioral abnormalities. Schizophrenia cINs similarly showed defects in synaptic density and arborization that were reversed by inhibitors of protein kinase C, a downstream kinase in the protocadherin pathway. These findings reveal an intrinsic abnormality in schizophrenia cINs in the absence of any circuit-driven pathology. They also demonstrate the utility of homogenous and functional populations of a relevant neuronal subtype for probing pathogenesis mechanisms during development

Crossref

eScholarship - University of California

Sequential stopping for high-throughput experiments

Author: Armstrong
Campo Dell'orto
D. Rossell
P. Muller
Tibshirani
Yang
Zien
Publication venue: 'Oxford University Press (OUP)'
Publication date: 20/08/2012
Field of study

In high-throughput experiments, the sample size is typically chosen informally. Most formal sample-size calculations depend critically on prior knowledge. We propose a sequential strategy that, by updating knowledge when new data are available, depends less critically on prior assumptions. Experiments are stopped or continued based on the potential benefits in obtaining additional data. The underlying decision-theoretic framework guarantees the design to proceed in a coherent fashion. We propose intuitively appealing, easy-to-implement utility functions. As in most sequential design problems, an exact solution is prohibitive. We propose a simulation-based approximation that uses decision boundaries. We apply the method to RNA-seq, microarray, and reverse-phase protein array studies and show its potential advantages. The approach has been added to the Bioconductor package gaga

Crossref

PubMed Central

Warwick Research Archives Portal Repository

Genetic determinants of co-accessible chromatin regions in activated T cells across humans.

Author: A Barrie
A Battle
A Franke
AA Shabalin
AM Klein
AR Quinlan
Atsede Siba
Aviv Regev
Aviva P. Aiden
B Li
BE Stranger
C Hou
Christine S. Cheng
Christophe Benoist
Chun J. Ye
CJ Ye
CK Stroud
D Hnisz
D Lee
D Sakata
DE Speiser
Dmytro Lituiev
E Elinav
E Splinter
EM Schmidt
Erez Lieberman Aiden
EZ Macosko
G Jun
G McVicker
H Kilpinen
H Li
H Li
HK Finucane
HM Kang
Howard Y. Chang
Ido Machol
Ivo Wortman
J Yang
JD Buenrostro
JD Buenrostro
JD Storey
JE Phillips
JF Degner
JN Hirschhorn
JS Delisle
K Enjyoji
Kendrick L. Hougen
KK Farh
L Chen
L Plesner
M Feuerer
M Ghandi
M Kasowski
M Kronenberg
M Kurachi
M. Grace Gordon
Marcin Tabaka
MB Gerstein
Meena Subramaniam
MI Love
MI McCarthy
Michael A. Beer
MN Lee
MT Maurano
Muhammad Shamim
MY Donath
N Kumasaka
NC Durand
Neva C. Durand
NP Restifo
P Cauchy
P Li
PC Hollenhorst
Philip L. De Jager
PM Visscher
PS Ohashi
R Satija
Rachel E. Gate
RE Thurman
RM Samstein
Roadmap Epigenomics Consortium
S Deaglio
S Heinz
S Neph
SM Waszak
SS Rao
Su-Chen Huang
T Lappalainen
T Raj
The ENCODE Project Consortium.
Ting Feng
TL Murphy
UM Marigorta
WA Whyte
WJ Astle
X Chen
X Sun
Y Belkaid
Y Zhang
YY Fan
Publication venue: eScholarship, University of California
Publication date: 01/08/2018
Field of study

Over 90% of genetic variants associated with complex human traits map to non-coding regions, but little is understood about how they modulate gene regulation in health and disease. One possible mechanism is that genetic variants affect the activity of one or more cis-regulatory elements leading to gene expression variation in specific cell types. To identify such cases, we analyzed ATAC-seq and RNA-seq profiles from stimulated primary CD4+ T cells in up to 105 healthy donors. We found that regions of accessible chromatin (ATAC-peaks) are co-accessible at kilobase and megabase resolution, consistent with the three-dimensional chromatin organization measured by in situ Hi-C in T cells. Fifteen percent of genetic variants located within ATAC-peaks affected the accessibility of the corresponding peak (local-ATAC-QTLs). Local-ATAC-QTLs have the largest effects on co-accessible peaks, are associated with gene expression and are enriched for autoimmune disease variants. Our results provide insights into how natural genetic variants modulate cis-regulatory elements, in isolation or in concert, to influence gene expression

Crossref

eScholarship - University of California

On the utility of RNA sample pooling to optimize cost and statistical power in RNA sequencing experiments

Author: Assefa Alemu Takele
Thas Olivier
Vandesompele Jo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Background: In gene expression studies, RNA sample pooling is sometimes considered because of budget constraints or lack of sufficient input material. Using microarray technology, RNA sample pooling strategies have been reported to optimize both the cost of data generation as well as the statistical power for differential gene expression (DGE) analysis. For RNA sequencing, with its different quantitative output in terms of counts and tunable dynamic range, the adequacy and empirical validation of RNA sample pooling strategies have not yet been evaluated. In this study, we comprehensively assessed the utility of pooling strategies in RNA-seq experiments using empirical and simulated RNA-seq datasets. Result: The data generating model in pooled experiments is defined mathematically to evaluate the mean and variability of gene expression estimates. The model is further used to examine the trade-off between the statistical power of testing for DGE and the data generating costs. Empirical assessment of pooling strategies is done through analysis of RNA-seq datasets under various pooling and non-pooling experimental settings. Simulation study is also used to rank experimental scenarios with respect to the rate of false and true discoveries in DGE analysis. The results demonstrate that pooling strategies in RNA-seq studies can be both cost-effective and powerful when the number of pools, pool size and sequencing depth are optimally defined. Conclusion: For high within-group gene expression variability, small RNA sample pools are effective to reduce the variability and compensate for the loss of the number of replicates. Unlike the typical cost-saving strategies, such as reducing sequencing depth or number of RNA samples (replicates), an adequate pooling strategy is effective in maintaining the power of testing DGE for genes with low to medium abundance levels, along with a substantial reduction of the total cost of the experiment. In general, pooling RNA samples or pooling RNA samples in conjunction with moderate reduction of the sequencing depth can be good options to optimize the cost and maintain the power

Ghent University Academic Bibliography

Research Online