14,259 research outputs found
Detecting differential usage of exons from RNA-Seq data
RNA-Seq is a powerful tool for the study of alternative splicing and other forms of alternative isoform expression. Understanding the regulation of these processes requires comparisons between treatments, tissues or conditions. For the analysis of such experiments, we present _DEXSeq_, a statistical method to test for differential exon usage in RNA-Seq data. _DEXSeq_ employs generalized linear models and offers good detection power and reliable control of false discoveries by taking biological variation into account. An implementation is available as an R/Bioconductor package
Differential expression analysis for sequence count data
*Motivation:* High-throughput nucleotide sequencing provides quantitative readouts in assays for RNA expression (RNA-Seq), protein-DNA binding (ChIP-Seq) or cell counting (barcode sequencing). Statistical inference of differential signal in such data requires estimation of their variability throughout the dynamic range. When the number of replicates is small, error modelling is needed to achieve statistical power.

*Results:* We propose an error model that uses the negative binomial distribution, with variance and mean linked by local regression, to model the null distribution of the count data. The method controls type-I error and provides good detection power. 

*Availability:* A free open-source R software package, _DESeq_, is available from the Bioconductor project and from "http://www-huber.embl.de/users/anders/DESeq":http://www-huber.embl.de/users/anders/DESeq
Evaluation of experimental design and computational parameter choices affecting analyses of ChIP-seq and RNA-seq data in undomesticated poplar trees.
BackgroundOne of the great advantages of next generation sequencing is the ability to generate large genomic datasets for virtually all species, including non-model organisms. It should be possible, in turn, to apply advanced computational approaches to these datasets to develop models of biological processes. In a practical sense, working with non-model organisms presents unique challenges. In this paper we discuss some of these challenges for ChIP-seq and RNA-seq experiments using the undomesticated tree species of the genus Populus.ResultsWe describe specific challenges associated with experimental design in Populus, including selection of optimal genotypes for different technical approaches and development of antibodies against Populus transcription factors. Execution of the experimental design included the generation and analysis of Chromatin immunoprecipitation-sequencing (ChIP-seq) data for RNA polymerase II and transcription factors involved in wood formation. We discuss criteria for analyzing the resulting datasets, determination of appropriate control sequencing libraries, evaluation of sequencing coverage needs, and optimization of parameters. We also describe the evaluation of ChIP-seq data from Populus, and discuss the comparison between ChIP-seq and RNA-seq data and biological interpretations of these comparisons.ConclusionsThese and other "lessons learned" highlight the challenges but also the potential insights to be gained from extending next generation sequencing-supported network analyses to undomesticated non-model species
NBLDA: Negative Binomial Linear Discriminant Analysis for RNA-Seq Data
RNA-sequencing (RNA-Seq) has become a powerful technology to characterize
gene expression profiles because it is more accurate and comprehensive than
microarrays. Although statistical methods that have been developed for
microarray data can be applied to RNA-Seq data, they are not ideal due to the
discrete nature of RNA-Seq data. The Poisson distribution and negative binomial
distribution are commonly used to model count data. Recently, Witten (2011)
proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson
assumption may not be as appropriate as negative binomial distribution when
biological replicates are available and in the presence of overdispersion
(i.e., when the variance is larger than the mean). However, it is more
complicated to model negative binomial variables because they involve a
dispersion parameter that needs to be estimated. In this paper, we propose a
negative binomial linear discriminant analysis for RNA-Seq data. By Bayes'
rule, we construct the classifier by fitting a negative binomial model, and
propose some plug-in rules to estimate the unknown parameters in the
classifier. The relationship between the negative binomial classifier and the
Poisson classifier is explored, with a numerical investigation of the impact of
dispersion on the discriminant score. Simulation results show the superiority
of our proposed method. We also analyze four real RNA-Seq data sets to
demonstrate the advantage of our method in real-world applications
Production and analysis of synthetic Cascade variants
CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR assoziiert) ist ein
adaptives Immunsystem in Archaeen und Bakterien, das fremdes genetisches Material mit Hilfe von
Ribonukleoprotein-Komplexen erkennt und zerstört. Diese Komplexe bestehen aus einer CRISPR RNA
(crRNA) und Cas Proteinen. CRISPR-Cas Systeme sind in zwei Hauptklassen und mehrere Typen
unterteilt, abhängig von den beteiligten Cas Proteinen. In Typ I Systemen sucht ein Komplex namens
Cascade (CRISPR associated complex for antiviral defence) nach eingedrungener viraler DNA während
einer Folgeinfektion und bindet die zu der eingebauten crRNA komplementäre Sequenz. Anschließend
wird die Nuklease/Helikase Cas3 rekrutiert, welche die virale DNA degradiert (Interferenz).
Das Typ I System wird in mehrere Subtypen unterteilt, die Unterschiede im Aufbau von Cascade
vorweisen. Im Fokus dieser Arbeit steht eine minimale Cascade-Variante aus Shewanella putrefaciens
CN-32. Im Vergleich zur gut untersuchten Typ I-E Cascade aus Escherichia coli fehlen in diesem Komplex
zwei Untereinheiten, die gewöhnlicher Weise für die Zielerkennung benötigt werden. Dennoch ist der
Komplex aktiv. Rekombinante I-Fv Cascade wurde bereits aus E. coli aufgereinigt und es war möglich,
den Komplex zu modifizieren, indem das Rückgrat entweder verlängert oder verkürzt wurde. Dadurch
wurden synthetische Varianten mit veränderter Protein-Stöchiometrie erzeugt.
In der vorliegenden Arbeit wurde I-Fv Cascade weiter mit in vitro Methoden untersucht. So wurde die
Bindung von Ziel-DNA beobachtet und die 3D Struktur zeigt, dass strukturelle Veränderungen im
Komplex die fehlenden Untereinheiten ersetzen, möglicherweise um viralen Anti-CRISPR Proteinen zu
entgehen. Die Nuklease/Helikase dieses Systems, Cas2/3fv, ist eine Fusion des Cas3 Proteins mit dem
Interferenz-unabhängigen Protein Cas2. Ein unabhängiges Cas3fv ohne Cas2 Untereinheit wurde
aufgereinigt und in vitro Assays zeigten, dass dieses Protein sowohl freie ssDNA als auch Cascadegebundene Substrate degradiert. Das komplette Cas2/3fv Protein bildet einen Komplex mit dem Protein
Cas1 und zeigt eine reduzierte Aktivität gegenüber freier ssDNA, möglicherweise als
Regulationsmechanismus zur Vermeidung von unspezifischer Aktivität.
Weiterhin wurde ein Prozess namens „RNA wrapping“ etabliert. Synthetische Cascade-Komplexe
wurden erzeugt, in denen die grundlegende RNA-Bindung des charakteristischen Cas7fv RückgratProteins auf eine ausgewählte RNA gelenkt wird. Diese spezifische Komplexbildung kann in vivo durch
eine Repeat-Sequenz der crRNA stromaufwärts der Zielsequenz und durch Bindung des Cas5fv Proteins
initiiert werden. Die erzeugten Komplexe beinhalten die ersten 100 nt der markierten RNA, die
anschlieĂźend isoliert werden kann. Innerhalb der Komplexe ist die RNA stabilisiert und geschĂĽtzt vor
Degradation durch RNasen. Komplexbildung kann außerdem genutzt werden, um ReportergenTranskripte stillzulegen. Zusätzlich wurden erste Hinweise geliefert, dass das Rückgrat der synthetischen
Komplexe durch Fusion mit weiteren Reporterproteinen modifiziert werden kann.CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR associated) is an
adaptive immune system of Archaea and Bacteria. It is able to target and destroy foreign genetic
material with ribonucleoprotein complexes consisting of CRISPR RNAs (crRNAs) and certain Cas proteins.
CRISPR-Cas systems are classified in two major classes and multiple types, according to the involved Cas
proteins. In type I systems, a ribonucleoprotein complex called Cascade (CRISPR associated complex for
antiviral defence) scans for invading viral DNA during a recurring infection and binds the sequence
complementary to the incorporated crRNA. After target recognition, the nuclease/helicase Cas3 is
recruited and subsequently destroys the viral DNA in a step termed interfere nce.
Multiple subtypes of type I exist that show differences in the Cascade composition. This work focuses on
a minimal Cascade variant found in Shewanella putrefaciens CN-32. In comparison to the well-studied
type I-E Cascade from Escherichia coli, this complex is missing two proteins usually required for target
recognition, yet it is still able to provide immunity. Recombinant I-Fv Cascade was previously purified
from E. coli and it was possible to modulate the complex by extending or shortening the backbone,
resulting in synthetic variants with altered protein stoichiometry.
In the present study, I-Fv Cascade was further analyzed by in vitro methods. Target binding was
observed and the 3D structure revealed structural variations that replace the missing subunits,
potentially to evade viral anti-CRISPR proteins. The nuclease/helicase of this system, Cas2/3fv, is a fusion
of the Cas3 protein with the interference-unrelated protein Cas2. A standalone Cas3fv was purified
without the Cas2 domain and in vitro cleavage assays showed that Cas3fv degrades both free ssDNA as
well as Cascade-bound substrates. The complete Cas2/3fv protein forms a complex with the protein
Cas1 and was shown to reduce cleave of free ssDNA, potentially as a regulatory mechanism against
unspecific cleavage.
Furthermore, we established a process termed “RNA wrapping”. Synthetic Cascade assemblies can be
created by directing the general RNA-binding ability of the characteristic Cas7fv backbone protein on an
RNA of choice such as reporter gene transcripts. Specific complex formation can be initiated in vivo by
including a repeat sequence from the crRNA upstream a given target sequence and binding of the
Cas5fv protein. The created complexes contain the initial 100 nt of the tagged RNA which can be
isolated afterwards. While incorporated in complexes, RNA is stabilized and protected from degradation
by RNases. Complex formation can be used to silence reporter gene transcripts. Furthermore, we
provided initial indications that the backbone of synthetic complexes can be modified by addition of
reporter proteins
Integrated genomics and proteomics define huntingtin CAG length-dependent networks in mice.
To gain insight into how mutant huntingtin (mHtt) CAG repeat length modifies Huntington's disease (HD) pathogenesis, we profiled mRNA in over 600 brain and peripheral tissue samples from HD knock-in mice with increasing CAG repeat lengths. We found repeat length-dependent transcriptional signatures to be prominent in the striatum, less so in cortex, and minimal in the liver. Coexpression network analyses revealed 13 striatal and 5 cortical modules that correlated highly with CAG length and age, and that were preserved in HD models and sometimes in patients. Top striatal modules implicated mHtt CAG length and age in graded impairment in the expression of identity genes for striatal medium spiny neurons and in dysregulation of cyclic AMP signaling, cell death and protocadherin genes. We used proteomics to confirm 790 genes and 5 striatal modules with CAG length-dependent dysregulation at the protein level, and validated 22 striatal module genes as modifiers of mHtt toxicities in vivo
Network-based approaches to explore complex biological systems towards network medicine
Network medicine relies on different types of networks: from the molecular level of protein–protein interactions to gene regulatory network and correlation studies of gene expression. Among network approaches based on the analysis of the topological properties of protein–protein interaction (PPI) networks, we discuss the widespread DIAMOnD (disease module detection) algorithm. Starting from the assumption that PPI networks can be viewed as maps where diseases can be identified with localized perturbation within a specific neighborhood (i.e., disease modules), DIAMOnD performs a systematic analysis of the human PPI network to uncover new disease-associated genes by exploiting the connectivity significance instead of connection density. The past few years have witnessed the increasing interest in understanding the molecular mechanism of post-transcriptional regulation with a special emphasis on non-coding RNAs since they are emerging as key regulators of many cellular processes in both physiological and pathological states. Recent findings show that coding genes are not the only targets that microRNAs interact with. In fact, there is a pool of different RNAs—including long non-coding RNAs (lncRNAs) —competing with each other to attract microRNAs for interactions, thus acting as competing endogenous RNAs (ceRNAs). The framework of regulatory networks provides a powerful tool to gather new insights into ceRNA regulatory mechanisms. Here, we describe a data-driven model recently developed to explore the lncRNA-associated ceRNA activity in breast invasive carcinoma. On the other hand, a very promising example of the co-expression network is the one implemented by the software SWIM (switch miner), which combines topological properties of correlation networks with gene expression data in order to identify a small pool of genes—called switch genes—critically associated with drastic changes in cell phenotype. Here, we describe SWIM tool along with its applications to cancer research and compare its predictions with DIAMOnD disease genes
- …