14,259 research outputs found

    Detecting differential usage of exons from RNA-Seq data

    Get PDF
    RNA-Seq is a powerful tool for the study of alternative splicing and other forms of alternative isoform expression. Understanding the regulation of these processes requires comparisons between treatments, tissues or conditions. For the analysis of such experiments, we present _DEXSeq_, a statistical method to test for differential exon usage in RNA-Seq data. _DEXSeq_ employs generalized linear models and offers good detection power and reliable control of false discoveries by taking biological variation into account. An implementation is available as an R/Bioconductor package

    Differential expression analysis for sequence count data

    Get PDF
    *Motivation:* High-throughput nucleotide sequencing provides quantitative readouts in assays for RNA expression (RNA-Seq), protein-DNA binding (ChIP-Seq) or cell counting (barcode sequencing). Statistical inference of differential signal in such data requires estimation of their variability throughout the dynamic range. When the number of replicates is small, error modelling is needed to achieve statistical power.

*Results:* We propose an error model that uses the negative binomial distribution, with variance and mean linked by local regression, to model the null distribution of the count data. The method controls type-I error and provides good detection power. 

*Availability:* A free open-source R software package, _DESeq_, is available from the Bioconductor project and from "http://www-huber.embl.de/users/anders/DESeq":http://www-huber.embl.de/users/anders/DESeq

    Evaluation of experimental design and computational parameter choices affecting analyses of ChIP-seq and RNA-seq data in undomesticated poplar trees.

    Get PDF
    BackgroundOne of the great advantages of next generation sequencing is the ability to generate large genomic datasets for virtually all species, including non-model organisms. It should be possible, in turn, to apply advanced computational approaches to these datasets to develop models of biological processes. In a practical sense, working with non-model organisms presents unique challenges. In this paper we discuss some of these challenges for ChIP-seq and RNA-seq experiments using the undomesticated tree species of the genus Populus.ResultsWe describe specific challenges associated with experimental design in Populus, including selection of optimal genotypes for different technical approaches and development of antibodies against Populus transcription factors. Execution of the experimental design included the generation and analysis of Chromatin immunoprecipitation-sequencing (ChIP-seq) data for RNA polymerase II and transcription factors involved in wood formation. We discuss criteria for analyzing the resulting datasets, determination of appropriate control sequencing libraries, evaluation of sequencing coverage needs, and optimization of parameters. We also describe the evaluation of ChIP-seq data from Populus, and discuss the comparison between ChIP-seq and RNA-seq data and biological interpretations of these comparisons.ConclusionsThese and other "lessons learned" highlight the challenges but also the potential insights to be gained from extending next generation sequencing-supported network analyses to undomesticated non-model species

    NBLDA: Negative Binomial Linear Discriminant Analysis for RNA-Seq Data

    Full text link
    RNA-sequencing (RNA-Seq) has become a powerful technology to characterize gene expression profiles because it is more accurate and comprehensive than microarrays. Although statistical methods that have been developed for microarray data can be applied to RNA-Seq data, they are not ideal due to the discrete nature of RNA-Seq data. The Poisson distribution and negative binomial distribution are commonly used to model count data. Recently, Witten (2011) proposed a Poisson linear discriminant analysis for RNA-Seq data. The Poisson assumption may not be as appropriate as negative binomial distribution when biological replicates are available and in the presence of overdispersion (i.e., when the variance is larger than the mean). However, it is more complicated to model negative binomial variables because they involve a dispersion parameter that needs to be estimated. In this paper, we propose a negative binomial linear discriminant analysis for RNA-Seq data. By Bayes' rule, we construct the classifier by fitting a negative binomial model, and propose some plug-in rules to estimate the unknown parameters in the classifier. The relationship between the negative binomial classifier and the Poisson classifier is explored, with a numerical investigation of the impact of dispersion on the discriminant score. Simulation results show the superiority of our proposed method. We also analyze four real RNA-Seq data sets to demonstrate the advantage of our method in real-world applications

    Production and analysis of synthetic Cascade variants

    Get PDF
    CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR assoziiert) ist ein adaptives Immunsystem in Archaeen und Bakterien, das fremdes genetisches Material mit Hilfe von Ribonukleoprotein-Komplexen erkennt und zerstört. Diese Komplexe bestehen aus einer CRISPR RNA (crRNA) und Cas Proteinen. CRISPR-Cas Systeme sind in zwei Hauptklassen und mehrere Typen unterteilt, abhängig von den beteiligten Cas Proteinen. In Typ I Systemen sucht ein Komplex namens Cascade (CRISPR associated complex for antiviral defence) nach eingedrungener viraler DNA während einer Folgeinfektion und bindet die zu der eingebauten crRNA komplementäre Sequenz. Anschließend wird die Nuklease/Helikase Cas3 rekrutiert, welche die virale DNA degradiert (Interferenz). Das Typ I System wird in mehrere Subtypen unterteilt, die Unterschiede im Aufbau von Cascade vorweisen. Im Fokus dieser Arbeit steht eine minimale Cascade-Variante aus Shewanella putrefaciens CN-32. Im Vergleich zur gut untersuchten Typ I-E Cascade aus Escherichia coli fehlen in diesem Komplex zwei Untereinheiten, die gewöhnlicher Weise für die Zielerkennung benötigt werden. Dennoch ist der Komplex aktiv. Rekombinante I-Fv Cascade wurde bereits aus E. coli aufgereinigt und es war möglich, den Komplex zu modifizieren, indem das Rückgrat entweder verlängert oder verkürzt wurde. Dadurch wurden synthetische Varianten mit veränderter Protein-Stöchiometrie erzeugt. In der vorliegenden Arbeit wurde I-Fv Cascade weiter mit in vitro Methoden untersucht. So wurde die Bindung von Ziel-DNA beobachtet und die 3D Struktur zeigt, dass strukturelle Veränderungen im Komplex die fehlenden Untereinheiten ersetzen, möglicherweise um viralen Anti-CRISPR Proteinen zu entgehen. Die Nuklease/Helikase dieses Systems, Cas2/3fv, ist eine Fusion des Cas3 Proteins mit dem Interferenz-unabhängigen Protein Cas2. Ein unabhängiges Cas3fv ohne Cas2 Untereinheit wurde aufgereinigt und in vitro Assays zeigten, dass dieses Protein sowohl freie ssDNA als auch Cascadegebundene Substrate degradiert. Das komplette Cas2/3fv Protein bildet einen Komplex mit dem Protein Cas1 und zeigt eine reduzierte Aktivität gegenüber freier ssDNA, möglicherweise als Regulationsmechanismus zur Vermeidung von unspezifischer Aktivität. Weiterhin wurde ein Prozess namens „RNA wrapping“ etabliert. Synthetische Cascade-Komplexe wurden erzeugt, in denen die grundlegende RNA-Bindung des charakteristischen Cas7fv RückgratProteins auf eine ausgewählte RNA gelenkt wird. Diese spezifische Komplexbildung kann in vivo durch eine Repeat-Sequenz der crRNA stromaufwärts der Zielsequenz und durch Bindung des Cas5fv Proteins initiiert werden. Die erzeugten Komplexe beinhalten die ersten 100 nt der markierten RNA, die anschließend isoliert werden kann. Innerhalb der Komplexe ist die RNA stabilisiert und geschützt vor Degradation durch RNasen. Komplexbildung kann außerdem genutzt werden, um ReportergenTranskripte stillzulegen. Zusätzlich wurden erste Hinweise geliefert, dass das Rückgrat der synthetischen Komplexe durch Fusion mit weiteren Reporterproteinen modifiziert werden kann.CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR associated) is an adaptive immune system of Archaea and Bacteria. It is able to target and destroy foreign genetic material with ribonucleoprotein complexes consisting of CRISPR RNAs (crRNAs) and certain Cas proteins. CRISPR-Cas systems are classified in two major classes and multiple types, according to the involved Cas proteins. In type I systems, a ribonucleoprotein complex called Cascade (CRISPR associated complex for antiviral defence) scans for invading viral DNA during a recurring infection and binds the sequence complementary to the incorporated crRNA. After target recognition, the nuclease/helicase Cas3 is recruited and subsequently destroys the viral DNA in a step termed interfere nce. Multiple subtypes of type I exist that show differences in the Cascade composition. This work focuses on a minimal Cascade variant found in Shewanella putrefaciens CN-32. In comparison to the well-studied type I-E Cascade from Escherichia coli, this complex is missing two proteins usually required for target recognition, yet it is still able to provide immunity. Recombinant I-Fv Cascade was previously purified from E. coli and it was possible to modulate the complex by extending or shortening the backbone, resulting in synthetic variants with altered protein stoichiometry. In the present study, I-Fv Cascade was further analyzed by in vitro methods. Target binding was observed and the 3D structure revealed structural variations that replace the missing subunits, potentially to evade viral anti-CRISPR proteins. The nuclease/helicase of this system, Cas2/3fv, is a fusion of the Cas3 protein with the interference-unrelated protein Cas2. A standalone Cas3fv was purified without the Cas2 domain and in vitro cleavage assays showed that Cas3fv degrades both free ssDNA as well as Cascade-bound substrates. The complete Cas2/3fv protein forms a complex with the protein Cas1 and was shown to reduce cleave of free ssDNA, potentially as a regulatory mechanism against unspecific cleavage. Furthermore, we established a process termed “RNA wrapping”. Synthetic Cascade assemblies can be created by directing the general RNA-binding ability of the characteristic Cas7fv backbone protein on an RNA of choice such as reporter gene transcripts. Specific complex formation can be initiated in vivo by including a repeat sequence from the crRNA upstream a given target sequence and binding of the Cas5fv protein. The created complexes contain the initial 100 nt of the tagged RNA which can be isolated afterwards. While incorporated in complexes, RNA is stabilized and protected from degradation by RNases. Complex formation can be used to silence reporter gene transcripts. Furthermore, we provided initial indications that the backbone of synthetic complexes can be modified by addition of reporter proteins

    Integrated genomics and proteomics define huntingtin CAG length-dependent networks in mice.

    Get PDF
    To gain insight into how mutant huntingtin (mHtt) CAG repeat length modifies Huntington's disease (HD) pathogenesis, we profiled mRNA in over 600 brain and peripheral tissue samples from HD knock-in mice with increasing CAG repeat lengths. We found repeat length-dependent transcriptional signatures to be prominent in the striatum, less so in cortex, and minimal in the liver. Coexpression network analyses revealed 13 striatal and 5 cortical modules that correlated highly with CAG length and age, and that were preserved in HD models and sometimes in patients. Top striatal modules implicated mHtt CAG length and age in graded impairment in the expression of identity genes for striatal medium spiny neurons and in dysregulation of cyclic AMP signaling, cell death and protocadherin genes. We used proteomics to confirm 790 genes and 5 striatal modules with CAG length-dependent dysregulation at the protein level, and validated 22 striatal module genes as modifiers of mHtt toxicities in vivo

    Network-based approaches to explore complex biological systems towards network medicine

    Get PDF
    Network medicine relies on different types of networks: from the molecular level of protein–protein interactions to gene regulatory network and correlation studies of gene expression. Among network approaches based on the analysis of the topological properties of protein–protein interaction (PPI) networks, we discuss the widespread DIAMOnD (disease module detection) algorithm. Starting from the assumption that PPI networks can be viewed as maps where diseases can be identified with localized perturbation within a specific neighborhood (i.e., disease modules), DIAMOnD performs a systematic analysis of the human PPI network to uncover new disease-associated genes by exploiting the connectivity significance instead of connection density. The past few years have witnessed the increasing interest in understanding the molecular mechanism of post-transcriptional regulation with a special emphasis on non-coding RNAs since they are emerging as key regulators of many cellular processes in both physiological and pathological states. Recent findings show that coding genes are not the only targets that microRNAs interact with. In fact, there is a pool of different RNAs—including long non-coding RNAs (lncRNAs) —competing with each other to attract microRNAs for interactions, thus acting as competing endogenous RNAs (ceRNAs). The framework of regulatory networks provides a powerful tool to gather new insights into ceRNA regulatory mechanisms. Here, we describe a data-driven model recently developed to explore the lncRNA-associated ceRNA activity in breast invasive carcinoma. On the other hand, a very promising example of the co-expression network is the one implemented by the software SWIM (switch miner), which combines topological properties of correlation networks with gene expression data in order to identify a small pool of genes—called switch genes—critically associated with drastic changes in cell phenotype. Here, we describe SWIM tool along with its applications to cancer research and compare its predictions with DIAMOnD disease genes
    • …
    corecore