621 research outputs found

    Differential expression analysis for sequence count data

    Get PDF
    *Motivation:* High-throughput nucleotide sequencing provides quantitative readouts in assays for RNA expression (RNA-Seq), protein-DNA binding (ChIP-Seq) or cell counting (barcode sequencing). Statistical inference of differential signal in such data requires estimation of their variability throughout the dynamic range. When the number of replicates is small, error modelling is needed to achieve statistical power.

*Results:* We propose an error model that uses the negative binomial distribution, with variance and mean linked by local regression, to model the null distribution of the count data. The method controls type-I error and provides good detection power. 

*Availability:* A free open-source R software package, _DESeq_, is available from the Bioconductor project and from "http://www-huber.embl.de/users/anders/DESeq":http://www-huber.embl.de/users/anders/DESeq

    Chromosomal-level assembly of the Asian Seabass genome using long sequence reads and multi-layered scaffolding

    Get PDF
    We report here the ~670 Mb genome assembly of the Asian seabass (Lates calcarifer), a tropical marine teleost. We used long-read sequencing augmented by transcriptomics, optical and genetic mapping along with shared synteny from closely related fish species to derive a chromosome-level assembly with a contig N50 size over 1 Mb and scaffold N50 size over 25 Mb that span ~90% of the genome. The population structure of L. calcarifer species complex was analyzed by re-sequencing 61 individuals representing various regions across the species' native range. SNP analyses identified high levels of genetic diversity and confirmed earlier indications of a population stratification comprising three clades with signs of admixture apparent in the South-East Asian population. The quality of the Asian seabass genome assembly far exceeds that of any other fish species, and will serve as a new standard for fish genomics

    RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments.</p> <p>Results</p> <p>We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene.</p> <p>Conclusions</p> <p>RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.</p

    Saltatory remodeling of Hox chromatin in response to rostrocaudal patterning signals

    Get PDF
    Hox genes controlling motor neuron subtype identity are expressed in rostrocaudal patterns that are spatially and temporally collinear with their chromosomal organization. Here we demonstrate that Hox chromatin is subdivided into discrete domains that are controlled by rostrocaudal patterning signals that trigger rapid, domain-wide clearance of repressive histone H3 Lys27 trimethylation (H3K27me3) polycomb modifications. Treatment of differentiating mouse neural progenitors with retinoic acid leads to activation and binding of retinoic acid receptors (RARs) to the Hox1–Hox5 chromatin domains, which is followed by a rapid domain-wide removal of H3K27me3 and acquisition of cervical spinal identity. Wnt and fibroblast growth factor (FGF) signals induce expression of the Cdx2 transcription factor that binds and clears H3K27me3 from the Hox1–Hox9 chromatin domains, leading to specification of brachial or thoracic spinal identity. We propose that rapid clearance of repressive modifications in response to transient patterning signals encodes global rostrocaudal neural identity and that maintenance of these chromatin domains ensures the transmission of positional identity to postmitotic motor neurons later in development.Leona M. and Harry B. Helmsley Charitable TrustNational Institutes of Health (U.S.) (Grant P01 NS055923)Smith Family Foundatio

    HMMSplicer: A Tool for Efficient and Sensitive Discovery of Known and Novel Splice Junctions in RNA-Seq Data

    Get PDF
    Background: High-throughput sequencing of an organism’s transcriptome, or RNA-Seq, is a valuable and versatile new strategy for capturing snapshots of gene expression. However, transcriptome sequencing creates a new class of alignment problem: mapping short reads that span exon-exon junctions back to the reference genome, especially in the case where a splice junction is previously unknown. Methodology/Principal Findings: Here we introduce HMMSplicer, an accurate and efficient algorithm for discovering canonical and non-canonical splice junctions in short read datasets. HMMSplicer identifies more splice junctions than currently available algorithms when tested on publicly available A. thaliana, P. falciparum, and H. sapiens datasets without a reduction in specificity. Conclusions/Significance: HMMSplicer was found to perform especially well in compact genomes and on genes with low expression levels, alternative splice isoforms, or non-canonical splice junctions. Because HHMSplicer does not rely on prebuilt gene models, the products of inexact splicing are also detected. For H. sapiens, we find 3.6 % of 39 splice sites and 1.4% of 59 splice sites are inexact, typically differing by 3 bases in either direction. In addition, HMMSplicer provides a score for every predicted junction allowing the user to set a threshold to tune false positive rates depending on the needs of the experiment. HMMSplicer is implemented in Python. Code and documentation are freely available a

    Transcriptomic analysis of crustacean neuropeptide signaling during the moult cycle in the green shore crab, Carcinus maenas

    Get PDF
    Abstract Background Ecdysis is an innate behaviour programme by which all arthropods moult their exoskeletons. The complex suite of interacting neuropeptides that orchestrate ecdysis is well studied in insects, but details of the crustacean ecdysis cassette are fragmented and our understanding of this process is comparatively crude, preventing a meaningful evolutionary comparison. To begin to address this issue we identified transcripts coding for neuropeptides and their putative receptors in the central nervous system (CNS) and Y-organs (YO) within the crab, Carcinus maenas, and mapped their expression profiles across accurately defined stages of the moult cycle using RNA-sequencing. We also studied gene expression within the epidermally-derived YO, the only defined role for which is the synthesis of ecdysteroid moulting hormones, to elucidate peptides and G protein-coupled receptors (GPCRs) that might have a function in ecdysis. Results Transcriptome mining of the CNS transcriptome yielded neuropeptide transcripts representing 47 neuropeptide families and 66 putative GPCRs. Neuropeptide transcripts that were differentially expressed across the moult cycle included carcikinin, crustacean hyperglycemic hormone-2, and crustacean cardioactive peptide, whilst a single putative neuropeptide receptor, proctolin R1, was differentially expressed. Carcikinin mRNA in particular exhibited dramatic increases in expression pre-moult, suggesting a role in ecdysis regulation. Crustacean hyperglycemic hormone-2 mRNA expression was elevated post- and pre-moult whilst that for crustacean cardioactive peptide, which regulates insect ecdysis and plays a role in stereotyped motor activity during crustacean ecdysis, was elevated in pre-moult. In the YO, several putative neuropeptide receptor transcripts were differentially expressed across the moult cycle, as was the mRNA for the neuropeptide, neuroparsin-1. Whilst differential gene expression of putative neuropeptide receptors was expected, the discovery and differential expression of neuropeptide transcripts was surprising. Analysis of GPCR transcript expression between YO and epidermis revealed 11 to be upregulated in the YO and thus are now candidates for peptide control of ecdysis. Conclusions The data presented represent a comprehensive survey of the deduced C. maenas neuropeptidome and putative GPCRs. Importantly, we have described the differential expression profiles of these transcripts across accurately staged moult cycles in tissues key to the ecdysis programme. This study provides important avenues for the future exploration of functionality of receptor-ligand pairs in crustaceans

    Accurate Genome Relative Abundance Estimation Based on Shotgun Metagenomic Reads

    Get PDF
    Accurate estimation of microbial community composition based on metagenomic sequencing data is fundamental for subsequent metagenomics analysis. Prevalent estimation methods are mainly based on directly summarizing alignment results or its variants; often result in biased and/or unstable estimates. We have developed a unified probabilistic framework (named GRAMMy) by explicitly modeling read assignment ambiguities, genome size biases and read distributions along the genomes. Maximum likelihood method is employed to compute Genome Relative Abundance of microbial communities using the Mixture Model theory (GRAMMy). GRAMMy has been demonstrated to give estimates that are accurate and robust across both simulated and real read benchmark datasets. We applied GRAMMy to a collection of 34 metagenomic read sets from four metagenomics projects and identified 99 frequent species (minimally 0.5% abundant in at least 50% of the data- sets) in the human gut samples. Our results show substantial improvements over previous studies, such as adjusting the over-estimated abundance for Bacteroides species for human gut samples, by providing a new reference-based strategy for metagenomic sample comparisons. GRAMMy can be used flexibly with many read assignment tools (mapping, alignment or composition-based) even with low-sensitivity mapping results from huge short-read datasets. It will be increasingly useful as an accurate and robust tool for abundance estimation with the growing size of read sets and the expanding database of reference genomes

    Synergistic binding of transcription factors to cell-specific enhancers programs motor neuron identity

    Get PDF
    Efficient transcriptional programming promises to open new frontiers in regenerative medicine. However, mechanisms by which programming factors transform cell fate are unknown, preventing more rational selection of factors to generate desirable cell types. Three transcription factors, Ngn2, Isl1 and Lhx3, were sufficient to program rapidly and efficiently spinal motor neuron identity when expressed in differentiating mouse embryonic stem cells. Replacement of Lhx3 by Phox2a led to specification of cranial, rather than spinal, motor neurons. Chromatin immunoprecipitation–sequencing analysis of Isl1, Lhx3 and Phox2a binding sites revealed that the two cell fates were programmed by the recruitment of Isl1-Lhx3 and Isl1-Phox2a complexes to distinct genomic locations characterized by a unique grammar of homeodomain binding motifs. Our findings suggest that synergistic interactions among transcription factors determine the specificity of their recruitment to cell type–specific binding sites and illustrate how a single transcription factor can be repurposed to program different cell types.Project ALS FoundationNational Institutes of Health (U.S.) (Grant P01 NS055923

    Virulence Regulator EspR of Mycobacterium tuberculosis Is a Nucleoid-Associated Protein

    Get PDF
    The principal virulence determinant of Mycobacterium tuberculosis (Mtb), the ESX-1 protein secretion system, is positively controlled at the transcriptional level by EspR. Depletion of EspR reportedly affects a small number of genes, both positively or negatively, including a key ESX-1 component, the espACD operon. EspR is also thought to be an ESX-1 substrate. Using EspR-specific antibodies in ChIP-Seq experiments (chromatin immunoprecipitation followed by ultra-high throughput DNA sequencing) we show that EspR binds to at least 165 loci on the Mtb genome. Included in the EspR regulon are genes encoding not only EspA, but also EspR itself, the ESX-2 and ESX-5 systems, a host of diverse cell wall functions, such as production of the complex lipid PDIM (phenolthiocerol dimycocerosate) and the PE/PPE cell-surface proteins. EspR binding sites are not restricted to promoter regions and can be clustered. This suggests that rather than functioning as a classical regulatory protein EspR acts globally as a nucleoid-associated protein capable of long-range interactions consistent with a recently established structural model. EspR expression was shown to be growth phase-dependent, peaking in the stationary phase. Overexpression in Mtb strain H37Rv revealed that EspR influences target gene expression both positively or negatively leading to growth arrest. At no stage was EspR secreted into the culture filtrate. Thus, rather than serving as a specific activator of a virulence locus, EspR is a novel nucleoid-associated protein, with both architectural and regulatory roles, that impacts cell wall functions and pathogenesis through multiple genes
    • …
    corecore