955 research outputs found

    Do human transposable element small RNAs serve primarily as genome defenders or genome regulators?

    Get PDF
    It is currently thought that small RNA (sRNA) based repression mechanisms are primarily employed to mitigate the mutagenic threat posed by the activity of transposable elements (TEs). This can be achieved by the sRNA guided processing of TE transcripts via Dicer-dependent (e.g., siRNA) or Dicer-independent (e.g., piRNA) mechanisms. For example, potentially active human L1 elements are silenced by mRNA cleavage induced by element encoded siRNAs, leading to a negative correlation between element mRNA and siRNA levels. On the other hand, there is emerging evidence that TE derived sRNAs can also be used to regulate the host genome. Here, we evaluated these two hypotheses for human TEs by comparing the levels of TE derived mRNA and TE sRNA across six tissues. The genome defense hypothesis predicts a negative correlation between TE mRNA and TE sRNA levels, whereas the genome regulatory hypothesis predicts a positive correlation. On average, TE mRNA and TE sRNA levels are positively correlated across human tissues. These correlations are higher than seen for human genes or for randomly permuted control data sets. Overall, Alu subfamilies show the highest positive correlations of element mRNA and sRNA levels across tissues, although a few of the youngest, and potentially most active, Alu subfamilies do show negative correlations. Thus, Alu derived sRNAs may be related to both genome regulation and genome defense. These results are inconsistent with a simple model whereby TE derived sRNAs reduce levels of standing TE mRNA via transcript cleavage, and suggest that human cells efficiently process TE transcripts into sRNA based on the available message levels. This may point to a widespread role for processed TE transcripts in genome regulation or to alternative roles of TE-to-sRNA processing including the mitigation of TE transcript cytotoxicity

    Data from: Genotyping-by-Sequencing for Populus Population Genomics: An Assessment of Genome Sampling Patterns and Filtering Approaches

    Get PDF
    Continuing advances in nucleotide sequencing technology are inspiring a suite of genomic approaches in studies of natural populations. Researchers are faced with data management and analytical scales that are increasing by orders of magnitude. With such dramatic advances comes a need to understand biases and error rates, which can be propagated and magnified in large-scale data acquisition and processing. Here we assess genomic sampling biases and the effects of various population-level data filtering strategies in a genotyping-by-sequencing (GBS) protocol. We focus on data from two species of Populus, because this genus has a relatively small genome and is emerging as a target for population genomic studies. We estimate the proportions and patterns of genomic sampling by examining the Populus trichocarpa genome (Nisqually-1), and demonstrate a pronounced bias towards coding regions when using the methylation-sensitive ApeKI restriction enzyme in this species. Using population-level data from a closely related species (P. tremuloides), we also investigate various approaches for filtering GBS data to retain high-depth, informative SNPs that can be used for population genetic analyses. We find a data filter that includes the designation of ambiguous alleles resulted in metrics of population structure and Hardy-Weinberg equilibrium that were most consistent with previous studies of the same populations based on other genetic markers. Analyses of the filtered data (27,910 SNPs) also resulted in patterns of heterozygosity and population structure similar to a previous study using microsatellites. Our application demonstrates that technically and analytically simple approaches can readily be developed for population genomics of natural populations

    Transcriptional landscape of repetitive elements in normal and cancer human cells

    Get PDF
    BACKGROUND: Repetitive elements comprise at least 55% of the human genome with more recent estimates as high as two-thirds. Most of these elements are retrotransposons, DNA sequences that can insert copies of themselves into new genomic locations by a “copy and paste” mechanism. These mobile genetic elements play important roles in shaping genomes during evolution, and have been implicated in the etiology of many human diseases. Despite their abundance and diversity, few studies investigated the regulation of endogenous retrotransposons at the genome-wide scale, primarily because of the technical difficulties of uniquely mapping high-throughput sequencing reads to repetitive DNA. RESULTS: Here we develop a new computational method called RepEnrich to study genome-wide transcriptional regulation of repetitive elements. We show that many of the Long Terminal Repeat retrotransposons in humans are transcriptionally active in a cell line-specific manner. Cancer cell lines display increased RNA Polymerase II binding to retrotransposons than cell lines derived from normal tissue. Consistent with increased transcriptional activity of retrotransposons in cancer cells we found significantly higher levels of L1 retrotransposon RNA expression in prostate tumors compared to normal-matched controls. CONCLUSIONS: Our results support increased transcription of retrotransposons in transformed cells, which may explain the somatic retrotransposition events recently reported in several types of cancers. ELECTRONIC SUPPLEMENTARY MATERIAL: Supplementary material is available for this article at 10.1186/1471-2164-15-583 and is accessible for authorized users

    Index-Free De Novo Assembly and Deconvolution of Mixed Mitochondrial Genomes

    Get PDF
    Second-generation sequencing technology has allowed a very large increase in sequencing throughput. In order to make use of this high throughput, we have developed a pipeline for sequencing and de novo assembly of multiple mitochondrial genomes without the costs of indexing. Simulation studies on a mixture of diverse animal mitochondrial genomes showed that mitochondrial genomes could be reassembled from a high coverage of short (35 nt) reads, such as those generated by a second-generation Illumina Genome Analyzer. We then assessed this experimentally with long-range polymerase chain reaction products from mitochondria of a human, a rat, a bird, a frog, an insect, and a mollusc. Comparison with reference genomes was used for deconvolution of the assembled contigs rather than for mapping of sequence reads. As proof of concept, we report the complete mollusc mitochondrial genome of an olive shell (Amalda northlandica). It has a very unusual putative control region, which contains a structure that would probably only be detectable by next-generation sequencing. The general approach has considerable potential, especially when combined with indexed sequencing of different groups of genomes

    RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments.</p> <p>Results</p> <p>We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene.</p> <p>Conclusions</p> <p>RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.</p
    corecore