60 research outputs found

    Global properties of assembled transcriptomes.

    No full text
    <p><b>a)</b> Percentage of annotated and novel genes and transcripts using strand-specific deep polyA+ RNA sequencing. Classification is based on the comparison to reference gene annotations in Ensembl v.75. 70.65 and 87.77% of annotated genes in human and mouse are classified as protein-coding, respectively. Number of genes identified: human 34,188; chimpanzee, 35,915; macaque 34,427; mouse 31,043. Number of transcripts identified: human 99,670; chimpanzee 102,262; macaque 93,860; mouse 85,688. <b>b)</b> Cumulative density of nucleotide length in annotated and novel assembled transcripts. <b>c)</b> Cumulative density of expression values in logarithmic scale in annotated and novel assembled transcripts. Expression is measured in fragments per kilobase per million mapped reads (FPKM) values, selecting the maximum value across all samples.</p

    Origins of <i>De Novo</i> Genes in Human and Chimpanzee

    No full text
    <div><p>The birth of new genes is an important motor of evolutionary innovation. Whereas many new genes arise by gene duplication, others originate at genomic regions that did not contain any genes or gene copies. Some of these newly expressed genes may acquire coding or non-coding functions and be preserved by natural selection. However, it is yet unclear which is the prevalence and underlying mechanisms of <i>de novo</i> gene emergence. In order to obtain a comprehensive view of this process, we have performed in-depth sequencing of the transcriptomes of four mammalian species—human, chimpanzee, macaque, and mouse—and subsequently compared the assembled transcripts and the corresponding syntenic genomic regions. This has resulted in the identification of over five thousand new multiexonic transcriptional events in human and/or chimpanzee that are not observed in the rest of species. Using comparative genomics, we show that the expression of these transcripts is associated with the gain of regulatory motifs upstream of the transcription start site (TSS) and of U1 snRNP sites downstream of the TSS. In general, these transcripts show little evidence of purifying selection, suggesting that many of them are not functional. However, we find signatures of selection in a subset of <i>de novo</i> genes which have evidence of protein translation. Taken together, the data support a model in which frequently-occurring new transcriptional events in the genome provide the raw material for the evolution of new proteins.</p></div

    Coding potential of <i>de novo</i> genes.

    No full text
    <p><b>a-d)</b> ORF length and coding score for ORFs in different sequence types. <i>De novo</i> gene, longest ORF in <i>de novo</i> transcripts (n = 1,933). CodRNA (all), annotated coding sequences from Ensembl v.75 (n = 8,462). CodRNA (short), annotated coding sequences sampled as to have the same transcript length distribution as <i>de novo</i> transcripts (n = 1,952). Intron, longest ORF in intronic sequences from annotated genes sampled as to have the same transcript length distribution as <i>novo</i> transcripts (n = 5,000); Proteogenomics—ORFs in <i>de novo</i> transcripts with peptide evidence by mass-spectrometry; Ribosome profiling—ORFs in <i>de novo</i> transcripts with ribosome association evidence in brain. <b>e)</b> Example of hominoid-specific <i>de novo</i> gene with evidence of protein expression from proteogenomics, with RNA-Seq read profiles in two human samples. <b>(f)</b> Example of hominoid-specific <i>de novo</i> gene with RNA-Seq and ribosome profiling read profiles. Predicted coding sequences are highlighted with red boxes and the putative encoded protein sequences displayed.</p

    Identification and characterization of <i>de novo</i> genes in human and chimpanzee.

    No full text
    <p><b>a)</b> Simplified phylogenetic tree indicating the nine species considered in this study. In all species we had RNA-Seq data from several tissues. Chimpanzee, human, macaque and mouse were the species for which we performed strand-specific deep polyA+ RNA sequencing. We indicate the branches in which <i>de novo</i> genes were defined, together with the number of genes. <b>b)</b> Categories of transcripts in <i>de novo</i> genes based on genomic location. Intergenic, transcripts that do not overlap any other gene; Overlapping antisense, transcripts that overlap exons from other genes in the opposite strand; Overlapping intronic, transcripts that overlap introns from other genes in the opposite strand, with no exonic overlap. <b>c)</b> Classification of <i>de novo</i> genes based on existing evidence in databases. Annotated; genes classified as annotated in Ensembl v.75; EST/nr; non-annotated genes with BLAST hits (10<sup>−4</sup>) to expressed sequence tags (EST) and/or non-redundant protein (nr) sequences in the same species. Novel; rest of genes. <b>d)</b> Patterns of gene expression in four tissues. Brain refers to frontal cortex. Transcripts with FPKM > 0 in a tissue are considered as expressed in that tissue. In red boxes, fraction of transcripts whose expression is restricted to that tissue (τ > 0.85, see <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005721#sec009" target="_blank">Methods</a>). Chimp conserved, transcripts assembled in chimpanzee not classified as <i>de novo</i>. Human conserved, transcripts assembled in human not classified as <i>de novo</i>. <b>e)</b> Number of testis GTEx samples with expression of <i>de novo</i> and conserved genes. We considered all annotated genes with FPKM > 0 in at least one testis sample. Conserved, genes sampled from the total pool of annotated genes analyzed in GTEx with the same distribution of FPKM values than in annotated <i>de novo</i> genes (n = 200).</p

    Divergence with macaque syntenic regions.

    No full text
    <p>Estimated number of substitutions per Kb (PAML). Dataset 3 corresponds to the genes in <a href="http://www.plosgenetics.org/article/info:doi/10.1371/journal.pgen.1005721#pgen.1005721.t001" target="_blank">Table 1</a>. ORF in datasets 1 and 2 is the longest ORF in the transcript. Introns refers to sampled intronic regions of size 500 bp from the same set of transcripts. We tested for differences between complete exons and introns, and ORF and introns with the Fisher test.</p

    Recent signatures of transcription in <i>de novo</i> genes.

    No full text
    <p><b>a)</b> Overrepresented transcription factor binding sites (TFBS) in the region -100 to 0 with respect to the transcription start site (TSS) in <i>de novo</i> genes. The region from -300 to +300 with respect to the TSS was analysed (n = 3,875). Color code relates to normalized values (highest value is yellow). <b>b)</b> Fine-grained motif density 200bp upstream of the TSS is shown. <b>c)</b> Comparison of motif density in genomic syntenic regions in macaque for <i>de novo</i> transcripts (n = 3,116) and conserved transcripts (n = 4,323, randomly taken human and chimpanzee annotated transcripts not classified as <i>de novo</i>). Significant differences between human/chimpanzee and macaque are indicated; Fisher-test; *, p-value < 0.05; **, p-value < 0.01. <b>d)</b> Density of the main human transposable elements (TE) families around the TSS of <i>de novo</i> and conserved transcripts. Regions -3 kB to +3 kB with respect to the TSS were analyzed. LTR frequency is higher in the region -100 to +100 in de novo genes when compared to conserved genes (Fisher-test p-value < 10<sup>−18</sup>). <b>e)</b> Comparison of motif density in promoters with and without long terminal repeat (LTR) in the region -500 to 0 with respect to the TSS. Significant differences in motif density in the -100 bp window are indicated. <b>f)</b> Signatures of transcription elongation in <i>de novo</i> and conserved genes. Density of U1 and PAS motifs in the 500bp region upstream and downstream of the TSS. Comparison of U1 and PAS motif density in genomic syntenic regions in macaque for <i>de novo</i> transcripts (n = 3,116) and conserved transcripts (n = 4,323). There is an increase of U1 motifs in <i>de novo</i> transcripts when compared to macaque (indicated by a black arrow, Fisher-test, p-value = 0.016 for the region +100 to +200).</p

    Global properties of assembled transcriptomes.

    No full text
    <p><b>a)</b> Percentage of annotated and novel genes and transcripts using strand-specific deep polyA+ RNA sequencing. Classification is based on the comparison to reference gene annotations in Ensembl v.75. 70.65 and 87.77% of annotated genes in human and mouse are classified as protein-coding, respectively. Number of genes identified: human 34,188; chimpanzee, 35,915; macaque 34,427; mouse 31,043. Number of transcripts identified: human 99,670; chimpanzee 102,262; macaque 93,860; mouse 85,688. <b>b)</b> Cumulative density of nucleotide length in annotated and novel assembled transcripts. <b>c)</b> Cumulative density of expression values in logarithmic scale in annotated and novel assembled transcripts. Expression is measured in fragments per kilobase per million mapped reads (FPKM) values, selecting the maximum value across all samples.</p

    miRNA conservation in clustered and non-clustered miRNAs.

    No full text
    <p>(<b>A</b>) Single nucleotide variant (SNV) density in clustered or non-clustered miRNAs, calculated as the average number of fixed substitutions in any of the great ape populations across the precursor miRNA. (<b>B</b>) Molecular age of clustered and non-clustered miRNAs. Molecular age is taken from Iwama et al. [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0154194#pone.0154194.ref017" target="_blank">17</a>] were each integer represents a period of origin with the oldest miRNAs having a value of -1 (right after the split between mammals and birds) and the youngest a value of 13 (after the split between humans and chimpanzees). (<b>C</b>) Correlation between SNV density and expression, calculated as the average expression values for miRNAs across five human tissues (cerebellum, brain, heart, kidney and testis) taken from Meunier et al. [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0154194#pone.0154194.ref016" target="_blank">16</a>].</p

    Quantification of intensity of 56 kD band in SNA staining on SDS-PAGE of trachea and bronchus tissue samples from cynomolgus and rhesus monkeys.

    No full text
    <p>Indicated is the intensity of signal as percentage of signal from 62kD band of Fetuin (positive control).</p><p>Quantification of intensity of 56 kD band in SNA staining on SDS-PAGE of trachea and bronchus tissue samples from cynomolgus and rhesus monkeys.</p
    corecore