22 research outputs found

    Selection on CTCF motif sites.

    No full text
    <p>(A) Proportion of binding sites with conserved motifs. The bar plots show proportions of <i>D. melanogaster</i>–specific (pink) and shared (green) binding sites that have conserved motifs between each species pair. A binding site is defined as having conserved motifs if there is at least one species-specific motif identified in the corresponding orthologous sequences. The <i>p</i> value cutoff for FIMO motif searching here is 0.005. For any species pair, the proportion of conserved (here shared) binding sites having conserved motifs is significantly higher than the diverged (here <i>D. melanogaster</i>–specific) binding sites. Significance levels: * <i>p</i><0.05; ** <i>p</i><0.01, two-sided Fisher's exact test. (B) Mean Tajima's D values for CTCF-motif sites. Tajima's D values were calculated using 37 <i>D. melanogaster</i> North American strains' polymorphism data for various groups of CTCF-motif sites, the synonymous and nonsynonymous sites of nearest genes, and randomly sampled 3′UTR, 5′UTR, and intergenic 9 bp sites. The center of each filled circle depicts the mean Tajima's D value for each group, with the error bar indicating 2 standard deviations. (C and D) Estimated shared proportion of adaptation with neutral reference to nearest gene synonymous sites (C) and a set of small introns (D). <i>D. yakuba</i> sequences were used as an out-group for estimating alpha values for different groups of CTCF-motif sites using an extension of the MK test framework. The filled colored circles depict the shared alpha value estimated within each group, with the error bar indicating the 95% confidence interval. Label abbreviations: Syn, synonymous sites of nearest genes of CTCF binding sites; Nonsyn, non-synonymous sites of nearest genes of CTCF binding sites; TWOB, CTCF-motif sites associated with two-way orthologous binding events between <i>D. melanogaster</i> and the out-group; conserved TWOB, CTCF-motif sites associated with conserved two-way orthologous binding events; diverged TWOB, CTCF-motif sites associated with <i>D. melanogaster</i>–specific two-way othologous binding events; FWOB binding, sites associated with four-way orthologous binding events; Young FWOB, sites associated with FWOBs, for which the age is estimated to be <2.5 Myr; old FWOB, sites associated with FWOBs, for which the age is estimated to be >6 Myr.</p

    Functional consequences of CTCF binding evolution.

    No full text
    <p>(A–B) CTCF binding evolution is associated with gene expression evolution. The bar plots show the proportion of genes with diverged expression between (A) <i>D. melanogaster/D. simulans</i> and (B) <i>D. melanogaster/D. yakuba</i> comparisons associated with different groups of CTCF binding sites: Genome-wide (black), Conserved TWOB (pink), Diverged TWOB (green), Old FWOB (orange), and Young FWOB (light purple). The table below each bar plot shows the number of genes with diverged and conserved gene expression in the corresponding comparisons and associated with the corresponding CTCF binding sites. For each groups of CTCF binding sites, the associated genes are the union of the nearest gene to each binding site. The evolutionary status of gene expression (conserved or diverged) is determined using triplicate WPP mRNA-seq data through a generalized linear regression framework. Label abbreviations are the same as described in <a href="http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1001420#pbio-1001420-g003" target="_blank">Figure 3</a>. Significance levels: * <i>p</i><0.05; **<i>p</i><0.01; one-sided Fisher's exact test. (C–E) CTCF binding evolution is correlated with new gene origination. The four colored wiggle tracks in each of the plots show the ChIP CDP enrichment scores of the four species (<i>D. melanogaster</i>, blue; <i>D. simulans</i>, green; <i>D. yakuba</i>, orange; <i>D. pseudoobscura</i>, purple) across different genomic regions. CTCF binding peaks are observed in <i>D. melanogaster</i>, <i>D. simulans</i>, and <i>D. yakuba</i> at flanking genomic regions of newly evolved genes <i>TFII-A-S2</i> (C) and <i>CheB93a</i> (D). The two genes both originated after the split of the <i>melanogaster</i> group with the <i>pseudoobscura</i> group. CTCF binding peak is only observed in the <i>D. melanogaster</i> genome in the flanking genomic regions of <i>D. melanogaster</i> lineage-specific gene <i>sphinx</i> (E).</p

    Diverged CTCF binding between <i>Drosophila</i> species.

    No full text
    <p>(A) Evolutionary dynamics of CTCF binding profiles at the <i>Bithorax complex</i> region. The four colored wiggle file tracks show the ChIP CDP enrichment scores estimated from our quantitative analysis pipeline for the four species: <i>D. melanogaster</i> (blue), <i>D. simulans</i> (green), <i>D. yakuba</i> (orange), and <i>D. pseudoobscura</i> (purple). The four tracks are at the same scale, with the height of each curve at each coordinate denoting the enrichment score values. In the top panel, the blue arrows point to examples of conserved binding events across the four species, and the red arrows point to examples of diverged binding events between species. The fifth track shows the boundaries of previously identified insulator elements (in sky blue). The last track shows the genes in the genomic region. (B) Number of conserved and diverged binding events. From left to right, the three bar plots show the number of <i>D. melanogaster</i>–specific (pink), shared (blue), and non–<i>D. melanogaster</i> (D.xxx, yellow) specific binding events between each of the species pairs (<i>D. melanogaster/D. simulans</i>, <i>D. melanogaster/D. yakuba</i>, and <i>D. melanogaster/D. pseudoobscura</i>) for all binding events possibly identified (All, left), Two-Way Orthologous Binding events (TWOB, middle), and Four-Way Orthologous Binding events (FWOB, right). TWOB is defined as a binding event identified in regions where the sequence identity between the two compared species is >50%. FWOB is defined as a binding event identified in regions where the sequence identity across all four species is >50%. (C) Linear increase of pair-wise binding divergence with species divergent time. The binding divergence is calculated as the percent of <i>D. melanogaster</i> binding events not shared with the non–<i>D. melanogaster</i> species in each pair-wise comparison. Different shaped and colored points represent different groups of binding events as indicated by the legend. The red dashed line depicts the fitted linear regression line of TWOB binding divergence with divergent time. (D) Evolutionary groups of CTCF binding events. Top panel, representative dynamic binding profiles in the four <i>Drosophila</i> species (<i>D. melanogaster</i>, blue; <i>D. simulans</i>, green; <i>D. yakuba</i>, orange; <i>D. pseudoobscura</i>, purple) illustrating examples of 15 mutually exclusive evolutionary groups of binding status. The height at each binding curve denotes the ChIP CDP enrichment score estimated from our analyses pipeline. For each evolutionary group, the <i>y</i>-axes of the four binding curves are at the same scale. The first row of the lower table shows the Boolean conservation score corresponding to the binding profiles, where 0 indicates absence of binding event and 1 indicates the presence of binding events. The second and third rows of the lower table summarize the number of all binding events (second row) and FWOB events (third row) falling into each evolutionary group. The last row of the lower table shows the inferred evolutionary age for different groups of <i>D. melanogaster</i> binding events using Parsimony methods. * As for the evolutionary group with boolean conservation score 0,1,1,1, there is no instance identified in our analyses, so the representative binding profile in the figure is generated by artificially modifying another binding profile to represent the specific category.</p

    Transcending ambivalence: Overcoming the ambiguity of theory and practices

    Get PDF
    Ambivalence is a deeply ambiguous concept. Contributions to the present book, viewed all together, exemplify that verdict, as they are situated in the intellectual space among theory, phenomena, research practices, and basic assumptions about the world..

    Strand-specific RNA–Seq in five rhesus tissues reveals clear transcript structure for <i>de novo</i> genes.

    No full text
    <p>(A) An example of <i>de novo</i> gene <i>ENST00000315302</i> partially overlapped with a pre-existing gene <i>ODZ3</i>, transcribed by the other strand of the DNA. The ortholog of <i>ENST00000315302</i> in rhesus macaque was aligned according to genome-wide multiple alignments in UCSC. The junction reads generated by strand-specific RNA-Seq assays are highlighted by black bold lines, with fragments of junction reads crossing splicing junctions connected by thinner lines. The mapped reads well supported the transcription of the target <i>de novo</i> gene on the reverse strand, as most reads appeared in the track for ‘reads transcribed from the minus-strand’. Regions for all four splicing junctions are highlighted in dotted boxes and expanded in (B), including three in <i>ENST00000315302</i> transcribed from the minus strand and one from the other strand. All of these splicing junctions were well supported by the RNA-Seq reads mapped on the corresponding strand of the DNA. Vertical dotted lines in brown or blue highlight the exon boundaries in transcripts on the minus or plus strands, respectively. (C) Demo case for a discarded <i>de novo</i> gene in the manual curation process, in which the RNA-Seq data in rhesus macaque were not consistent with the putative splicing pattern predicted on the basis of human gene models. The common disabler is marked with a red star, and this was actually spliced out in rhesus macaque as indicated by the junction reads. Scale bar shown as benchmark for gene size.</p

    Non-coding orthologs of human <i>de novo</i> protein-coding genes in rhesus macaque and chimpanzee show tissue expression profiles similar to human.

    No full text
    <p>(A) Hierarchical clustering chart of tissue expression proportions. For each gene in one species, tissue expression proportions were calculated by normalizing RPKM scores with the total expression level of the gene in that species. The scores were then clustered according to similarity using complete linkage hierarchical clustering. For each gene, cross-tissue correlation coefficients between human and chimpanzee (H–C), chimpanzee and rhesus macaque (C–R) and human and rhesus macaque (H–R) are shown. (B) Correlation coefficient scores for tissue expression profiles between human and rhesus macaque. Correlation coefficients for <i>de novo</i> genes (brown histograms) are illustrated with background simulated by 10,000 <i>Monte Carlo</i> simulations neglecting ortholog relationship for the tissue expression profile (blue histograms, mean scores are shown). (C) For each pair of tissues, Spearman correlation coefficients were computed separately and the extent of tissue-specific differences in <i>de novo</i> gene expressions are shown. Dotted lines highlight comparisons between pairs of corresponding tissues in different species. Grey boxes: missing data. <sup>*</sup>Correlation coefficient not available due to low tissue expressions in one or both species. <sup>#</sup>Gene reported in previous study as human-specific <i>de novo</i> protein-coding gene.</p

    Orthologs of human <i>de novo</i> protein-coding genes encode structure-matched non-coding RNAs in rhesus macaque or chimpanzee.

    No full text
    <p>(A) Summed RPKM scores (log<sub>2</sub> transformed) of <i>de novo</i> genes in seven tissues from human and rhesus macaque. The human genes were ordered by decreasing expression level as a reference, and the rhesus genes were aligned accordingly. (B) For each <i>de novo</i> gene in Classes I and II, the base-level densities of RNA-Seq reads across the transcript (red), as well as the upstream/downstream regions (grey, 50% of the length of the transcript), are shown. The raw density scores computed from RNA-Seq reads coverage were normalized with the total reads across the region. (C) Splicing junctions with the sequence motifs near both the donor site and acceptor site, summarized by all splicing junctions in human <i>de novo</i> genes. (D) Venn diagram showing the numbers of human splicing junctions detected also in chimpanzee or rhesus macaque. Pie charts further illustrate the detailed status of human splicing junctions in chimpanzee and rhesus macaque.</p

    Genome-wide identification of hominoid-specific <i>de novo</i> protein-coding genes.

    No full text
    <p>(A) On the basis of the gene locus and ORF age assignments, hominoid-specific <i>de novo</i> protein-coding genes were identified. Regions within dotted red lines indicate the repeating steps for each out-group species. We further filtered this list using stringent inclusion criteria and generated a smaller convincing list of 24 <i>de novo</i> genes. (B) Distribution of protein length for the 24 <i>de novo</i> genes, compared with the human genome as background. (C) Distribution of summed RPKM scores of the 24 <i>de novo</i> genes in seven human tissues, compared with the human genome as background. (D) Pie chart showing the distribution of the 24 <i>de novo</i> protein-coding genes in terms of the reuse of preexisting transcriptional context. Gene numbers in each category are marked. None: no evidence for the reuse of transcriptional context; bi: located downstream of bi-directional promoter; +: overlapping with preexisting genes on the same strand; −: overlapping with preexisting genes on the opposite strand. (E) Venn diagrams showing the contribution of <i>Alu</i> sequences to exons and splicing junctions in <i>de novo</i> protein-coding genes.</p

    RNA Editome in Rhesus Macaque Shaped by Purifying Selection

    No full text
    <div><p>Understanding of the RNA editing process has been broadened considerably by the next generation sequencing technology; however, several issues regarding this regulatory step remain unresolved – the strategies to accurately delineate the editome, the mechanism by which its profile is maintained, and its evolutionary and functional relevance. Here we report an accurate and quantitative profile of the RNA editome for rhesus macaque, a close relative of human. By combining genome and transcriptome sequencing of multiple tissues from the same animal, we identified 31,250 editing sites, of which 99.8% are A-to-G transitions. We verified 96.6% of editing sites in coding regions and 97.5% of randomly selected sites in non-coding regions, as well as the corresponding levels of editing by multiple independent means, demonstrating the feasibility of our experimental paradigm. Several lines of evidence supported the notion that the adenosine deamination is associated with the macaque editome – A-to-G editing sites were flanked by sequences with the attributes of <i>ADAR</i> substrates, and both the sequence context and the expression profile of <i>ADARs</i> are relevant factors in determining the quantitative variance of RNA editing across different sites and tissue types. In support of the functional relevance of some of these editing sites, substitution valley of decreased divergence was detected around the editing site, suggesting the evolutionary constraint in maintaining some of these editing substrates with their double-stranded structure. These findings thus complement the “continuous probing” model that postulates tinkering-based origination of a small proportion of functional editing sites. In conclusion, the macaque editome reported here highlights RNA editing as a widespread functional regulation in primate evolution, and provides an informative framework for further understanding RNA editing in human.</p></div
    corecore