15 research outputs found

    Striking Similarities in the Genomic Distribution of Tandemly Arrayed Genes in Arabidopsis and Rice

    Get PDF
    In Arabidopsis, tandemly arrayed genes (TAGs) comprise > 10% of the genes in the genome. These duplicated genes represent a rich template for genetic innovation, but little is known of the evolutionary forces governing their generation and maintenance. Here we compare the organization and evolution of TAGs between Arabidopsis and rice, two plant genomes that diverged; 150 million years ago. TAGs from the two genomes are similar in a number of respects, including the proportion of genes that are tandemly arrayed, the number of genes within an array, the number of tandem arrays, and the dearth of TAGs relative to single copy genes in centromeric regions. Analysis of recombination rates along rice chromosomes confirms a positive correlation between the occurrence of TAGs and recombination rate, as found in Arabidopsis. TAGs are also biased functionally relative to duplicated, nontandemly arrayed genes. In both genomes, TAGs are enriched for genes that encode membrane proteins and function in "abiotic and biotic stress" but underrepresented for genes involved in transcription and DNA or RNA binding functions. We speculate that these observations reflect an evolutionary trend in which successful tandem duplication involves genes either at the end of biochemical pathways or in flexible steps in a pathway, for which fluctuation in copy number is unlikely to affect downstream genes. Despite differences in the age distribution of tandem arrays, the striking similarities between rice and Arabidopsis indicate similar mechanisms of TAG generation and maintenance

    MNHN-Tree-Tools: A toolbox for tree inference using multi-scale clustering of a set of sequences

    No full text
    International audienceGenomic sequences are widely used to infer the evolutionary history of a given group of individuals. Many methods have been developed for sequence clustering and tree building. In the early days of genome sequencing, these were often limited to hundreds of sequences, but due to the surge of high throughput sequencing, it is now common to have millions of sampled sequences at hand. We introduce MNHN-Tree-Tools, a high performance set of algorithms that builds multi-scale, nested clusters of sequences found in a FASTA file. MNHN-Tree-Tools does not rely on sequence alignment and can thus be used on large datasets to infer a sequence tree. Herein we outline two applications: A human alpha-satellite repeats classification and a tree of life derivation from 16S/18S rDNA sequences

    Distribution of the Size of TAGs for the H/0 Dataset

    No full text
    <p>(A) A. thaliana. (B) O. sativa.</p

    TAG Gene Density Plotted against Recombination Rate in O. sativa for the L/0 Dataset and the H/0 Dataset

    No full text
    <p>TAG Gene Density Plotted against Recombination Rate in O. sativa for the L/0 Dataset and the H/0 Dataset</p

    Frequency of Genes in the GO MF categories in <i>Arabidopsis</i> and Rice, Based on the H/0 Dataset

    No full text
    <div><p>Only the H/0 datasets are shown.</p><p>The asterisks above the bars indicate significance of the Ļ‡<sup>2</sup> tests, under the null hypothesis that TAGs and duplicated non-TAG genes have the same proportion.</p><p>*, <i>p</i> < 0.05; **, <i>p</i> > 0.01; ***, <i>p</i> < 0.001. Bonferonni-corrected for 92 tests.</p></div

    Recombination Rate Estimates and Density of TAGs (Number of TAGs/Total Number of Genes) along O. sativa Chromosome

    No full text
    <div><p>Recombination estimates are represented by the black lines. Density estimates are based on the H/0 dataset (blue lines) and H/10 dataset (pink lines).</p><p>Centromere positions are marked in orange.</p></div

    Distribution of Ks Values between TAG Pairs and between Duplicated Non-TAG Gene Pairs

    No full text
    <p>Histogram of the distribution of Ks values for TAG pairs (dots) and duplicated non-TAG gene pairs (bars) in <i>Arabidopsis</i> and in rice for the H/0 dataset are in panels (A) and (B), respectively. Results for the L/0 dataset are provided in panels (C) and (D).</p
    corecore