9 research outputs found

    The Genomic Substrate for Adaptive Radiation: Copy Number Variation across 12 Tribes of African Cichlid Species

    Get PDF
    © 2019 Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution 2019. This work is written by US Government employees and is in the public domain in the US. The initial sequencing of five cichlid genomes revealed an accumulation of genetic variation, including extensive copy number variation in cichlid lineages particularly those that have undergone dramatic evolutionary radiation. Gene duplication has the potential to generate substantial molecular substrate for the origin of evolutionary novelty. We use array-based comparative heterologous genomic hybridization to identify copy number variation events (CNVEs) for 168 samples representing 53 cichlid species including the 5 species for which full genome sequence is available. We identify an average of 50-100 CNVEs per individual. For those species represented by multiple samples, we identify 150-200 total CNVEs suggesting a substantial amount of intraspecific variation. For these species, only ∼10% of the detected CNVEs are fixed. Hierarchical clustering of species according to CNVE data recapitulates phylogenetic relationships fairly well at both the tribe and radiation level. Although CNVEs are detected on all linkage groups, they tend to cluster in "hotspots" and are likely to contain and be flanked by transposable elements. Furthermore, we show that CNVEs impact functional categories of genes with potential roles in adaptive phenotypes that could reasonably promote divergence and speciation in the cichlid clade. These data contribute to a more complete understanding of the molecular basis for adaptive natural selection, speciation, and evolutionary radiation

    Data From: \u3ci\u3ePseudo-De Novo\u3c/i\u3e Assembly and Analysis of Unmapped Genome Sequence Reads in Wild Zebrafish Reveals Novel Gene Content

    No full text
    Zebrafish represents the third vertebrate with an officially completed genome, yet it remains incomplete with additions and corrections continuing with the current release, GRCz10, having 13% of zebrafish cDNA sequences unmapped. This disparity may result from population differences given the reference was generated from clonal individuals with limited genetic diversity. This is supported by the recent analysis of a single wild zebrafish which identified over 5.2 million SNPs and 1.6 million in/dels in the previous genome build, zv9. Re-examination of this sequence dataset indicated that 13.8% of quality sequence reads failed to align to GRCz10. Using a novel bioinformatics de novo assembly pipeline on these unmappable reads we identified 1,514,491 novel contigs covering ~224 Mb of genomic sequence. Among these, 1,083 contigs were found to contain potential gene coding sequence. RNA-seq data comparison confirmed 362 contigs contained transcribed DNA sequence, suggesting that a large amount of functional genomic sequence remains unannotated in zebrafish. By utilizing the bioinformatics pipeline developed in this study the zebrafish genome will be bolstered as a model for human disease research. Adaptation of the pipeline described here also offers a cost-efficient and effective method to identify and map novel genetic content across any genome and will ultimately aid in the completion of additional genomes for a broad range of species

    Data From: Anchored \u3ci\u3ePseudo-De Novo\u3c/i\u3e Assembly of Human Genomes Identifies Extensive Sequence Variation from Unmapped Sequence Reads

    No full text
    The Human Genome Reference (HGR) completion marked the genomics era beginning, yet despite its utility universal application is limited by the small number of individuals used in its development. This is highlighted by the presence of high quality sequence reads failing to map within the HGR. Sequences failing to map generally represent 2-5% of total reads, which may harbor regions that would enhance our understanding of population variation, evolution, and disease. Alternatively, complete de novo assemblies can be created, but these effectively ignore the groundwork of the HGR. In an effort to find a middle ground we developed a bioinformatic pipeline that maps paired-end reads to the HGR as separate single reads, exports unmappable reads, de novo assembles these reads per individual, then combines assemblies into a secondary reference assembly used for comparative analysis. Using 45 diverse 1000 Genomes Project individuals, we identified 351,361 contigs covering 195.5 Mb of sequence unincorporated in GRCh38. 30,879 contigs are represented in multiple individuals with ~40% showing high sequence complexity. Genomic coordinates were generated for 99.9%, with 52.5% exhibiting high quality mapping scores. Comparative genomic analyses with archaic humans and primates revealed significant sequence alignments and comparisons with model organism RefSeq gene datasets identified novel human genes. If incorporated, these sequences will expand the HGR, but more importantly, our data highlights that with this method low coverage (~10-20X) next generation sequencing can still be used to identify novel unmapped sequences to explore biological functions contributing to human phenotypic variation, disease and functionality for personal genomic medicine

    Pseudo- De Novo

    No full text

    Characterization of the OmyY1 Region on the Rainbow Trout Y Chromosome

    Get PDF
    We characterized the male-specific region on the Y chromosome of rainbow trout, which contains both sdY (the sex-determining gene) and the male-specific genetic marker, OmyY1. Several clones containing the OmyY1 marker were screened from a BAC library from a YY clonal line and found to be part of an 800 kb BAC contig. Using fluorescence in situ hybridization (FISH), these clones were localized to the end of the short arm of the Y chromosome in rainbow trout, with an additional signal on the end of the X chromosome in many cells. We sequenced a minimum tiling path of these clones using Illumina and 454 pyrosequencing. The region is rich in transposons and rDNA, but also appears to contain several single-copy protein-coding genes. Most of these genes are also found on the X chromosome; and in several cases sex-specific SNPs in these genes were identified between the male (YY) and female (XX) homozygous clonal lines. Additional genes were identified by hybridization of the BACs to the cGRASP salmonid 4x44K oligo microarray. By BLASTn evaluations using hypothetical transcripts of OmyY1-linked candidate genes as query against several EST databases, we conclude at least 12 of these candidate genes are likely functional, and expressed

    Dynamic genomic architecture of mutualistic cooperation in a wild population of Mesorhizobium

    No full text
    Research on mutualism seeks to explain how cooperation can be maintained when uncooperative mutants co-occur with cooperative kin. Gains and losses of the gene modules required for cooperation punctuate symbiont phylogenies and drive lifestyle transitions between cooperative symbionts and uncooperative free-living lineages over evolutionary time. Yet whether uncooperative symbionts commonly evolve from within cooperative symbiont populations or from within distantly related lineages with antagonistic or free-living lifestyles (i.e., third-party mutualism exploiters or parasites), remains controversial. We use genomic data to show that genotypes that differ in the presence or absence of large islands of symbiosis genes are common within a single wild recombining population of Mesorhizobium symbionts isolated from host tissues and are an important source of standing heritable variation in cooperation in this population. In a focal population of Mesorhizobium , uncooperative variants that lack a symbiosis island segregate at 16% frequency in nodules, and genome size and symbiosis gene number are positively correlated with cooperation. This finding contrasts with the genomic architecture of variation in cooperation in other symbiont populations isolated from host tissues in which the islands of genes underlying cooperation are ubiquitous and variation in cooperation is primarily driven by allelic substitution and individual gene gain and loss events. Our study demonstrates that uncooperative mutants within mutualist populations can comprise a significant component of genetic variation in nature, providing biological rationale for models and experiments that seek to explain the maintenance of mutualism in the face of non-cooperators

    \u3ci\u3eDrosophila\u3c/i\u3e Muller F Elements Maintain a Distinct Set of Genomic Properties Over 40 Million Years of Evolution

    Get PDF
    The Muller F element (4.2 Mb, ~80 protein-coding genes) is an unusual autosome of Drosophila melanogaster; it is mostly heterochromatic with a low recombination rate. To investigate how these properties impact the evolution of repeats and genes, we manually improved the sequence and annotated the genes on the D. erecta, D. mojavensis, and D. grimshawi F elements and euchromatic domains from the Muller D element. We find that F elements have greater transposon density (25–50%) than euchromatic reference regions (3–11%). Among the F elements, D. grimshawi has the lowest transposon density (particularly DINE-1: 2% vs. 11–27%). F element genes have larger coding spans, more coding exons, larger introns, and lower codon bias. Comparison of the Effective Number of Codons with the Codon Adaptation Index shows that, in contrast to the other species, codon bias in D. grimshawi F element genes can be attributed primarily to selection instead of mutational biases, suggesting that density and types of transposons affect the degree of local heterochromatin formation. F element genes have lower estimated DNA melting temperatures than D element genes, potentially facilitating transcription through heterochromatin. Most F element genes (~90%) have remained on that element, but the F element has smaller syntenic blocks than genome averages (3.4–3.6 vs. 8.4–8.8 genes per block), indicating greater rates of inversion despite lower rates of recombination. Overall, the F element has maintained characteristics that are distinct from other autosomes in the Drosophila lineage, illuminating the constraints imposed by a heterochromatic milieu
    corecore