169 research outputs found

    Microsatellite discovery in an insular amphibian (Grandisonia alternans) with comments on cross-species utility and the accuracy of locus identification from unassembled Illumina data

    Get PDF
    The Seychelles archipelago is unique among isolated oceanic islands because it features an endemic radiation of caecilian amphibians (Gymnophiona). In order to develop population genetics resources for this system, we identified microsatellite loci using unassembled Illumina MiSeq data generated from a genomic library of Grandisonia alternans, a species that occurs on multiple islands in the archipelago. Applying a recently described method (PALFINDER) we identified 8001 microsatellite loci that were potentially informative for population genetics analyses. Of these markers, we screened 60 loci using five individuals, directly sequenced several amplicons to confirm their identity, and then used eight loci to score allele sizes in 64 G. alternans individuals originating from five islands. A number of these individuals were sampled using non-lethal methods, demonstrating the efficacy of non-destructive molecular sampling in amphibian research. Although two loci satisfied our criteria as diploid, neutrally evolving loci with the statistical power to detect population structure, our success in identifying reliable loci was very low. Additionally, we discovered some issues with primer redundancy and differences between Illumina and Sanger sequences that suggest some Illumina-inferred loci are invalid. We investigated cross-species utility for eight loci and found most could be successfully amplified, sequenced and aligned across other species and genera of caecilians from the Seychelles. Thus, our study in part supported the validity of using PALFINDER with unassembled reads for microsatellite discovery within and across species, but importantly identified major limitations to applying this approach to small datasets (ca. 1 million reads) and loci with small tandem repeat sizes

    Repetitive Elements May Comprise Over Two-Thirds of the Human Genome

    Get PDF
    Transposable elements (TEs) are conventionally identified in eukaryotic genomes by alignment to consensus element sequences. Using this approach, about half of the human genome has been previously identified as TEs and low-complexity repeats. We recently developed a highly sensitive alternative de novo strategy, P-clouds, that instead searches for clusters of high-abundance oligonucleotides that are related in sequence space (oligo “clouds”). We show here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus suggesting that 66%–69% of the human genome is repetitive or repeat-derived. To investigate this remarkable difference, we conducted detailed analyses of the ability of both P-clouds and a commonly used conventional approach, RepeatMasker (RM), to detect different sized fragments of the highly abundant human Alu and MIR SINEs. RM can have surprisingly low sensitivity for even moderately long fragments, in contrast to P-clouds, which has good sensitivity down to small fragment sizes (∼25 bp). Although short fragments have a high intrinsic probability of being false positives, we performed a probabilistic annotation that reflects this fact. We further developed “element-specific” P-clouds (ESPs) to identify novel Alu and MIR SINE elements, and using it we identified ∼100 Mb of previously unannotated human elements. ESP estimates of new MIR sequences are in good agreement with RM-based predictions of the amount that RM missed. These results highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome consists of substantially more repetitive sequence than previously believed

    Genome-wide signatures of convergent evolution in echolocating mammals

    Get PDF
    Evolution is typically thought to proceed through divergence of genes, proteins, and ultimately phenotypes(1-3). However, similar traits might also evolve convergently in unrelated taxa due to similar selection pressures(4,5). Adaptive phenotypic convergence is widespread in nature, and recent results from a handful of genes have suggested that this phenomenon is powerful enough to also drive recurrent evolution at the sequence level(6-9). Where homoplasious substitutions do occur these have long been considered the result of neutral processes. However, recent studies have demonstrated that adaptive convergent sequence evolution can be detected in vertebrates using statistical methods that model parallel evolution(9,10) although the extent to which sequence convergence between genera occurs across genomes is unknown. Here we analyse genomic sequence data in mammals that have independently evolved echolocation and show for the first time that convergence is not a rare process restricted to a handful of loci but is instead widespread, continuously distributed and commonly driven by natural selection acting on a small number of sites per locus. Systematic analyses of convergent sequence evolution in 805,053 amino acids within 2,326 orthologous coding gene sequences compared across 22 mammals (including four new bat genomes) revealed signatures consistent with convergence in nearly 200 loci. Strong and significant support for convergence among bats and the dolphin was seen in numerous genes linked to hearing or deafness, consistent with an involvement in echolocation. Surprisingly we also found convergence in many genes linked to vision: the convergent signal of many sensory genes was robustly correlated with the strength of natural selection. This first attempt to detect genome-wide convergent sequence evolution across divergent taxa reveals the phenomenon to be much more pervasive than previously recognised

    Pleistocene Climate, Phylogeny, and Climate Envelope Models: An Integrative Approach to Better Understand Species' Response to Climate Change

    Get PDF
    Mean annual temperature reported by the Intergovernmental Panel on Climate Change increases at least 1.1°C to 6.4°C over the next 90 years. In context, a change in climate of 6°C is approximately the difference between the mean annual temperature of the Last Glacial Maximum (LGM) and our current warm interglacial. Species have been responding to changing climate throughout Earth's history and their previous biological responses can inform our expectations for future climate change. Here we synthesize geological evidence in the form of stable oxygen isotopes, general circulation paleoclimate models, species' evolutionary relatedness, and species' geographic distributions. We use the stable oxygen isotope record to develop a series of temporally high-resolution paleoclimate reconstructions spanning the Middle Pleistocene to Recent, which we use to map ancestral climatic envelope reconstructions for North American rattlesnakes. A simple linear interpolation between current climate and a general circulation paleoclimate model of the LGM using stable oxygen isotope ratios provides good estimates of paleoclimate at other time periods. We use geologically informed rates of change derived from these reconstructions to predict magnitudes and rates of change in species' suitable habitat over the next century. Our approach to modeling the past suitable habitat of species is general and can be adopted by others. We use multiple lines of evidence of past climate (isotopes and climate models), phylogenetic topology (to correct the models for long-term changes in the suitable habitat of a species), and the fossil record, however sparse, to cross check the models. Our models indicate the annual rate of displacement in a clade of rattlesnakes over the next century will be 2 to 3 orders of magnitude greater (430-2,420 m/yr) than it has been on average for the past 320 ky (2.3 m/yr)

    Structural and Functional Roles of Coevolved Sites in Proteins

    Get PDF
    Understanding the residue covariations between multiple positions in protein families is very crucial and can be helpful for designing protein engineering experiments. These simultaneous changes or residue coevolution allow protein to maintain its overall structural-functional integrity while enabling it to acquire specific functional modifications. Despite the significant efforts in the field there is still controversy in terms of the preferable locations of coevolved residues on different regions of protein molecules, the strength of coevolutionary signal and role of coevolution in functional diversification.In this paper we study the scale and nature of residue coevolution in maintaining the overall functionality and structural integrity of proteins. We employed a large scale study to investigate the structural and functional aspects of coevolved residues. We found that the networks representing the coevolutionary residue connections within our dataset are in general of 'small-world' type as they have clustering coefficient values higher than random networks and also show smaller mean shortest path lengths similar and/or lower than random and regular networks. We also found that altogether 11% of functionally important sites are coevolved with any other sites. Active sites are found more frequently to coevolve with any other sites (15%) compared to protein (11%) and ligand (9%) binding sites. Metal binding and active sites are also found to be more frequently coevolved with other metal binding and active sites, respectively. Analysis of the coupling between coevolutionary processes and the spatial distribution of coevolved sites reveals that a high fraction of coevolved sites are located close to each other. Moreover, approximately 80% of charge compensatory substitutions within coevolved sites are found at very close spatial proximity (<or= 5A), pointing to the possible preservation of salt bridges in evolution.Our findings show that a noticeable fraction of functionally important sites undergo coevolution and also point towards compensatory substitutions as a probable coevolutionary mechanism within spatially proximal coevolved functional sites

    Representativeness of microsatellite distributions in genomes, as revealed by 454 GS-FLX Titanium pyrosequencing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Microsatellites are markers of choice in population genetics and genomics, as they provide useful insight into patterns and processes as diverse as genome evolutionary dynamics and demographic processes. The acquisition of microsatellites through multiplex-enriched libraries and 454 GS-FLX Titanium pyrosequencing is a promising new tool for the isolation of new markers in unknown genomes. This approach can also be used to evaluate the extent to which microsatellite-enriched libraries are representative of the genome from which they were isolated. In this study, we deciphered potential discrepancies in microsatellite content recovery for two reference genomes (<it>Apis mellifera </it>and <it>Danio rerio</it>), selected on the basis of their extreme heterogeneity in terms of the proportions and distributions of microsatellites on chromosomes.</p> <p>Results</p> <p>The <it>A. mellifera </it>genome, in particular, was found to be highly heterogeneous, due to extremely high rates of recombination, with hotspots, but the only bias consistently introduced into pyrosequenced multiplex-enriched libraries concerned sequence length, with the overrepresentation of sequences 160 to 320 bp in length. Other deviations from expected proportions or distributions of motifs on chromosomes were observed, but the significance and intensity of these deviations was mostly limited. Furthermore, no consistent adverse competition between multiplexed probes was observed during the motif enrichment phase.</p> <p>Conclusions</p> <p>This approach therefore appears to be a promising strategy for improving the development of microsatellites, as it introduces no major bias in terms of the proportions and distribution of microsatellites.</p

    Integrative genetic map of repetitive DNA in the sole Solea senegalensis genome shows a Rex transposon located in a proto-sex chromosome

    Get PDF
    Repetitive sequences play an essential role in the structural and functional evolution of the genome, particularly in the sexual chromosomes. The Senegalese sole (Solea senegalensis) is a valuable flatfish in aquaculture albeit few studies have addressed the mapping and characterization of repetitive DNA families. Here we analyzed the Simple Sequence Repeats (SSRs) and Transposable elements (TEs) content from fifty-seven BAC clones (spanning 7.9 Mb) of this species, located in chromosomes by multiple fluorescence in situ hybridization (m-BAC-FISH) technique. The SSR analysis revealed an average density of 675.1 loci per Mb and a high abundance (59.69%) of dinucleotide coverage was observed, being 'AC' the most abundant. An SSR-FISH analysis using eleven probes was also carried out and seven of the 11 probes yielded positive signals. 'AC' probes were present as large clusters in almost all chromosomes, supporting the bioinformatic analysis. Regarding TEs, DNA transposons (Class II) were the most abundant. In Class I, LINE elements were the most abundant and the hAT family was the most represented in Class II. Rex/Babar subfamily, observed in two BAC clones mapping to chromosome pair 1, showed the longest match. This chromosome pair has been recently reported as a putative sexual proto-chromosome in this species, highlighting the possible role of the Rex element in the evolution of this chromosome. In the Rex1 phylogenetic tree, the Senegalese sole Rex1 retrotransposon could be associated with one of the four major ancient lineages in fish genomes, in which it is included O. latipes

    Sequencing three crocodilian genomes to illuminate the evolution of archosaurs and amniotes

    Get PDF
    The International Crocodilian Genomes Working Group (ICGWG) will sequence and assemble the American alligator (Alligator mississippiensis), saltwater crocodile (Crocodylus porosus) and Indian gharial (Gavialis gangeticus) genomes. The status of these projects and our planned analyses are described
    corecore