61 research outputs found

    Optimization of sequence alignment for simple sequence repeat regions

    Get PDF
    Abstract Background Microsatellites, or simple sequence repeats (SSRs), are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs) mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs). SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. Findings To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type. When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. Conclusions The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic phylogenic relationship.</p

    Simple sequence repeat variation in the Daphnia pulex genome

    Get PDF
    Background: Simple sequence repeats (SSRs) are highly variable features of all genomes. Their rapid evolution makes them useful for tracing the evolutionary history of populations and investigating patterns of selection and mutation across gnomes. The recently sequenced Daphnia pulex genome provides us with a valuable data set to study the mode and tempo of SSR evolution, without the inherent biases that accompany marker selection. Results: Here we catalogue SSR loci in the Daphnia pulex genome with repeated motif sizes of 1-100 nucleotides with a minimum of 3 perfect repeats. We then used whole genome shotgun reads to determine the average heterozygosity of each SSR type and the relationship that it has to repeat number, motif size, motif sequence, and distribution of SSR loci. We find that SSR heterozygosity is motif specific, and positively correlated with repeat number as well as motif size. For non-repeat unit polymorphisms, we identify a motif-dependent end-nucleotide polymorphism bias that may contribute to the patterns of abundance for specific homopolymers, dimers, and trimers. Our observations confirm the high frequency of multiple unit variation (multistep) at large microsatellite loci, and further show that the occurrence of multiple unit variation is dependent on both repeat number and motif size. Using the Daphnia pulex genetic map, we show a positive correlation between dimer and trimer frequency and recombination. Conclusions: This genome-wide analysis of SSR variation in Daphnia pulex indicates that several aspects of SSR variation are motif dependent and suggests that a combination of unit length variation and end repeat biased base substitution contribute to the unique spectrum of SSR repeat loci

    Ancestral Origin of the ATTCT Repeat Expansion in Spinocerebellar Ataxia Type 10 (SCA10)

    Get PDF
    Spinocerebellar ataxia type 10 (SCA10) is an autosomal dominant neurodegenerative disease characterized by cerebellar ataxia and seizures. The disease is caused by a large ATTCT repeat expansion in the ATXN10 gene. The first families reported with SCA10 were of Mexican origin, but the disease was soon after described in Brazilian families of mixed Portuguese and Amerindian ancestry. The origin of the SCA10 expansion and a possible founder effect that would account for its geographical distribution have been the source of speculation over the last years. To unravel the mutational origin and spread of the SCA10 expansion, we performed an extensive haplotype study, using closely linked STR markers and intragenic SNPs, in families from Brazil and Mexico. Our results showed (1) a shared disease haplotype for all Brazilian and one of the Mexican families, and (2) closely-related haplotypes for the additional SCA10 Mexican families; (3) little or null genetic distance in small normal alleles of different repeat sizes, from the same SNP lineage, indicating that they are being originated by a single step mechanism; and (4) a shared haplotype for pure and interrupted expanded alleles, pointing to a gene conversion model for its generation. In conclusion, we show evidence for an ancestral common origin for SCA10 in Latin America, which might have arisen in an ancestral Amerindian population and later have been spread into the mixed populations of Mexico and Brazil

    Heterozygosity increases microsatellite mutation rate, linking it to demographic history

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Biochemical experiments in yeast suggest a possible mechanism that would cause heterozygous sites to mutate faster than equivalent homozygous sites. If such a process operates, it could undermine a key assumption at the core of population genetic theory, namely that mutation rate and population size are indpendent, because population expansion would increase heterozygosity that in turn would increase mutation rate. Here we test this hypothesis using both direct counting of microsatellite mutations in human pedigrees and an analysis of the relationship between microsatellite length and patterns of demographically-induced variation in heterozygosity.</p> <p>Results</p> <p>We find that microsatellite alleles of any given length are more likely to mutate when their homologue is unusually different in length. Furthermore, microsatellite lengths in human populations do not vary randomly, but instead exhibit highly predictable trends with both distance from Africa, a surrogate measure of genome-wide heterozygosity, and modern population size. This predictability remains even after statistically controlling for non-independence due to shared ancestry among populations.</p> <p>Conclusion</p> <p>Our results reveal patterns that are unexpected under classical population genetic theory, where no mechanism exists capable of linking allele length to extrinsic variables such as geography or population size. However, the predictability of microsatellite length is consistent with heterozygote instability and suggest that this has an important impact on microsatellite evolution. Whether similar processes impact on single nucleotide polymorphisms remains unclear.</p

    Analysis of Microsatellite Variation in Drosophila melanogaster with Population-Scale Genome Sequencing

    Get PDF
    Genome sequencing technologies promise to revolutionize our understanding of genetics, evolution, and disease by making it feasible to survey a broad spectrum of sequence variation on a population scale. However, this potential can only be realized to the extent that methods for extracting and interpreting distinct forms of variation can be established. The error profiles and read length limitations of early versions of next-generation sequencing technologies rendered them ineffective for some sequence variant types, particularly microsatellites and other tandem repeats, and fostered the general misconception that such variants are inherently inaccessible to these platforms. At the same time, tandem repeats have emerged as important sources of functional variation. Tandem repeats are often located in and around genes, and frequent mutations in their lengths exert quantitative effects on gene function and phenotype, rapidly degrading linkage disequilibrium between markers and traits. Sensitive identification of these variants in large-scale next-gen sequencing efforts will enable more comprehensive association studies capable of revealing previously invisible associations. We present a population-scale analysis of microsatellite repeats using whole-genome data from 158 inbred isolates from the Drosophila Genetics Reference Panel, a collection of over 200 extensively phenotypically characterized isolates from a single natural population, to uncover processes underlying repeat mutation and to enable associations with behavioral, morphological, and life-history traits. Analysis of repeat variation from next-generation sequence data will also enhance studies of genome stability and neurodegenerative diseases

    Direct estimation of the mutation rate at dinucleotide microsatellite loci in Arabidopsis thaliana (Brassicaceae)

    Get PDF
    This is the author's accepted manuscript, made available with the permission of the publisher.This research was supported by NIH grant GM073990 and NSF grant DEB-0543052 to J. K. Kelly, NSF grants DEB-9629457 and DEB-9981891 to R. G. Shaw, and NSF DEB-0108242 to M. Orive. M. E. Mort acknowledges DEB-0344883

    Design and Implementation of Degenerate Microsatellite Primers for the Mammalian Clade

    Get PDF
    Microsatellites are popular genetic markers in molecular ecology, genetic mapping and forensics. Unfortunately, despite recent advances, the isolation of de novo polymorphic microsatellite loci often requires expensive and intensive groundwork. Primers developed for a focal species are commonly tested in a related, non-focal species of interest for the amplification of orthologous polymorphic loci; when successful, this approach significantly reduces cost and time of microsatellite development. However, transferability of polymorphic microsatellite loci decreases rapidly with increasing evolutionary distance, and this approach has shown its limits. Whole genome sequences represent an under-exploited resource to develop cross-species primers for microsatellites. Here we describe a three-step method that combines a novel in silico pipeline that we use to (1) identify conserved microsatellite loci from a multiple genome alignments, (2) design degenerate primer pairs, with (3) a simple PCR protocol used to implement these primers across species. Using this approach we developed a set of primers for the mammalian clade. We found 126,306 human microsatellites conserved in mammalian aligned sequences, and isolated 5,596 loci using criteria based on wide conservation. From a random subset of ∼1000 dinucleotide repeats, we designed degenerate primer pairs for 19 loci, of which five produced polymorphic fragments in up to 18 mammalian species, including the distinctly related marsupials and monotremes, groups that diverged from other mammals 120–160 million years ago. Using our method, many more cross-clade microsatellite loci can be harvested from the currently available genomic data, and this ability is set to improve exponentially as further genomes are sequenced

    Chance and necessity in the genome evolution of endosymbiotic bacteria of insects

    Full text link
    [EN] An open question in evolutionary biology is how does the selection¿drift balance determine the fates of biological interactions. We searched for signatures of selection and drift in genomes of five endosymbiotic bacterial groups known to evolve under strong genetic drift. Although most genes in endosymbiotic bacteria showed evidence of relaxed purifying selection, many genes in these bacteria exhibited stronger selective constraints than their orthologs in free-living bacterial relatives. Remarkably, most of these highly constrained genes had no role in the host¿symbiont interactions but were involved in either buffering the deleterious consequences of drift or other host-unrelated functions, suggesting that they have either acquired new roles or their role became more central in endosymbiotic bacteria. Experimental evolution of Escherichia coli under strong genetic drift revealed remarkable similarities in the mutational spectrum, genome reduction patterns and gene losses to endosymbiotic bacteria of insects. Interestingly, the transcriptome of the experimentally evolved lines showed a generalized deregulation of the genome that affected genes encoding proteins involved in mutational buffering, regulation and amino acid biosynthesis, patterns identical to those found in endosymbiotic bacteria. Our results indicate that drift has shaped endosymbiotic associations through a change in the functional landscape of bacterial genes and that the host had only a small role in such a shiftThis work was supported by Science Foundation Ireland (12/IP/1637) and grants from the Spanish Ministerio de Economia y Competitividad (MINECO-FEDER; BFU2012-36346 and BFU2015-66073-P) to MAF. DAP and CT were supported by Juan de la Cierva fellowships from MINECO (references: JCI-2011-11089 and JCA-2012-14056, respectively). DAP is supported by funds from the University of Nevada, Reno, NV, USA.Sabater-Muñoz, B.; Toft, C.; Alvarez-Ponce, D.; Fares Riaño, MA. (2017). Chance and necessity in the genome evolution of endosymbiotic bacteria of insects. The ISME Journal. 11(6):1291-1304. https://doi.org/10.1038/ismej.2017.18S12911304116Aguilar-Rodriguez J, Sabater-Munoz B, Montagud-Martinez R, Berlanga V, Alvarez-Ponce D, Wagner A et al. (2016). The molecular chaperone DnaK is a source of mutational robustness. Genome Biol Evol 8: 2979–2991.Alvarez-Ponce D, Sabater-Munoz B, Toft C, Ruiz-Gonzalez MX, Fares MA . (2016). Essentiality is a strong determinant of protein rates of evolution during mutation accumulation experiments in Escherichia coli. Genome Biol Evol 8: 2914–2927.Anders S, Huber W . (2010). Differential expression analysis for sequence count data. Genome Biol 11: R106.Archibald J . (2014) One Plus One Equals One: Symbiosis and the Evolution of Complex Life. Oxford University Press: Oxford, UK.Aussel L, Loiseau L, Hajj Chehade M, Pocachard B, Fontecave M, Pierrel F et al. (2014). ubiJ, a new gene required for aerobic growth and proliferation in macrophage, is involved in coenzyme Q biosynthesis in Escherichia coli and Salmonella enterica serovar Typhimurium. J Bacteriol 196: 70–79.Baumann P, Baumann L, Clark MA . (1996). Levels of Buchnera aphidicola chaperonin groEL during growth of the aphid Schizaphis graminum. Curr Microbiol 32: 7.Benjamini Y, Yekutieli Y . (2005). False discovery rate controlling confidence intervals for selected parameters. J Am Stat Assoc 100: 10.Bennett GM, Moran NA . (2015). Heritable symbiosis: the advantages and perils of an evolutionary rabbit hole. Proc Natl Acad Sci USA 112: 10169–10176.Bermingham J, Rabatel A, Calevro F, Vinuelas J, Febvay G, Charles H et al. (2009). Impact of host developmental age on the transcriptome of the symbiotic bacterium Buchnera aphidicola in the pea aphid (Acyrthosiphon pisum. Appl Environ Microbiol 75: 7294–7297.Bogumil D, Dagan T . (2010). Chaperonin-dependent accelerated substitution rates in prokaryotes. Genome Biol Evol 2: 602–608.Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S et al. (2009). AmiGO: online access to ontology and annotation data. Bioinformatics 25: 288–289.Chen Z, Wang Y, Li Y, Li Y, Fu N, Ye J et al. (2012). Esre: a novel essential non-coding RNA in Escherichia coli. FEBS Lett 586: 1195–1200.Clark JW, Hossain S, Burnside CA, Kambhampati S . (2001). Coevolution between a cockroach and its bacterial endosymbiont: a biogeographical perspective. Proc Biol Sci 268: 393–398.Dale C, Wang B, Moran N, Ochman H . (2003). Loss of DNA recombinational repair enzymes in the initial stages of genome degeneration. Mol Biol Evol 20: 1188–1194.Deatherage DE, Barrick JE . (2014). Identification of mutations in laboratory-evolved microbes from next-generation sequencing data using breseq. Methods Mol Biol 1151: 165–188.Douglas AE . (2003). The nutritional physiology of aphids. Adv Insect Physiol 31: 68.Fares MA, Barrio E, Sabater-Munoz B, Moya A . (2002a). The evolution of the heat-shock protein GroEL from Buchnera, the primary endosymbiont of aphids, is governed by positive selection. Mol Biol Evol 19: 1162–1170.Fares MA, Ruiz-Gonzalez MX, Moya A, Elena SF, Barrio E . (2002b). Endosymbiotic bacteria: groEL buffers against deleterious mutations. Nature 417: 398.Gancedo C, Flores CL, Gancedo JM . (2016). The expanding landscape of moonlighting proteins in yeasts. Microbiol Mol Biol Rev 80: 765–777.Gerardo NM, Altincicek B, Anselme C, Atamian H, Barribeau SM, de Vos M et al. (2010). Immunity and other defenses in pea aphids, Acyrthosiphon pisum. Genome Biol 11: R21.Gomez-Valero L, Latorre A, Silva FJ . (2004). The evolutionary fate of nonfunctional DNA in the bacterial endosymbiont Buchnera aphidicola. Mol Biol Evol 21: 2172–2181.Gomez-Valero L, Silva FJ, Christophe Simon J, Latorre A . (2007). Genome reduction of the aphid endosymbiont Buchnera aphidicola in a recent evolutionary time scale. Gene 389: 87–95.Gonzalez-Domenech CM, Belda E, Patino-Navarrete R, Moya A, Pereto J, Latorre A . (2012). Metabolic stasis in an ancient symbiosis: genome-scale metabolic networks from two Blattabacterium cuenoti strains, primary endosymbionts of cockroaches. BMC Microbiol 12 (Suppl 1): S5.Hansen AK, Moran NA . (2011). Aphid genome expression reveals host-symbiont cooperation in the production of amino acids. Proc Natl Acad Sci USA 108: 2849–2854.Hansen AK, Moran NA . (2014). The impact of microbial symbionts on host plant utilization by herbivorous insects. Mol Ecol 23: 1473–1496.Henderson B, Fares MA, Lund PA . (2013). Chaperonin 60: a paradoxical, evolutionarily conserved protein family with multiple moonlighting functions. Biol Rev Camb Philos Soc 88: 955–987.Humphreys NJ, Douglas AE . (1997). Partitioning of symbiotic bacteria between generations of an insect: a quantitative study of a Buchnera sp. in the pea aphid (Acyrthosiphon pisum reared at different temperatures. Appl Environ Microbiol 63: 3294–3296.International Aphid Genomics Consortium. (2010). Genome sequence of the pea aphid Acyrthosiphon pisum. PLoS Biol 8: e1000313.Kadibalban AS, Bogumil D, Landan G, Dagan T . (2016). DnaK-dependent accelerated evolutionary rate in prokaryotes. Genome Biol Evol 8: 1590–1599.Katoh K, Standley DM . (2013). MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30: 772–780.Kelkar YD, Ochman H . (2013). Genome reduction promotes increase in protein functional complexity in bacteria. Genetics 193: 303–307.Koga R, Meng XY, Tsuchida T, Fukatsu T . (2012). Cellular mechanism for selective vertical transmission of an obligate insect symbiont at the bacteriocyte-embryo interface. Proc Natl Acad Sci USA 109: E1230–E1237.Kuo CH, Moran NA, Ochman H . (2009). The consequences of genetic drift for bacterial genome complexity. Genome Res 19: 1450–1454.Kuo CH, Ochman H . (2009). Deletional bias across the three domains of life. Genome Biol Evol 1: 145–152.Law R, Lewis DH . (1983). Biotic environments and the maintenance of sex-some evidence from mutualistic symbioses. Biol J Linnean Soc 20: 28.Liu XD, Xie L, Wei Y, Zhou X, Jia B, Liu J et al. (2014). Abiotic stress resistance, a novel moonlighting function of ribosomal protein RPL44 in the halophilic fungus Aspergillus glaucus. Appl Environ Microbiol 80: 4294–4300.Lohse M, Bolger AM, Nagel A, Fernie AR, Lunn JE, Stitt M et al. (2012). RobiNA: a user-friendly, integrated software solution for RNA-Seq-based transcriptomics. Nucleic Acids Res 40: W622–W627.Macdonald SJ, Lin GG, Russell CW, Thomas GH, Douglas AE . (2012). The central role of the host cell in symbiotic nitrogen metabolism. Proc Biol Sci 279: 2965–2973.McClure R, Balasubramanian D, Sun Y, Bobrovskyy M, Sumby P, Genco CA et al. (2013). Computational analysis of bacterial RNA-Seq data. Nucleic Acids Res 41: e140.McCutcheon JP, Moran NA . (2012). Extreme genome reduction in symbiotic bacteria. Nat Rev Microbiol 10: 13–26.McFall-Ngai M, Hadfield MG, Bosch TC, Carey HV, Domazet-Loso T, Douglas AE et al. (2013). Animals in a bacterial world, a new imperative for the life sciences. Proc Natl Acad Sci USA 110: 3229–3236.Mira A, Ochman H, Moran NA . (2001). Deletional bias and the evolution of bacterial genomes. Trends Genet 17: 589–596.Moran NA . (1996). Accelerated evolution and Muller's rachet in endosymbiotic bacteria. Proc Natl Acad Sci USA 93: 2873–2878.Moran NA, Dunbar HE, Wilcox JL . (2005). Regulation of transcription in a reduced bacterial genome: nutrient-provisioning genes of the obligate symbiont Buchnera aphidicola. J Bacteriol 187: 4229–4237.Moran NA, McCutcheon JP, Nakabachi A . (2008). Genomics and evolution of heritable bacterial symbionts. Annu Rev Genet 42: 165–190.Moran NA, McLaughlin HJ, Sorek R . (2009). The dynamics and time scale of ongoing genomic erosion in symbiotic bacteria. Science 323: 379–382.Nakabachi A, Ishida K, Hongoh Y, Ohkuma M, Miyagishima SY . (2014). Aphid gene of bacterial origin encodes a protein transported to an obligate endosymbiont. Curr Biol 24: R640–R641.Nilsson AI, Koskiniemi S, Eriksson S, Kugelberg E, Hinton JC, Andersson DI . (2005). Bacterial genome size reduction by experimental evolution. Proc Natl Acad Sci USA 102: 12112–12116.Patino-Navarrete R, Moya A, Latorre A, Pereto J . (2013). Comparative genomics of Blattabacterium cuenoti: the frozen legacy of an ancient endosymbiont genome. Genome Biol Evol 5: 351–361.Pettersson ME, Berg OG . (2007). Muller's ratchet in symbiont populations. Genetica 130: 199–211.Price DR, Feng H, Baker JD, Bavan S, Luetje CW, Wilson AC . (2014). Aphid amino acid transporter regulates glutamine supply to intracellular bacterial symbionts. Proc Natl Acad Sci USA 111: 320–325.Reyes-Prieto M, Vargas-Chavez C, Latorre A, Moya A . (2015). SymbioGenomesDB: a database for the integration and access to knowledge on host-symbiont relationships. Database 2015: bav109 (1–8).Robinson MD, McCarthy DJ, Smyth GK . (2010). edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26: 139–140.Sabater-Muñoz B, Prats-Escriche M, Montagud-Martinez R, Lopez-Cerdan A, Toft C, Aguilar-Rodriguez J et al. (2015). Fitness trade-offs determine the role of the molecular chaperonin groel in buffering mutations. Mol Biol Evol 32: 2681–2693.Schlicker A, Domingues FS, Rahnenfuhrer J, Lengauer T . (2006). A new measure for functional similarity of gene products based on Gene Ontology. BMC Bioinformatics 7: 302.Shigenobu S, Watanabe H, Hattori M, Sakaki Y, Ishikawa H . (2000). Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature 407: 81–86.Supek F, Bosnjak M, Skunca N, Smuc T . (2011). REVIGO summarizes and visualizes long lists of gene ontology terms. PLoS ONE 6: e21800.Tamas I, Klasson L, Canback B, Naslund AK, Eriksson AS, Wernegreen JJ et al. (2002). 50 million years of genomic stasis in endosymbiotic bacteria. Science 296: 2376–2379.Toft C, Fares MA . (2008). The evolution of the flagellar assembly pathway in endosymbiotic bacterial genomes. Mol Biol Evol 25: 2069–2076.van Ham RC, Kamerbeek J, Palacios C, Rausell C, Abascal F, Bastolla U et al. (2003). Reductive genome evolution in Buchnera aphidicola. Proc Natl Acad Sci USA 100: 581–586.Wernegreen JJ . (2002). Genome evolution in bacterial endosymbionts of insects. Nat Rev Genet 3: 850–861.Wernegreen JJ . (2011). Reduced selective constraint in endosymbionts: elevation in radical amino acid replacements occurs genome-wide. PLoS One 6: e28905.Williams TA, Fares MA . (2010). The effect of chaperonin buffering on protein evolution. Genome Biol Evol 2: 609–619.Yang Z . (2007). PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol 24: 1586–1591

    A Novel Approach for Mining Polymorphic Microsatellite Markers In Silico

    Get PDF
    An important emerging application of high-throughput 454 sequencing is the isolation of molecular markers such as microsatellites from genomic DNA. However, few studies have developed microsatellites from cDNA despite the added potential for targeting candidate genes. Moreover, to develop microsatellites usually requires the evaluation of numerous primer pairs for polymorphism in the focal species. This can be time-consuming and wasteful, particularly for taxa with low genetic diversity where the majority of primers often yield monomorphic polymerase chain reaction (PCR) products. Transcriptome assemblies provide a convenient solution, functional annotation of transcripts allowing markers to be targeted towards candidate genes, while high sequence coverage in principle permits the assessment of variability in silico. Consequently, we evaluated fifty primer pairs designed to amplify microsatellites, primarily residing within transcripts related to immunity and growth, identified from an Antarctic fur seal (Arctocephalus gazella) transcriptome assembly. In silico visualization was used to classify each microsatellite as being either polymorphic or monomorphic and to quantify the number of distinct length variants, each taken to represent a different allele. The majority of loci (n = 36, 76.0%) yielded interpretable PCR products, 23 of which were polymorphic in a sample of 24 fur seal individuals. Loci that appeared variable in silico were significantly more likely to yield polymorphic PCR products, even after controlling for microsatellite length measured in silico. We also found a significant positive relationship between inferred and observed allele number. This study not only demonstrates the feasibility of generating modest panels of microsatellites targeted towards specific classes of gene, but also suggests that in silico microsatellite variability may provide a useful proxy for PCR product polymorphism
    corecore