12 research outputs found

    SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput re-sequencing, new genotyping technologies and the availability of reference genomes allow the extensive characterization of Single Nucleotide Polymorphisms (SNPs) and insertion/deletion events (indels) in many plant species. The rapidly increasing amount of re-sequencing and genotyping data generated by large-scale genetic diversity projects requires the development of integrated bioinformatics tools able to efficiently manage, analyze, and combine these genetic data with genome structure and external data.</p> <p>Results</p> <p>In this context, we developed SNiPlay, a flexible, user-friendly and integrative web-based tool dedicated to polymorphism discovery and analysis. It integrates:</p> <p>1) a pipeline, freely accessible through the internet, combining existing softwares with new tools to detect SNPs and to compute different types of statistical indices and graphical layouts for SNP data. From standard sequence alignments, genotyping data or Sanger sequencing traces given as input, SNiPlay detects SNPs and indels events and outputs submission files for the design of Illumina's SNP chips. Subsequently, it sends sequences and genotyping data into a series of modules in charge of various processes: physical mapping to a reference genome, annotation (genomic position, intron/exon location, synonymous/non-synonymous substitutions), SNP frequency determination in user-defined groups, haplotype reconstruction and network, linkage disequilibrium evaluation, and diversity analysis (Pi, Watterson's Theta, Tajima's D).</p> <p>Furthermore, the pipeline allows the use of external data (such as phenotype, geographic origin, taxa, stratification) to define groups and compare statistical indices.</p> <p>2) a database storing polymorphisms, genotyping data and grapevine sequences released by public and private projects. It allows the user to retrieve SNPs using various filters (such as genomic position, missing data, polymorphism type, allele frequency), to compare SNP patterns between populations, and to export genotyping data or sequences in various formats.</p> <p>Conclusions</p> <p>Our experiments on grapevine genetic projects showed that SNiPlay allows geneticists to rapidly obtain advanced results in several key research areas of plant genetic diversity. Both the management and treatment of large amounts of SNP data are rendered considerably easier for end-users through automation and integration. Current developments are taking into account new advances in high-throughput technologies.</p> <p>SNiPlay is available at: <url>http://sniplay.cirad.fr/</url>.</p

    AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses.</p> <p>Results</p> <p>AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid) obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, <monospace>Entropy</monospace> being the method that provides the highest number of regions with the greatest length, and <monospace>Weighted</monospace> being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. <it>In silico </it>and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly".</p> <p>Conclusions</p> <p>AlignMiner can be used to reliably detect divergent regions via several scoring methods that provide different levels of selectivity. Its predictions have been verified by experimental means. Hence, it is expected that its usage will save researchers' time and ensure an objective selection of the best-possible divergent region when closely related sequences are analysed. AlignMiner is freely available at <url>http://www.scbi.uma.es/alignminer</url>.</p

    Model SNP development for complex genomes based on hexaploid oat using high-throughput 454 sequencing technology

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genetic markers are pivotal to modern genomics research; however, discovery and genotyping of molecular markers in oat has been hindered by the size and complexity of the genome, and by a scarcity of sequence data. The purpose of this study was to generate oat expressed sequence tag (EST) information, develop a bioinformatics pipeline for SNP discovery, and establish a method for rapid, cost-effective, and straightforward genotyping of SNP markers in complex polyploid genomes such as oat.</p> <p>Results</p> <p>Based on cDNA libraries of four cultivated oat genotypes, approximately 127,000 contigs were assembled from approximately one million Roche 454 sequence reads. Contigs were filtered through a novel bioinformatics pipeline to eliminate ambiguous polymorphism caused by subgenome homology, and 96 <it>in silico </it>SNPs were selected from 9,448 candidate loci for validation using high-resolution melting (HRM) analysis. Of these, 52 (54%) were polymorphic between parents of the Ogle1040 × TAM O-301 (OT) mapping population, with 48 segregating as single Mendelian loci, and 44 being placed on the existing OT linkage map. Ogle and TAM amplicons from 12 primers were sequenced for SNP validation, revealing complex polymorphism in seven amplicons but general sequence conservation within SNP loci. Whole-amplicon interrogation with HRM revealed insertions, deletions, and heterozygotes in secondary oat germplasm pools, generating multiple alleles at some primer targets. To validate marker utility, 36 SNP assays were used to evaluate the genetic diversity of 34 diverse oat genotypes. Dendrogram clusters corresponded generally to known genome composition and genetic ancestry.</p> <p>Conclusions</p> <p>The high-throughput SNP discovery pipeline presented here is a rapid and effective method for identification of polymorphic SNP alleles in the oat genome. The current-generation HRM system is a simple and highly-informative platform for SNP genotyping. These techniques provide a model for SNP discovery and genotyping in other species with complex and poorly-characterized genomes.</p

    High-throughput SNP genotyping in the highly heterozygous genome of Eucalyptus: assay success, polymorphism and transferability across species

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High-throughput SNP genotyping has become an essential requirement for molecular breeding and population genomics studies in plant species. Large scale SNP developments have been reported for several mainstream crops. A growing interest now exists to expand the speed and resolution of genetic analysis to outbred species with highly heterozygous genomes. When nucleotide diversity is high, a refined diagnosis of the target SNP sequence context is needed to convert queried SNPs into high-quality genotypes using the Golden Gate Genotyping Technology (GGGT). This issue becomes exacerbated when attempting to transfer SNPs across species, a scarcely explored topic in plants, and likely to become significant for population genomics and inter specific breeding applications in less domesticated and less funded plant genera.</p> <p>Results</p> <p>We have successfully developed the first set of 768 SNPs assayed by the GGGT for the highly heterozygous genome of <it>Eucalyptus </it>from a mixed Sanger/454 database with 1,164,695 ESTs and the preliminary 4.5X draft genome sequence for <it>E. grandis</it>. A systematic assessment of <it>in silico </it>SNP filtering requirements showed that stringent constraints on the SNP surrounding sequences have a significant impact on SNP genotyping performance and polymorphism. SNP assay success was high for the 288 SNPs selected with more rigorous <it>in silico </it>constraints; 93% of them provided high quality genotype calls and 71% of them were polymorphic in a diverse panel of 96 individuals of five different species.</p> <p>SNP reliability was high across nine <it>Eucalyptus </it>species belonging to three sections within subgenus Symphomyrtus and still satisfactory across species of two additional subgenera, although polymorphism declined as phylogenetic distance increased.</p> <p>Conclusions</p> <p>This study indicates that the GGGT performs well both within and across species of <it>Eucalyptus </it>notwithstanding its nucleotide diversity ≥2%. The development of a much larger array of informative SNPs across multiple <it>Eucalyptus </it>species is feasible, although strongly dependent on having a representative and sufficiently deep collection of sequences from many individuals of each target species. A higher density SNP platform will be instrumental to undertake genome-wide phylogenetic and population genomics studies and to implement molecular breeding by Genomic Selection in <it>Eucalyptus</it>.</p

    Patterns of population structure and adaptive genetic variation in alpine populations of Picea abies (L.) Karst.

    Get PDF
    Forest trees dominate many alpine landscapes and are currently exposed to changing climate. Picea abies (L.) Karst (Norway spruce) is one of the most important conifer species of the Italian Alps due to its ecological and economical relevance. Natural populations of this species are found across steep environmental gradients with large differences in temperature and moisture availability. Steep environmental gradients represent interesting models to study the interaction between natural selection and gene flow, especially when aiming to understand adaptation processes under global change. The present work aims to understand adaptive responses to changing climate by determining and quantifying patterns of genetic diversity in natural population of P.abies. a wide array of potential candidate genes was tested, by means of Single Nucleotide Polymorphisms (SNPs), for correlation with climatic and environmental parameters at different spatial scales: i) a geographical scale corresponding to the natural distribution of P.abies across the Italian Alps and ii) at a regional scale on the Eastern Italian Alps. Weak population structure was revealed at the geographical scale with only one population clearly divergent from the unique major genetic cluster identified. At the regional scale, hierarchical analyses of molecular variance revealed that most of the genetic variability was found within populations (ca. 99%), and small but significant variation was also found due landscape features (ca. 0.38%). In order to detect potentially adaptive markers, classical FST outlier approaches were first applied and five outlier loci were revealed at broad scale, while contrasting results were obtained at the regional scale according to the model used. Subsequently, environmental association analysis were performed: at the geographical scale temperature and precipitation were found to influence allelic variation at seven polymorphic loci, while at the regional scale, the Alpine topography resulted a potential adaptive determinants at 19 polymorphic loci, thus considered of ecological relevance. The results obtained in this study may provide relevant information for forestry management and genetic conservation, to understand and quantify the effect of climate change on conifer species as well as their adaptive potential

    Grapevine acidity: SVM tool development and NGS data analyses.

    Get PDF
    Single Nucleotide Polymorphisms (SNPs) represent the most abundant type of genetic variation and they are a valuable tool for several biological applications like linkage mapping, integration of genetic and physical maps, population genetics as well as evolutionary and protein structure-function studies. SNP genotyping by mapping DNA reads produced via Next generation sequencing (NGS) technologies on a reference genome is a very common and convenient approach in our days, but still prone to a significant error rate. The need of defining in silico true genetic variants in genomic and transcriptomic sequences is prompted by the high costs of the experimental validation through re-sequencing or SNP arrays, not only in terms of money but also time and sample availability. Several open-source tools have been recently developed to identify small variants in whole-genome data, but still the candidate variants, provided in the VCF output format, present a high false positive calling rate. Goal of this thesis work is the development of a bioinformatic method that classifies variant calling outputs in order to reduce the number of false positive calls. With the aim to dissect the molecular bases of grape acidity (Vitis vinifera L.), this tool has been then used to select SNPs in two grapevine varieties, which show very different content of organic acids in the berry. The VCF parameters have been used to train a Support Vector Machine (SVM) that classifies the VCF records in true and false positive variants, cleaning the output from the most likely false positive results. The SVM approach has been implemented in a new software, called VerySNP, and applied to model and non-model organisms. In both cases, the machine learning method efficiently recognized true positive from false positive variants in both genomic and transcriptomic sequences. In the second part of the thesis, VerySNP was applied to identify true SNPs in RNA-seq data of the grapevine variety Gora Chirine, characterized by low acidity, and Sultanine, a normal acidity variety closely related to Gora. The comparative transcriptomic analysis crossed with the SNP information lead to discover non-synonymous polymorphisms inside coding regions and, thus, provided a list of candidate genes potentially affecting acidity in grapevine

    Patterns of population structure and adaptive genetic variation in alpine populations of Picea abies (L.) Karst.

    Get PDF
    Forest trees dominate many alpine landscapes and are currently exposed to changing climate. Picea abies (L.) Karst (Norway spruce) is one of the most important conifer species of the Italian Alps due to its ecological and economical relevance. Natural populations of this species are found across steep environmental gradients with large differences in temperature and moisture availability. Steep environmental gradients represent interesting models to study the interaction between natural selection and gene flow, especially when aiming to understand adaptation processes under global change. The present work aims to understand adaptive responses to changing climate by determining and quantifying patterns of genetic diversity in natural population of P.abies. a wide array of potential candidate genes was tested, by means of Single Nucleotide Polymorphisms (SNPs), for correlation with climatic and environmental parameters at different spatial scales: i) a geographical scale corresponding to the natural distribution of P.abies across the Italian Alps and ii) at a regional scale on the Eastern Italian Alps. Weak population structure was revealed at the geographical scale with only one population clearly divergent from the unique major genetic cluster identified. At the regional scale, hierarchical analyses of molecular variance revealed that most of the genetic variability was found within populations (ca. 99%), and small but significant variation was also found due landscape features (ca. 0.38%). In order to detect potentially adaptive markers, classical FST outlier approaches were first applied and five outlier loci were revealed at broad scale, while contrasting results were obtained at the regional scale according to the model used. Subsequently, environmental association analysis were performed: at the geographical scale temperature and precipitation were found to influence allelic variation at seven polymorphic loci, while at the regional scale, the Alpine topography resulted a potential adaptive determinants at 19 polymorphic loci, thus considered of ecological relevance. The results obtained in this study may provide relevant information for forestry management and genetic conservation, to understand and quantify the effect of climate change on conifer species as well as their adaptive potential
    corecore