53 research outputs found

    SNPServer: a real-time SNP discovery tool

    Get PDF
    SNPServer is a real-time flexible tool for the discovery of SNPs (single nucleotide polymorphisms) within DNA sequence data. The program uses BLAST, to identify related sequences, and CAP3, to cluster and align these sequences. The alignments are parsed to the SNP discovery software autoSNP, a program that detects SNPs and insertion/deletion polymorphisms (indels). Alternatively, lists of related sequences or pre-assembled sequences may be entered for SNP discovery. SNPServer and autoSNP use redundancy to differentiate between candidate SNPs and sequence errors. For each candidate SNP, two measures of confidence are calculated, the redundancy of the polymorphism at a SNP locus and the co-segregation of the candidate SNP with other SNPs in the alignment. SNPServer is available at

    Bioinformatics tools for development of fast and cost effective simple sequence repeat (SSR), and single nucleotide polymorphisms (SNP) markers from expressed sequence tags (ESTs)

    Get PDF
    The development of current molecular biology techniques has led to the generation of huge amount of gene sequence information under the expressed sequence tag (EST) sequencing projects on a large number of plant species. This has opened a new era in crop molecular breeding with identification and/or development of a new class of useful DNA markers called genic molecular markers (GMMs). These markers represent the functional component of the genome in contrast to all other random DNA markers (RMMs). Many recent studies have demonstrated that GMMs may be superior to RMMs for use in the marker assisted selection, comparative mapping and exploration of functional genetic diversity in the germplasms adapted to different environment. Therefore, identification of DNA sequences which can be used as markers remains fundamental to the development of GMMs. Amongst others; bioinformatics approaches are very useful for development of molecular markers, making their development much faster and cheaper. Already, a number of computer programs have been implemented that aim at identifying molecular markers from sequence data. A revision of current bioinformatics tools for development of genic molecular markers is, therefore, crucial in this phase. This mini-review mainly provides an overview of different bioinformatics tools available and its use in marker development with particular reference to SNP and SSR markers.Keywords: Genic molecular marker, simple sequence repeat (SSR), and single nucleotide polymorphisms (SNP) markers from expressed sequence tags (ESTs).African Journal of Biotechnology Vol. 12(30), pp. 4713-472

    Interspecific differences in single nucleotide polymorphisms (SNPs) and indels in expressed sequence tag libraries of oil palm _Elaeis guineensis_ and _E. oleifera_

    Get PDF
    Oil palm is the second largest source of edible oil, which meets one-fifth of global demands of oils and fats. Expressed sequence tag (EST) sequencing programs have provided a wealth of information, identifying novel genes from a broad range of organisms and providing an indication of gene expression level in particular tissues. It also provides the richest source of biologically useful SNPs due to the relatively high redundancy of gene sequence, the diversity of genotypes represented within databases. EST based SNPs are potential molecular markers and aid in genetic improvement. A total of 21062 and 2053 polymorphic (SNP and Indel) sites in _E. guineensis_ species and in _E. oleifera_, 4955 SNPs and 1172 Indels were detected. SNP(17.5/kbp) and Indel(4.1/kbp) frequency was higher in _E. oleifera_ than _E. guineensis_ species (16.8/kbp, 1.6/kbp). _E. oleifera_ showed higher transition to transversion ratio (1.40) than in _E. guineensis_ (1.02). The ratio of Ts vs Tv showed, the genetic divergence is occurring in this crops in different fashion and _E. guineensis_ had diverged more than _E. oleifera_. We provide the results of the study as online database ("http://riju.byethost31.com/oilpalm/":http://riju.byethost31.com/oilpalm/) for use by oil palm breeders

    QualitySNP: a pipeline for detecting single nucleotide polymorphisms and insertions/deletions in EST data from diploid and polyploid species

    Get PDF
    BACKGROUND: Single nucleotide polymorphisms (SNPs) are important tools in studying complex genetic traits and genome evolution. Computational strategies for SNP discovery make use of the large number of sequences present in public databases (in most cases as expressed sequence tags (ESTs)) and are considered to be faster and more cost-effective than experimental procedures. A major challenge in computational SNP discovery is distinguishing allelic variation from sequence variation between paralogous sequences, in addition to recognizing sequencing errors. For the majority of the public EST sequences, trace or quality files are lacking which makes detection of reliable SNPs even more difficult because it has to rely on sequence comparisons only. RESULTS: We have developed a new algorithm to detect reliable SNPs and insertions/deletions (indels) in EST data, both with and without quality files. Implemented in a pipeline called QualitySNP, it uses three filters for the identification of reliable SNPs. Filter 1 screens for all potential SNPs and identifies variation between or within genotypes. Filter 2 is the core filter that uses a haplotype-based strategy to detect reliable SNPs. Clusters with potential paralogs as well as false SNPs caused by sequencing errors are identified. Filter 3 screens SNPs by calculating a confidence score, based upon sequence redundancy and quality. Non-synonymous SNPs are subsequently identified by detecting open reading frames of consensus sequences (contigs) with SNPs. The pipeline includes a data storage and retrieval system for haplotypes, SNPs and alignments. QualitySNP's versatility is demonstrated by the identification of SNPs in EST datasets from potato, chicken and humans. CONCLUSION: QualitySNP is an efficient tool for SNP detection, storage and retrieval in diploid as well as polyploid species. It is available for running on Linux or UNIX systems. The program, test data, and user manual are available at and as Additional files

    AutoSNPdb: an annotated single nucleotide polymorphism database for crop plants

    Get PDF
    Single nucleotide polymorphisms (SNPs) may be considered the ultimate genetic marker as they represent the finest resolution of a DNA sequence (a single nucleotide), are generally abundant in populations and have a low mutation rate. Analysis of assembled EST sequence data provides a cost-effective means to identify large numbers of SNPs associated with functional genes. We have developed an integrated SNP discovery pipeline, which identifies SNPs from assembled EST sequences. The results are maintained in a custom relational database along with EST source and annotation information. The current database hosts data for the important crops rice, barley and Brassica. Users may rapidly identify polymorphic sequences of interest through BLAST sequence comparison, keyword searches of annotations derived from UniRef90 and GenBank comparisons, GO annotations or in genes corresponding to syntenic regions of reference genomes. In addition, SNPs between specific varieties may be identified for targeted mapping and association studies. SNPs are viewed using a user-friendly graphical interface. The database is freely accessible at http://autosnpdb.qfab.org.au/

    COMPUTATIONAL TOOLS TO DETECT SINGLE NUCLEOTIDE POLYMORPHISM (SNP) IN NUCLEOTIDE SEQUENCES: A REVIEW

    Get PDF
    ABSTRACT Single nucleotide polymorphisms (SNPs) are basically single base pair alterations present in the genomic DNA. SNPs is usually treated as one of the most common genetic markers in case of plants, animals as well as the human genome to study the complex genetic traits and evolutionary status of the genome. SNPs are widely used as popular markers due to their continuous presence in the genome, highly reproducible, relatively easy to score. In addition to this, SNPs in coding sequences are used to directly examine the genetics of expressing genes and to study various polymorphic functional traits. Specifically the non-synonymous SNPs are more attractive because they alter the amino acid that ultimately affecting the protein functions. The direct application of SNP exists with pharmacogenomics study and crop improvement. Various strategies have been used for SNP discovery that comes from both observational and computational techniques. SNPs can be detected by laboratory based experimental methods, which are time consuming and expensive also the development costs are high. The implementations of Bioinformatics approach reduce the development cost of SNPs as it uses publicly available sequences from databases like expressed sequence tags (ESTS) that cause the development of SNP markers rapid and less expensive

    Discovery and application of insertion-deletion (INDEL) polymorphisms for QTL mapping of early life-history traits in Atlantic salmon

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>For decades, linkage mapping has been one of the most powerful and widely used approaches for elucidating the genetic architecture of phenotypic traits of medical, agricultural and evolutionary importance. However, successful mapping of Mendelian and quantitative phenotypic traits depends critically on the availability of fast and preferably high-throughput genotyping platforms. Several array-based single nucleotide polymorphism (SNP) genotyping platforms have been developed for genetic model organisms during recent years but most of these methods become prohibitively expensive for screening large numbers of individuals. Therefore, inexpensive, simple and flexible genotyping solutions that enable rapid screening of intermediate numbers of loci (~75-300) in hundreds to thousands of individuals are still needed for QTL mapping applications in a broad range of organisms.</p> <p>Results</p> <p>Here we describe the discovery of and application of insertion-deletion (INDEL) polymorphisms for cost-efficient medium throughput genotyping that enables analysis of >75 loci in a single automated sequencer electrophoresis column with standard laboratory equipment. Genotyping of INDELs requires low start-up costs, includes few standard sample handling steps and is applicable to a broad range of species for which expressed sequence tag (EST) collections are available. As a proof of principle, we generated a partial INDEL linkage map in Atlantic salmon (<it>Salmo salar</it>) and rapidly identified a number of quantitative trait loci (QTLs) affecting early life-history traits that are expected to have important fitness consequences in the natural environment.</p> <p>Conclusions</p> <p>The INDEL genotyping enabled fast coarse-mapping of chromosomal regions containing QTL, thus providing an efficient means for characterization of genetic architecture in multiple crosses and large pedigrees. This enables not only the discovery of larger number of QTLs with relatively smaller phenotypic effect but also provides a cost-effective means for evaluation of the frequency of segregating QTLs in outbred populations which is important for further understanding how genetic variation underlying phenotypic traits is maintained in the wild.</p

    GoSh: a goat and sheep ESTs database.

    Get PDF
    Made available in DSpace on 2018-06-07T01:03:16Z (GMT). No. of bitstreams: 1 ID29151124.pdf: 69304 bytes, checksum: 128ac67dd2da790fae9cf4d1ab49e9df (MD5) Previous issue date: 2008-02-16bitstream/item/178254/1/ID-29151-1-2-4.pd

    Quality assessment parameters for EST-derived SNPs from catfish

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>SNPs are abundant, codominantly inherited, and sequence-tagged markers. They are highly adaptable to large-scale automated genotyping, and therefore, are most suitable for association studies and applicable to comparative genome analysis. However, discovery of SNPs requires genome sequencing efforts through whole genome sequencing or deep sequencing of reduced representation libraries. Such genome resources are not yet available for many species including catfish. A large resource of ESTs is to become available in catfish allowing identification of large number of SNPs, but reliability of EST-derived SNPs are relatively low because of sequencing errors. This project was designed to answer some of the questions relevant to quality assessment of EST-derived SNPs.</p> <p>Results</p> <p>wo factors were found to be most significant for validation of EST-derived SNPs: the contig size (number of sequences in the contig) and the minor allele sequence frequency. The larger the contigs were, the greater the validation rate although the validation rate was reasonably high when the contigs contain four or more EST sequences with the minor allele sequence being represented at least twice in the contigs. Sequence quality surrounding the SNP under test is also crucially important. PCR extension appeared to be limited to a very short distance, prohibiting successful genotyping when an intron was present, a surprising finding.</p> <p>Conclusion</p> <p>Stringent quality assessment measures should be used when working with EST-derived SNPs. In particular, contigs containing four or more ESTs should be used and the minor allele sequence should be represented at least twice. Genotyping primers should be designed from a single exon, completely avoiding introns. Application of such quality assessment measures, along with large resources of ESTs, should provide effective means for SNP identification in species where genome sequence resources are lacking.</p

    SNP-PHAGE – High throughput SNP discovery pipeline

    Get PDF
    BACKGROUND: Single nucleotide polymorphisms (SNPs) as defined here are single base sequence changes or short insertion/deletions between or within individuals of a given species. As a result of their abundance and the availability of high throughput analysis technologies SNP markers have begun to replace other traditional markers such as restriction fragment length polymorphisms (RFLPs), amplified fragment length polymorphisms (AFLPs) and simple sequence repeats (SSRs or microsatellite) markers for fine mapping and association studies in several species. For SNP discovery from chromatogram data, several bioinformatics programs have to be combined to generate an analysis pipeline. Results have to be stored in a relational database to facilitate interrogation through queries or to generate data for further analyses such as determination of linkage disequilibrium and identification of common haplotypes. Although these tasks are routinely performed by several groups, an integrated open source SNP discovery pipeline that can be easily adapted by new groups interested in SNP marker development is currently unavailable. RESULTS: We developed SNP-PHAGE (SNP discovery Pipeline with additional features for identification of common haplotypes within a sequence tagged site (Haplotype Analysis) and GenBank (-dbSNP) submissions. This tool was applied for analyzing sequence traces from diverse soybean genotypes to discover over 10,000 SNPs. This package was developed on UNIX/Linux platform, written in Perl and uses a MySQL database. Scripts to generate a user-friendly web interface are also provided with common queries for preliminary data analysis. A machine learning tool developed by this group for increasing the efficiency of SNP discovery is integrated as a part of this package as an optional feature. The SNP-PHAGE package is being made available open source at . CONCLUSION: SNP-PHAGE provides a bioinformatics solution for high throughput SNP discovery, identification of common haplotypes within an amplicon, and GenBank (dbSNP) submissions. SNP selection and visualization are aided through a user-friendly web interface. This tool is useful for analyzing sequence tagged sites (STSs) of genomic sequences, and this software can serve as a starting point for groups interested in developing SNP markers
    corecore