86 research outputs found

    ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence

    Get PDF
    Background: The possibilities offered by next generation sequencing (NGS) platforms are revolutionizing biotechnological laboratories. Moreover, the combination of NGS sequencing and affordable high-throughput genotyping technologies is facilitating the rapid discovery and use of SNPs in non-model species. However, this abundance of sequences and polymorphisms creates new software needs. To fulfill these needs, we have developed a powerful, yet easy-to-use application. Results: The ngs_backbone software is a parallel pipeline capable of analyzing Sanger, 454, Illumina and SOLiD (Sequencing by Oligonucleotide Ligation and Detection) sequence reads. Its main supported analyses are: read cleaning, transcriptome assembly and annotation, read mapping and single nucleotide polymorphism (SNP) calling and selection. In order to build a truly useful tool, the software development was paired with a laboratory experiment. All public tomato Sanger EST reads plus 14.2 million Illumina reads were employed to test the tool and predict polymorphism in tomato. The cleaned reads were mapped to the SGN tomato transcriptome obtaining a coverage of 4.2 for Sanger and 8.5 for Illumina. 23,360 single nucleotide variations (SNVs) were predicted. A total of 76 SNVs were experimentally validated, and 85% were found to be real. Conclusions: ngs_backbone is a new software package capable of analyzing sequences produced by NGS technologies and predicting SNVs with great accuracy. In our tomato example, we created a highly polymorphic collection of SNVs that will be a useful resource for tomato researchers and breeders. The software developed along with its documentation is freely available under the AGPL license and can be downloaded from http://bioinf. comav.upv.es/ngs_backbone/ or http://github.com/JoseBlanca/franklin.Blanca Postigo, JM.; Pascual Bañuls, L.; Ziarsolo Areitioaurtena, P.; Nuez Viñals, F.; Cañizares Sales, J. (2011). Ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence. BMC Genomics. 12:1-8. doi:10.1186/1471-2164-12-285S1812Metzker ML: Sequencing technologies - the next generation. Nature Reviews Genetics. 2010, 11 (1): 31-46. 10.1038/nrg2626.454 sequencing. [ http://www.454.com/ ]Illumina Inc. [ http://www.illumina.com/ ]Flicek P, Birney E: Sense from sequence reads: methods for alignment and assembly (vol 6, pg S6, 2009). Nature Methods. 2010, 7 (6): 479-479.Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Muller WEG, Wetter T, Suhai S: Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Research. 2004, 14 (6): 1147-1159. 10.1101/gr.1917404.Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25 (14): 1754-1760. 10.1093/bioinformatics/btp324.Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology. 2009, 10 (3):Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data P: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25 (16): 2078-2079. 10.1093/bioinformatics/btp352.1000 Genomes. A deep Catalog of Human Genetic Variation. [ http://1000genomes.org/wiki/doku.php?id=1000_genomes:analysis:vcf4.0 ]The seqanswers internet forum. [ http://seqanswers.com/ ]Blankenberg D, Taylor J, Schenck I, He JB, Zhang Y, Ghent M, Veeraraghavan N, Albert I, Miller W, Makova KD, Ross CH, Nekrutenko A: A framework for collaborative analysis of ENCODE data: Making large-scale analyses biologist-friendly. Genome Research. 2007, 17 (6): 960-964. 10.1101/gr.5578007.CloVR Automated Sequence Analysis from Your Desktop. [ http://clovr.org/ ]Papanicolaou A, Stierli R, Ffrench-Constant RH, Heckel DG: Next generation transcriptomes for next generation genomes using est2assembly. Bmc Bioinformatics. 2009, 10:Applied Biosystems by life technologies. [ http://www.appliedbiosystems.com/absite/us/en/home/applications-technologies/solid-next-generation-sequencing.html ]Wall PK, Leebens-Mack J, Chanderbali AS, Barakat A, Wolcott E, Liang HY, Landherr L, Tomsho LP, Hu Y, Carlson JE, Ma H, Schuster SC, Soltis DE, Soltis PS, Altman N, dePamphilis CW: Comparison of next generation sequencing technologies for transcriptome characterization. Bmc Genomics. 2009, 10:Murchison EP, Tovar C, Hsu A, Bender HS, Kheradpour P, Rebbeck CA, Obendorf D, Conlan C, Bahlo M, Blizzard CA, Pyecroft S, Kreiss A, Kellis M, Stark A, Harkins TT, Marshall Graves JA, Woods GM, Hanon GJ, Papenfuss AT: The Tasmanian Devil Transcriptome Reveals Schwann Cell Origins of a Clonally Transmissible Cancer. Science. 2010, 327 (5961): 84-87. 10.1126/science.1180616.Parchman TL, Geist KS, Grahnen JA, Benkman CW, Buerkle CA: Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. Bmc Genomics. 2010, 11:Babik W, Stuglik M, Qi W, Kuenzli M, Kuduk K, Koteja P, Radwan J: Heart transcriptome of the bank vole (Myodes glareolus): towards understanding the evolutionary variation in metabolic rate. BMC Genomics. 2010, 11: 390-10.1186/1471-2164-11-390.Miller JC, Tanksley SD: RFLP analysis of phylogenetic-relationships and genetic-variation in the genus Lycopersicon. Theoretical and Applied Genetics. 1990, 80 (4): 437-448.Williams CE, Stclair DA: Phenetic relationships and levels of variability detected by restriction-fragment-length-polymorphism and random amplified polymorphic DNA analysis of cultivated and wild accessions of Lycopersicon-esculentum. Genome. 1993, 36 (3): 619-630. 10.1139/g93-083.Rick CM: Tomato, Lycopersicon esculentum (Solanaceae). Evolution of crop plants. Edited by: Simmonds NW. 1976, London: Longman Group, 268-273.Labate JA, Baldo AM: Tomato SNP discovery by EST mining and resequencing. Molecular Breeding. 2005, 16 (4): 343-349. 10.1007/s11032-005-1911-5.Yano K, Watanabe M, Yamamoto N, Maeda F, Tsugane T, Shibata D: Expressed sequence tags (EST) database of a miniature tomato cultivar, Micro-Tom. Plant and Cell Physiology. 2005, 46: S139-S139.Jimenez-Gomez JM, Maloof JN: Sequence diversity in three tomato species: SNPs, markers, and molecular evolution. Bmc Plant Biology. 2009, 9:Yang WC, Bai XD, Kabelka E, Eaton C, Kamoun S, van der Knaap E, Francis D: Discovery of single nucleotide polymorphisms in Lycopersicon esculentum by computer aided analysis of expressed sequence tags. Molecular Breeding. 2004, 14 (1): 21-34.Van Deynze A, Stoffel K, Buell CR, Kozik A, Liu J, van der Knaap E, Francis D: Diversity in conserved genes in tomato. Bmc Genomics. 2007, 8:Sim SC, Robbins MD, Chilcott C, Zhu T, Francis DM: Oligonucleotide array discovery of polymorphisms in cultivated tomato (Solanum lycopersicum L.) reveals patterns of SNP variation associated with breeding. Bmc Genomics. 2009, 10:Bioinformatics at COMAV. [ http://bioinf.comav.upv.es/ngs_backbone/index.html ]Broad institute. [ http://www.broadinstitute.org/igv ]Bioinformatics at COMAV. [ http://bioinf.comav.upv.es/ngs_backbone/install.html ]Github social coding. [ http://github.com/JoseBlanca/franklin ]Chou HH, Holmes MH: DNA sequence quality trimming and vector removal. Bioinformatics. 2001, 17 (12): 1093-1104. 10.1093/bioinformatics/17.12.1093.Picard. [ http://picard.sourceforge.net/index.shtml ]McKenna A, Hanna M, Banks E, Sivachenko A, Citulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research. 2010, 20: 1297-1303. 10.1101/gr.107524.110.Sol Genomics Network. [ ftp://ftp.solgenomics.net/ ]NCBI Genbank. [ http://www.ncbi.nlm.nih.gov/genbank/ ]Gundry CN, Vandersteen JG, Reed GH, Pryor RJ, Chen J, Wittwer CT: Amplicon melting analysis with labeled primers: A closed-tube method for differentiating homozygotes and heterozygotes. Clinical Chemistry. 2003, 49 (3): 396-406. 10.1373/49.3.396

    Sequencing, de novo annotation and analysis of the first Anguilla anguilla transcriptome: EeelBase opens new perspectives for the study of the critically endangered european eel

    Get PDF
    Background: Once highly abundant, the European eel (Anguilla anguilla L.; Anguillidae; Teleostei) is considered to be critically endangered and on the verge of extinction, as the stock has declined by 90-99% since the 1980s. Yet, the species is poorly characterized at molecular level with little sequence information available in public databases.\ud \ud Results: The first European eel transcriptome was obtained by 454 FLX Titanium sequencing of a normalized cDNA library, produced from a pool of 18 glass eels (juveniles) from the French Atlantic coast and two sites in the Mediterranean coast. Over 310,000 reads were assembled in a total of 19,631 transcribed contigs, with an average length of 531 nucleotides. Overall 36% of the contigs were annotated to known protein/nucleotide sequences and 35 putative miRNA identified.\ud \ud Conclusions: This study represents the first transcriptome analysis for a critically endangered species. EeelBase, a dedicated database of annotated transcriptome sequences of the European eel is freely available at http://compgen.bio.unipd.it/eeelbase. Considering the multiple factors potentially involved in the decline of the European eel, including anthropogenic factors such as pollution and human-introduced diseases, our results will provide a rich source of data to discover and identify new genes, characterize gene expression, as well as for identification of genetic markers scattered across the genome to be used in various applications

    Comparing de novo assemblers for 454 transcriptome data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Roche 454 pyrosequencing has become a method of choice for generating transcriptome data from non-model organisms. Once the tens to hundreds of thousands of short (250-450 base) reads have been produced, it is important to correctly assemble these to estimate the sequence of all the transcripts. Most transcriptome assembly projects use only one program for assembling 454 pyrosequencing reads, but there is no evidence that the programs used to date are optimal. We have carried out a systematic comparison of five assemblers (CAP3, MIRA, Newbler, SeqMan and CLC) to establish best practices for transcriptome assemblies, using a new dataset from the parasitic nematode <it>Litomosoides sigmodontis</it>.</p> <p>Results</p> <p>Although no single assembler performed best on all our criteria, Newbler 2.5 gave longer contigs, better alignments to some reference sequences, and was fast and easy to use. SeqMan assemblies performed best on the criterion of recapitulating known transcripts, and had more novel sequence than the other assemblers, but generated an excess of small, redundant contigs. The remaining assemblers all performed almost as well, with the exception of Newbler 2.3 (the version currently used by most assembly projects), which generated assemblies that had significantly lower total length. As different assemblers use different underlying algorithms to generate contigs, we also explored merging of assemblies and found that the merged datasets not only aligned better to reference sequences than individual assemblies, but were also more consistent in the number and size of contigs.</p> <p>Conclusions</p> <p>Transcriptome assemblies are smaller than genome assemblies and thus should be more computationally tractable, but are often harder because individual contigs can have highly variable read coverage. Comparing single assemblers, Newbler 2.5 performed best on our trial data set, but other assemblers were closely comparable. Combining differently optimal assemblies from different programs however gave a more credible final product, and this strategy is recommended.</p

    Transcriptome characterization and polymorphism detection between subspecies of big sagebrush (Artemisia tridentata)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Big sagebrush (<it>Artemisia tridentata</it>) is one of the most widely distributed and ecologically important shrub species in western North America. This species serves as a critical habitat and food resource for many animals and invertebrates. Habitat loss due to a combination of disturbances followed by establishment of invasive plant species is a serious threat to big sagebrush ecosystem sustainability. Lack of genomic data has limited our understanding of the evolutionary history and ecological adaptation in this species. Here, we report on the sequencing of expressed sequence tags (ESTs) and detection of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers in subspecies of big sagebrush.</p> <p>Results</p> <p>cDNA of <it>A. tridentata </it>sspp. <it>tridentata </it>and <it>vaseyana </it>were normalized and sequenced using the 454 GS FLX Titanium pyrosequencing technology. Assembly of the reads resulted in 20,357 contig consensus sequences in ssp. <it>tridentata </it>and 20,250 contigs in ssp. <it>vaseyana</it>. A BLASTx search against the non-redundant (NR) protein database using 29,541 consensus sequences obtained from a combined assembly resulted in 21,436 sequences with significant blast alignments (≤ 1e<sup>-15</sup>). A total of 20,952 SNPs and 119 polymorphic SSRs were detected between the two subspecies. SNPs were validated through various methods including sequence capture. Validation of SNPs in different individuals uncovered a high level of nucleotide variation in EST sequences. EST sequences of a third, tetraploid subspecies (ssp. <it>wyomingensis</it>) obtained by Illumina sequencing were mapped to the consensus sequences of the combined 454 EST assembly. Approximately one-third of the SNPs between sspp. <it>tridentata </it>and <it>vaseyana </it>identified in the combined assembly were also polymorphic within the two geographically distant ssp. <it>wyomingensis </it>samples.</p> <p>Conclusion</p> <p>We have produced a large EST dataset for <it>Artemisia tridentata</it>, which contains a large sample of the big sagebrush leaf transcriptome. SNP mapping among the three subspecies suggest the origin of ssp. <it>wyomingensis </it>via mixed ancestry. A large number of SNP and SSR markers provide the foundation for future research to address questions in big sagebrush evolution, ecological genetics, and conservation using genomic approaches.</p

    Organizational interventions employing principles of complexity science have improved outcomes for patients with Type II diabetes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Despite the development of several models of care delivery for patients with chronic illness, consistent improvements in outcomes have not been achieved. These inconsistent results may be less related to the content of the models themselves, but to their underlying conceptualization of clinical settings as linear, predictable systems. The science of complex adaptive systems (CAS), suggests that clinical settings are non-linear, and increasingly has been used as a framework for describing and understanding clinical systems. The purpose of this study is to broaden the conceptualization by examining the relationship between interventions that leverage CAS characteristics in intervention design and implementation, and effectiveness of reported outcomes for patients with Type II diabetes.</p> <p>Methods</p> <p>We conducted a systematic review of the literature on organizational interventions to improve care of Type II diabetes. For each study we recorded measured process and clinical outcomes of diabetic patients. Two independent reviewers gave each study a score that reflected whether organizational interventions reflected one or more characteristics of a complex adaptive system. The effectiveness of the intervention was assessed by standardizing the scoring of the results of each study as 0 (no effect), 0.5 (mixed effect), or 1.0 (effective).</p> <p>Results</p> <p>Out of 157 potentially eligible studies, 32 met our eligibility criteria. Most studies were felt to utilize at least one CAS characteristic in their intervention designs, and ninety-one percent were scored as either "mixed effect" or "effective." The number of CAS characteristics present in each intervention was associated with effectiveness (p = 0.002). Two individual CAS characteristics were associated with effectiveness: interconnections between participants and co-evolution.</p> <p>Conclusion</p> <p>The significant association between CAS characteristics and effectiveness of reported outcomes for patients with Type II diabetes suggests that complexity science may provide an effective framework for designing and implementing interventions that lead to improved patient outcomes.</p

    Exploring the Switchgrass Transcriptome Using Second-Generation Sequencing Technology

    Get PDF
    Background: Switchgrass (Panicum virgatum L.) is a C4 perennial grass and widely popular as an important bioenergy crop. To accelerate the pace of developing high yielding switchgrass cultivars adapted to diverse environmental niches, the generation of genomic resources for this plant is necessary. The large genome size and polyploid nature of switchgrass makes whole genome sequencing a daunting task even with current technologies. Exploring the transcriptional landscape using next generation sequencing technologies provides a viable alternative to whole genome sequencing in switchgrass. Principal Findings: Switchgrass cDNA libraries from germinating seedlings, emerging tillers, flowers, and dormant seeds were sequenced using Roche 454 GS-FLX Titanium technology, generating 980,000 reads with an average read length of 367 bp. De novo assembly generated 243,600 contigs with an average length of 535 bp. Using the foxtail millet genome as a reference greatly improved the assembly and annotation of switchgrass ESTs. Comparative analysis of the 454-derived switchgrass EST reads with other sequenced monocots including Brachypodium, sorghum, rice and maize indicated a 70– 80 % overlap. RPKM analysis demonstrated unique transcriptional signatures of the four tissues analyzed in this study. More than 24,000 ESTs were identified in the dormant seed library. In silico analysis indicated that there are more than 2000 EST-SSRs in this collection. Expression of several orphan ESTs was confirmed by RT-PCR. Significance: We estimate that about 90 % of the switchgrass gene space has been covered in this analysis. This study nearl

    De novo sequencing and characterization of floral transcriptome in two species of buckwheat (Fagopyrum)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transcriptome sequencing data has become an integral component of modern genetics, genomics and evolutionary biology. However, despite advances in the technologies of DNA sequencing, such data are lacking for many groups of living organisms, in particular, many plant taxa. We present here the results of transcriptome sequencing for two closely related plant species. These species, <it>Fagopyrum esculentum </it>and <it>F. tataricum</it>, belong to the order Caryophyllales - a large group of flowering plants with uncertain evolutionary relationships. <it>F. esculentum </it>(common buckwheat) is also an important food crop. Despite these practical and evolutionary considerations <it>Fagopyrum </it>species have not been the subject of large-scale sequencing projects.</p> <p>Results</p> <p>Normalized cDNA corresponding to genes expressed in flowers and inflorescences of <it>F. esculentum </it>and <it>F. tataricum </it>was sequenced using the 454 pyrosequencing technology. This resulted in 267 (for <it>F. esculentum</it>) and 229 (<it>F. tataricum</it>) thousands of reads with average length of 341-349 nucleotides. <it>De novo </it>assembly of the reads produced about 25 thousands of contigs for each species, with 7.5-8.2× coverage. Comparative analysis of two transcriptomes demonstrated their overall similarity but also revealed genes that are presumably differentially expressed. Among them are retrotransposon genes and genes involved in sugar biosynthesis and metabolism. Thirteen single-copy genes were used for phylogenetic analysis; the resulting trees are largely consistent with those inferred from multigenic plastid datasets. The sister relationships of the Caryophyllales and asterids now gained high support from nuclear gene sequences.</p> <p>Conclusions</p> <p>454 transcriptome sequencing and <it>de novo </it>assembly was performed for two congeneric flowering plant species, <it>F. esculentum </it>and <it>F. tataricum</it>. As a result, a large set of cDNA sequences that represent orthologs of known plant genes as well as potential new genes was generated.</p

    Transcriptome characterization of the South African abalone Haliotis midae using sequencing-by-synthesis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Worldwide, the genus <it>Haliotis </it>is represented by 56 extant species and several of these are commercially cultured. Among the six abalone species found in South Africa, <it>Haliotis midae </it>is the only aquacultured species. Despite its economic importance, genomic sequence resources for <it>H. midae</it>, and for abalone in general, are still scarce. Next generation sequencing technologies provide a fast and efficient tool to generate large sequence collections that can be used to characterize the transcriptome and identify expressed genes associated with economically important traits like growth and disease resistance.</p> <p>Results</p> <p>More than 25 million short reads generated by the Illumina Genome Analyzer were <it>de novo </it>assembled in 22,761 contigs with an average size of 260 bp. With a stringent <it>E</it>-value threshold of 10<sup>-10</sup>, 3,841 contigs (16.8%) had a BLAST homologous match against the Genbank non-redundant (NR) protein database. Most of these sequences were annotated using the gene ontology (GO) and eukaryotic orthologous groups of proteins (KOG) databases and assigned to various functional categories. According to annotation results, many gene families involved in immune response were identified. Thousands of simple sequence repeats (SSR) and single nucleotide polymorphisms (SNP) were detected. Setting stringent parameters to ensure a high probability of amplification, 420 primer pairs in 181 contigs containing SSR loci were designed.</p> <p>Conclusion</p> <p>This data represents the most comprehensive genomic resource for the South African abalone <it>H. midae </it>to date. The amount of assembled sequences demonstrated the utility of the Illumina sequencing technology in the transcriptome characterization of a non-model species. It allowed the development of several markers and the identification of promising candidate genes for future studies on population and functional genomics in <it>H. midae </it>and in other abalone species.</p
    corecore