330 research outputs found

    Genetic Competence Drives Genome Diversity in Bacillus subtilis

    Get PDF
    This deposit is composed by the main article plus the supplementary materials of the publication.Prokaryote genomes are the result of a dynamic flux of genes, with increases achieved via horizontal gene transfer and reductions occurring through gene loss. The ecological and selective forces that drive this genomic flexibility vary across species. Bacillus subtilis is a naturally competent bacterium that occupies various environments, including plant-associated, soil, and marine niches, and the gut of both invertebrates and vertebrates. Here, we quantify the genomic diversity of B. subtilis and infer the genome dynamics that explain the high genetic and phenotypic diversity observed. Phylogenomic and comparative genomic analyses of 42 B. subtilis genomes uncover a remarkable genome diversity that translates into a core genome of 1,659 genes and an asymptotic pangenome growth rate of 57 new genes per new genome added. This diversity is due to a large proportion of low-frequency genes that are acquired from closely related species. We find no gene-loss bias among wild isolates, which explains why the cloud genome, 43% of the species pangenome, represents only a small proportion of each genome. We show that B. subtilis can acquire xenologous copies of core genes that propagate laterally among strains within a niche. While not excluding the contributions of other mechanisms, our results strongly suggest a process of gene acquisition that is largely driven by competence, where the long-term maintenance of acquired genes depends on local and global fitness effects. This competence-driven genomic diversity provides B. subtilis with its generalist character, enabling it to occupy a wide range of ecological niches and cycle through them.Fundação para a Ciência e a Tecnologia grants: (PTDC/EBB-BIO/119006/2010, PEst-OE/EQB/LA0004/2011, SFRH/BPD/89907/2012, SFRH/BD/29397/06); FEDER grant: (LISBOA-01-0145-FEDER-007660).info:eu-repo/semantics/publishedVersio

    4Pipe4-A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information

    Get PDF
    This work was fully supported by projects SOBREIRO/0036/2009 (under the framework of the Cork Oak ESTs Consortium), PTDC/BIA-BEC/098783/2008 and PTDC/AGR-GPL/119943/2010 from Fundação para a Ciência e Tecnologia (FCT) – Portugal. F. Pina-Martins was funded by FCT grant SFRH/BD/51411/2011, under the PhD program “Biology and Ecology of Global Changes”, Univ. Aveiro & Univ. Lisbon, Portugal. D. Batista was funded by FCT grant SFRH/BPD/104629/2014

    Next generation transcriptomes for next generation genomes using est2assembly

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The decreasing costs of capillary-based Sanger sequencing and next generation technologies, such as 454 pyrosequencing, have prompted an explosion of transcriptome projects in non-model species, where even shallow sequencing of transcriptomes can now be used to examine a range of research questions. This rapid growth in data has outstripped the ability of researchers working on non-model species to analyze and mine transcriptome data efficiently.</p> <p>Results</p> <p>Here we present a semi-automated platform '<it>est2assembly</it>' that processes raw sequence data from Sanger or 454 sequencing into a hybrid <it>de-novo </it>assembly, annotates it and produces GMOD compatible output, including a SeqFeature database suitable for GBrowse. Users are able to parameterize assembler variables, judge assembly quality and determine the optimal assembly for their specific needs. We used <it>est2assembly </it>to process <it>Drosophila </it>and <it>Bicyclus </it>public Sanger EST data and then compared them to published 454 data as well as eight new insect transcriptome collections.</p> <p>Conclusions</p> <p>Analysis of such a wide variety of data allows us to understand how these new technologies can assist EST project design. We determine that assembler parameterization is as essential as standardized methods to judge the output of ESTs projects. Further, even shallow sequencing using 454 produces sufficient data to be of wide use to the community. <it>est2assembly </it>is an important tool to assist manual curation for gene models, an important resource in their own right but especially for species which are due to acquire a genome project using Next Generation Sequencing.</p

    ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence

    Get PDF
    Background: The possibilities offered by next generation sequencing (NGS) platforms are revolutionizing biotechnological laboratories. Moreover, the combination of NGS sequencing and affordable high-throughput genotyping technologies is facilitating the rapid discovery and use of SNPs in non-model species. However, this abundance of sequences and polymorphisms creates new software needs. To fulfill these needs, we have developed a powerful, yet easy-to-use application. Results: The ngs_backbone software is a parallel pipeline capable of analyzing Sanger, 454, Illumina and SOLiD (Sequencing by Oligonucleotide Ligation and Detection) sequence reads. Its main supported analyses are: read cleaning, transcriptome assembly and annotation, read mapping and single nucleotide polymorphism (SNP) calling and selection. In order to build a truly useful tool, the software development was paired with a laboratory experiment. All public tomato Sanger EST reads plus 14.2 million Illumina reads were employed to test the tool and predict polymorphism in tomato. The cleaned reads were mapped to the SGN tomato transcriptome obtaining a coverage of 4.2 for Sanger and 8.5 for Illumina. 23,360 single nucleotide variations (SNVs) were predicted. A total of 76 SNVs were experimentally validated, and 85% were found to be real. Conclusions: ngs_backbone is a new software package capable of analyzing sequences produced by NGS technologies and predicting SNVs with great accuracy. In our tomato example, we created a highly polymorphic collection of SNVs that will be a useful resource for tomato researchers and breeders. The software developed along with its documentation is freely available under the AGPL license and can be downloaded from http://bioinf. comav.upv.es/ngs_backbone/ or http://github.com/JoseBlanca/franklin.Blanca Postigo, JM.; Pascual Bañuls, L.; Ziarsolo Areitioaurtena, P.; Nuez Viñals, F.; Cañizares Sales, J. (2011). Ngs_backbone: a pipeline for read cleaning, mapping and SNP calling using Next Generation Sequence. BMC Genomics. 12:1-8. doi:10.1186/1471-2164-12-285S1812Metzker ML: Sequencing technologies - the next generation. Nature Reviews Genetics. 2010, 11 (1): 31-46. 10.1038/nrg2626.454 sequencing. [ http://www.454.com/ ]Illumina Inc. [ http://www.illumina.com/ ]Flicek P, Birney E: Sense from sequence reads: methods for alignment and assembly (vol 6, pg S6, 2009). Nature Methods. 2010, 7 (6): 479-479.Chevreux B, Pfisterer T, Drescher B, Driesel AJ, Muller WEG, Wetter T, Suhai S: Using the miraEST assembler for reliable and automated mRNA transcript assembly and SNP detection in sequenced ESTs. Genome Research. 2004, 14 (6): 1147-1159. 10.1101/gr.1917404.Li H, Durbin R: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009, 25 (14): 1754-1760. 10.1093/bioinformatics/btp324.Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biology. 2009, 10 (3):Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data P: The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009, 25 (16): 2078-2079. 10.1093/bioinformatics/btp352.1000 Genomes. A deep Catalog of Human Genetic Variation. [ http://1000genomes.org/wiki/doku.php?id=1000_genomes:analysis:vcf4.0 ]The seqanswers internet forum. [ http://seqanswers.com/ ]Blankenberg D, Taylor J, Schenck I, He JB, Zhang Y, Ghent M, Veeraraghavan N, Albert I, Miller W, Makova KD, Ross CH, Nekrutenko A: A framework for collaborative analysis of ENCODE data: Making large-scale analyses biologist-friendly. Genome Research. 2007, 17 (6): 960-964. 10.1101/gr.5578007.CloVR Automated Sequence Analysis from Your Desktop. [ http://clovr.org/ ]Papanicolaou A, Stierli R, Ffrench-Constant RH, Heckel DG: Next generation transcriptomes for next generation genomes using est2assembly. Bmc Bioinformatics. 2009, 10:Applied Biosystems by life technologies. [ http://www.appliedbiosystems.com/absite/us/en/home/applications-technologies/solid-next-generation-sequencing.html ]Wall PK, Leebens-Mack J, Chanderbali AS, Barakat A, Wolcott E, Liang HY, Landherr L, Tomsho LP, Hu Y, Carlson JE, Ma H, Schuster SC, Soltis DE, Soltis PS, Altman N, dePamphilis CW: Comparison of next generation sequencing technologies for transcriptome characterization. Bmc Genomics. 2009, 10:Murchison EP, Tovar C, Hsu A, Bender HS, Kheradpour P, Rebbeck CA, Obendorf D, Conlan C, Bahlo M, Blizzard CA, Pyecroft S, Kreiss A, Kellis M, Stark A, Harkins TT, Marshall Graves JA, Woods GM, Hanon GJ, Papenfuss AT: The Tasmanian Devil Transcriptome Reveals Schwann Cell Origins of a Clonally Transmissible Cancer. Science. 2010, 327 (5961): 84-87. 10.1126/science.1180616.Parchman TL, Geist KS, Grahnen JA, Benkman CW, Buerkle CA: Transcriptome sequencing in an ecologically important tree species: assembly, annotation, and marker discovery. Bmc Genomics. 2010, 11:Babik W, Stuglik M, Qi W, Kuenzli M, Kuduk K, Koteja P, Radwan J: Heart transcriptome of the bank vole (Myodes glareolus): towards understanding the evolutionary variation in metabolic rate. BMC Genomics. 2010, 11: 390-10.1186/1471-2164-11-390.Miller JC, Tanksley SD: RFLP analysis of phylogenetic-relationships and genetic-variation in the genus Lycopersicon. Theoretical and Applied Genetics. 1990, 80 (4): 437-448.Williams CE, Stclair DA: Phenetic relationships and levels of variability detected by restriction-fragment-length-polymorphism and random amplified polymorphic DNA analysis of cultivated and wild accessions of Lycopersicon-esculentum. Genome. 1993, 36 (3): 619-630. 10.1139/g93-083.Rick CM: Tomato, Lycopersicon esculentum (Solanaceae). Evolution of crop plants. Edited by: Simmonds NW. 1976, London: Longman Group, 268-273.Labate JA, Baldo AM: Tomato SNP discovery by EST mining and resequencing. Molecular Breeding. 2005, 16 (4): 343-349. 10.1007/s11032-005-1911-5.Yano K, Watanabe M, Yamamoto N, Maeda F, Tsugane T, Shibata D: Expressed sequence tags (EST) database of a miniature tomato cultivar, Micro-Tom. Plant and Cell Physiology. 2005, 46: S139-S139.Jimenez-Gomez JM, Maloof JN: Sequence diversity in three tomato species: SNPs, markers, and molecular evolution. Bmc Plant Biology. 2009, 9:Yang WC, Bai XD, Kabelka E, Eaton C, Kamoun S, van der Knaap E, Francis D: Discovery of single nucleotide polymorphisms in Lycopersicon esculentum by computer aided analysis of expressed sequence tags. Molecular Breeding. 2004, 14 (1): 21-34.Van Deynze A, Stoffel K, Buell CR, Kozik A, Liu J, van der Knaap E, Francis D: Diversity in conserved genes in tomato. Bmc Genomics. 2007, 8:Sim SC, Robbins MD, Chilcott C, Zhu T, Francis DM: Oligonucleotide array discovery of polymorphisms in cultivated tomato (Solanum lycopersicum L.) reveals patterns of SNP variation associated with breeding. Bmc Genomics. 2009, 10:Bioinformatics at COMAV. [ http://bioinf.comav.upv.es/ngs_backbone/index.html ]Broad institute. [ http://www.broadinstitute.org/igv ]Bioinformatics at COMAV. [ http://bioinf.comav.upv.es/ngs_backbone/install.html ]Github social coding. [ http://github.com/JoseBlanca/franklin ]Chou HH, Holmes MH: DNA sequence quality trimming and vector removal. Bioinformatics. 2001, 17 (12): 1093-1104. 10.1093/bioinformatics/17.12.1093.Picard. [ http://picard.sourceforge.net/index.shtml ]McKenna A, Hanna M, Banks E, Sivachenko A, Citulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA: The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research. 2010, 20: 1297-1303. 10.1101/gr.107524.110.Sol Genomics Network. [ ftp://ftp.solgenomics.net/ ]NCBI Genbank. [ http://www.ncbi.nlm.nih.gov/genbank/ ]Gundry CN, Vandersteen JG, Reed GH, Pryor RJ, Chen J, Wittwer CT: Amplicon melting analysis with labeled primers: A closed-tube method for differentiating homozygotes and heterozygotes. Clinical Chemistry. 2003, 49 (3): 396-406. 10.1373/49.3.396

    Broad Phylogenomic Sampling and the Sister Lineage of Land Plants

    Get PDF
    The tremendous diversity of land plants all descended from a single charophyte green alga that colonized the land somewhere between 430 and 470 million years ago. Six orders of charophyte green algae, in addition to embryophytes, comprise the Streptophyta s.l. Previous studies have focused on reconstructing the phylogeny of organisms tied to this key colonization event, but wildly conflicting results have sparked a contentious debate over which lineage gave rise to land plants. The dominant view has been that ‘stoneworts,’ or Charales, are the sister lineage, but an alternative hypothesis supports the Zygnematales (often referred to as “pond scum”) as the sister lineage. In this paper, we provide a well-supported, 160-nuclear-gene phylogenomic analysis supporting the Zygnematales as the closest living relative to land plants. Our study makes two key contributions to the field: 1) the use of an unbiased method to collect a large set of orthologs from deeply diverging species and 2) the use of these data in determining the sister lineage to land plants. We anticipate this updated phylogeny not only will hugely impact lesson plans in introductory biology courses, but also will provide a solid phylogenetic tree for future green-lineage research, whether it be related to plants or green algae

    High-throughput sequencing and analysis of the gill tissue transcriptome from the deep-sea hydrothermal vent mussel Bathymodiolus azoricus

    Get PDF
    © The Authors, 2010. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in BMC Genomics 11 (2010): 559, doi:10.1186/1471-2164-11-559.Bathymodiolus azoricus is a deep-sea hydrothermal vent mussel found in association with large faunal communities living in chemosynthetic environments at the bottom of the sea floor near the Azores Islands. Investigation of the exceptional physiological reactions that vent mussels have adopted in their habitat, including responses to environmental microbes, remains a difficult challenge for deep-sea biologists. In an attempt to reveal genes potentially involved in the deep-sea mussel innate immunity we carried out a high-throughput sequence analysis of freshly collected B. azoricus transcriptome using gills tissues as the primary source of immune transcripts given its strategic role in filtering the surrounding waterborne potentially infectious microorganisms. Additionally, a substantial EST data set was produced and from which a comprehensive collection of genes coding for putative proteins was organized in a dedicated database, "DeepSeaVent" the first deep-sea vent animal transcriptome database based on the 454 pyrosequencing technology. A normalized cDNA library from gills tissue was sequenced in a full 454 GS-FLX run, producing 778,996 sequencing reads. Assembly of the high quality reads resulted in 75,407 contigs of which 3,071 were singletons. A total of 39,425 transcripts were conceptually translated into amino-sequences of which 22,023 matched known proteins in the NCBI non-redundant protein database, 15,839 revealed conserved protein domains through InterPro functional classification and 9,584 were assigned with Gene Ontology terms. Queries conducted within the database enabled the identification of genes putatively involved in immune and inflammatory reactions which had not been previously evidenced in the vent mussel. Their physical counterpart was confirmed by semi-quantitative quantitative Reverse-Transcription-Polymerase Chain Reactions (RT-PCR) and their RNA transcription level by quantitative PCR (qPCR) experiments. We have established the first tissue transcriptional analysis of a deep-sea hydrothermal vent animal and generated a searchable catalog of genes that provides a direct method of identifying and retrieving vast numbers of novel coding sequences which can be applied in gene expression profiling experiments from a non-conventional model organism. This provides the most comprehensive sequence resource for identifying novel genes currently available for a deep-sea vent organism, in particular, genes putatively involved in immune and inflammatory reactions in vent mussels. The characterization of the B. azoricus transcriptome will facilitate research into biological processes underlying physiological adaptations to hydrothermal vent environments and will provide a basis for expanding our understanding of genes putatively involved in adaptations processes during post-capture long term acclimatization experiments, at "sea-level" conditions, using B. azoricus as a model organism.We acknowledge the Portuguese Foundation for Science and Technology, FCT-Lisbon and the Regional Azorean Directorate for Science and Technology, DRCT-Azores, for pluri-annual and programmatic PIDDAC and FEDER funding to IMAR/DOP Research Unit #531 and the Associated Laboratory #9 (ISR-Lisboa); the Luso-American Foundation FLAD (Project L-V- 173/2006); the Biotechnology and Biomedicine Institute of the Azores (IBBA), project M.2.1.2/I/029/2008-BIODEEPSEA and the project n° FCOMP-01-0124- FEDER-007376 (ref: FCT PTDC/MAR/65991/2006-IMUNOVENT; coordinated by RB) under the auspices of the COMPETE program

    Sequencing, de novo annotation and analysis of the first Anguilla anguilla transcriptome: EeelBase opens new perspectives for the study of the critically endangered european eel

    Get PDF
    Background: Once highly abundant, the European eel (Anguilla anguilla L.; Anguillidae; Teleostei) is considered to be critically endangered and on the verge of extinction, as the stock has declined by 90-99% since the 1980s. Yet, the species is poorly characterized at molecular level with little sequence information available in public databases.\ud \ud Results: The first European eel transcriptome was obtained by 454 FLX Titanium sequencing of a normalized cDNA library, produced from a pool of 18 glass eels (juveniles) from the French Atlantic coast and two sites in the Mediterranean coast. Over 310,000 reads were assembled in a total of 19,631 transcribed contigs, with an average length of 531 nucleotides. Overall 36% of the contigs were annotated to known protein/nucleotide sequences and 35 putative miRNA identified.\ud \ud Conclusions: This study represents the first transcriptome analysis for a critically endangered species. EeelBase, a dedicated database of annotated transcriptome sequences of the European eel is freely available at http://compgen.bio.unipd.it/eeelbase. Considering the multiple factors potentially involved in the decline of the European eel, including anthropogenic factors such as pollution and human-introduced diseases, our results will provide a rich source of data to discover and identify new genes, characterize gene expression, as well as for identification of genetic markers scattered across the genome to be used in various applications

    Spinning Gland Transcriptomics from Two Main Clades of Spiders (Order: Araneae) - Insights on Their Molecular, Anatomical and Behavioral Evolution

    Get PDF
    Characterized by distinctive evolutionary adaptations, spiders provide a comprehensive system for evolutionary and developmental studies of anatomical organs, including silk and venom production. Here we performed cDNA sequencing using massively parallel sequencers (454 GS-FLX Titanium) to generate ∼80,000 reads from the spinning gland of Actinopus spp. (infraorder: Mygalomorphae) and Gasteracantha cancriformis (infraorder: Araneomorphae, Orbiculariae clade). Actinopus spp. retains primitive characteristics on web usage and presents a single undifferentiated spinning gland while the orbiculariae spiders have seven differentiated spinning glands and complex patterns of web usage. MIRA, Celera Assembler and CAP3 software were used to cluster NGS reads for each spider. CAP3 unigenes passed through a pipeline for automatic annotation, classification by biological function, and comparative transcriptomics. Genes related to spider silks were manually curated and analyzed. Although a single spidroin gene family was found in Actinopus spp., a vast repertoire of specialized spider silk proteins was encountered in orbiculariae. Astacin-like metalloproteases (meprin subfamily) were shown to be some of the most sampled unigenes and duplicated gene families in G. cancriformis since its evolutionary split from mygalomorphs. Our results confirm that the evolution of the molecular repertoire of silk proteins was accompanied by the (i) anatomical differentiation of spinning glands and (ii) behavioral complexification in the web usage. Finally, a phylogenetic tree was constructed to cluster most of the known spidroins in gene clades. This is the first large-scale, multi-organism transcriptome for spider spinning glands and a first step into a broad understanding of spider web systems biology and evolution
    corecore