256 research outputs found

    Sequence verification of synthetic DNA by assembly of sequencing reads

    Get PDF
    Gene synthesis attempts to assemble user-defined DNA sequences with base-level precision. Verifying the sequences of construction intermediates and the final product of a gene synthesis project is a critical part of the workflow, yet one that has received the least attention. Sequence validation is equally important for other kinds of curated clone collections. Ensuring that the physical sequence of a clone matches its published sequence is a common quality control step performed at least once over the course of a research project. GenoREAD is a web-based application that breaks the sequence verification process into two steps: the assembly of sequencing reads and the alignment of the resulting contig with a reference sequence. GenoREAD can determine if a clone matches its reference sequence. Its sophisticated reporting features help identify and troubleshoot problems that arise during the sequence verification process. GenoREAD has been experimentally validated on thousands of gene-sized constructs from an ORFeome project, and on longer sequences including whole plasmids and synthetic chromosomes. Comparing GenoREAD results with those from manual analysis of the sequencing data demonstrates that GenoREAD tends to be conservative in its diagnostic. GenoREAD is available at www.genoread.or

    Sequence verification of synthetic DNA by assembly of sequencing reads

    Get PDF
    This is the publisher’s final pdf. The published article is copyrighted by Oxford University Press and can be found at: http://www.oxfordjournals.org/Gene synthesis attempts to assemble user-defined DNA sequences with base-level precision. Verifying the sequences of construction intermediates and the final product of a gene synthesis project is a critical part of the workflow, yet one that has received the least attention. Sequence validation is equally important for other kinds of curated clone collections. Ensuring that the physical sequence of a clone matches its published sequence is a common quality control step performed at least once over the course of a research project. GenoREAD is a web-based application that breaks the sequence verification process into two steps: the assembly of sequencing reads and the alignment of the resulting contig with a reference sequence. GenoREAD can determine if a clone matches its reference sequence. Its sophisticated reporting features help identify and troubleshoot problems that arise during the sequence verification process. GenoREAD has been experimentally validated on thousands of gene-sized constructs from an ORFeome project, and on longer sequences including whole plasmids and synthetic chromosomes. Comparing GenoREAD results with those from manual analysis of the sequencing data demonstrates that GenoREAD tends to be conservative in its diagnostic. GenoREAD is available at www.genoread.org

    Estimating DNA coverage and abundance in metagenomes using a gamma approximation

    Get PDF
    Motivation: Shotgun sequencing generates large numbers of short DNA reads from either an isolated organism or, in the case of metagenomics projects, from the aggregate genome of a microbial community. These reads are then assembled based on overlapping sequences into larger, contiguous sequences (contigs). The feasibility of assembly and the coverage achieved (reads per nucleotide or distinct sequence of nucleotides) depend on several factors: the number of reads sequenced, the read length and the relative abundances of their source genomes in the microbial community. A low coverage suggests that most of the genomic DNA in the sample has not been sequenced, but it is often difficult to estimate either the extent of the uncaptured diversity or the amount of additional sequencing that would be most efficacious. In this work, we regard a metagenome as a population of DNA fragments (bins), each of which may be covered by one or more reads. We employ a gamma distribution to model this bin population due to its flexibility and ease of use. When a gamma approximation can be found that adequately fits the data, we may estimate the number of bins that were not sequenced and that could potentially be revealed by additional sequencing. We evaluated the performance of this model using simulated metagenomes and demonstrate its applicability on three recent metagenomic datasets

    4Pipe4-A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information

    Get PDF
    This work was fully supported by projects SOBREIRO/0036/2009 (under the framework of the Cork Oak ESTs Consortium), PTDC/BIA-BEC/098783/2008 and PTDC/AGR-GPL/119943/2010 from Fundação para a Ciência e Tecnologia (FCT) – Portugal. F. Pina-Martins was funded by FCT grant SFRH/BD/51411/2011, under the PhD program “Biology and Ecology of Global Changes”, Univ. Aveiro & Univ. Lisbon, Portugal. D. Batista was funded by FCT grant SFRH/BPD/104629/2014

    Gene expression atlas of fruit ripening and transcriptome assembly from RNA-seq data in octoploid strawberry (Fragaria × ananassa)

    Get PDF
    RNA-seq has been used to perform global expression analysis of the achene and the receptacle at four stages of fruit ripening, and of the roots and leaves of strawberry (Fragaria × ananassa). About 967 million reads and 191 Gb of sequence were produced, using Illumina sequencing. Mapping the reads in the related genome of the wild diploid Fragaria vesca revealed differences between the achene and receptacle development program, and reinforced the role played by ethylene in the ripening receptacle. For the strawberry transcriptome assembly, a de novo strategy was followed, generating separate assemblies for each of the ten tissues and stages sampled. The Trinity program was used for these assemblies, resulting in over 1.4 M isoforms. Filtering by a threshold of 0.3 FPKM, and doing Blastx (E-value < 1 e-30) against the UniProt database of plants reduced the number to 472,476 isoforms. Their assembly with the MIRA program (90% homology) resulted in 26,087 contigs. From these, 91.34 percent showed high homology to Fragaria vesca genes and 87.30 percent Fragaria iinumae (BlastN E-value < 1 e-100). Mapping back the reads on the MIRA contigs identified polymorphisms at nucleotide level, using FREEBAYES, as well as estimate their relative abundance in each sample

    Spinning Gland Transcriptomics from Two Main Clades of Spiders (Order: Araneae) - Insights on Their Molecular, Anatomical and Behavioral Evolution

    Get PDF
    Characterized by distinctive evolutionary adaptations, spiders provide a comprehensive system for evolutionary and developmental studies of anatomical organs, including silk and venom production. Here we performed cDNA sequencing using massively parallel sequencers (454 GS-FLX Titanium) to generate ∼80,000 reads from the spinning gland of Actinopus spp. (infraorder: Mygalomorphae) and Gasteracantha cancriformis (infraorder: Araneomorphae, Orbiculariae clade). Actinopus spp. retains primitive characteristics on web usage and presents a single undifferentiated spinning gland while the orbiculariae spiders have seven differentiated spinning glands and complex patterns of web usage. MIRA, Celera Assembler and CAP3 software were used to cluster NGS reads for each spider. CAP3 unigenes passed through a pipeline for automatic annotation, classification by biological function, and comparative transcriptomics. Genes related to spider silks were manually curated and analyzed. Although a single spidroin gene family was found in Actinopus spp., a vast repertoire of specialized spider silk proteins was encountered in orbiculariae. Astacin-like metalloproteases (meprin subfamily) were shown to be some of the most sampled unigenes and duplicated gene families in G. cancriformis since its evolutionary split from mygalomorphs. Our results confirm that the evolution of the molecular repertoire of silk proteins was accompanied by the (i) anatomical differentiation of spinning glands and (ii) behavioral complexification in the web usage. Finally, a phylogenetic tree was constructed to cluster most of the known spidroins in gene clades. This is the first large-scale, multi-organism transcriptome for spider spinning glands and a first step into a broad understanding of spider web systems biology and evolution

    Cryptic species in a well-known habitat: applying taxonomics to the amphipod genus Epimeria (Crustacea, Peracarida)

    Get PDF
    Taxonomy plays a central role in biological sciences. It provides a communication system for scientists as it aims to enable correct identification of the studied organisms. As a consequence, species descriptions should seek to include as much available information as possible at species level to follow an integrative concept of ‘taxonomics’. Here, we describe the cryptic species Epimeria frankei sp. nov. from the North Sea, and also redescribe its sister species, Epimeria cornigera. The morphological information obtained is substantiated by DNA barcodes and complete nuclear 18S rRNA gene sequences. In addition, we provide, for the first time, full mitochondrial genome data as part of a metazoan species description for a holotype, as well as the neotype. This study represents the first successful implementation of the recently proposed concept of taxonomics, using data from highthroughput technologies for integrative taxonomic studies, allowing the highest level of confidence for both biodiversity and ecological research

    Broad Phylogenomic Sampling and the Sister Lineage of Land Plants

    Get PDF
    The tremendous diversity of land plants all descended from a single charophyte green alga that colonized the land somewhere between 430 and 470 million years ago. Six orders of charophyte green algae, in addition to embryophytes, comprise the Streptophyta s.l. Previous studies have focused on reconstructing the phylogeny of organisms tied to this key colonization event, but wildly conflicting results have sparked a contentious debate over which lineage gave rise to land plants. The dominant view has been that ‘stoneworts,’ or Charales, are the sister lineage, but an alternative hypothesis supports the Zygnematales (often referred to as “pond scum”) as the sister lineage. In this paper, we provide a well-supported, 160-nuclear-gene phylogenomic analysis supporting the Zygnematales as the closest living relative to land plants. Our study makes two key contributions to the field: 1) the use of an unbiased method to collect a large set of orthologs from deeply diverging species and 2) the use of these data in determining the sister lineage to land plants. We anticipate this updated phylogeny not only will hugely impact lesson plans in introductory biology courses, but also will provide a solid phylogenetic tree for future green-lineage research, whether it be related to plants or green algae
    corecore