25 research outputs found

    SOLiD sequencing of four Vibrio vulnificus genomes enables comparative genomic analysis and identification of candidate clade-specific virulence genes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Vibrio vulnificus </it>is the leading cause of reported death from consumption of seafood in the United States. Despite several decades of research on molecular pathogenesis, much remains to be learned about the mechanisms of virulence of this opportunistic bacterial pathogen. The two complete and annotated genomic DNA sequences of <it>V. vulnificus </it>belong to strains of clade 2, which is the predominant clade among clinical strains. Clade 2 strains generally possess higher virulence potential in animal models of disease compared with clade 1, which predominates among environmental strains. SOLiD sequencing of four <it>V. vulnificus </it>strains representing different clades (1 and 2) and biotypes (1 and 2) was used for comparative genomic analysis.</p> <p>Results</p> <p>Greater than 4,100,000 bases were sequenced of each strain, yielding approximately 100-fold coverage for each of the four genomes. Although the read lengths of SOLiD genomic sequencing were only 35 nt, we were able to make significant conclusions about the unique and shared sequences among the genomes, including identification of single nucleotide polymorphisms. Comparative analysis of the newly sequenced genomes to the existing reference genomes enabled the identification of 3,459 core <it>V. vulnificus </it>genes shared among all six strains and 80 clade 2-specific genes. We identified 523,161 SNPs among the six genomes.</p> <p>Conclusions</p> <p>We were able to glean much information about the genomic content of each strain using next generation sequencing. Flp pili, GGDEF proteins, and genomic island XII were identified as possible virulence factors because of their presence in virulent sequenced strains. Genomic comparisons also point toward the involvement of sialic acid catabolism in pathogenesis.</p

    Partitioning Transcript Variation in Drosophila: Abundance, Isoforms, and Alleles

    Get PDF
    Multilevel analysis of transcription is facilitated by a new array design that includes modules for assessment of differential expression, isoform usage, and allelic imbalance in Drosophila. The ∼2.5 million feature chip incorporates a large number of controls, and it contains 18,769 3′ expression probe sets and 61,919 exon probe sets with probe sequences from Drosophila melanogaster and 60,118 SNP probe sets focused on Drosophila simulans. An experiment in D. simulans identified genes differentially expressed between males and females (34% in the 3′ expression module; 32% in the exon module). These proportions are consistent with previous reports, and there was good agreement (κ = 0.63) between the modules. Alternative isoform usage between the sexes was identified for 164 genes. The SNP module was verified with resequencing data. Concordance between resequencing and the chip design was greater than 99%. The design also proved apt in separating alleles based upon hybridization intensity. Concordance between the highest hybridization signals and the expected alleles in the genotype was greater than 96%. Intriguingly, allelic imbalance was detected for 37% of 6579 probe sets examined that contained heterozygous SNP loci. The large number of probes and multiple probe sets per gene in the 3′ expression and exon modules allows the array to be used in D. melanogaster and in closely related species. The SNP module can be used for allele specific expression and genotyping of D. simulans

    Ensembl Genomes 2016: more genomes, more complexity

    Get PDF
    Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including reference sequence, gene models, transcriptional data, genetic variation and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments. These include the development of new analyses and views to represent polyploid genomes (of which bread wheat is the primary exemplar); and the continued up-scaling of the resource, which now includes over 23 000 bacterial genomes, 400 fungal genomes and 100 protist genomes, in addition to 55 genomes from invertebrate metazoa and 39 genomes from plants. This dramatic increase in the number of included genomes is one part of a broader effort to automate the integration of archival data (genome sequence, but also associated RNA sequence data and variant calls) within the context of reference genomes and make it available through the Ensembl user interfaces

    Assembly and validation of the genome of the nonmodel basal angiosperm Amborella

    No full text
    Genome sequencing with next-generation sequence (NGS) technologies can now be applied to organisms pivotal to addressing fundamental biological questions, but with genomes previously considered intractable or too expensive to undertake. However, for species with large and complex genomes, extensive genetic and physical map resources have, until now, been required to direct the sequencing effort and sequence assembly. As these resources are unavailable for most species, assembling high-quality genome sequences from NGS data remains challenging. We describe a strategy that uses NGS, fluorescence in situ hybridization, and whole-genome mapping to assemble a high-quality genome sequence for Amborella trichopoda, a nonmodel species crucial to understanding flowering plant evolution. These methods are applicable to many other organisms with limited genomic resources

    TransPLANT resources for triticeae genomic data

    No full text
    The genome sequences of many important Triticeae species, including bread wheat (Triticum aestivum L.) and barley (Hordeum vulgare L.), remained uncharacterized for a long time because their high repeat content, large sizes, and polyploidy. As a result of improvements in sequencing technologies and novel analyses strategies, several of these have recently been deciphered. These efforts have generated new insights into Triticeae biology and genome organization and have important implications for downstream usage by breeders, experimental biologists, and comparative genomicists. transPLANT (http://www.transplantdb.eu) is an EU-funded project aimed at constructing hardware, software, and data infrastructure for genome-scale research in the life sciences. Since the Triticeae data are intrinsically complex, heterogenous, and distributed, the transPLANT consortium has undertaken efforts to develop common data formats and tools that enable the exchange and integration of data from distributed resources. Here we present an overview of the individual Triticeae genome resources hosted by transPLANT partners, introduce the objectives of transPLANT, and outline common developments and interfaces supporting integrated data access
    corecore