65 research outputs found

    High quality draft sequences for prokaryotic genomes using a mix of new sequencing technologies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Massively parallel DNA sequencing instruments are enabling the decoding of whole genomes at significantly lower cost and higher throughput than classical Sanger technology. Each of these technologies have been estimated to yield assemblies with more problematic features than the standard method. These problems are of a different nature depending on the techniques used. So, an appropriate mix of technologies may help resolve most difficulties, and eventually provide assemblies of high quality without requiring any Sanger-based input.</p> <p>Results</p> <p>We compared assemblies obtained using Sanger data with those from different inputs from New Sequencing Technologies. The assemblies were systematically compared with a reference finished sequence. We found that the 454 GSFLX can efficiently produce high continuity when used at high coverage. The potential to enhance continuity by scaffolding was tested using 454 sequences from circularized genomic fragments. Finally, we explore the use of Solexa-Illumina short reads to polish the genome draft by implementing a technique to correct 454 consensus errors.</p> <p>Conclusion</p> <p>High quality drafts can be produced for small genomes without any Sanger data input. We found that 454 GSFLX and Solexa/Illumina show great complementarity in producing large contigs and supercontigs with a low error rate.</p

    Pilot Anopheles gambiae full-length cDNA study: sequencing and initial characterization of 35,575 clones

    Get PDF
    We describe the preliminary analysis of over 35,000 clones from a full-length enriched cDNA library from the malaria mosquito vector Anopheles gambiae. The clones define nearly 3,700 genes, of which around 2,600 significantly improve current gene definitions. An additional 17% of the genes were not previously annotated, suggesting that an equal percentage may be missing from the current Anopheles genome annotation

    The Eukaryote Genome Annotation Platform at Genoscope

    Get PDF
    The Genoscope annotation workflow for eukaryote genomes relies on evidence from ab initio gene models predictions combined with homology searches, using collections of expressed sequences - full length cDNAs, ESTs or massive-scale mRNA sequences from the same or closely related organisms &#x2013; proteins or other genomic sequences. Global analysis of these drafts or complete sequences are then combining both approaches in the form of gene prediction data integration using GAZE, capable to identify a majority of the existing gene features. Although of very good quality, gene-modelling remains still tentative at the end of the process. Even though computational predictors are useful on large scale annotation for global genomics analysis, there is no complete genome for which all gene structures, in terms of exons, introns and coding regions, have been experimentally confirmed.&#xd;&#xa;&#xd;&#xa;Finished genomes can provide exciting insights into the genome organization and evolution. Additional experimental data generated by genome sequencing projects give assistance to genome annotation aiming to a better understanding of the biology of the organism. Therefore, gene models and annotation can be improved by human curation to find errors or to resolve incongruous evidence on the automatic annotation of the genome. &#xd;&#xa;&#xd;&#xa;We now provide to collaborators carrying sequencing projects with a distributed annotation platform allowing expert evaluation of the annotation, in addition to our automated gene prediction pipeline.&#xd;&#xa;&#xd;&#xa;To ensure at most the participation of the scientific community, an annotation tool for revising annotations has been set up using components of the Generic Model Organism Database toolkit, which provides tools for managing organism databases. A CHADO database, linked to an Apollo graphical interface, permit users to correct gene structures and store them in a dedicated organism database, as we will show on a few examples. Such a tool would facilitate connecting and comparing predicted annotations with existing biological data, becoming the repository of complete annotated finished genome sequence

    A complete collection of single-gene deletion mutants of Acinetobacter baylyi ADP1

    Get PDF
    We have constructed a collection of single-gene deletion mutants for all dispensable genes of the soil bacterium Acinetobacter baylyi ADP1. A total of 2594 deletion mutants were obtained, whereas 499 (16%) were not, and are therefore candidate essential genes for life on minimal medium. This essentiality data set is 88% consistent with the Escherichia coli data set inferred from the Keio mutant collection profiled for growth on minimal medium, while 80% of the orthologous genes described as essential in Pseudomonas aeruginosa are also essential in ADP1. Several strategies were undertaken to investigate ADP1 metabolism by (1) searching for discrepancies between our essentiality data and current metabolic knowledge, (2) comparing this essentiality data set to those from other organisms, (3) systematic phenotyping of the mutant collection on a variety of carbon sources (quinate, 2-3 butanediol, glucose, etc.). This collection provides a new resource for the study of gene function by forward and reverse genetic approaches and constitutes a robust experimental data source for systems biology approaches

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    Assessing the Drosophila melanogaster and Anopheles gambiae Genome Annotations Using Genome-Wide Sequence Comparisons

    No full text
    We performed genome-wide sequence comparisons at the protein coding level between the genome sequences of Drosophila melanogaster and Anopheles gambiae. Such comparisons detect evolutionarily conserved regions (ecores) that can be used for a qualitative and quantitative evaluation of the available annotations of both genomes. They also provide novel candidate features for annotation. The percentage of ecores mapping outside annotations in the A. gambiae genome is about fourfold higher than in D. melanogaster. The A. gambiae genome assembly also contains a high proportion of duplicated ecores, possibly resulting from artefactual sequence duplications in the genome assembly. The occurrence of 4063 ecores in the D. melanogaster genome outside annotations suggests that some genes are not yet or only partially annotated. The present work illustrates the power of comparative genomics approaches towards an exhaustive and accurate establishment of gene models and gene catalogues in insect genomes
    corecore