5 research outputs found

    Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome

    Get PDF
    Background: The structure annotation of a genome is based either on ab initio methodologies or on similaritiy searches versus molecules that have been already annotated. Ab initio gene predictions in a genome are based on a priori knowledge of species-specific features of genes. The training of ab initio gene finders is based on the definition of a data-set of gene models. To accomplish this task the common approach is to align species-specific full length cDNA and EST sequences along the genomic sequences in order to define exon/intron structure of mRNA coding genes. Results: GeneModelEST is the software here proposed for defining a data-set of candidate gene models using exclusively evidence derived from cDNA/EST sequences. GeneModelEST requires the genome coordinates of the spliced-alignments of ESTs and of contigs (tentative consensus sequences) generated by an EST clustering/assembling procedure to be formatted in a General Feature Format (GFF) standard file. Moreover, the alignments of the contigs versus a protein database are required as an NCBI BLAST formatted report file. The GeneModelEST analysis aims to i) evaluate each exon as defined from contig spliced alignments onto the genome sequence; ii) classify the contigs according to quality levels in order to select candidate gene models; iii) assign to the candidate gene models preliminary functional annotations. We discuss the application of the proposed methodology to build a data-set of gene models of Solanum lycopersicum, whose genome sequencing is an ongoing effort by the International Tomato Genome Sequencing Consortium. Conclusion: The contig classification procedure used by GeneModelEST supports the detection of candidate gene models, the identification of potential alternative transcripts and it is useful to filter out ambiguous information. An automated procedure, such as the one proposed here, is fundamental to support large scale analysis in order to provide species-specific gene models, that could be useful as a training data-set for ab initio gene finders and/or as a reference gene list for a human curated annotation

    ISOL@: an Italian SOLAnaceae genomics resource

    Get PDF
    BACKGROUND: Present-day '-omics' technologies produce overwhelming amounts of data which include genome sequences, information on gene expression (transcripts and proteins) and on cell metabolic status. These data represent multiple aspects of a biological system and need to be investigated as a whole to shed light on the mechanisms which underpin the system functionality.The gathering and convergence of data generated by high-throughput technologies, the effective integration of different data-sources and the analysis of the information content based on comparative approaches are key methods for meaningful biological interpretations.In the frame of the International Solanaceae Genome Project, we propose here ISOLA, an Italian SOLAnaceae genomics resource. RESULTS: ISOLA (available at http://biosrv.cab.unina.it/isola) represents a trial platform and it is conceived as a multi-level computational environment.ISOLA currently consists of two main levels: the genome and the expression level. The cornerstone of the genome level is represented by the Solanum lycopersicum genome draft sequences generated by the International Tomato Genome Sequencing Consortium. Instead, the basic element of the expression level is the transcriptome information from different Solanaceae species, mainly in the form of species-specific comprehensive collections of Expressed Sequence Tags (ESTs).The cross-talk between the genome and the expression levels is based on data source sharing and on tools that enhance data quality, that extract information content from the levels' under parts and produce value-added biological knowledge. CONCLUSIONS: ISOLA is the result of a bioinformatics effort that addresses the challenges of the post-genomics era. It is designed to exploit '-omics' data based on effective integration to acquire biological knowledge and to approach a systems biology view. Beyond providing experimental biologists with a preliminary annotation of the tomato genome, this effort aims to produce a trial computational environment where different aspects and details are maintained as they are relevant for the analysis of the organization, the functionality and the evolution of the Solanaceae family

    Structural and Functional Genomics of Tomato

    Get PDF
    Tomato (Solanum lycopersicum L.) is the most intensively investigated Solanaceous species both in genetic and genomics studies. It is a diploid species with a haploid set of 12 chromosomes and a small genome (950 Mb). Based on the detailed knowledge on tomato structural genomics, the sequencing of the euchromatic regions started in the year 2005 as a common effort of different countries. The manuscript focuses on markers used for tomato, on mapping efforts mainly based on exploitation of natural biodiversity, and it gives an updated report on the international sequencing activities. The principal tools developed to explore the function of tomato genes are also summarized, including mutagenesis, genetic transformation, and transcriptome analysis. The current progress in bioinformatic strategies available to manage the overwhelming amount of data generated from different tomato “omics” approaches is reported, and emphasis is given to the effort of producing a computational workbench for the analysis of the organization, as well as the functionality and evolution of the Solanaceae family

    Gene Models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome.

    No full text
    Background: The structure annotation of a genome is based either on ab initio methodologies or on similaritiy searches versus molecules that have been already annotated. Ab initio gene predictions in a genome are based on a priori knowledge of species-specific features of genes. The training of ab initio gene finders is based on the definition of a data-set of gene models. To accomplish this task the common approach is to align species-specific full length cDNA and EST sequences along the genomic sequences in order to define exon/intron structure of mRNA coding genes.Results: GeneModelEST is the software here proposed for defining a data-set of candidate gene models using exclusively evidence derived from cDNA/EST sequences.GeneModelEST requires the genome coordinates of the spliced-alignments of ESTs and of contigs (tentative consensus sequences) generated by an EST clustering/assembling procedure to be formatted in a General Feature Format (GFF) standard file. Moreover, the alignments of the contigs versus a protein database are required as an NCBI BLAST formatted report file.The GeneModelEST analysis aims to i) evaluate each exon as defined from contig spliced alignments onto the genome sequence; ii) classify the contigs according to quality levels in order to select candidate gene models; iii) assign to the candidate gene models preliminary functional annotations.We discuss the application of the proposed methodology to build a data-set of gene models of Solanum lycopersicum, whose genome sequencing is an ongoing effort by the International Tomato Genome Sequencing Consortium.Conclusion: The contig classification procedure used by GeneModelEST supports the detection of candidate gene models, the identification of potential alternative transcripts and it is useful to filter out ambiguous information. An automated procedure, such as the one proposed here, is fundamental to support large scale analysis in order to provide species-specific gene models, that could be useful as a training data-set for ab initio gene finders and/or as a reference gene list for a human curated annotation
    corecore