213 research outputs found

    NexGenEx-Tom: A gene expression platform to investigate the functionalities of the tomato genome

    Get PDF
    BACKGROUND: Next Generation Sequencing technologies (NGS) unexpectedly pushed forward the capability of solving genome organization and of widely depicting gene expression. However, although the flourishing of tools to process the NGS data, versatile and user-friendly computational environments for integrative and comparative analyses of the results from the increasing amount of collections are still required. The gene expression of tomato tissues has been widely investigated in the years, thanks to both EST sequencing and different microarray platforms. However, the resulting collections are heterogeneous in terms of experimental approaches, genotypes and conditions, making the data far from representing a gene expression atlas for the species. Therefore, the recent release of NGS transcriptome collections from several tissues and stages from physiological conditions for specific tomato genotypes provides a relevant resource to be appropriately exploited to address key questions on gene expression patterns, such as those related to fruit ripening and development in tomato. The organization of the results from the processed collections in web accessible environments, enriched with tools for their exploration, may represent a precious opportunity for the scientific research in tomato and a reference example for similar efforts. DESCRIPTION: Here we present the architecture and the facilities of NexGenEx-, a web based platform that offers processed NGS transcriptome collections and enables immediate analyses of the results. The platform allows gene expression investigations, profiling and comparisons, and exploits different resources. Specifically, we present here the platform partition NexGenEx-Tom, dedicated to the organization of results from tomato NGS based transcriptomes. CONCLUSION: In the current version, NexGenEx-Tom includes processed and normalized NGS expression data from three collections covering several tissue/stages from different genotypes. Beyond providing a user-friendly interface, the platform was designed with the aim to easily be expanded to include other NGS based transcriptome collections. It can also integrate different genome releases, possibly from different cultivars or genotypes, but even from different species. The platform is proposed as an example effort in tomato, and is described as a profitable approach for the exploitation of these challenging and precious datasets

    Transcriptologs: A Transcriptome-Based Approach to Predict Orthology Relationships

    Get PDF
    The detection of orthologs is a key approach in genomics, useful to understand gene evolution and phylogenetic relationships and essential for gene function prediction. However, a reliable annotation of the encoded protein regions is still a limiting aspect in genomics, mainly due to the lack of confirmatory experimental evidence at proteome level. Nevertheless, the current ortholog collections are generally based on protein sequence comparisons, in addition to the availability of large transcriptome sequence collections. We developed Transcriptologs , a method for the prediction of orthologs based on similarities of translated fragments from messenger RNAs of 2 species. We implemented a procedure to extend BLAST-based alignments and to define orthologs based on the Bidirectional Best Hit approach. Results from a test case on Arabidopsis thaliana and Sorghum bicolor transcript collections revealed in some cases outperformance of Transcriptologs in comparison with a classical protein-based analysis in terms of alignment quality, revealing similarities otherwise not detectable

    ParPEST: a pipeline for EST data analysis based on parallel computing

    Get PDF
    Background Expressed Sequence Tags (ESTs) are short and error-prone DNA sequences generated from the 5' and 3' ends of randomly selected cDNA clones. They provide an important resource for comparative and functional genomic studies and, moreover, represent a reliable information for the annotation of genomic sequences. Because of the advances in biotechnologies, ESTs are daily determined in the form of large datasets. Therefore, suitable and efficient bioinformatic approaches are necessary to organize data related information content for further investigations. Results We implemented ParPEST (Parallel Processing of ESTs), a pipeline based on parallel computing for EST analysis. The results are organized in a suitable data warehouse to provide a starting point to mine expressed sequence datasets. The collected information is useful for investigations on data quality and on data information content, enriched also by a preliminary functional annotation. Conclusion The pipeline presented here has been developed to perform an exhaustive and reliable analysis on EST data and to provide a curated set of information based on a relational database. Moreover, it is designed to reduce execution time of the specific steps required for a complete analysis using distributed processes and parallelized software. It is conceived to run on low requiring hardware components, to fulfill increasing demand, typical of the data used, and scalability at affordable cost

    Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome

    Get PDF
    Background: The structure annotation of a genome is based either on ab initio methodologies or on similaritiy searches versus molecules that have been already annotated. Ab initio gene predictions in a genome are based on a priori knowledge of species-specific features of genes. The training of ab initio gene finders is based on the definition of a data-set of gene models. To accomplish this task the common approach is to align species-specific full length cDNA and EST sequences along the genomic sequences in order to define exon/intron structure of mRNA coding genes. Results: GeneModelEST is the software here proposed for defining a data-set of candidate gene models using exclusively evidence derived from cDNA/EST sequences. GeneModelEST requires the genome coordinates of the spliced-alignments of ESTs and of contigs (tentative consensus sequences) generated by an EST clustering/assembling procedure to be formatted in a General Feature Format (GFF) standard file. Moreover, the alignments of the contigs versus a protein database are required as an NCBI BLAST formatted report file. The GeneModelEST analysis aims to i) evaluate each exon as defined from contig spliced alignments onto the genome sequence; ii) classify the contigs according to quality levels in order to select candidate gene models; iii) assign to the candidate gene models preliminary functional annotations. We discuss the application of the proposed methodology to build a data-set of gene models of Solanum lycopersicum, whose genome sequencing is an ongoing effort by the International Tomato Genome Sequencing Consortium. Conclusion: The contig classification procedure used by GeneModelEST supports the detection of candidate gene models, the identification of potential alternative transcripts and it is useful to filter out ambiguous information. An automated procedure, such as the one proposed here, is fundamental to support large scale analysis in order to provide species-specific gene models, that could be useful as a training data-set for ab initio gene finders and/or as a reference gene list for a human curated annotation

    ParPEST: a pipeline for EST data analysis based on parallel computing

    Get PDF
    BACKGROUND: Expressed Sequence Tags (ESTs) are short and error-prone DNA sequences generated from the 5' and 3' ends of randomly selected cDNA clones. They provide an important resource for comparative and functional genomic studies and, moreover, represent a reliable information for the annotation of genomic sequences. Because of the advances in biotechnologies, ESTs are daily determined in the form of large datasets. Therefore, suitable and efficient bioinformatic approaches are necessary to organize data related information content for further investigations. RESULTS: We implemented ParPEST (Parallel Processing of ESTs), a pipeline based on parallel computing for EST analysis. The results are organized in a suitable data warehouse to provide a starting point to mine expressed sequence datasets. The collected information is useful for investigations on data quality and on data information content, enriched also by a preliminary functional annotation. CONCLUSION: The pipeline presented here has been developed to perform an exhaustive and reliable analysis on EST data and to provide a curated set of information based on a relational database. Moreover, it is designed to reduce execution time of the specific steps required for a complete analysis using distributed processes and parallelized software. It is conceived to run on low requiring hardware components, to fulfill increasing demand, typical of the data used, and scalability at affordable costs

    SolEST database: a "one-stop shop" approach to the study of Solanaceae transcriptomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since no genome sequences of solanaceous plants have yet been completed, expressed sequence tag (EST) collections represent a reliable tool for broad sampling of <it>Solanaceae </it>transcriptomes, an attractive route for understanding <it>Solanaceae </it>genome functionality and a powerful reference for the structural annotation of emerging <it>Solanaceae </it>genome sequences.</p> <p>Description</p> <p>We describe the SolEST database <url>http://biosrv.cab.unina.it/solestdb</url> which integrates different EST datasets from both cultivated and wild <it>Solanaceae </it>species and from two species of the genus <it>Coffea</it>. Background as well as processed data contained in the database, extensively linked to external related resources, represent an invaluable source of information for these plant families. Two novel features differentiate SolEST from other resources: i) the option of accessing and then visualizing <it>Solanaceae </it>EST/TC alignments along the emerging tomato and potato genome sequences; ii) the opportunity to compare different <it>Solanaceae </it>assemblies generated by diverse research groups in the attempt to address a common complaint in the SOL community.</p> <p>Conclusion</p> <p>Different databases have been established worldwide for collecting <it>Solanaceae </it>ESTs and are related in concept, content and utility to the one presented herein. However, the SolEST database has several distinguishing features that make it appealing for the research community and facilitates a "one-stop shop" for the study of <it>Solanaceae </it>transcriptomes.</p

    An EST database from saffron stigmas

    Get PDF
    BACKGROUND: Saffron (Crocus sativus L., Iridaceae) flowers have been used as a spice and medicinal plant ever since the Greek-Minoan civilization. The edible part - the stigmas - are commonly considered the most expensive spice in the world and are the site of a peculiar secondary metabolism, responsible for the characteristic color and flavor of saffron. RESULTS: We produced 6,603 high quality Expressed Sequence Tags (ESTs) from a saffron stigma cDNA library. This collection is accessible and searchable through the Saffron Genes database http://www.saffrongenes.org. The ESTs have been grouped into 1,893 Clusters, each corresponding to a different expressed gene, and annotated. The complete set of raw EST sequences, as well as of their electopherograms, are maintained in the database, allowing users to investigate sequence qualities and EST structural features (vector contamination, repeat regions). The saffron stigma transcriptome contains a series of interesting sequences (putative sex determination genes, lipid and carotenoid metabolism enzymes, transcription factors). CONCLUSION: The Saffron Genes database represents the first reference collection for the genomics of Iridaceae, for the molecular biology of stigma biogenesis, as well as for the metabolic pathways underlying saffron secondary metabolism

    TomatEST database: in silico exploitation of EST data to explore expression patterns in tomato species

    Get PDF
    TomatEST is a secondary database integrating expressed sequence tag (EST)/cDNA sequence information from different libraries of multiple tomato species. Redundant EST collections from each species are organized into clusters (gene indices). A cluster consists of one or multiple contigs. Multiple contigs in a cluster represent alternatively transcribed forms of a gene. The set of stand-alone EST sequences (singletons) and contigs, representing all the computationally defined ‘Transcript Indices’, are annotated according to similarity versus protein and RNA family databases. Sequence function description is integrated with the Gene Ontologies and the Enzyme Commission identifiers for a standard classification of gene products and for the mapping of the expressed sequences onto metabolic pathways. Information on the origin of the ESTs, on their structural features, on clusters and contigs, as well as on functional annotations are accessible via a user-friendly web interface. Specific facilities in the database allow Transcript Indices from a query be automatically classified in Enzyme classes and in metabolic pathways. The ‘on the fly’ mapping onto the metabolic maps is integrated in the analytical tools. The TomatEST database website is freely available at
    • …
    corecore