Search CORE

Archivio della ricerca - Università degli studi di Napoli Federico II

Transcriptologs: A Transcriptome-Based Approach to Predict Orthology Relationships

Author: AMBROSINO LUCA
CHIUSANO MARIA LUISA
Publication venue: 'SAGE Publications'
Publication date: 01/01/2017
Field of study

The detection of orthologs is a key approach in genomics, useful to understand gene evolution and phylogenetic relationships and essential for gene function prediction. However, a reliable annotation of the encoded protein regions is still a limiting aspect in genomics, mainly due to the lack of confirmatory experimental evidence at proteome level. Nevertheless, the current ortholog collections are generally based on protein sequence comparisons, in addition to the availability of large transcriptome sequence collections. We developed Transcriptologs , a method for the prediction of orthologs based on similarities of translated fragments from messenger RNAs of 2 species. We implemented a procedure to extend BLAST-based alignments and to define orthologs based on the Bidirectional Best Hit approach. Results from a test case on Arabidopsis thaliana and Sorghum bicolor transcript collections revealed in some cases outperformance of Transcriptologs in comparison with a classical protein-based analysis in terms of alignment quality, revealing similarities otherwise not detectable

Directory of Open Access Journals

Archivio della ricerca - Università degli studi di Napoli Federico II

ParPEST: a pipeline for EST data analysis based on parallel computing

Author: AVERSANO M
CHIUSANO MARIA LUISA
D'AGOSTINO N
Publication venue
Publication date: 01/01/2005
Field of study

Background Expressed Sequence Tags (ESTs) are short and error-prone DNA sequences generated from the 5' and 3' ends of randomly selected cDNA clones. They provide an important resource for comparative and functional genomic studies and, moreover, represent a reliable information for the annotation of genomic sequences. Because of the advances in biotechnologies, ESTs are daily determined in the form of large datasets. Therefore, suitable and efficient bioinformatic approaches are necessary to organize data related information content for further investigations. Results We implemented ParPEST (Parallel Processing of ESTs), a pipeline based on parallel computing for EST analysis. The results are organized in a suitable data warehouse to provide a starting point to mine expressed sequence datasets. The collected information is useful for investigations on data quality and on data information content, enriched also by a preliminary functional annotation. Conclusion The pipeline presented here has been developed to perform an exhaustive and reliable analysis on EST data and to provide a curated set of information based on a relational database. Moreover, it is designed to reduce execution time of the specific steps required for a complete analysis using distributed processes and parallelized software. It is conceived to run on low requiring hardware components, to fulfill increasing demand, typical of the data used, and scalability at affordable cost

Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome

Author: Chiusano Maria Luisa
D'Agostino Nunzio and Traini
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Background: The structure annotation of a genome is based either on ab initio methodologies or on similaritiy searches versus molecules that have been already annotated. Ab initio gene predictions in a genome are based on a priori knowledge of species-specific features of genes. The training of ab initio gene finders is based on the definition of a data-set of gene models. To accomplish this task the common approach is to align species-specific full length cDNA and EST sequences along the genomic sequences in order to define exon/intron structure of mRNA coding genes. Results: GeneModelEST is the software here proposed for defining a data-set of candidate gene models using exclusively evidence derived from cDNA/EST sequences. GeneModelEST requires the genome coordinates of the spliced-alignments of ESTs and of contigs (tentative consensus sequences) generated by an EST clustering/assembling procedure to be formatted in a General Feature Format (GFF) standard file. Moreover, the alignments of the contigs versus a protein database are required as an NCBI BLAST formatted report file. The GeneModelEST analysis aims to i) evaluate each exon as defined from contig spliced alignments onto the genome sequence; ii) classify the contigs according to quality levels in order to select candidate gene models; iii) assign to the candidate gene models preliminary functional annotations. We discuss the application of the proposed methodology to build a data-set of gene models of Solanum lycopersicum, whose genome sequencing is an ongoing effort by the International Tomato Genome Sequencing Consortium. Conclusion: The contig classification procedure used by GeneModelEST supports the detection of candidate gene models, the identification of potential alternative transcripts and it is useful to filter out ambiguous information. An automated procedure, such as the one proposed here, is fundamental to support large scale analysis in order to provide species-specific gene models, that could be useful as a training data-set for ab initio gene finders and/or as a reference gene list for a human curated annotation

Archivio della ricerca - Università degli studi di Napoli Federico II

ParPEST: a pipeline for EST data analysis based on parallel computing

Author: Aversano Mario
Chiusano Maria Luisa
D'Agostino Nunzio
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Expressed Sequence Tags (ESTs) are short and error-prone DNA sequences generated from the 5' and 3' ends of randomly selected cDNA clones. They provide an important resource for comparative and functional genomic studies and, moreover, represent a reliable information for the annotation of genomic sequences. Because of the advances in biotechnologies, ESTs are daily determined in the form of large datasets. Therefore, suitable and efficient bioinformatic approaches are necessary to organize data related information content for further investigations. RESULTS: We implemented ParPEST (Parallel Processing of ESTs), a pipeline based on parallel computing for EST analysis. The results are organized in a suitable data warehouse to provide a starting point to mine expressed sequence datasets. The collected information is useful for investigations on data quality and on data information content, enriched also by a preliminary functional annotation. CONCLUSION: The pipeline presented here has been developed to perform an exhaustive and reliable analysis on EST data and to provide a curated set of information based on a relational database. Moreover, it is designed to reduce execution time of the specific steps required for a complete analysis using distributed processes and parallelized software. It is conceived to run on low requiring hardware components, to fulfill increasing demand, typical of the data used, and scalability at affordable costs

Archivio della ricerca - Università degli studi di Napoli Federico II

Archivio della ricerca - Università degli studi di Napoli Federico II

SolEST database: a "one-stop shop" approach to the study of Solanaceae transcriptomes

Author: Chiusano Maria Luisa
D'Agostino Nunzio
Frusciante Luigi
Traini Alessandra
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Since no genome sequences of solanaceous plants have yet been completed, expressed sequence tag (EST) collections represent a reliable tool for broad sampling of <it>Solanaceae </it>transcriptomes, an attractive route for understanding <it>Solanaceae </it>genome functionality and a powerful reference for the structural annotation of emerging <it>Solanaceae </it>genome sequences. Description We describe the SolEST database <url>http://biosrv.cab.unina.it/solestdb</url> which integrates different EST datasets from both cultivated and wild <it>Solanaceae </it>species and from two species of the genus <it>Coffea</it>. Background as well as processed data contained in the database, extensively linked to external related resources, represent an invaluable source of information for these plant families. Two novel features differentiate SolEST from other resources: i) the option of accessing and then visualizing <it>Solanaceae </it>EST/TC alignments along the emerging tomato and potato genome sequences; ii) the opportunity to compare different <it>Solanaceae </it>assemblies generated by diverse research groups in the attempt to address a common complaint in the SOL community. Conclusion Different databases have been established worldwide for collecting <it>Solanaceae </it>ESTs and are related in concept, content and utility to the one presented herein. However, the SolEST database has several distinguishing features that make it appealing for the research community and facilitates a "one-stop shop" for the study of <it>Solanaceae </it>transcriptomes.</p

Directory of Open Access Journals

Gene models from ESTs (GeneModelEST): an application on the Solanum lycopersicum genome

Author: Chiusano Maria Luisa
D'Agostino Nunzio
Frusciante Luigi
Traini Alessandra
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

An EST database from saffron stigmas

Author: Chiusano Maria Luisa
D'Agostino Nunzio
Giuliano Giovanni
Pizzichini Daniele
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

BACKGROUND: Saffron (Crocus sativus L., Iridaceae) flowers have been used as a spice and medicinal plant ever since the Greek-Minoan civilization. The edible part - the stigmas - are commonly considered the most expensive spice in the world and are the site of a peculiar secondary metabolism, responsible for the characteristic color and flavor of saffron. RESULTS: We produced 6,603 high quality Expressed Sequence Tags (ESTs) from a saffron stigma cDNA library. This collection is accessible and searchable through the Saffron Genes database http://www.saffrongenes.org. The ESTs have been grouped into 1,893 Clusters, each corresponding to a different expressed gene, and annotated. The complete set of raw EST sequences, as well as of their electopherograms, are maintained in the database, allowing users to investigate sequence qualities and EST structural features (vector contamination, repeat regions). The saffron stigma transcriptome contains a series of interesting sequences (putative sex determination genes, lipid and carotenoid metabolism enzymes, transcription factors). CONCLUSION: The Saffron Genes database represents the first reference collection for the genomics of Iridaceae, for the molecular biology of stigma biogenesis, as well as for the metabolic pathways underlying saffron secondary metabolism

Archivio della ricerca - Università degli studi di Napoli Federico II

TomatEST database: in silico exploitation of EST data to explore expression patterns in tomato species

Author: Aversano Mario
Chiusano Maria Luisa
D'Agostino Nunzio
Frusciante Luigi
Publication venue: Oxford University Press
Publication date: 16/11/2006
Field of study

TomatEST is a secondary database integrating expressed sequence tag (EST)/cDNA sequence information from different libraries of multiple tomato species. Redundant EST collections from each species are organized into clusters (gene indices). A cluster consists of one or multiple contigs. Multiple contigs in a cluster represent alternatively transcribed forms of a gene. The set of stand-alone EST sequences (singletons) and contigs, representing all the computationally defined ‘Transcript Indices’, are annotated according to similarity versus protein and RNA family databases. Sequence function description is integrated with the Gene Ontologies and the Enzyme Commission identifiers for a standard classification of gene products and for the mapping of the expressed sequences onto metabolic pathways. Information on the origin of the ESTs, on their structural features, on clusters and contigs, as well as on functional annotations are accessible via a user-friendly web interface. Specific facilities in the database allow Transcript Indices from a query be automatically classified in Enzyme classes and in metabolic pathways. The ‘on the fly’ mapping onto the metabolic maps is integrated in the analytical tools. The TomatEST database website is freely available at