713 research outputs found

    EST-PAC a web package for EST annotation and protein sequence prediction

    Get PDF
    With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST) from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST) annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1) searching local or remote biological databases for sequence similarities using Blast services, 2) predicting protein coding sequence from EST data and, 3) annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics

    EST-PAC HPC - a web portal for high-throughput EST annotation and protein sequence prediction

    Full text link
    Expressed Sequence Tags (ESTs) are short DNA sequences generated by sequencing the transcribed cDNAs coming from a gene expression. They can provide significant functional, structural and evolutionary information and thus are a primary resource for gene discovery. EST annotation basically refers to the analysis of unknown ESTs that can be performed by database similarity search for possible identities and database search for functional prediction of translation products. Such kind of annotation typically consists of a series of repetitive tasks which should be automated, and be customizable and amenable to using distributed computing resources. Furthermore, processing of EST data should be done efficiently using a high performance computing platform. In this paper, we describe an EST annotator, EST-PACHPC, which has been developed for harnessing HPC resources potentially from Grid and Cloud systems for high throughput EST annotations. The performance analysis of EST-PACHPC has shown that it provides substantial performance gain in EST annotation.<br /

    MammoSapiens: eResearch of the lactation program.

    Full text link
    Delivering bioinformatics power to life science researchers inevitably runs into problems of limited computing resources in the context of exponentially increasing data sources, access time, costs, lack of skills and, rapidly evolving technology and software tools with poorly defined standards. In this context the development of e-facilities to best enable collaborative research often needs to be customized to specific project applications in close cooperation with the experimentalist users and, to be concerned with the storage and management of results to allow more consistency and traceability of e-results on a broad access data mining platform. Here we showcase an internet based eResearch platform using the PHP/MySQL paradigm for the collaborative, integrative and comparative analysis of lactation related gene sequences and gene expression experiments to support lactation research. We also illustrate how these resources are used, how they enable research by allowing meta-analysis of data and results and, how the bottom-up development of customized eResearch components can lead to the production of more generic functional software tools and eResearch environments for deployment to a larger number of biological research users working on other bio-systems.<br /

    MammoSapiens: eResearch of the lactation program. Building online facilities for collaborative molecular and evolutionary analysis of lactation and other biological systems from gene sequences and gene expression data.

    Get PDF
    Delivering bioinformatics power to life science researchers inevitably runs into problems of limited computing resources in the context of exponentially increasing data sources, access time, costs, lack of skills and, rapidly evolving technology and software tools with poorly defined standards. In this context the development of online facilities to best enable collaborative research often needs to be customized to specific project applications in close cooperation with the experimentalist users and, to be concerned with the storage and management of results to allow more consistency and traceability of results on a broad access data mining platform. Here we showcase an Internet based research platform using the PHP/MySQL paradigm for the collaborative, integrative and comparative analysis of lactation related gene sequences and gene expression experiments to support lactation research. We also illustrate how these resources are used, how they enable research by allowing meta-analysis of data and results and, how the bottom-up development of customized eResearch components can lead to the production of more generic functional software tools and eResearch environments for deployment to a larger number of biological researchers working on other bio-systems

    Improving the Caenorhabditis elegans Genome Annotation Using Machine Learning

    Get PDF
    For modern biology, precise genome annotations are of prime importance, as they allow the accurate definition of genic regions. We employ state-of-the-art machine learning methods to assay and improve the accuracy of the genome annotation of the nematode Caenorhabditis elegans. The proposed machine learning system is trained to recognize exons and introns on the unspliced mRNA, utilizing recent advances in support vector machines and label sequence learning. In 87% (coding and untranslated regions) and 95% (coding regions only) of all genes tested in several out-of-sample evaluations, our method correctly identified all exons and introns. Notably, only 37% and 50%, respectively, of the presently unconfirmed genes in the C. elegans genome annotation agree with our predictions, thus we hypothesize that a sizable fraction of those genes are not correctly annotated. A retrospective evaluation of the Wormbase WS120 annotation [1] of C. elegans reveals that splice form predictions on unconfirmed genes in WS120 are inaccurate in about 18% of the considered cases, while our predictions deviate from the truth only in 10%–13%. We experimentally analyzed 20 controversial genes on which our system and the annotation disagree, confirming the superiority of our predictions. While our method correctly predicted 75% of those cases, the standard annotation was never completely correct. The accuracy of our system is further corroborated by a comparison with two other recently proposed systems that can be used for splice form prediction: SNAP and ExonHunter. We conclude that the genome annotation of C. elegans and other organisms can be greatly enhanced using modern machine learning technology

    Simplifying gene expression microarray comparative analysis.

    Full text link
    Gene Expression Comparative Analysis allows bioinformatics researchers to discover the conserved or specific functional regulation of genes. This is achieved through comparisons between quantitative gene expression measurements obtained in different species on different platforms to address a particular biological system. Comparisons are made more difficult due to the need to map orthologous genes between species, pre-processing of data (normalization) and post-analysis (statistical and correlation analysis). In this paper we introduce a web-based software package called EXP-PAC which provides on line interfaces for database construction and query of data, and makes use of a high performance computing platform of computer clusters to run gene sequence mapping and normalization methods in parallel. Thus, EXP-PAC facilitates the integration of gene expression data for comparative analysis and the online sharing, retrieval and visualization of complex multi-specific and multi-platform gene expression results.<br /

    Rice Annotation Database (RAD): a contig-oriented database for map-based rice genomics

    Get PDF
    A contig-oriented database for annotation of the rice genome has been constructed to facilitate map-based rice genomics. The Rice Annotation Database has the following functional features: (i) extensive effort of manual annotations of P1-derived artificial chromosome/bacterial artificial chromosome clones can be merged at chromosome and contig-level; (ii) concise visualization of the annotation information such as the predicted genes, results of various prediction programs (RiceHMM, Genscan, Genscan+, Fgenesh, GeneMark, etc.), homology to expressed sequence tag, full-length cDNA and protein; (iii) user-friendly clone / gene query system; (iv) download functions for nucleotide, amino acid and coding sequences; (v) analysis of various features of the genome (GC-content, average value, etc.); and (vi) genome-wide homology search (BLAST) of contig- and chromosome-level genome sequence to allow comparative analysis with the genome sequence of other organisms. As of October 2004, the database contains a total of 215 Mb sequence with relevant annotation results including 30 000 manually curated genes. The database can provide the latest information on manual annotation as well as a comprehensive structural analysis of various features of the rice genome. The database can be accessed at http://rad.dna.affrc.go.jp/

    Sma3s: A three-step modular annotator for large sequence datasets

    Get PDF
    This is an Open Access article distributed under the terms of the Creative Commons Attribution License.Automatic sequence annotation is an essential component of modern 'omics' studies, which aim to extract information from large collections of sequence data. Most existing tools use sequence homology to establish evolutionary relationships and assign putative functions to sequences. However, it can be difficult to define a similarity threshold that achieves sufficient coverage without sacrificing annotation quality. Defining the correct configuration is critical and can be challenging for non-specialist users. Thus, the development of robust automatic annotation techniques that generate high-quality annotations without needing expert knowledge would be very valuable for the research community. We present Sma3s, a tool for automatically annotating very large collections of biological sequences from any kind of gene library or genome. Sma3s is composed of three modules that progressively annotate query sequences using either: (i) very similar homologues, (ii) orthologous sequences or (iii) terms enriched in groups of homologous sequences. We trained the system using several random sets of known sequences, demonstrating average sensitivity and specificity values of ∼85%. In conclusion, Sma3s is a versatile tool for high-throughput annotation of a wide variety of sequence datasets that outperforms the accuracy of other well-established annotation algorithms, and it can enrich existing database annotations and uncover previously hidden features. Importantly, Sma3s has already been used in the functional annotation of two published transcriptomes.This work has been partially financed by the National Institute for Bioinformatics (www.inab.org), a platform of Genoma España and the EC project ‘Advancing Clinico-Genomic Trials on Cancer’ (contract no. 026996).Peer Reviewe

    Lactation transcriptomics in the Australian marsupial, Macropus eugenii: transcript sequencing and quantification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Lactation is an important aspect of mammalian biology and, amongst mammals, marsupials show one of the most complex lactation cycles. Marsupials, such as the tammar wallaby (<it>Macropus eugenii</it>) give birth to a relatively immature newborn and progressive changes in milk composition and milk production regulate early stage development of the young.</p> <p>Results</p> <p>In order to investigate gene expression in the marsupial mammary gland during lactation, a comprehensive set of cDNA libraries was derived from lactating tissues throughout the lactation cycle of the tammar wallaby. A total of 14,837 express sequence tags were produced by cDNA sequencing. Sequence analysis and sequence assembly were used to construct a comprehensive catalogue of mammary transcripts.</p> <p>Sequence data from pregnant and early or late lactating specific cDNA libraries and, data from early or late lactation massively parallel sequencing strategies were combined to analyse the variation of milk protein gene expression during the lactation cycle.</p> <p>Conclusion</p> <p>Results show a steady increase in expression of genes coding for secreted protein during the lactation cycle that is associated with high proportion of transcripts coding for milk proteins. In addition, genes involved in immune function, translation and energy or anabolic metabolism are expressed across the lactation cycle. A number of potential new milk proteins or mammary gland remodelling markers, including noncoding RNAs have been identified.</p
    • …
    corecore