13 research outputs found

    EST-PAC a web package for EST annotation and protein sequence prediction

    Get PDF
    With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST) from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST) annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1) searching local or remote biological databases for sequence similarities using Blast services, 2) predicting protein coding sequence from EST data and, 3) annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics

    EST Express: PHP/MySQL based automated annotation of ESTs from expression libraries

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Several biological techniques result in the acquisition of functional sets of cDNAs that must be sequenced and analyzed. The emergence of redundant databases such as UniGene and centralized annotation engines such as Entrez Gene has allowed the development of software that can analyze a great number of sequences in a matter of seconds.</p> <p>Results</p> <p>We have developed "EST Express", a suite of analytical tools that identify and annotate ESTs originating from specific mRNA populations. The software consists of a user-friendly GUI powered by PHP and MySQL that allows for online collaboration between researchers and continuity with UniGene, Entrez Gene and RefSeq. Two key features of the software include a novel, simplified Entrez Gene parser and tools to manage cDNA library sequencing projects. We have tested the software on a large data set (2,016 samples) produced by subtractive hybridization.</p> <p>Conclusion</p> <p>EST Express is an open-source, cross-platform web server application that imports sequences from cDNA libraries, such as those generated through subtractive hybridization or yeast two-hybrid screens. It then provides several layers of annotation based on Entrez Gene and RefSeq to allow the user to highlight useful genes and manage cDNA library projects.</p

    MammoSapiens: eResearch of the lactation program. Building online facilities for collaborative molecular and evolutionary analysis of lactation and other biological systems from gene sequences and gene expression data.

    Get PDF
    Delivering bioinformatics power to life science researchers inevitably runs into problems of limited computing resources in the context of exponentially increasing data sources, access time, costs, lack of skills and, rapidly evolving technology and software tools with poorly defined standards. In this context the development of online facilities to best enable collaborative research often needs to be customized to specific project applications in close cooperation with the experimentalist users and, to be concerned with the storage and management of results to allow more consistency and traceability of results on a broad access data mining platform. Here we showcase an Internet based research platform using the PHP/MySQL paradigm for the collaborative, integrative and comparative analysis of lactation related gene sequences and gene expression experiments to support lactation research. We also illustrate how these resources are used, how they enable research by allowing meta-analysis of data and results and, how the bottom-up development of customized eResearch components can lead to the production of more generic functional software tools and eResearch environments for deployment to a larger number of biological researchers working on other bio-systems

    MammoSapiens: eResearch of the lactation program.

    Full text link
    Delivering bioinformatics power to life science researchers inevitably runs into problems of limited computing resources in the context of exponentially increasing data sources, access time, costs, lack of skills and, rapidly evolving technology and software tools with poorly defined standards. In this context the development of e-facilities to best enable collaborative research often needs to be customized to specific project applications in close cooperation with the experimentalist users and, to be concerned with the storage and management of results to allow more consistency and traceability of e-results on a broad access data mining platform. Here we showcase an internet based eResearch platform using the PHP/MySQL paradigm for the collaborative, integrative and comparative analysis of lactation related gene sequences and gene expression experiments to support lactation research. We also illustrate how these resources are used, how they enable research by allowing meta-analysis of data and results and, how the bottom-up development of customized eResearch components can lead to the production of more generic functional software tools and eResearch environments for deployment to a larger number of biological research users working on other bio-systems.<br /

    Gene mining a marama bean expressed sequence tags (ESTs) database: Embryonic seed development genes and microsatellite marker identification

    Get PDF
    Tylosema esculentum (marama bean) is one of the underutilized legumes that have potential to provide protein and fatty acids to ensure food security in dry parts of Southern Africa. In order to establish rapid domestication programs for the plant, it is important to explore the plant’s genome and identify functional genes molecular markers like microsatellites in order to develop molecular tools. With the advent of high-throughput sequencing technologies and associated  bioinformatics methods, expressed sequence tags (ESTs) have been developed for many plant species. These are being developed as an economic means of  obtaining large numbers of gene sequences. The aim of this study was to identify genes with important roles for valuable agronomic traits and microsatellite  sequences for marama bean. The authors reported the identification of genes associated with embryonic development and microsatellite sequences. The future direction will entail characterization of these genes using gene over-expression and mutant assays.Key words: Namibia, simple sequence repeats (SSR), data mining, homology searches, bioinformatics, Tylosema esculentum

    EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration

    Get PDF
    This article is available from: http://www.biomedcentral.com/1471-2105/9/5[Background] Expressed sequence tag (EST) collections are composed of a high number of single-pass, redundant, partial sequences, which need to be processed, clustered, and annotated to remove low-quality and vector regions, eliminate redundancy and sequencing errors, and provide biologically relevant information. In order to provide a suitable way of performing the different steps in the analysis of the ESTs, flexible computation pipelines adapted to the local needs of specific EST projects have to be developed. Furthermore, EST collections must be stored in highly structured relational databases available to researchers through user-friendly interfaces which allow efficient and complex data mining, thus offering maximum capabilities for their full exploitation.[Results] We have created EST2uni, an integrated, highly-configurable EST analysis pipeline and data mining software package that automates the pre-processing, clustering, annotation, database creation, and data mining of EST collections. The pipeline uses standard EST analysis tools and the software has a modular design to facilitate the addition of new analytical methods and their configuration. Currently implemented analyses include functional and structural annotation, SNP and microsatellite discovery, integration of previously known genetic marker data and gene expression results, and assistance in cDNA microarray design. It can be run in parallel in a PC cluster in order to reduce the time necessary for the analysis. It also creates a web site linked to the database, showing collection statistics, with complex query capabilities and tools for data mining and retrieval.[Conclusion] The software package presented here provides an efficient and complete bioinformatics tool for the management of EST collections which is very easy to adapt to the local needs of different EST projects. The code is freely available under the GPL license and can be obtained at http:// bioinf.comav.upv.es/est2uni. This site also provides detailed instructions for installation and configuration of the software package. The code is under active development to incorporate new analyses, methods, and algorithms as they are released by the bioinformatics community.Partially funded by "Conselleria de Agricultura, Pesca y Alimentacion de la Comunidad Valenciana" and Spanish "Ministerio de Ciencia y Tecnologia" (research grants GEN2001-4885-C05 and GEN2003-20237-C06).Peer reviewe

    JUICE: a data management system that facilitates the analysis of large volumes of information in an EST project workflow

    Get PDF
    BACKGROUND: Expressed sequence tag (EST) analyses provide a rapid and economical means to identify candidate genes that may be involved in a particular biological process. These ESTs are useful in many Functional Genomics studies. However, the large quantity and complexity of the data generated during an EST sequencing project can make the analysis of this information a daunting task. RESULTS: In an attempt to make this task friendlier, we have developed JUICE, an open source data management system (Apache + PHP + MySQL on Linux), which enables the user to easily upload, organize, visualize and search the different types of data generated in an EST project pipeline. In contrast to other systems, the JUICE data management system allows a branched pipeline to be established, modified and expanded, during the course of an EST project. The web interfaces and tools in JUICE enable the users to visualize the information in a graphical, user-friendly manner. The user may browse or search for sequences and/or sequence information within all the branches of the pipeline. The user can search using terms associated with the sequence name, annotation or other characteristics stored in JUICE and associated with sequences or sequence groups. Groups of sequences can be created by the user, stored in a clipboard and/or downloaded for further analyses. Different user profiles restrict the access of each user depending upon their role in the project. The user may have access exclusively to visualize sequence information, access to annotate sequences and sequence information, or administrative access. CONCLUSION: JUICE is an open source data management system that has been developed to aid users in organizing and analyzing the large amount of data generated in an EST Project workflow. JUICE has been used in one of the first functional genomics projects in Chile, entitled "Functional Genomics in nectarines: Platform to potentiate the competitiveness of Chile in fruit exportation". However, due to its ability to organize and visualize data from external pipelines, JUICE is a flexible data management system that should be useful for other EST/Genome projects. The JUICE data management system is released under the Open Source GNU Lesser General Public License (LGPL). JUICE may be downloaded from or

    Gene projects: a genome web tool for ongoing mining and annotation applied to CitEST

    Get PDF
    Genome projects, both genomic DNA and ESTs (cDNA), generate a large amount of information, demanding time and a well-structured bioinformatics laboratory to manage these data. These genome projects use information available in heterogeneous formats from different sources. The amount and heterogeneity of this information, as well as the absence of a world consensus pattern, make the integration of these data a difficult task. At the same time, sub-tasks, such as microarray analyses of these projects, are very complex. This creates a demand for the development of creative solutions for ongoing annotation, thematic projects, microarray experiments, etc. This paper presents Gene Projects, a system developed to integrate all kinds of solutions.10301036Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES

    ESTPiper--a web-based analysis pipeline for expressed sequence tags

    Get PDF
    BACKGROUND: EST sequencing projects are increasing in scale and scope as the genome sequencing technologies migrate from core sequencing centers to individual research laboratories. Effectively, generating EST data is no longer a bottleneck for investigators. However, processing large amounts of EST data remains a non-trivial challenge for many. Web-based EST analysis tools are proving to be the most convenient option for biologists when performing their analysis, so these tools must continuously improve on their utility to keep in step with the growing needs of research communities. We have developed a web-based EST analysis pipeline called ESTPiper, which streamlines typical large-scale EST analysis components. RESULTS: The intuitive web interface guides users through each step of base calling, data cleaning, assembly, genome alignment, annotation, analysis of gene ontology (GO), and microarray oligonucleotide probe design. Each step is modularized. Therefore, a user can execute them separately or together in batch mode. In addition, the user has control over the parameters used by the underlying programs. Extensive documentation of ESTPiper's functionality is embedded throughout the web site to facilitate understanding of the required input and interpretation of the computational results. The user can also download intermediate results and port files to separate programs for further analysis. In addition, our server provides a time-stamped description of the run history for reproducibility. The pipeline can also be installed locally, allowing researchers to modify ESTPiper to suit their own needs. CONCLUSION: ESTPiper streamlines the typical process of EST analysis. The pipeline was initially designed in part to support the Daphnia pulex cDNA sequencing project. A web server hosting ESTPiper is provided at to now support projects of all size. The software is also freely available from the authors for local installations

    Moniliophthora perniciosa genome: assembly and annotation of mitochondrion and development of a semi-automatic system of genes annotation

    Get PDF
    Orientador: Gonçalo Amarante Guimarães PereiraTese (doutorado) - Universidade Estadual de Campinas, Instituto de BiologiaResumo: O genoma mitocondrial (mtDNA) do fungo Moniliophthora perniciosa foi completamente seqüenciado e contém 109103 pb, com 31% de bases GC, porcentagem menor que a encontrada nas seqüências do genoma nuclear (47%). É o maior genoma mitocondrial de fungos descrito até o momento, e seu tamanho é conseqüência de grande espaço intergênico, que contém diversas ORFs com possibilidade de serem confirmadas como novos genes. Análises computacionais indicam a presença de variação no número de mtDNAs/célula nas diferentes bibliotecas, com tendência significativa de menor número de mtDNAs/célula no grupo de bibliotecas proveniente de culturas submetidas a repetidas repicagens. A maioria dos genes típicos (atp6, atp9, nad1-6, nad4L, cox1-3, cob, sendo a exceção o atp8), todos os rRNAS, tRNAS (foi encontrado pelo menos um para cada aminoácido) e genes das ORFs intrônicas estão orientados no sentido horário. Foram identificados também um gene rps3 e um grupo de ORFs com características semelhantes às dos genes típicos. Surpreendentemente o mtDNA apresenta uma região ocupada por uma estrutura de invertron característica de plasmídeos kalilo-like, integrado de maneira estável ao genoma em todas as variedades do biótipo C, e presente nos demais biótipos testados. Esta seqüência está disponível no GenBank através do número de acesso: AY376688. A outra linha de trabalho foi desenvolvida juntamente com outros bioinformatas do Laboratório de Genômica e Expressão. Foram desenvolvidas ferramentas de mineração e anotação de genes para projetos genoma, sendo os maiores destaques o Gene Projects, que permite mineração e anotação de genes durante o processo de seqüenciamento, e a nova interface de anotação, desenvolvida para otimizar a qualidade e a eficiência da anotação de genesAbstract: The mitochondrial genome (mtDNA) of the fungus Moniliophthora perniciosa was completely sequenced and it contains 109103 bases pair, with 31% of bases GC, smaller percentage than found in the sequences of the nuclear genome (47%). It is the largest mitochondrial genome of fungus described to the moment, and its size is consequence of great intergenic space, with several ORFs who can be confirmed as new genes. Computational analyses show the presence of variation in the number of mtDNAs / cell in different libraries, with significant tendency of smaller mtDNAs / cell number in group of libraries originating from cultures undergoes to repeatedly reply. Most of the typical genes (atp6, atp9, nad1-6, nad4L, cox1-3, cob, being the exception the atp8), all of the rRNAS, tRNAS (it was found at least one for each amino acid) and genes of the intronic ORFs are guided in the hourly sense. Surprisingly the mtDNA presents one region occupied for a structure of invertron, characteristic of plasmids kalilo-like, integrated in stable way to the genome in all of the varieties of the biotype C, and present in other tested biotypes. This sequence is available in the GenBank through the accession number: AY376688. The other work line was developed together with other bioinformatics of the Genomic and Expression Laboratory. Data mining and annotation of genes tools were developed for projects genome, being the largest prominences the Gene Projects, that allows mining and annotation of genes during the sequencing process, and the new annotation interface, developed to optimize the quality and the efficiency of the annotation of genesDoutoradoBioquimicaDoutor em Biologia Funcional e Molecula
    corecore