110,059 research outputs found

    SeqHound: biological sequence and structure database as a platform for bioinformatics research

    Get PDF
    BACKGROUND: SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. RESULTS: SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. CONCLUSIONS: The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit

    ProtVirDB: a database of protozoan virulent proteins

    Get PDF
    Abstract Summary: ProtVirDB is a comprehensive and user-friendly web-based knowledgebase of virulent proteins belonging to protozoan species. The database will facilitate research and provide an integrated platform for comparative studies of virulent proteins in different parasitic protozoans and organize them under a unifying classification schema with functional categories. Remarkably, one-third of the protein sequences in the database showed presence of either mono- or hetero-repeats, or both concomitantly—hence reiterating the importance of repeats in parasite virulence mechanisms. A number of useful bioinformatics tools including BLAST and tools for phylogenetic analysis are integrated with the database. With the rapidly burgeoning interest in the pathogenesis mechanisms of protozoans and ongoing genome sequencing projects, we anticipate that the database will be a useful tool for the research community. Availability: http://bioinfo.icgeb.res.in/protvirdb Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics onlin

    DATA-INTENSIVE COMPUTING FOR BIOINFORMATICS USING VIRTUALIZATION TECHNOLOGIES AND HPC INFRASTRUCTURES

    Get PDF
    The bioinformatics applications often involve many computational components and massive data sets, which are very difficult to be deployed on a single computing machine. In this thesis, we designed a data-intensive computing platform for bioinformatics applications using virtualization technologies and high performance computing (HPC) infrastructures with the concept of multi-tier architecture, which can seamlessly integrate the web user interface (presentation tier), scientific workflow (logic tier) and computing infrastructure (data/computing tier). We demonstrated our platform on two bioinformatics projects. First, we redesigned and deployed the cotton marker database (CMD) (http://www.cottonmarker.org), a centralized web portal in the cotton research community, using the Xen-based virtualization solution. To achieve high-performance and scalability for CMD web tools, we hosted the large amounts of protein databases and computational intensive applications of CMD on the Palmetto HPC of Clemson University. Biologists can easily utilize both bioinformatics applications and HPC resources through the CMD website without a background in computer science. Second, we developed a web tools - Glycan Array QSAR Tool (http://bci.clemson.edu/tools/glycan_array), to analyze glycan array data. The user interface of this tool was developed at the top of Drupal Content Management Systems (CMS) and the computational part was implemented using MATLAB Compiler Runtime (MCR) module. Our new bioinformatics computing platform enables the rapid deployment of data-intensive bioinformatics applications on HPC and virtualization environment with a user-friendly web interface and bridges the gap between biological scientists and cyberinfrastructure

    B mu G@Sbase-a microbial gene expression and comparative genomic database

    Get PDF
    The reducing cost of high-throughput functional genomic technologies is creating a deluge of high volume, complex data, placing the burden on bioinformatics resources and tool development. The Bacterial Microarray Group at St George's (BÎĽG@S) has been at the forefront of bacterial microarray design and analysis for over a decade and while serving as a hub of a global network of microbial research groups has developed BÎĽG@Sbase, a microbial gene expression and comparative genomic database. BÎĽG@Sbase (http://bugs.sgul.ac.uk/bugsbase/) is a web-browsable, expertly curated, MIAME-compliant database that stores comprehensive experimental annotation and multiple raw and analysed data formats. Consistent annotation is enabled through a structured set of web forms, which guide the user through the process following a set of best practices and controlled vocabulary. The database currently contains 86 expertly curated publicly available data sets (with a further 124 not yet published) and full annotation information for 59 bacterial microarray designs. The data can be browsed and queried using an explorer-like interface; integrating intuitive tree diagrams to present complex experimental details clearly and concisely. Furthermore the modular design of the database will provide a robust platform for integrating other data types beyond microarrays into a more Systems analysis based future

    Bioinformatics on the Cloud Computing Platform Azure

    Get PDF
    We discuss the applicability of the Microsoft cloud computing platform, Azure, for bioinformatics. We focus on the usability of the resource rather than its performance. We provide an example of how R can be used on Azure to analyse a large amount of microarray expression data deposited at the public database ArrayExpress. We provide a walk through to demonstrate explicitly how Azure can be used to perform these analyses in Appendix S1 and we offer a comparison with a local computation. We note that the use of the Platform as a Service (PaaS) offering of Azure can represent a steep learning curve for bioinformatics developers who will usually have a Linux and scripting language background. On the other hand, the presence of an additional set of libraries makes it easier to deploy software in a parallel (scalable) fashion and explicitly manage such a production run with only a few hundred lines of code, most of which can be incorporated from a template. We propose that this environment is best suited for running stable bioinformatics software by users not involved with its development. © 2014 Shanahan et al

    yStreX: yeast stress expression database

    Get PDF
    Over the past decade genome-wide expression analyses have been often used to study how expression of genes changes in response to various environmental stresses. Many of these studies (such as effects of oxygen concentration, temperature stress, low pH stress, osmotic stress, depletion or limitation of nutrients, addition of different chemical compounds, etc.) have been conducted in the unicellular Eukaryal model, yeast Saccharomyces cerevisiae. However, the lack of a unifying or integrated, bioinformatics platformthat would permit efficient and rapid use of all these existing data remain an important issue. To facilitate research by exploiting existing transcription data in the field of yeast physiology, we have developed the yStreX database. It is an online repository of analyzed gene expression data from curated data sets from different studies that capture genome-wide transcriptional changes in response to diverse environmental transitions. The first aim of this online database is to facilitate comparison of cross-platform and cross-laboratory gene expression data. Additionally, we performed different expression analyses, meta-analyses and gene set enrichment analyses; and the results are also deposited in this database. Lastly, we constructed a user-friendly Web interface with interactive visualization to provide intuitive access and to display the queried data for users with no background in bioinformatics. Database URL: http://www.ystrexdb.co

    Updates in metabolomics tools and resources: 2014-2015

    Get PDF
    Data processing and interpretation represent the most challenging and time-consuming steps in high-throughput metabolomic experiments, regardless of the analytical platforms (MS or NMR spectroscopy based) used for data acquisition. Improved machinery in metabolomics generates increasingly complex datasets that create the need for more and better processing and analysis software and in silico approaches to understand the resulting data. However, a comprehensive source of information describing the utility of the most recently developed and released metabolomics resources—in the form of tools, software, and databases—is currently lacking. Thus, here we provide an overview of freely-available, and open-source, tools, algorithms, and frameworks to make both upcoming and established metabolomics researchers aware of the recent developments in an attempt to advance and facilitate data processing workflows in their metabolomics research. The major topics include tools and researches for data processing, data annotation, and data visualization in MS and NMR-based metabolomics. Most in this review described tools are dedicated to untargeted metabolomics workflows; however, some more specialist tools are described as well. All tools and resources described including their analytical and computational platform dependencies are summarized in an overview Table

    EST-PAC a web package for EST annotation and protein sequence prediction

    Get PDF
    With the decreasing cost of DNA sequencing technology and the vast diversity of biological resources, researchers increasingly face the basic challenge of annotating a larger number of expressed sequences tags (EST) from a variety of species. This typically consists of a series of repetitive tasks, which should be automated and easy to use. The results of these annotation tasks need to be stored and organized in a consistent way. All these operations should be self-installing, platform independent, easy to customize and amenable to using distributed bioinformatics resources available on the Internet. In order to address these issues, we present EST-PAC a web oriented multi-platform software package for expressed sequences tag (EST) annotation. EST-PAC provides a solution for the administration of EST and protein sequence annotations accessible through a web interface. Three aspects of EST annotation are automated: 1) searching local or remote biological databases for sequence similarities using Blast services, 2) predicting protein coding sequence from EST data and, 3) annotating predicted protein sequences with functional domain predictions. In practice, EST-PAC integrates the BLASTALL suite, EST-Scan2 and HMMER in a relational database system accessible through a simple web interface. EST-PAC also takes advantage of the relational database to allow consistent storage, powerful queries of results and, management of the annotation process. The system allows users to customize annotation strategies and provides an open-source data-management environment for research and education in bioinformatics

    AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system

    Get PDF
    We have implemented a genome annotation system for prokaryotes called AGMIAL. Our approach embodies a number of key principles. First, expert manual annotators are seen as a critical component of the overall system; user interfaces were cyclically refined to satisfy their needs. Second, the overall process should be orchestrated in terms of a global annotation strategy; this facilitates coordination between a team of annotators and automatic data analysis. Third, the annotation strategy should allow progressive and incremental annotation from a time when only a few draft contigs are available, to when a final finished assembly is produced. The overall architecture employed is modular and extensible, being based on the W3 standard Web services framework. Specialized modules interact with two independent core modules that are used to annotate, respectively, genomic and protein sequences. AGMIAL is currently being used by several INRA laboratories to analyze genomes of bacteria relevant to the food-processing industry, and is distributed under an open source license

    QueryOR: a comprehensive web platform for genetic variant analysis and prioritization

    Get PDF
    Background: Whole genome and exome sequencing are contributing to the extraordinary progress in the study of human genetic variants. In this fast developing field, appropriate and easily accessible tools are required to facilitate data analysis. Results: Here we describe QueryOR, a web platform suitable for searching among known candidate genes as well as for finding novel gene-disease associations. QueryOR combines several innovative features that make it comprehensive, flexible and easy to use. Instead of being designed on specific datasets, it works on a general XML schema specifying formats and criteria of each data source. Thanks to this flexibility, new criteria can be easily added for future expansion. Currently, up to 70 user-selectable criteria are available, including a wide range of gene and variant features. Moreover, rather than progressively discarding variants taking one criterion at a time, the prioritization is achieved by a global positive selection process that considers all transcript isoforms, thus producing reliable results. QueryOR is easy to use and its intuitive interface allows to handle different kinds of inheritance as well as features related to sharing variants in different patients. QueryOR is suitable for investigating single patients, families or cohorts. Conclusions: QueryOR is a comprehensive and flexible web platform eligible for an easy user-driven variant prioritization. It is freely available for academic institutions at http://queryor.cribi.unipd.it/
    • …
    corecore