63,178 research outputs found

    BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

    Get PDF
    Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process

    Applications of next-generation sequencing technologies and computational tools in molecular evolution and aquatic animals conservation studies : a short review

    Get PDF
    Aquatic ecosystems that form major biodiversity hotspots are critically threatened due to environmental and anthropogenic stressors. We believe that, in this genomic era, computational methods can be applied to promote aquatic biodiversity conservation by addressing questions related to the evolutionary history of aquatic organisms at the molecular level. However, huge amounts of genomics data generated can only be discerned through the use of bioinformatics. Here, we examine the applications of next-generation sequencing technologies and bioinformatics tools to study the molecular evolution of aquatic animals and discuss the current challenges and future perspectives of using bioinformatics toward aquatic animal conservation efforts

    Grid Added Value to Address Malaria

    Get PDF
    Through this paper, we call for a distributed, internet-based collaboration to address one of the worst plagues of our present world, malaria. The spirit is a non-proprietary peer-production of information-embedding goods. And we propose to use the grid technology to enable such a world wide "open source" like collaboration. The first step towards this vision has been achieved during the summer on the EGEE grid infrastructure where 46 million ligands were docked for a total amount of 80 CPU years in 6 weeks in the quest for new drugs.Comment: 7 pages, 1 figure, 6th IEEE International Symposium on Cluster Computing and the Grid, Singapore, 16-19 may 2006, to appear in the proceeding

    Removing batch effects for prediction problems with frozen surrogate variable analysis

    Full text link
    Batch effects are responsible for the failure of promising genomic prognos- tic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to re- move these artifacts, but they are designed to be used in population studies. But genomic technologies are beginning to be used in clinical applications where sam- ples are analyzed one at a time for diagnostic, prognostic, and predictive applica- tions. There are currently no batch correction methods that have been developed specifically for prediction. In this paper, we propose an new method called frozen surrogate variable analysis (fSVA) that borrows strength from a training set for individual sample batch correction. We show that fSVA improves prediction ac- curacy in simulations and in public genomic studies. fSVA is available as part of the sva Bioconductor package

    Educating the educators: Incorporating bioinformatics into biological science education in Malaysia

    Get PDF
    Bioinformatics can be defined as a fusion of computational and biological sciences. The urgency to process and analyse the deluge of data created by proteomics and genomics studies has caused bioinformatics to gain prominence and importance. However, its multidisciplinary nature has created a unique demand for specialist trained in both biology and computing. In this review, we described the components that constitute the bioinformatics field and distinctive education criteria that are required to produce individuals with bioinformatics training. This paper will also provide an introduction and overview of bioinformatics in Malaysia. The existing bioinformatics scenario in Malaysia was surveyed to gauge its advancement and to plan for future bioinformatics education strategies. For comparison, we surveyed methods and strategies used in education by other countries so that lessons can be learnt to further improve the implementation of bioinformatics in Malaysia. It is believed that accurate and sufficient steerage from the academia and industry will enable Malaysia to produce quality bioinformaticians in the future

    An Introduction to Programming for Bioscientists: A Python-based Primer

    Full text link
    Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in the biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language's usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a 'variable', the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables, numerous exercises, and 19 pages of Supporting Information; currently in press at PLOS Computational Biolog
    corecore