Search CORE

63,178 research outputs found

BioWorkbench: A High-Performance Framework for Managing and Analyzing Bioinformatics Experiments

Author: Barbosa Helio J. C.
Foster Ian
Gadelha Jr Luiz M. R.
Katz Daniel S.
Loss Guilherme
Magalhães Thiago
Mattoso Marta
Mondelli Maria Luiza
Ocaña Kary
Vasconcelos Ana Tereza R.
Wilde Michael
Publication venue: 'PeerJ'
Publication date: 11/01/2018
Field of study

Advances in sequencing techniques have led to exponential growth in biological data, demanding the development of large-scale bioinformatics experiments. Because these experiments are computation- and data-intensive, they require high-performance computing (HPC) techniques and can benefit from specialized technologies such as Scientific Workflow Management Systems (SWfMS) and databases. In this work, we present BioWorkbench, a framework for managing and analyzing bioinformatics experiments. This framework automatically collects provenance data, including both performance data from workflow execution and data from the scientific domain of the workflow application. Provenance data can be analyzed through a web application that abstracts a set of queries to the provenance database, simplifying access to provenance information. We evaluate BioWorkbench using three case studies: SwiftPhylo, a phylogenetic tree assembly workflow; SwiftGECKO, a comparative genomics workflow; and RASflow, a RASopathy analysis workflow. We analyze each workflow from both computational and scientific domain perspectives, by using queries to a provenance and annotation database. Some of these queries are available as a pre-built feature of the BioWorkbench web application. Through the provenance data, we show that the framework is scalable and achieves high-performance, reducing up to 98% of the case studies execution time. We also show how the application of machine learning techniques can enrich the analysis process

arXiv.org e-Print Archive

Directory of Open Access Journals

Applications of next-generation sequencing technologies and computational tools in molecular evolution and aquatic animals conservation studies : a short review

Author: Afiqah-Aleng Nor
Danish-Daniel Muhd
Mohd Nor Siti Azizah
Razali Siti Aisyah
Sorgeloos Patrick
Sung Yeong Yik
Tan Min Pau
Van de Peer Yves
Wong Li Lian
Publication venue: 'SAGE Publications'
Publication date: 01/01/2019
Field of study

Aquatic ecosystems that form major biodiversity hotspots are critically threatened due to environmental and anthropogenic stressors. We believe that, in this genomic era, computational methods can be applied to promote aquatic biodiversity conservation by addressing questions related to the evolutionary history of aquatic organisms at the molecular level. However, huge amounts of genomics data generated can only be discerned through the use of bioinformatics. Here, we examine the applications of next-generation sequencing technologies and bioinformatics tools to study the molecular evolution of aquatic animals and discuss the current challenges and future perspectives of using bioinformatics toward aquatic animal conservation efforts

Ghent University Academic Bibliography

Grid Added Value to Address Malaria

Author: Breton V.
Hofmann-Apitius M.
Jacq N.
Kasam V.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Through this paper, we call for a distributed, internet-based collaboration to address one of the worst plagues of our present world, malaria. The spirit is a non-proprietary peer-production of information-embedding goods. And we propose to use the grid technology to enable such a world wide "open source" like collaboration. The first step towards this vision has been achieved during the summer on the EGEE grid infrastructure where 46 million ligands were docked for a total amount of 80 CPU years in 6 weeks in the quest for new drugs.Comment: 7 pages, 1 figure, 6th IEEE International Symposium on Cluster Computing and the Grid, Singapore, 16-19 may 2006, to appear in the proceeding

arXiv.org e-Print Archive

HAL Clermont Université

Removing batch effects for prediction problems with frozen surrogate variable analysis

Author: Bravo Héctor Corrada
Leek Jeffrey T.
Parker Hilary S.
Publication venue
Publication date: 16/01/2013
Field of study

Batch effects are responsible for the failure of promising genomic prognos- tic signatures, major ambiguities in published genomic results, and retractions of widely-publicized findings. Batch effect corrections have been developed to re- move these artifacts, but they are designed to be used in population studies. But genomic technologies are beginning to be used in clinical applications where sam- ples are analyzed one at a time for diagnostic, prognostic, and predictive applica- tions. There are currently no batch correction methods that have been developed specifically for prediction. In this paper, we propose an new method called frozen surrogate variable analysis (fSVA) that borrows strength from a training set for individual sample batch correction. We show that fSVA improves prediction ac- curacy in simulations and in public genomic studies. fSVA is available as part of the sva Bioconductor package

arXiv.org e-Print Archive

CiteSeerX

Directory of Open Access Journals

PubMed Central

Educating the educators: Incorporating bioinformatics into biological science education in Malaysia

Author: Hussein Huszalina
Mohd. Hashim Siti Zaiton
Salim Naomie
Shamsir Mohd. Shahir
Publication venue
Publication date: 01/01/2012
Field of study

Bioinformatics can be defined as a fusion of computational and biological sciences. The urgency to process and analyse the deluge of data created by proteomics and genomics studies has caused bioinformatics to gain prominence and importance. However, its multidisciplinary nature has created a unique demand for specialist trained in both biology and computing. In this review, we described the components that constitute the bioinformatics field and distinctive education criteria that are required to produce individuals with bioinformatics training. This paper will also provide an introduction and overview of bioinformatics in Malaysia. The existing bioinformatics scenario in Malaysia was surveyed to gauge its advancement and to plan for future bioinformatics education strategies. For comparison, we surveyed methods and strategies used in education by other countries so that lessons can be learnt to further improve the implementation of bioinformatics in Malaysia. It is believed that accurate and sufficient steerage from the academia and industry will enable Malaysia to produce quality bioinformaticians in the future

Universiti Teknologi Malaysia Institutional Repository

An Introduction to Programming for Bioscientists: A Python-based Primer

Author: Ekmekci Berk
McAnany Charles E.
Mura Cameron
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 17/05/2016
Field of study

Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in the biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language's usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a 'variable', the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables, numerous exercises, and 19 pages of Supporting Information; currently in press at PLOS Computational Biolog

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare