4,822 research outputs found

    An Open Framework for Extensible Multi-Stage Bioinformatics Software

    Get PDF
    In research labs, there is often a need to customise software at every step in a given bioinformatics workflow, but traditionally it has been difficult to obtain both a high degree of customisability and good performance. Performance-sensitive tools are often highly monolithic, which can make research difficult. We present a novel set of software development principles and a bioinformatics framework, Friedrich, which is currently in early development. Friedrich applications support both early stage experimentation and late stage batch processing, since they simultaneously allow for good performance and a high degree of flexibility and customisability. These benefits are obtained in large part by basing Friedrich on the multiparadigm programming language Scala. We present a case study in the form of a basic genome assembler and its extension with new functionality. Our architecture has the potential to greatly increase the overall productivity of software developers and researchers in bioinformatics.Comment: 12 pages, 1 figure, to appear in proceedings of PRIB 201

    Ontology-based knowledge representation of experiment metadata in biological data mining

    Get PDF
    According to the PubMed resource from the U.S. National Library of Medicine, over 750,000 scientific articles have been published in the ~5000 biomedical journals worldwide in the year 2007 alone. The vast majority of these publications include results from hypothesis-driven experimentation in overlapping biomedical research domains. Unfortunately, the sheer volume of information being generated by the biomedical research enterprise has made it virtually impossible for investigators to stay aware of the latest findings in their domain of interest, let alone to be able to assimilate and mine data from related investigations for purposes of meta-analysis. While computers have the potential for assisting investigators in the extraction, management and analysis of these data, information contained in the traditional journal publication is still largely unstructured, free-text descriptions of study design, experimental application and results interpretation, making it difficult for computers to gain access to the content of what is being conveyed without significant manual intervention. In order to circumvent these roadblocks and make the most of the output from the biomedical research enterprise, a variety of related standards in knowledge representation are being developed, proposed and adopted in the biomedical community. In this chapter, we will explore the current status of efforts to develop minimum information standards for the representation of a biomedical experiment, ontologies composed of shared vocabularies assembled into subsumption hierarchical structures, and extensible relational data models that link the information components together in a machine-readable and human-useable framework for data mining purposes

    Bioconductor: open software development for computational biology and bioinformatics.

    Get PDF
    The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples

    Object-Oriented Paradigms for Modelling Vascular\ud Tumour Growth: a Case Study

    Get PDF
    Motivated by a family of related hybrid multiscale models, we have built an object-oriented framework for developing and implementing multiscale models of vascular tumour growth. The models are implemented in our framework as a case study to highlight how object-oriented programming techniques and good object-oriented design may be used effectively to develop hybrid multiscale models of vascular tumour growth. The intention is that this paper will serve as a useful reference for researchers modelling complex biological systems and that these researchers will employ some of the techniques presented herein in their own projects

    An open and extensible framework for spatially explicit land use change modelling in R: the lulccR package (0.1.0)

    Get PDF
    Land use change has important consequences for biodiversity and the sustainability of ecosystem services, as well as for global environmental change. Spatially explicit land use change models improve our understanding of the processes driving change and make predictions about the quantity and location of future and past change. Here we present the lulccR package, an object-oriented framework for land use change modelling written in the R programming language. The contribution of the work is to resolve the following limitations associated with the current land use change modelling paradigm: (1) the source code for model implementations is frequently unavailable, severely compromising the reproducibility of scientific results and making it impossible for members of the community to improve or adapt models for their own purposes; (2) ensemble experiments to capture model structural uncertainty are difficult because of fundamental differences between implementations of different models; (3) different aspects of the modelling procedure must be performed in different environments because existing applications usually only perform the spatial allocation of change. The package includes a stochastic ordered allocation procedure as well as an implementation of the widely used CLUE-S algorithm. We demonstrate its functionality by simulating land use change at the Plum Island Ecosystems site, using a dataset included with the package. It is envisaged that lulccR will enable future model development and comparison within an open environment

    Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications

    Full text link
    Many scientific problems require multiple distinct computational tasks to be executed in order to achieve a desired solution. We introduce the Ensemble Toolkit (EnTK) to address the challenges of scale, diversity and reliability they pose. We describe the design and implementation of EnTK, characterize its performance and integrate it with two distinct exemplar use cases: seismic inversion and adaptive analog ensembles. We perform nine experiments, characterizing EnTK overheads, strong and weak scalability, and the performance of two use case implementations, at scale and on production infrastructures. We show how EnTK meets the following general requirements: (i) implementing dedicated abstractions to support the description and execution of ensemble applications; (ii) support for execution on heterogeneous computing infrastructures; (iii) efficient scalability up to O(10^4) tasks; and (iv) fault tolerance. We discuss novel computational capabilities that EnTK enables and the scientific advantages arising thereof. We propose EnTK as an important addition to the suite of tools in support of production scientific computing

    On Designing Multicore-aware Simulators for Biological Systems

    Full text link
    The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It often is an enlightening technique, which may however result in being computational expensive. We discuss the main opportunities to speed it up on multi-core platforms, which pose new challenges for parallelisation techniques. These opportunities are developed in two general families of solutions involving both the single simulation and a bulk of independent simulations (either replicas of derived from parameter sweep). Proposed solutions are tested on the parallelisation of the CWC simulator (Calculus of Wrapped Compartments) that is carried out according to proposed solutions by way of the FastFlow programming framework making possible fast development and efficient execution on multi-cores.Comment: 19 pages + cover pag
    • …
    corecore