4,822 research outputs found
An Open Framework for Extensible Multi-Stage Bioinformatics Software
In research labs, there is often a need to customise software at every step
in a given bioinformatics workflow, but traditionally it has been difficult to
obtain both a high degree of customisability and good performance.
Performance-sensitive tools are often highly monolithic, which can make
research difficult. We present a novel set of software development principles
and a bioinformatics framework, Friedrich, which is currently in early
development. Friedrich applications support both early stage experimentation
and late stage batch processing, since they simultaneously allow for good
performance and a high degree of flexibility and customisability. These
benefits are obtained in large part by basing Friedrich on the multiparadigm
programming language Scala. We present a case study in the form of a basic
genome assembler and its extension with new functionality. Our architecture has
the potential to greatly increase the overall productivity of software
developers and researchers in bioinformatics.Comment: 12 pages, 1 figure, to appear in proceedings of PRIB 201
Ontology-based knowledge representation of experiment metadata in biological data mining
According to the PubMed resource from the U.S. National Library of Medicine,
over 750,000 scientific articles have been published in the ~5000 biomedical journals
worldwide in the year 2007 alone. The vast majority of these publications include results from hypothesis-driven experimentation in overlapping biomedical research domains. Unfortunately, the sheer volume of information being generated by the biomedical research enterprise has made it virtually impossible for investigators to stay aware of the latest findings in their domain of interest, let alone to be able to assimilate and mine data from related investigations for purposes of meta-analysis. While computers have the potential for assisting investigators in the extraction, management and analysis of these data, information contained in the traditional journal publication is still largely unstructured, free-text descriptions of study design, experimental application and results interpretation, making it difficult for computers to gain access to the content of what is being conveyed without significant manual intervention. In order to circumvent these roadblocks and make the most of the output from the biomedical research enterprise, a variety of related standards in knowledge representation are being developed, proposed and adopted in the biomedical community. In this chapter, we will explore the current status of efforts to develop minimum information standards for the representation of a biomedical experiment, ontologies composed of shared vocabularies assembled into subsumption hierarchical structures, and extensible relational data models that link the information components together in a machine-readable and human-useable framework for data mining purposes
Bioconductor: open software development for computational biology and bioinformatics.
The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. The goals of the project include: fostering collaborative development and widespread use of innovative software, reducing barriers to entry into interdisciplinary scientific research, and promoting the achievement of remote reproducibility of research results. We describe details of our aims and methods, identify current challenges, compare Bioconductor to other open bioinformatics projects, and provide working examples
Object-Oriented Paradigms for Modelling Vascular\ud Tumour Growth: a Case Study
Motivated by a family of related hybrid multiscale models, we have built an object-oriented framework for developing and implementing multiscale models of vascular tumour growth. The models are implemented in our framework as a case study to highlight how object-oriented programming techniques and good object-oriented design may be used effectively to develop hybrid multiscale models of vascular tumour growth. The intention is that this paper will serve as a useful reference for researchers modelling complex biological systems and that these researchers will employ some of the techniques presented herein in their own projects
An open and extensible framework for spatially explicit land use change modelling in R: the lulccR package (0.1.0)
Land use change has important consequences for biodiversity and the
sustainability of ecosystem services, as well as for global
environmental change. Spatially explicit land use change models
improve our understanding of the processes driving change and make
predictions about the quantity and location of future and past
change. Here we present the lulccR package, an object-oriented
framework for land use change modelling written in the R programming
language. The contribution of the work is to resolve the following
limitations associated with the current land use change modelling
paradigm: (1) the source code for model implementations is
frequently unavailable, severely compromising the reproducibility of
scientific results and making it impossible for members of the
community to improve or adapt models for their own purposes; (2)
ensemble experiments to capture model structural uncertainty are
difficult because of fundamental differences between implementations
of different models; (3) different aspects of the modelling
procedure must be performed in different environments because
existing applications usually only perform the spatial allocation of
change. The package includes a stochastic ordered allocation
procedure as well as an implementation of the widely used CLUE-S
algorithm. We demonstrate its functionality by simulating land use
change at the Plum Island Ecosystems site, using a dataset included
with the package. It is envisaged that lulccR will enable future
model development and comparison within an open environment
Harnessing the Power of Many: Extensible Toolkit for Scalable Ensemble Applications
Many scientific problems require multiple distinct computational tasks to be
executed in order to achieve a desired solution. We introduce the Ensemble
Toolkit (EnTK) to address the challenges of scale, diversity and reliability
they pose. We describe the design and implementation of EnTK, characterize its
performance and integrate it with two distinct exemplar use cases: seismic
inversion and adaptive analog ensembles. We perform nine experiments,
characterizing EnTK overheads, strong and weak scalability, and the performance
of two use case implementations, at scale and on production infrastructures. We
show how EnTK meets the following general requirements: (i) implementing
dedicated abstractions to support the description and execution of ensemble
applications; (ii) support for execution on heterogeneous computing
infrastructures; (iii) efficient scalability up to O(10^4) tasks; and (iv)
fault tolerance. We discuss novel computational capabilities that EnTK enables
and the scientific advantages arising thereof. We propose EnTK as an important
addition to the suite of tools in support of production scientific computing
On Designing Multicore-aware Simulators for Biological Systems
The stochastic simulation of biological systems is an increasingly popular
technique in bioinformatics. It often is an enlightening technique, which may
however result in being computational expensive. We discuss the main
opportunities to speed it up on multi-core platforms, which pose new challenges
for parallelisation techniques. These opportunities are developed in two
general families of solutions involving both the single simulation and a bulk
of independent simulations (either replicas of derived from parameter sweep).
Proposed solutions are tested on the parallelisation of the CWC simulator
(Calculus of Wrapped Compartments) that is carried out according to proposed
solutions by way of the FastFlow programming framework making possible fast
development and efficient execution on multi-cores.Comment: 19 pages + cover pag
Recommended from our members
The Alliance of Genome Resources: Building a Modern Data Ecosystem for Model Organism Databases.
Model organisms are essential experimental platforms for discovering gene functions, defining protein and genetic networks, uncovering functional consequences of human genome variation, and for modeling human disease. For decades, researchers who use model organisms have relied on Model Organism Databases (MODs) and the Gene Ontology Consortium (GOC) for expertly curated annotations, and for access to integrated genomic and biological information obtained from the scientific literature and public data archives. Through the development and enforcement of data and semantic standards, these genome resources provide rapid access to the collected knowledge of model organisms in human readable and computation-ready formats that would otherwise require countless hours for individual researchers to assemble on their own. Since their inception, the MODs for the predominant biomedical model organisms [Mus sp (laboratory mouse), Saccharomyces cerevisiae, Drosophila melanogaster, Caenorhabditis elegans, Danio rerio, and Rattus norvegicus] along with the GOC have operated as a network of independent, highly collaborative genome resources. In 2016, these six MODs and the GOC joined forces as the Alliance of Genome Resources (the Alliance). By implementing shared programmatic access methods and data-specific web pages with a unified "look and feel," the Alliance is tackling barriers that have limited the ability of researchers to easily compare common data types and annotations across model organisms. To adapt to the rapidly changing landscape for evaluating and funding core data resources, the Alliance is building a modern, extensible, and operationally efficient "knowledge commons" for model organisms using shared, modular infrastructure
- …