9,187 research outputs found
Grid-enabled SIMAP utility: Motivation, integration technology and performance results
A biological system comprises large numbers of functionally diverse and frequently multifunctional sets of elements that interact selectively and nonlinearly to produce coherent behaviours. Such a system can be anything from an intracellular biological process (such as a biochemical reaction cycle, gene regulatory network or signal transduction pathway) to a cell, tissue, entire organism, or even an ecological web. Biochemical systems are
responsible for processing environmental signals, inducing the appropriate cellular responses and sequence of
internal events. However, such systems are not fully or even poorly understood. Systems biology is a scientific field that is concerned with the systematic study of biological and biochemical systems in terms of complex interactions rather than their individual molecular components. At the core of systems biology is computational
modelling (also called mathematical modelling), which is the process of constructing and simulating an abstract
model of a biological system for subsequent analysis. This methodology can be used to test hypotheses via insilico experiments, providing predictions that can be tested by in-vitro and in-vivo studies. For example, the ERbB1-4 receptor tyrosine kinases (RTKs) and the signalling pathways they activate, govern most core cellular processes such as cell division, motility and survival (Citri and Yarden, 2006) and are strongly linked to cancer when they malfunction due to mutations etc. An ODE (ordinary differential equation)-based mass action ErbB model has been constructed and analysed by Chen et al. (2009) in order to depict what roles of each protein plays and ascertain to how sets of proteins coordinate with each other to perform distinct physiological functions. The
model comprises 499 species (molecules), 201 parameters and 828 reactions. These in silico experiments can often be computationally very expensive, e.g. when multiple biochemical factors are being considered or a variety of complex networks are being simulated simultaneously. Due to the size and complexity of the models
and the requirement to perform comprehensive experiments it is often necessary to use high-performance computing (HPC) to keep the experimental time within tractable bounds. Based on this as part of an EC funded
cancer research project, we have developed the SIMAP Utility that allows the SImulation modeling of the MAP kinase pathway (http://www.simap-project.org). In this paper we present experiences with Grid-enabling SIMAP using Condor
The Family of MapReduce and Large Scale Data Processing Systems
In the last two decades, the continuous increase of computational power has
produced an overwhelming flow of data which has called for a paradigm shift in
the computing architecture and large scale data processing mechanisms.
MapReduce is a simple and powerful programming model that enables easy
development of scalable parallel applications to process vast amounts of data
on large clusters of commodity machines. It isolates the application from the
details of running a distributed program such as issues on data distribution,
scheduling and fault tolerance. However, the original implementation of the
MapReduce framework had some limitations that have been tackled by many
research efforts in several followup works after its introduction. This article
provides a comprehensive survey for a family of approaches and mechanisms of
large scale data processing mechanisms that have been implemented based on the
original idea of the MapReduce framework and are currently gaining a lot of
momentum in both research and industrial communities. We also cover a set of
introduced systems that have been implemented to provide declarative
programming interfaces on top of the MapReduce framework. In addition, we
review several large scale data processing systems that resemble some of the
ideas of the MapReduce framework for different purposes and application
scenarios. Finally, we discuss some of the future research directions for
implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author
Real Time in Plan 9
We describe our experience with the implementation and use of a hard-real-time scheduler for use in Plan 9 as an embedded operating system
Pregelix: Big(ger) Graph Analytics on A Dataflow Engine
There is a growing need for distributed graph processing systems that are
capable of gracefully scaling to very large graph datasets. Unfortunately, this
challenge has not been easily met due to the intense memory pressure imposed by
process-centric, message passing designs that many graph processing systems
follow. Pregelix is a new open source distributed graph processing system that
is based on an iterative dataflow design that is better tuned to handle both
in-memory and out-of-core workloads. As such, Pregelix offers improved
performance characteristics and scaling properties over current open source
systems (e.g., we have seen up to 15x speedup compared to Apache Giraph and up
to 35x speedup compared to distributed GraphLab), and makes more effective use
of available machine resources to support Big(ger) Graph Analytics
- …