10,875 research outputs found
Enhanced parallel Differential Evolution algorithm for problems in computational systems biology
[Abstract] Many key problems in computational systems biology and bioinformatics can be formulated and solved using a global optimization framework. The complexity of the underlying mathematical models require the use of efficient solvers in order to obtain satisfactory results in reasonable computation times. Metaheuristics are gaining recognition in this context, with Differential Evolution (DE) as one of the most popular methods. However, for most realistic applications, like those considering parameter estimation in dynamic models, DE still requires excessive computation times.
Here we consider this latter class of problems and present several enhancements to DE based on the introduction of additional algorithmic steps and the exploitation of parallelism. In particular, we propose an asynchronous parallel implementation of DE which has been extended with improved heuristics to exploit the specific structure of parameter estimation problems in computational systems biology. The proposed method is evaluated with different types of benchmarks problems: (i) black-box global optimization problems and (ii) calibration of non-linear dynamic models of biological systems, obtaining excellent results both in terms of quality of the solution and regarding speedup and scalability.Ministerio de Economía y Competitividad; DPI2011-28112-C04-03Consejo Superior de Investigaciones Científicas; PIE-201170E018Ministerio de Ciencia e Innovación; TIN2013-42148-PGalicia. Consellería de Cultura, Educación e Ordenación Universitaria; GRC2013/05
Towards cloud-based parallel metaheuristics: A case study in computational biology with Differential Evolution and Spark
[Abstract]
Many key problems in science and engineering can be formulated and solved using global optimization techniques. In the particular case of computational biology, the development of dynamic (kinetic) models is one of the current key issues. In this context, the problem of parameter estimation (model calibration) remains as a very challenging task. The complexity of the underlying models requires the use of efficient solvers to achieve adequate results in reasonable computation times. Metaheuristics have been the focus of great consideration as an efficient way of solving hard global optimization problems. Even so, in most realistic applications, metaheuristics require a very large computation time to obtain an acceptable result. Therefore, several parallel schemes have been proposed, most of them focused on traditional parallel programming interfaces and infrastructures. However, with the emergence of cloud computing, new programming models have been proposed to deal with large-scale data processing on clouds. In this paper we explore the applicability of these new models for global optimization problems using as a case study a set of challenging parameter estimation problems in systems biology. We have developed, using Spark, an island-based parallel version of Differential Evolution. Differential Evolution is a simple population-based metaheuristic that, at the same time, is very popular for being very efficient in real function global optimization. Several experiments were conducted both on a cluster and on the Microsoft Azure public cloud to evaluate the speedup and efficiency of the proposal, concluding that the Spark implementation achieves not only competitive speedup against the serial implementation, but also good scalability when the number of nodes grows. The results can be useful for those interested in using parallel metaheuristics for global optimization problems benefiting from the potential of new cloud programming models.Ministerio de Economía y Competitividad and FEDER; through the Project SYNBIOFACTORY; DPI2014-55276-C5-2-RMinisterio de Economía y Competitividad and FEDER; TIN2013-42148-PMinisterio de Economía y Competitividad and FEDER; TIN2016-75845-PXunta de Galicia; R2014/04
Recommended from our members
Imaging of a fluid injection process using geophysical data - A didactic example
In many subsurface industrial applications, fluids are injected into or withdrawn from a geologic formation. It is of practical interest to quantify precisely where, when, and by how much the injected fluid alters the state of the subsurface. Routine geophysical monitoring of such processes attempts to image the way that geophysical properties, such as seismic velocities or electrical conductivity, change through time and space and to then make qualitative inferences as to where the injected fluid has migrated. The more rigorous formulation of the time-lapse geophysical inverse problem forecasts how the subsurface evolves during the course of a fluid-injection application. Using time-lapse geophysical signals as the data to be matched, the model unknowns to be estimated are the multiphysics forward-modeling parameters controlling the fluid-injection process. Properly reproducing the geophysical signature of the flow process, subsequent simulations can predict the fluid migration and alteration in the subsurface. The dynamic nature of fluid-injection processes renders imaging problems more complex than conventional geophysical imaging for static targets. This work intents to clarify the related hydrogeophysical parameter estimation concepts
A Bayesian Consistent Dual Ensemble Kalman Filter for State-Parameter Estimation in Subsurface Hydrology
Ensemble Kalman filtering (EnKF) is an efficient approach to addressing
uncertainties in subsurface groundwater models. The EnKF sequentially
integrates field data into simulation models to obtain a better
characterization of the model's state and parameters. These are generally
estimated following joint and dual filtering strategies, in which, at each
assimilation cycle, a forecast step by the model is followed by an update step
with incoming observations. The Joint-EnKF directly updates the augmented
state-parameter vector while the Dual-EnKF employs two separate filters, first
estimating the parameters and then estimating the state based on the updated
parameters. In this paper, we reverse the order of the forecast-update steps
following the one-step-ahead (OSA) smoothing formulation of the Bayesian
filtering problem, based on which we propose a new dual EnKF scheme, the
Dual-EnKF. Compared to the Dual-EnKF, this introduces a new update
step to the state in a fully consistent Bayesian framework, which is shown to
enhance the performance of the dual filtering approach without any significant
increase in the computational cost. Numerical experiments are conducted with a
two-dimensional synthetic groundwater aquifer model to assess the performance
and robustness of the proposed Dual-EnKF, and to evaluate its
results against those of the Joint- and Dual-EnKFs. The proposed scheme is able
to successfully recover both the hydraulic head and the aquifer conductivity,
further providing reliable estimates of their uncertainties. Compared with the
standard Joint- and Dual-EnKFs, the proposed scheme is found more robust to
different assimilation settings, such as the spatial and temporal distribution
of the observations, and the level of noise in the data. Based on our
experimental setups, it yields up to 25% more accurate state and parameters
estimates
Using the Cloud for Parameter Estimation Problems: Comparing Spark vs MPI with a Case-Study
Date of Conference: 14-17 May 2017.
Conference Location: Madrid[Abstract]
Systems biology is an emerging approach focused in generating new knowledge about complex biological systems by combining experimental data with mathematical modeling and advanced computational techniques. Many problems in this field are extremely challenging and require substantial supercomputing resources to be solved. This is the case of parameter estimation in large-scale nonlinear dynamic systems biology models. Recently, Cloud Computing has emerged as a new paradigm for on-demand delivery of computing resources. However, scientific computing community has been quite hesitant in using the Cloud, simply because traditional programming models do not fit well with the new paradigm, and the earliest cloud programming models do not allow most scientific computations being efficiently run in the Cloud. In this paper we explore and compare two distributed computing models: the MPI (message-passing interface) model, that is high-performance oriented, and the Spark model, which is throughput oriented but outperforms other cloud programming solutions adding improved support for iterative algorithms through in-memory computing. The performance of a very well known metaheuristic, the Differential Evolution algorithm, has been thoroughly assessed using a challenging parameter estimation problem from the domain of computational systems biology. The experiments have been carried out both in a local cluster and in the Microsoft Azure public cloud, allowing performance and cost evaluation for both infrastructures.Gobierno de España; DPI2014-55276-C5-2-RFondos Feder; TIN2016-75845-PXunta de Galicia; R2016/045Xunta de Galicia; GRC2013/05
Recommended from our members
Nanometer VLSI placement and optimization for multi-objective design closure
In a VLSI physical synthesis flow, placement directly defines the interconnection,
which affects many other design objectives, such as timing, power consumption,
congestion, and thermal issues. With the scaling of technology, the relative interconnect
delay increases dramatically. As a result, placement has become a bottleneck
in deep sub-micron physical synthesis. In this dissertation, I propose several
optimization algorithms from global placement, placement migration, timing driven
placements, to incremental power optimizations for multi-objective VLSI design
closure. The first work is DPlace, a new global placement algorithm that scales
well to the modern large-scale circuit placement problems. DPlace simulates the
natural diffusion process to spread cells smoothly over the placement region, and
uses both analytical and discrete techniques to improve the wire length. However,
global placement is never sufficient for multi-objective design closure, a variety of
design objectives have to be improved incrementally, such as timing, routing congestion,
signal integrity, and heat distribution. Placement migration is a critical step
to address the cell overlaps appearing during incremental optimizations. To achieve
high placement stability, I propose a computational geometry based placement migration
flow to cope with placement changes, and a new stability metric to measure
the “similarity” between two placements accurately. Our placement migration algorithm
has clear advantage over conventional legalization algorithms such that the
neighborhood characteristics of the original placement are preserved. For timing
closure in high performance designs, I present a linear programming based incremental
timing driven placement to improve the timing on critical paths directly.
I further present an efficient timing driven placement algorithm (Pyramids). Two
formulations of Pyramids are proposed, which are suitable for different optimization
stages in a physical synthesis flow. Both approaches find the optimal location
for timing of a cell in constant time, through computational geometry based approaches.
For fast convergence of design closure, placement should be integrated
with other optimization techniques. I propose to combine placement, gate sizing
and Vt swapping techniques to reduce the total power consumption, especially the
leakage power, which is becoming increasingly critical for nanometer VLSI design
closure.Electrical and Computer Engineerin
A cloud-based enhanced differential evolution algorithm for parameter estimation problems in computational systems biology
This is a post-peer-review, pre-copyedit version of an article published in Cluster Computing. The final authenticated version is available online at: https://doi.org/10.1007/s10586-017-0860-1[Abstract] Metaheuristics are gaining increasing recognition in many research areas, computational systems biology among them. Recent advances in metaheuristics can be helpful in locating the vicinity of the global solution in reasonable computation times, with Differential Evolution (DE) being one of the most popular methods. However, for most realistic applications, DE still requires excessive computation times. With the advent of Cloud Computing effortless access to large number of distributed resources has become more feasible, and new distributed frameworks, like Spark, have been developed to deal with large scale computations on commodity clusters and cloud resources. In this paper we propose a parallel implementation of an enhanced DE using Spark. The proposal drastically reduces the execution time, by means of including a selected local search and exploiting the available distributed resources. The performance of the proposal has been thoroughly assessed using challenging parameter estimation problems from the domain of computational systems biology. Two different platforms have been used for the evaluation, a local cluster and the Microsoft Azure public cloud. Additionally, it has been also compared with other parallel approaches, another cloud-based solution (a MapReduce implementation) and a traditional HPC solution (a MPI implementation)Ministerio de Economía y Competitividad; DPI2014-55276-C5-2-RMinisterio de Economía y Competitividad; TIN2013-42148-PMinisterio de Economía y Competitividad; TIN2016-75845-PXunta de Galicia ; R2016/045Xunta de Galicia; GRC2013/05
Seismic Ray Impedance Inversion
This thesis investigates a prestack seismic inversion scheme implemented in the ray
parameter domain. Conventionally, most prestack seismic inversion methods are
performed in the incidence angle domain. However, inversion using the concept of
ray impedance, as it honours ray path variation following the elastic parameter
variation according to Snell’s law, shows the capacity to discriminate different
lithologies if compared to conventional elastic impedance inversion.
The procedure starts with data transformation into the ray-parameter domain and then
implements the ray impedance inversion along constant ray-parameter profiles. With
different constant-ray-parameter profiles, mixed-phase wavelets are initially estimated
based on the high-order statistics of the data and further refined after a proper well-to-seismic
tie. With the estimated wavelets ready, a Cauchy inversion method is used to
invert for seismic reflectivity sequences, aiming at recovering seismic reflectivity
sequences for blocky impedance inversion. The impedance inversion from reflectivity
sequences adopts a standard generalised linear inversion scheme, whose results are
utilised to identify rock properties and facilitate quantitative interpretation. It has also
been demonstrated that we can further invert elastic parameters from ray impedance
values, without eliminating an extra density term or introducing a Gardner’s relation
to absorb this term.
Ray impedance inversion is extended to P-S converted waves by introducing the
definition of converted-wave ray impedance. This quantity shows some advantages in
connecting prestack converted wave data with well logs, if compared with the shearwave
elastic impedance derived from the Aki and Richards approximation to the
Zoeppritz equations. An analysis of P-P and P-S wave data under the framework of
ray impedance is conducted through a real multicomponent dataset, which can reduce
the uncertainty in lithology identification.Inversion is the key method in generating those examples throughout the entire thesis
as we believe it can render robust solutions to geophysical problems. Apart from the
reflectivity sequence, ray impedance and elastic parameter inversion mentioned above,
inversion methods are also adopted in transforming the prestack data from the offset
domain to the ray-parameter domain, mixed-phase wavelet estimation, as well as the
registration of P-P and P-S waves for the joint analysis.
The ray impedance inversion methods are successfully applied to different types of
datasets. In each individual step to achieving the ray impedance inversion, advantages,
disadvantages as well as limitations of the algorithms adopted are detailed. As a
conclusion, the ray impedance related analyses demonstrated in this thesis are highly
competent compared with the classical elastic impedance methods and the author
would like to recommend it for a wider application
Architectures and GPU-Based Parallelization for Online Bayesian Computational Statistics and Dynamic Modeling
Recent work demonstrates that coupling Bayesian computational statistics methods with dynamic models can facilitate the analysis of complex systems associated with diverse time series, including those involving social and behavioural dynamics. Particle Markov Chain Monte Carlo (PMCMC) methods constitute a particularly powerful class of Bayesian methods combining aspects of batch Markov Chain Monte Carlo (MCMC) and the sequential Monte Carlo method of Particle Filtering (PF). PMCMC can flexibly combine theory-capturing dynamic models with diverse empirical data. Online machine learning is a subcategory of machine learning algorithms characterized by sequential, incremental execution as new data arrives, which can give updated results and predictions with growing sequences of available incoming data. While many machine learning and statistical methods are adapted to online algorithms, PMCMC is one example of the many methods whose compatibility with and adaption to online learning remains unclear.
In this thesis, I proposed a data-streaming solution supporting PF and PMCMC methods with dynamic epidemiological models and demonstrated several successful applications.
By constructing an automated, easy-to-use streaming system, analytic applications and simulation models gain access to arriving real-time data to shorten the time gap between data and resulting model-supported insight. The well-defined architecture design emerging from the thesis would substantially expand traditional simulation models' potential by allowing such models to be offered as continually updated services.
Contingent on sufficiently fast execution time, simulation models within this framework can consume the incoming empirical data in real-time and generate informative predictions on an ongoing basis as new data points arrive.
In a second line of work, I investigated the platform's flexibility and capability by extending this system to support the use of a powerful class of PMCMC algorithms with dynamic models while ameliorating such algorithms' traditionally stiff performance limitations. Specifically, this work designed and implemented a GPU-enabled parallel version of a PMCMC method with dynamic simulation models. The resulting codebase readily has enabled researchers to adapt their models to the state-of-art statistical inference methods, and ensure that the computation-heavy PMCMC method can perform significant sampling between the successive arrival of each new data point. Investigating this method's impact with several realistic PMCMC application examples showed that GPU-based acceleration allows for up to 160x speedup compared to a corresponding CPU-based version not exploiting parallelism. The GPU accelerated PMCMC and the streaming processing system can complement each other, jointly providing researchers with a powerful toolset to greatly accelerate learning and securing additional insight from the high-velocity data increasingly prevalent within social and behavioural spheres.
The design philosophy applied supported a platform with broad generalizability and potential for ready future extensions.
The thesis discusses common barriers and difficulties in designing and implementing such systems and offers solutions to solve or mitigate them
- …