10,875 research outputs found

    Enhanced parallel Differential Evolution algorithm for problems in computational systems biology

    Get PDF
    [Abstract] Many key problems in computational systems biology and bioinformatics can be formulated and solved using a global optimization framework. The complexity of the underlying mathematical models require the use of efficient solvers in order to obtain satisfactory results in reasonable computation times. Metaheuristics are gaining recognition in this context, with Differential Evolution (DE) as one of the most popular methods. However, for most realistic applications, like those considering parameter estimation in dynamic models, DE still requires excessive computation times. Here we consider this latter class of problems and present several enhancements to DE based on the introduction of additional algorithmic steps and the exploitation of parallelism. In particular, we propose an asynchronous parallel implementation of DE which has been extended with improved heuristics to exploit the specific structure of parameter estimation problems in computational systems biology. The proposed method is evaluated with different types of benchmarks problems: (i) black-box global optimization problems and (ii) calibration of non-linear dynamic models of biological systems, obtaining excellent results both in terms of quality of the solution and regarding speedup and scalability.Ministerio de Economía y Competitividad; DPI2011-28112-C04-03Consejo Superior de Investigaciones Científicas; PIE-201170E018Ministerio de Ciencia e Innovación; TIN2013-42148-PGalicia. Consellería de Cultura, Educación e Ordenación Universitaria; GRC2013/05

    Towards cloud-based parallel metaheuristics: A case study in computational biology with Differential Evolution and Spark

    Get PDF
    [Abstract] Many key problems in science and engineering can be formulated and solved using global optimization techniques. In the particular case of computational biology, the development of dynamic (kinetic) models is one of the current key issues. In this context, the problem of parameter estimation (model calibration) remains as a very challenging task. The complexity of the underlying models requires the use of efficient solvers to achieve adequate results in reasonable computation times. Metaheuristics have been the focus of great consideration as an efficient way of solving hard global optimization problems. Even so, in most realistic applications, metaheuristics require a very large computation time to obtain an acceptable result. Therefore, several parallel schemes have been proposed, most of them focused on traditional parallel programming interfaces and infrastructures. However, with the emergence of cloud computing, new programming models have been proposed to deal with large-scale data processing on clouds. In this paper we explore the applicability of these new models for global optimization problems using as a case study a set of challenging parameter estimation problems in systems biology. We have developed, using Spark, an island-based parallel version of Differential Evolution. Differential Evolution is a simple population-based metaheuristic that, at the same time, is very popular for being very efficient in real function global optimization. Several experiments were conducted both on a cluster and on the Microsoft Azure public cloud to evaluate the speedup and efficiency of the proposal, concluding that the Spark implementation achieves not only competitive speedup against the serial implementation, but also good scalability when the number of nodes grows. The results can be useful for those interested in using parallel metaheuristics for global optimization problems benefiting from the potential of new cloud programming models.Ministerio de Economía y Competitividad and FEDER; through the Project SYNBIOFACTORY; DPI2014-55276-C5-2-RMinisterio de Economía y Competitividad and FEDER; TIN2013-42148-PMinisterio de Economía y Competitividad and FEDER; TIN2016-75845-PXunta de Galicia; R2014/04

    A Bayesian Consistent Dual Ensemble Kalman Filter for State-Parameter Estimation in Subsurface Hydrology

    Full text link
    Ensemble Kalman filtering (EnKF) is an efficient approach to addressing uncertainties in subsurface groundwater models. The EnKF sequentially integrates field data into simulation models to obtain a better characterization of the model's state and parameters. These are generally estimated following joint and dual filtering strategies, in which, at each assimilation cycle, a forecast step by the model is followed by an update step with incoming observations. The Joint-EnKF directly updates the augmented state-parameter vector while the Dual-EnKF employs two separate filters, first estimating the parameters and then estimating the state based on the updated parameters. In this paper, we reverse the order of the forecast-update steps following the one-step-ahead (OSA) smoothing formulation of the Bayesian filtering problem, based on which we propose a new dual EnKF scheme, the Dual-EnKFOSA_{\rm OSA}. Compared to the Dual-EnKF, this introduces a new update step to the state in a fully consistent Bayesian framework, which is shown to enhance the performance of the dual filtering approach without any significant increase in the computational cost. Numerical experiments are conducted with a two-dimensional synthetic groundwater aquifer model to assess the performance and robustness of the proposed Dual-EnKFOSA_{\rm OSA}, and to evaluate its results against those of the Joint- and Dual-EnKFs. The proposed scheme is able to successfully recover both the hydraulic head and the aquifer conductivity, further providing reliable estimates of their uncertainties. Compared with the standard Joint- and Dual-EnKFs, the proposed scheme is found more robust to different assimilation settings, such as the spatial and temporal distribution of the observations, and the level of noise in the data. Based on our experimental setups, it yields up to 25% more accurate state and parameters estimates

    Using the Cloud for Parameter Estimation Problems: Comparing Spark vs MPI with a Case-Study

    Get PDF
    Date of Conference: 14-17 May 2017. Conference Location: Madrid[Abstract] Systems biology is an emerging approach focused in generating new knowledge about complex biological systems by combining experimental data with mathematical modeling and advanced computational techniques. Many problems in this field are extremely challenging and require substantial supercomputing resources to be solved. This is the case of parameter estimation in large-scale nonlinear dynamic systems biology models. Recently, Cloud Computing has emerged as a new paradigm for on-demand delivery of computing resources. However, scientific computing community has been quite hesitant in using the Cloud, simply because traditional programming models do not fit well with the new paradigm, and the earliest cloud programming models do not allow most scientific computations being efficiently run in the Cloud. In this paper we explore and compare two distributed computing models: the MPI (message-passing interface) model, that is high-performance oriented, and the Spark model, which is throughput oriented but outperforms other cloud programming solutions adding improved support for iterative algorithms through in-memory computing. The performance of a very well known metaheuristic, the Differential Evolution algorithm, has been thoroughly assessed using a challenging parameter estimation problem from the domain of computational systems biology. The experiments have been carried out both in a local cluster and in the Microsoft Azure public cloud, allowing performance and cost evaluation for both infrastructures.Gobierno de España; DPI2014-55276-C5-2-RFondos Feder; TIN2016-75845-PXunta de Galicia; R2016/045Xunta de Galicia; GRC2013/05

    A cloud-based enhanced differential evolution algorithm for parameter estimation problems in computational systems biology

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in Cluster Computing. The final authenticated version is available online at: https://doi.org/10.1007/s10586-017-0860-1[Abstract] Metaheuristics are gaining increasing recognition in many research areas, computational systems biology among them. Recent advances in metaheuristics can be helpful in locating the vicinity of the global solution in reasonable computation times, with Differential Evolution (DE) being one of the most popular methods. However, for most realistic applications, DE still requires excessive computation times. With the advent of Cloud Computing effortless access to large number of distributed resources has become more feasible, and new distributed frameworks, like Spark, have been developed to deal with large scale computations on commodity clusters and cloud resources. In this paper we propose a parallel implementation of an enhanced DE using Spark. The proposal drastically reduces the execution time, by means of including a selected local search and exploiting the available distributed resources. The performance of the proposal has been thoroughly assessed using challenging parameter estimation problems from the domain of computational systems biology. Two different platforms have been used for the evaluation, a local cluster and the Microsoft Azure public cloud. Additionally, it has been also compared with other parallel approaches, another cloud-based solution (a MapReduce implementation) and a traditional HPC solution (a MPI implementation)Ministerio de Economía y Competitividad; DPI2014-55276-C5-2-RMinisterio de Economía y Competitividad; TIN2013-42148-PMinisterio de Economía y Competitividad; TIN2016-75845-PXunta de Galicia ; R2016/045Xunta de Galicia; GRC2013/05

    Seismic Ray Impedance Inversion

    Get PDF
    This thesis investigates a prestack seismic inversion scheme implemented in the ray parameter domain. Conventionally, most prestack seismic inversion methods are performed in the incidence angle domain. However, inversion using the concept of ray impedance, as it honours ray path variation following the elastic parameter variation according to Snell’s law, shows the capacity to discriminate different lithologies if compared to conventional elastic impedance inversion. The procedure starts with data transformation into the ray-parameter domain and then implements the ray impedance inversion along constant ray-parameter profiles. With different constant-ray-parameter profiles, mixed-phase wavelets are initially estimated based on the high-order statistics of the data and further refined after a proper well-to-seismic tie. With the estimated wavelets ready, a Cauchy inversion method is used to invert for seismic reflectivity sequences, aiming at recovering seismic reflectivity sequences for blocky impedance inversion. The impedance inversion from reflectivity sequences adopts a standard generalised linear inversion scheme, whose results are utilised to identify rock properties and facilitate quantitative interpretation. It has also been demonstrated that we can further invert elastic parameters from ray impedance values, without eliminating an extra density term or introducing a Gardner’s relation to absorb this term. Ray impedance inversion is extended to P-S converted waves by introducing the definition of converted-wave ray impedance. This quantity shows some advantages in connecting prestack converted wave data with well logs, if compared with the shearwave elastic impedance derived from the Aki and Richards approximation to the Zoeppritz equations. An analysis of P-P and P-S wave data under the framework of ray impedance is conducted through a real multicomponent dataset, which can reduce the uncertainty in lithology identification.Inversion is the key method in generating those examples throughout the entire thesis as we believe it can render robust solutions to geophysical problems. Apart from the reflectivity sequence, ray impedance and elastic parameter inversion mentioned above, inversion methods are also adopted in transforming the prestack data from the offset domain to the ray-parameter domain, mixed-phase wavelet estimation, as well as the registration of P-P and P-S waves for the joint analysis. The ray impedance inversion methods are successfully applied to different types of datasets. In each individual step to achieving the ray impedance inversion, advantages, disadvantages as well as limitations of the algorithms adopted are detailed. As a conclusion, the ray impedance related analyses demonstrated in this thesis are highly competent compared with the classical elastic impedance methods and the author would like to recommend it for a wider application

    Architectures and GPU-Based Parallelization for Online Bayesian Computational Statistics and Dynamic Modeling

    Get PDF
    Recent work demonstrates that coupling Bayesian computational statistics methods with dynamic models can facilitate the analysis of complex systems associated with diverse time series, including those involving social and behavioural dynamics. Particle Markov Chain Monte Carlo (PMCMC) methods constitute a particularly powerful class of Bayesian methods combining aspects of batch Markov Chain Monte Carlo (MCMC) and the sequential Monte Carlo method of Particle Filtering (PF). PMCMC can flexibly combine theory-capturing dynamic models with diverse empirical data. Online machine learning is a subcategory of machine learning algorithms characterized by sequential, incremental execution as new data arrives, which can give updated results and predictions with growing sequences of available incoming data. While many machine learning and statistical methods are adapted to online algorithms, PMCMC is one example of the many methods whose compatibility with and adaption to online learning remains unclear. In this thesis, I proposed a data-streaming solution supporting PF and PMCMC methods with dynamic epidemiological models and demonstrated several successful applications. By constructing an automated, easy-to-use streaming system, analytic applications and simulation models gain access to arriving real-time data to shorten the time gap between data and resulting model-supported insight. The well-defined architecture design emerging from the thesis would substantially expand traditional simulation models' potential by allowing such models to be offered as continually updated services. Contingent on sufficiently fast execution time, simulation models within this framework can consume the incoming empirical data in real-time and generate informative predictions on an ongoing basis as new data points arrive. In a second line of work, I investigated the platform's flexibility and capability by extending this system to support the use of a powerful class of PMCMC algorithms with dynamic models while ameliorating such algorithms' traditionally stiff performance limitations. Specifically, this work designed and implemented a GPU-enabled parallel version of a PMCMC method with dynamic simulation models. The resulting codebase readily has enabled researchers to adapt their models to the state-of-art statistical inference methods, and ensure that the computation-heavy PMCMC method can perform significant sampling between the successive arrival of each new data point. Investigating this method's impact with several realistic PMCMC application examples showed that GPU-based acceleration allows for up to 160x speedup compared to a corresponding CPU-based version not exploiting parallelism. The GPU accelerated PMCMC and the streaming processing system can complement each other, jointly providing researchers with a powerful toolset to greatly accelerate learning and securing additional insight from the high-velocity data increasingly prevalent within social and behavioural spheres. The design philosophy applied supported a platform with broad generalizability and potential for ready future extensions. The thesis discusses common barriers and difficulties in designing and implementing such systems and offers solutions to solve or mitigate them
    corecore