    Towards cloud-based parallel metaheuristics: A case study in computational biology with Differential Evolution and Spark

    [Abstract] Many key problems in science and engineering can be formulated and solved using global optimization techniques. In the particular case of computational biology, the development of dynamic (kinetic) models is one of the current key issues. In this context, the problem of parameter estimation (model calibration) remains as a very challenging task. The complexity of the underlying models requires the use of efficient solvers to achieve adequate results in reasonable computation times. Metaheuristics have been the focus of great consideration as an efficient way of solving hard global optimization problems. Even so, in most realistic applications, metaheuristics require a very large computation time to obtain an acceptable result. Therefore, several parallel schemes have been proposed, most of them focused on traditional parallel programming interfaces and infrastructures. However, with the emergence of cloud computing, new programming models have been proposed to deal with large-scale data processing on clouds. In this paper we explore the applicability of these new models for global optimization problems using as a case study a set of challenging parameter estimation problems in systems biology. We have developed, using Spark, an island-based parallel version of Differential Evolution. Differential Evolution is a simple population-based metaheuristic that, at the same time, is very popular for being very efficient in real function global optimization. Several experiments were conducted both on a cluster and on the Microsoft Azure public cloud to evaluate the speedup and efficiency of the proposal, concluding that the Spark implementation achieves not only competitive speedup against the serial implementation, but also good scalability when the number of nodes grows. The results can be useful for those interested in using parallel metaheuristics for global optimization problems benefiting from the potential of new cloud programming models.Ministerio de Economía y Competitividad and FEDER; through the Project SYNBIOFACTORY; DPI2014-55276-C5-2-RMinisterio de Economía y Competitividad and FEDER; TIN2013-42148-PMinisterio de Economía y Competitividad and FEDER; TIN2016-75845-PXunta de Galicia; R2014/04

    Implementing cloud-based parallel metaheuristics: an overview

    [Abstract] Metaheuristics are among the most popular methods for solving hard global optimization problems in many areas of science and engineering. Their parallel im- plementation applying HPC techniques is a common approach for efficiently using available resources to re- duce the time needed to get a good enough solution to hard-to-solve problems. Paradigms like MPI or OMP are the usual choice when executing them in clusters or supercomputers. Moreover, the pervasive presence of cloud computing and the emergence of programming models like MapReduce or Spark have given rise to an increasing interest in porting HPC workloads to the cloud, as is the case with parallel metaheuristics. In this paper we give an overview of our experience with different alternatives for porting parallel metaheuris- tics to the cloud, providing some useful insights to the interested reader that we have acquired through extensive experimentation.Gobierno de España; DPI2017-82896-C2-2-RGobierno de España; TIN2016-75845-PXunta de Galicia; R2016/045Xunta de Galicia; ED431C 2017/0

    Una visión general sobre la implementación de metaheurísticas paralelas en la nube

    Metaheuristics are among the most popular methods for solving hard global optimization problems in many areas of science and engineering. Their parallel implementation applying HPC techniques is a common approach for efficiently using available resources to reduce the time needed to get a good enough solution to hard-to-solve problems. Paradigms like MPI or OMP are the usual choice when executing them in clusters or supercomputers. Moreover, the pervasive presence of cloud computing and the emergence of programming models like MapReduce or Spark have given rise to an increasing interest in porting HPC workloads to the cloud, as is the case with parallel metaheuristics. In this paper we give an overview of our experience with different alternatives for porting parallel metaheuristics to the cloud, providing some useful insights to the interested reader that we have acquired through extensive experimentation.Las metaheurísticas son uno de los métodos más populares en muchas áreas de la ciencia y la ingeniera para la resolución de problemas de optimización global difíciles. Su implementación paralela, aplicando técnicas de HPC, es una aproximación común a la hora de reducir el tiempo necesario para obtener una solución lo suficientemente buena con un uso eficiente de los recursos disponibles. Paradigmas como MPI u OMP son las opciones habituales cuando se ejecutan en clústeres o supercomputadores. Además, la utilización generalizada de la computación en la nube y la aparición de modelos de programación como MapReduce o Spark, han generado un interés creciente por portar aplicaciones HPC a la nube, como ocurre en el caso de las metaheursticas paralelas. En este trabajo recogemos una visión general de nuestra experiencia con diferentes opciones a la hora de portar metaheursticas paralelas a la nube, proporcionando información útil al lector interesado, que hemos ido adquiriendo a través de nuestra experiencia practica.Facultad de Informátic

    Metaheuristics are among the most popular methods for solving hard global optimization problems in many areas of science and engineering. Their parallel implementation applying HPC techniques is a common approach for efficiently using available resources to reduce the time needed to get a good enough solution to hard-to-solve problems. Paradigms like MPI or OMP are the usual choice when executing them in clusters or supercomputers. Moreover, the pervasive presence of cloud computing and the emergence of programming models like MapReduce or Spark have given rise to an increasing interest in porting HPC workloads to the cloud, as is the case with parallel metaheuristics. In this paper we give an overview of our experience with different alternatives for porting parallel metaheuristics to the cloud, providing some useful insights to the interested reader that we have acquired through extensive experimentation.Las metaheurísticas son uno de los métodos más populares en muchas áreas de la ciencia y la ingeniera para la resolución de problemas de optimización global difíciles. Su implementación paralela, aplicando técnicas de HPC, es una aproximación común a la hora de reducir el tiempo necesario para obtener una solución lo suficientemente buena con un uso eficiente de los recursos disponibles. Paradigmas como MPI u OMP son las opciones habituales cuando se ejecutan en clústeres o supercomputadores. Además, la utilización generalizada de la computación en la nube y la aparición de modelos de programación como MapReduce o Spark, han generado un interés creciente por portar aplicaciones HPC a la nube, como ocurre en el caso de las metaheursticas paralelas. En este trabajo recogemos una visión general de nuestra experiencia con diferentes opciones a la hora de portar metaheursticas paralelas a la nube, proporcionando información útil al lector interesado, que hemos ido adquiriendo a través de nuestra experiencia practica.Facultad de Informátic

    Metaheuristics are among the most popular methods for solving hard global optimization problems in many areas of science and engineering. Their parallel implementation applying HPC techniques is a common approach for efficiently using available resources to reduce the time needed to get a good enough solution to hard-to-solve problems. Paradigms like MPI or OMP are the usual choice when executing them in clusters or supercomputers. Moreover, the pervasive presence of cloud computing and the emergence of programming models like MapReduce or Spark have given rise to an increasing interest in porting HPC workloads to the cloud, as is the case with parallel metaheuristics. In this paper we give an overview of our experience with different alternatives for porting parallel metaheuristics to the cloud, providing some useful insights to the interested reader that we have acquired through extensive experimentation.Las metaheurísticas son uno de los métodos más populares en muchas áreas de la ciencia y la ingeniera para la resolución de problemas de optimización global difíciles. Su implementación paralela, aplicando técnicas de HPC, es una aproximación común a la hora de reducir el tiempo necesario para obtener una solución lo suficientemente buena con un uso eficiente de los recursos disponibles. Paradigmas como MPI u OMP son las opciones habituales cuando se ejecutan en clústeres o supercomputadores. Además, la utilización generalizada de la computación en la nube y la aparición de modelos de programación como MapReduce o Spark, han generado un interés creciente por portar aplicaciones HPC a la nube, como ocurre en el caso de las metaheursticas paralelas. En este trabajo recogemos una visión general de nuestra experiencia con diferentes opciones a la hora de portar metaheursticas paralelas a la nube, proporcionando información útil al lector interesado, que hemos ido adquiriendo a través de nuestra experiencia practica.Facultad de Informátic

    Multi-Objective Big Data Optimization with jMetal and Spark

    Big Data Optimization is the term used to refer to optimization problems which have to manage very large amounts of data. In this paper, we focus on the parallelization of metaheuristics with the Apache Spark cluster computing system for solving multi-objective Big Data Optimization problems. Our purpose is to study the influence of accessing data stored in the Hadoop File System (HDFS) in each evaluation step of a metaheuristic and to provide a software tool to solve these kinds of problems. This tool combines the jMetal multi-objective optimization framework with Apache Spark. We have carried out experiments to measure the performance of the proposed parallel infrastructure in an environment based on virtual machines in a local cluster comprising up to 100 cores. We obtained interesting results for computational e ort and propose guidelines to face multi-objective Big Data Optimization problems.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Implementing Parallel Differential Evolution on Spark

    [Abstract] Metaheuristics are gaining increased attention as an efficient way of solving hard global optimization problems. Differential Evolution (DE) is one of the most popular algorithms in that class. However, its application to realistic problems results in excessive computation times. Therefore, several parallel DE schemes have been proposed, most of them focused on traditional parallel programming interfaces and infrastruc- tures. However, with the emergence of Cloud Computing, new program- ming models, like Spark, have appeared to suit with large-scale data processing on clouds. In this paper we investigate the applicability of Spark to develop parallel DE schemes to be executed in a distributed environment. Both the master-slave and the island-based DE schemes usually found in the literature have been implemented using Spark. The speedup and efficiency of all the implementations were evaluated on the Amazon Web Services (AWS) public cloud, concluding that the island- based solution is the best suited to the distributed nature of Spark. It achieves a good speedup versus the serial implementation, and shows a decent scalability when the number of nodes grows.[Resumen] Las metaheurísticas están recibiendo una atención creciente como técnica eficiente en la resolución de problemas difíciles de optimización global. Differential Evolution (DE) es una de las metaheurísticas más populares, sin embargo su aplicación en problemas reales deriva en tiempos de cómputo excesivos. Por ello se han realizado diferentes propuestas para la paralelización del DE, en su mayoría utilizando infraestructuras e interfaces de programación paralela tradicionales. Con la aparición de la computación en la nube también se han propuesto nuevos modelos de programación, como Spark, que permiten manejar el procesamiento de datos a gran escala en la nube. En este artículo investigamos la aplicabilidad de Spark en el desarrollo de implementaciones paralelas del DE para su ejecución en entornos distribuidos. Se han implementado tanto la aproximación master-slave como la basada en islas, que son las más comunes. También se han evaluado la aceleración y la eficiencia de todas las implementaciones usando el cloud público de Amazon (AWS, Amazon Web Services), concluyéndose que la implementación basada en islas es la más adecuada para el esquema de distribución usado por Spark. Esta implementación obtiene una buena aceleración en relación a la implementación serie y muestra una escalabilidad bastante buena cuando el número de nodos aumenta.[Resume] As metaheurísticas están recibindo unha atención a cada vez maior como técnica eficiente na resolución de problemas difíciles de optimización global. Differential Evolution (DE) é unha das metaheurísticas mais populares, ainda que a sua aplicación a problemas reais deriva en tempos de cómputo excesivos. É por iso que se propuxeron diferentes esquemas para a paralelización do DE, na sua maioría utilizando infraestruturas e interfaces de programación paralela tradicionais. Coa aparición da computación na nube tamén se propuxeron novos modelos de programación, como Spark, que permiten manexar o procesamento de datos a grande escala na nube. Neste artigo investigamos a aplicabilidade de Spark no desenvolvimento de implementacións paralelas do DE para a sua execución en contornas distribuidas. Implementáronse tanto a aproximación master-slave como a baseada en illas, que son as mais comúns. Tamén se avaliaron a aceleración e a eficiencia de todas as implementacións usando o cloud público de Amazon (AWS, Amazon Web Services), tirando como conclusión que a implementación baseada en illas é a mais acaida para o esquema de distribución usado por Spark. Esta implementación obtén unha boa aceleración en relación á implementación serie e amosa unha escalabilidade bastante boa cando o número de nos aumenta.Ministerio de Economía y Competitividad; DPI2014-55276-C5-2-RXunta de Galicia; GRC2013/055Xunta de Galicia; R2014/04

    Parallel Differential Evolution approach for Cloud workflow placements under simultaneous optimization of multiple objectives

    International audienceThe recent rapid expansion of Cloud computing facilities triggers an attendant challenge to facility providers and users for methods for optimal placement of workflows on distributed resources, under the often-contradictory impulses of minimizing makespan, energy consumption, and other metrics. Evolutionary Optimization techniques that from theoretical principles are guaranteed to provide globally optimum solutions, are among the most powerful tools to achieve such optimal placements. Multi-Objective Evolutionary algorithms by design work upon contradictory objectives, gradually evolving across generations towards a converged Pareto front representing optimal decision variables – in this case the mapping of tasks to resources on clusters. However the computation time taken by such algorithms for convergence makes them prohibitive for real time placements because of the adverse impact on makespan. This work describes parallelization, on the same cluster, of a Multi-Objective Differential Evolution method (NSDE-2) for optimization of workflow placement, and the attendant speedups that bring the implicit accuracy of the method into the realm of practical utility. Experimental validation is performed on a real-life testbed using diverse Cloud traces. The solutions under different scheduling policies demonstrate significant reduction in energy consumption with some improvement in makespan