328 research outputs found

    Multi-Objective Big Data Optimization with jMetal and Spark

    Get PDF
    Big Data Optimization is the term used to refer to optimization problems which have to manage very large amounts of data. In this paper, we focus on the parallelization of metaheuristics with the Apache Spark cluster computing system for solving multi-objective Big Data Optimization problems. Our purpose is to study the influence of accessing data stored in the Hadoop File System (HDFS) in each evaluation step of a metaheuristic and to provide a software tool to solve these kinds of problems. This tool combines the jMetal multi-objective optimization framework with Apache Spark. We have carried out experiments to measure the performance of the proposed parallel infrastructure in an environment based on virtual machines in a local cluster comprising up to 100 cores. We obtained interesting results for computational e ort and propose guidelines to face multi-objective Big Data Optimization problems.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    A cloud-based enhanced differential evolution algorithm for parameter estimation problems in computational systems biology

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in Cluster Computing. The final authenticated version is available online at: https://doi.org/10.1007/s10586-017-0860-1[Abstract] Metaheuristics are gaining increasing recognition in many research areas, computational systems biology among them. Recent advances in metaheuristics can be helpful in locating the vicinity of the global solution in reasonable computation times, with Differential Evolution (DE) being one of the most popular methods. However, for most realistic applications, DE still requires excessive computation times. With the advent of Cloud Computing effortless access to large number of distributed resources has become more feasible, and new distributed frameworks, like Spark, have been developed to deal with large scale computations on commodity clusters and cloud resources. In this paper we propose a parallel implementation of an enhanced DE using Spark. The proposal drastically reduces the execution time, by means of including a selected local search and exploiting the available distributed resources. The performance of the proposal has been thoroughly assessed using challenging parameter estimation problems from the domain of computational systems biology. Two different platforms have been used for the evaluation, a local cluster and the Microsoft Azure public cloud. Additionally, it has been also compared with other parallel approaches, another cloud-based solution (a MapReduce implementation) and a traditional HPC solution (a MPI implementation)Ministerio de Economía y Competitividad; DPI2014-55276-C5-2-RMinisterio de Economía y Competitividad; TIN2013-42148-PMinisterio de Economía y Competitividad; TIN2016-75845-PXunta de Galicia ; R2016/045Xunta de Galicia; GRC2013/05

    Using the Cloud for Parameter Estimation Problems: Comparing Spark vs MPI with a Case-Study

    Get PDF
    Date of Conference: 14-17 May 2017. Conference Location: Madrid[Abstract] Systems biology is an emerging approach focused in generating new knowledge about complex biological systems by combining experimental data with mathematical modeling and advanced computational techniques. Many problems in this field are extremely challenging and require substantial supercomputing resources to be solved. This is the case of parameter estimation in large-scale nonlinear dynamic systems biology models. Recently, Cloud Computing has emerged as a new paradigm for on-demand delivery of computing resources. However, scientific computing community has been quite hesitant in using the Cloud, simply because traditional programming models do not fit well with the new paradigm, and the earliest cloud programming models do not allow most scientific computations being efficiently run in the Cloud. In this paper we explore and compare two distributed computing models: the MPI (message-passing interface) model, that is high-performance oriented, and the Spark model, which is throughput oriented but outperforms other cloud programming solutions adding improved support for iterative algorithms through in-memory computing. The performance of a very well known metaheuristic, the Differential Evolution algorithm, has been thoroughly assessed using a challenging parameter estimation problem from the domain of computational systems biology. The experiments have been carried out both in a local cluster and in the Microsoft Azure public cloud, allowing performance and cost evaluation for both infrastructures.Gobierno de España; DPI2014-55276-C5-2-RFondos Feder; TIN2016-75845-PXunta de Galicia; R2016/045Xunta de Galicia; GRC2013/05

    Una visión general sobre la implementación de metaheurísticas paralelas en la nube

    Get PDF
    Metaheuristics are among the most popular methods for solving hard global optimization problems in many areas of science and engineering. Their parallel implementation applying HPC techniques is a common approach for efficiently using available resources to reduce the time needed to get a good enough solution to hard-to-solve problems. Paradigms like MPI or OMP are the usual choice when executing them in clusters or supercomputers. Moreover, the pervasive presence of cloud computing and the emergence of programming models like MapReduce or Spark have given rise to an increasing interest in porting HPC workloads to the cloud, as is the case with parallel metaheuristics. In this paper we give an overview of our experience with different alternatives for porting parallel metaheuristics to the cloud, providing some useful insights to the interested reader that we have acquired through extensive experimentation.Las metaheurísticas son uno de los métodos más populares en muchas áreas de la ciencia y la ingeniera para la resolución de problemas de optimización global difíciles. Su implementación paralela, aplicando técnicas de HPC, es una aproximación común a la hora de reducir el tiempo necesario para obtener una solución lo suficientemente buena con un uso eficiente de los recursos disponibles. Paradigmas como MPI u OMP son las opciones habituales cuando se ejecutan en clústeres o supercomputadores. Además, la utilización generalizada de la computación en la nube y la aparición de modelos de programación como MapReduce o Spark, han generado un interés creciente por portar aplicaciones HPC a la nube, como ocurre en el caso de las metaheursticas paralelas. En este trabajo recogemos una visión general de nuestra experiencia con diferentes opciones a la hora de portar metaheursticas paralelas a la nube, proporcionando información útil al lector interesado, que hemos ido adquiriendo a través de nuestra experiencia practica.Facultad de Informátic

    Una visión general sobre la implementación de metaheurísticas paralelas en la nube

    Get PDF
    Metaheuristics are among the most popular methods for solving hard global optimization problems in many areas of science and engineering. Their parallel implementation applying HPC techniques is a common approach for efficiently using available resources to reduce the time needed to get a good enough solution to hard-to-solve problems. Paradigms like MPI or OMP are the usual choice when executing them in clusters or supercomputers. Moreover, the pervasive presence of cloud computing and the emergence of programming models like MapReduce or Spark have given rise to an increasing interest in porting HPC workloads to the cloud, as is the case with parallel metaheuristics. In this paper we give an overview of our experience with different alternatives for porting parallel metaheuristics to the cloud, providing some useful insights to the interested reader that we have acquired through extensive experimentation.Las metaheurísticas son uno de los métodos más populares en muchas áreas de la ciencia y la ingeniera para la resolución de problemas de optimización global difíciles. Su implementación paralela, aplicando técnicas de HPC, es una aproximación común a la hora de reducir el tiempo necesario para obtener una solución lo suficientemente buena con un uso eficiente de los recursos disponibles. Paradigmas como MPI u OMP son las opciones habituales cuando se ejecutan en clústeres o supercomputadores. Además, la utilización generalizada de la computación en la nube y la aparición de modelos de programación como MapReduce o Spark, han generado un interés creciente por portar aplicaciones HPC a la nube, como ocurre en el caso de las metaheursticas paralelas. En este trabajo recogemos una visión general de nuestra experiencia con diferentes opciones a la hora de portar metaheursticas paralelas a la nube, proporcionando información útil al lector interesado, que hemos ido adquiriendo a través de nuestra experiencia practica.Facultad de Informátic

    Una visión general sobre la implementación de metaheurísticas paralelas en la nube

    Get PDF
    Metaheuristics are among the most popular methods for solving hard global optimization problems in many areas of science and engineering. Their parallel implementation applying HPC techniques is a common approach for efficiently using available resources to reduce the time needed to get a good enough solution to hard-to-solve problems. Paradigms like MPI or OMP are the usual choice when executing them in clusters or supercomputers. Moreover, the pervasive presence of cloud computing and the emergence of programming models like MapReduce or Spark have given rise to an increasing interest in porting HPC workloads to the cloud, as is the case with parallel metaheuristics. In this paper we give an overview of our experience with different alternatives for porting parallel metaheuristics to the cloud, providing some useful insights to the interested reader that we have acquired through extensive experimentation.Las metaheurísticas son uno de los métodos más populares en muchas áreas de la ciencia y la ingeniera para la resolución de problemas de optimización global difíciles. Su implementación paralela, aplicando técnicas de HPC, es una aproximación común a la hora de reducir el tiempo necesario para obtener una solución lo suficientemente buena con un uso eficiente de los recursos disponibles. Paradigmas como MPI u OMP son las opciones habituales cuando se ejecutan en clústeres o supercomputadores. Además, la utilización generalizada de la computación en la nube y la aparición de modelos de programación como MapReduce o Spark, han generado un interés creciente por portar aplicaciones HPC a la nube, como ocurre en el caso de las metaheursticas paralelas. En este trabajo recogemos una visión general de nuestra experiencia con diferentes opciones a la hora de portar metaheursticas paralelas a la nube, proporcionando información útil al lector interesado, que hemos ido adquiriendo a través de nuestra experiencia practica.Facultad de Informátic

    Implementing Parallel Differential Evolution on Spark

    Get PDF
    [Abstract] Metaheuristics are gaining increased attention as an efficient way of solving hard global optimization problems. Differential Evolution (DE) is one of the most popular algorithms in that class. However, its application to realistic problems results in excessive computation times. Therefore, several parallel DE schemes have been proposed, most of them focused on traditional parallel programming interfaces and infrastruc- tures. However, with the emergence of Cloud Computing, new program- ming models, like Spark, have appeared to suit with large-scale data processing on clouds. In this paper we investigate the applicability of Spark to develop parallel DE schemes to be executed in a distributed environment. Both the master-slave and the island-based DE schemes usually found in the literature have been implemented using Spark. The speedup and efficiency of all the implementations were evaluated on the Amazon Web Services (AWS) public cloud, concluding that the island- based solution is the best suited to the distributed nature of Spark. It achieves a good speedup versus the serial implementation, and shows a decent scalability when the number of nodes grows.[Resumen] Las metaheurísticas están recibiendo una atención creciente como técnica eficiente en la resolución de problemas difíciles de optimización global. Differential Evolution (DE) es una de las metaheurísticas más populares, sin embargo su aplicación en problemas reales deriva en tiempos de cómputo excesivos. Por ello se han realizado diferentes propuestas para la paralelización del DE, en su mayoría utilizando infraestructuras e interfaces de programación paralela tradicionales. Con la aparición de la computación en la nube también se han propuesto nuevos modelos de programación, como Spark, que permiten manejar el procesamiento de datos a gran escala en la nube. En este artículo investigamos la aplicabilidad de Spark en el desarrollo de implementaciones paralelas del DE para su ejecución en entornos distribuidos. Se han implementado tanto la aproximación master-slave como la basada en islas, que son las más comunes. También se han evaluado la aceleración y la eficiencia de todas las implementaciones usando el cloud público de Amazon (AWS, Amazon Web Services), concluyéndose que la implementación basada en islas es la más adecuada para el esquema de distribución usado por Spark. Esta implementación obtiene una buena aceleración en relación a la implementación serie y muestra una escalabilidad bastante buena cuando el número de nodos aumenta.[Resume] As metaheurísticas están recibindo unha atención a cada vez maior como técnica eficiente na resolución de problemas difíciles de optimización global. Differential Evolution (DE) é unha das metaheurísticas mais populares, ainda que a sua aplicación a problemas reais deriva en tempos de cómputo excesivos. É por iso que se propuxeron diferentes esquemas para a paralelización do DE, na sua maioría utilizando infraestruturas e interfaces de programación paralela tradicionais. Coa aparición da computación na nube tamén se propuxeron novos modelos de programación, como Spark, que permiten manexar o procesamento de datos a grande escala na nube. Neste artigo investigamos a aplicabilidade de Spark no desenvolvimento de implementacións paralelas do DE para a sua execución en contornas distribuidas. Implementáronse tanto a aproximación master-slave como a baseada en illas, que son as mais comúns. Tamén se avaliaron a aceleración e a eficiencia de todas as implementacións usando o cloud público de Amazon (AWS, Amazon Web Services), tirando como conclusión que a implementación baseada en illas é a mais acaida para o esquema de distribución usado por Spark. Esta implementación obtén unha boa aceleración en relación á implementación serie e amosa unha escalabilidade bastante boa cando o número de nos aumenta.Ministerio de Economía y Competitividad; DPI2014-55276-C5-2-RXunta de Galicia; GRC2013/055Xunta de Galicia; R2014/04

    Implementing cloud-based parallel metaheuristics: an overview

    Get PDF
    [Abstract] Metaheuristics are among the most popular methods for solving hard global optimization problems in many areas of science and engineering. Their parallel im- plementation applying HPC techniques is a common approach for efficiently using available resources to re- duce the time needed to get a good enough solution to hard-to-solve problems. Paradigms like MPI or OMP are the usual choice when executing them in clusters or supercomputers. Moreover, the pervasive presence of cloud computing and the emergence of programming models like MapReduce or Spark have given rise to an increasing interest in porting HPC workloads to the cloud, as is the case with parallel metaheuristics. In this paper we give an overview of our experience with different alternatives for porting parallel metaheuris- tics to the cloud, providing some useful insights to the interested reader that we have acquired through extensive experimentation.Gobierno de España; DPI2017-82896-C2-2-RGobierno de España; TIN2016-75845-PXunta de Galicia; R2016/045Xunta de Galicia; ED431C 2017/0

    Towards cloud-based parallel metaheuristics: A case study in computational biology with Differential Evolution and Spark

    Get PDF
    [Abstract] Many key problems in science and engineering can be formulated and solved using global optimization techniques. In the particular case of computational biology, the development of dynamic (kinetic) models is one of the current key issues. In this context, the problem of parameter estimation (model calibration) remains as a very challenging task. The complexity of the underlying models requires the use of efficient solvers to achieve adequate results in reasonable computation times. Metaheuristics have been the focus of great consideration as an efficient way of solving hard global optimization problems. Even so, in most realistic applications, metaheuristics require a very large computation time to obtain an acceptable result. Therefore, several parallel schemes have been proposed, most of them focused on traditional parallel programming interfaces and infrastructures. However, with the emergence of cloud computing, new programming models have been proposed to deal with large-scale data processing on clouds. In this paper we explore the applicability of these new models for global optimization problems using as a case study a set of challenging parameter estimation problems in systems biology. We have developed, using Spark, an island-based parallel version of Differential Evolution. Differential Evolution is a simple population-based metaheuristic that, at the same time, is very popular for being very efficient in real function global optimization. Several experiments were conducted both on a cluster and on the Microsoft Azure public cloud to evaluate the speedup and efficiency of the proposal, concluding that the Spark implementation achieves not only competitive speedup against the serial implementation, but also good scalability when the number of nodes grows. The results can be useful for those interested in using parallel metaheuristics for global optimization problems benefiting from the potential of new cloud programming models.Ministerio de Economía y Competitividad and FEDER; through the Project SYNBIOFACTORY; DPI2014-55276-C5-2-RMinisterio de Economía y Competitividad and FEDER; TIN2013-42148-PMinisterio de Economía y Competitividad and FEDER; TIN2016-75845-PXunta de Galicia; R2014/04
    • …
    corecore