    Overcoming Challenges in Predictive Modeling of Laser-Plasma Interaction Scenarios. The Sinuous Route from Advanced Machine Learning to Deep Learning

    The interaction of ultrashort and intense laser pulses with solid targets and dense plasmas is a rapidly developing area of physics, this being mostly due to the significant advancements in laser technology. There is, thus, a growing interest in diagnosing as accurately as possible the numerous phenomena related to the absorption and reflection of laser radiation. At the same time, envisaged experiments are in high demand of increased accuracy simulation software. As laser-plasma interaction modelings are experiencing a transition from computationally-intensive to data-intensive problems, traditional codes employed so far are starting to show their limitations. It is in this context that predictive modelings of laser-plasma interaction experiments are bound to reshape the definition of simulation software. This chapter focuses an entire class of predictive systems incorporating big data, advanced machine learning algorithms and deep learning, with improved accuracy and speed. Making use of terabytes of already available information (literature as well as simulation and experimental data) these systems enable the discovery and understanding of various physical phenomena occurring during interaction, hence allowing researchers to set up controlled experiments at optimal parameters. A comparative discussion in terms of challenges, advantages, bottlenecks, performances and suitability of laser-plasma interaction predictive systems is ultimately provided

    A cloudification methodology for multidimensional analysis: Implementation and application to a railway power simulator

    Many scientific areas make extensive use of computer simulations to study complex real-world processes. These computations are typically very resource-intensive and present scalability issues as experiments get larger even in dedicated clusters, since these are limited by their own hardware resources. Cloud computing raises as an option to move forward into the ideal unlimited scalability by providing virtually infinite resources, yet applications must be adapted to this new paradigm. This process of converting and/or migrating an application and its data in order to make use of cloud computing is sometimes known as cloudifying the application. We propose a generalist cloudification method based in the MapReduce paradigm to migrate scientific simulations into the cloud to provide greater scalability. We analysed its viability by applying it to a real-world railway power consumption simulatior and running the resulting implementation on Hadoop YARN over Amazon EC2. Our tests show that the cloudified application is highly scalable and there is still a large margin to improve the theoretical model and its implementations, and also to extend it to a wider range of simulations. We also propose and evaluate a multidimensional analysis tool based on the cloudified application. It generates, executes and evaluates several experiments in parallel, for the same simulation kernel. The results we obtained indicate that out methodology is suitable for resource intensive simulations and multidimensional analysis, as it improves infrastructure’s utilization, efficiency and scalability when running many complex experiments.This work has been partially funded under the grant TIN2013-41350-P of the Spanish Ministry of Economics and Competitiveness, and the COST Action IC1305 "Network for Sustainable Ultrascale Computing Platforms" (NESUS)

    Modeling performance of Hadoop applications: A journey from queueing networks to stochastic well formed nets

    Nowadays, many enterprises commit to the extraction of actionable knowledge from huge datasets as part of their core business activities. Applications belong to very different domains such as fraud detection or one-to-one marketing, and encompass business analytics and support to decision making in both private and public sectors. In these scenarios, a central place is held by the MapReduce framework and in particular its open source implementation, Apache Hadoop. In such environments, new challenges arise in the area of jobs performance prediction, with the needs to provide Service Level Agreement guarantees to the enduser and to avoid waste of computational resources. In this paper we provide performance analysis models to estimate MapReduce job execution times in Hadoop clusters governed by the YARN Capacity Scheduler. We propose models of increasing complexity and accuracy, ranging from queueing networks to stochastic well formed nets, able to estimate job performance under a number of scenarios of interest, including also unreliable resources. The accuracy of our models is evaluated by considering the TPC-DS industry benchmark running experiments on Amazon EC2 and the CINECA Italian supercomputing center. The results have shown that the average accuracy we can achieve is in the range 9–14%

    An optimization framework for the capacity allocation and admission control of MapReduce jobs in cloud systems

    Nowadays, we live in a Big Data world and many sectors of our economy are guided by data-driven decision processes. Big Data and Business Intelligence applications are facilitated by the MapReduce programming model, while, at infrastructural layer, cloud computing provides flexible and cost-effective solutions to provide on-demand large clusters. Capacity allocation in such systems, meant as the problem of providing computational power to support concurrent MapReduce applications in a cost-effective fashion, represents a challenge of paramount importance. In this paper we lay the foundation for a solution implementing admission control and capacity allocation for MapReduce jobs with a priori deadline guarantees. In particular, shared Hadoop 2.x clusters supporting batch and/or interactive jobs are targeted. We formulate a linear programming model able to minimize cloud resources costs and rejection penalties for the execution of jobs belonging to multiple classes with deadline guarantees. Scalability analyses demonstrated that the proposed method is able to determine the global optimal solution of the linear problem for systems including up to 10,000 classes in less than 1 s

    D-SPACE4Cloud: Towards Quality-Aware Data Intensive Applications in the Cloud

    The last years witnessed a steep rise in data generation worldwide and, consequently, the widespread adoption of software solutions claiming to support data intensive applications. Competitiveness and innovation have strongly benefited from these new platforms and methodologies, and there is a great deal of interest around the new possibilities that Big Data analytics promise to make reality. Many companies currently en- gage in data intensive processes as part of their core businesses; however, fully embracing the data-driven paradigm is still cumbersome, and es- tablishing a production-ready, fine-tuned deployment is time-consuming, expensive, and resource-intensive. This situation calls for novel models and techniques to streamline the process of deployment configuration for Big Data applications. In particular, the focus in this paper is on the rightsizing of Cloud deployed clusters, which represent a cost-effective alternative to installation on premises. We propose a novel tool, inte- grated in a wider DevOps-inspired approach, implementing a parallel and distributed simulation-optimization technique that efficiently and effec- tively explores the space of alternative resource configurations, seeking the minimum cost deployment that satisfies predefined quality of service constraints. The validity and relevance of the proposed solution has been thoroughly validated in a vast experimental campaign including different applications and Big Data platforms

    A cloudification methodology for high performance simulations

    Mención Internacional en el título de doctorMany scientific areas make extensive use of computer simulations to study complex real-world processes. These computations are typically very resource-intensive and present scalability issues as experiments get larger, even in dedicated supercomputers since they are limited by their own hardware resources. Cloud computing raises as an option to move forward into the ideal unlimited scalability by providing virtually infinite resources, yet applications must be adapted to this paradigm. The major goal of this thesis is to analyze the suitability of performing simulations in clouds by performing a paradigm shift, from classic parallel approaches to data-centric models, in those applications where that is possible. The aim is to maintain the scalability achieved in traditional HPC infrastructures, while taking advantage of Cloud Computing paradigm features. The thesis also explores the characteristics that make simulators suitable or unsuitable to be deployed on HPC or Cloud infrastructures, defining a generic architecture and extracting common elements present among the majority of simulators. As result, we propose a generalist cloudification methodology based on the MapReduce paradigm to migrate high performance simulations into the cloud to provide greater scalability. We analysed its viability by applying it to a real engineering simulator and running the resulting implementation on HPC and cloud environments. Our evaluations will aim to show that the cloudified application is highly scalable and there is still a large margin to improve the theoretical model and its implementations, and also to extend it to a wider range of simulations.Muchas áreas de investigación hacen uso extensivo de simulaciones informáticas para estudiar procesos complejos del mundo real. Estas simulaciones suelen hacer uso intensivo de recursos, y presentan problemas de escalabilidad conforme los experimentos aumentan en tamaño incluso en clústeres, ya que estos están limitados por sus propios recursos hardware. Cloud Computing (computación en la nube) surge como alternativa para avanzar hacia el ideal de escalabilidad ilimitada mediante el aprovisionamiento de infinitos recursos (de forma virtual). No obstante, las aplicaciones deben ser adaptadas a este nuevo paradigma. La principal meta de esta tesis es analizar la idoneidad de realizar simulaciones en la nube mediante un cambio de paradigma, de las clásicas aproximaciones paralelas a nuevos modelos centrados en los datos, en aquellas aplicaciones donde esto sea posible. El objetivo es mantener la escalabilidad alcanzada en las tradicionales infraestructuras HPC, mientras se explotan las ventajas del paradigma de computación en la nube. La tesis explora las características que hacen a los simuladores ser o no adecuados para ser desplegados en infraestructuras clúster o en la nube, definiendo una arquitectura genérica y extrayendo elementos comunes presentes en la mayoría de los simuladores. Como resultado, proponemos una metodología genérica de cloudificación, basada en el paradigma MapReduce, para migrar simulaciones de alto rendimiento a la nube con el fin de proveer mayor escalabilidad. Analizamos su viabilidad aplicándola a un simulador real de ingeniería, y ejecutando la implementación resultante en entornos clúster y en la nube. 