25 research outputs found

    Reducing memory requirements for large size LBM simulations on GPUs

    Get PDF
    The scientific community in its never-ending road of larger and more efficient computational resources is in need of more efficient implementations that can adapt efficiently on the current parallel platforms. Graphics processing units are an appropriate platform that cover some of these demands. This architecture presents a high performance with a reduced cost and an efficient power consumption. However, the memory capacity in these devices is reduced and so expensive memory transfers are necessary to deal with big problems. Today, the lattice-Boltzmann method (LBM) has positioned as an efficient approach for Computational Fluid Dynamics simulations. Despite this method is particularly amenable to be efficiently parallelized, it is in need of a considerable memory capacity, which is the consequence of a dramatic fall in performance when dealing with large simulations. In this work, we propose some initiatives to minimize such demand of memory, which allows us to execute bigger simulations on the same platform without additional memory transfers, keeping a high performance. In particular, we present 2 new implementations, LBM-Ghost and LBM-Swap, which are deeply analyzed, presenting the pros and cons of each of them.This project was funded by the Spanish Ministry of Economy and Competitiveness (MINECO): BCAM Severo Ochoa accreditation SEV-2013-0323, MTM2013-40824, Computación de Altas Prestaciones VII TIN2015-65316-P, by the Basque Excellence Research Center (BERC 2014-2017) pro- gram by the Basque Government, and by the Departament d' Innovació, Universitats i Empresa de la Generalitat de Catalunya, under project MPEXPAR: Models de Programació i Entorns d' Execució Paral·lels (2014-SGR-1051). We also thank the support of the computing facilities of Extremadura Research Centre for Advanced Technologies (CETA-CIEMAT) and NVIDIA GPU Research Center program for the provided resources, as well as the support of NVIDIA through the BSC/UPC NVIDIA GPU Center of Excellence.Peer ReviewedPostprint (author's final draft

    Performance Prediction of Cloud-Based Big Data Applications

    Get PDF
    Big data analytics have become widespread as a means to extract knowledge from large datasets. Yet, the heterogeneity and irregular- ity usually associated with big data applications often overwhelm the existing software and hardware infrastructures. In such con- text, the exibility and elasticity provided by the cloud computing paradigm o er a natural approach to cost-e ectively adapting the allocated resources to the application’s current needs. However, these same characteristics impose extra challenges to predicting the performance of cloud-based big data applications, a key step to proper management and planning. This paper explores three modeling approaches for performance prediction of cloud-based big data applications. We evaluate two queuing-based analytical models and a novel fast ad hoc simulator in various scenarios based on di erent applications and infrastructure setups. The three ap- proaches are compared in terms of prediction accuracy, nding that our best approaches can predict average application execution times with 26% relative error in the very worst case and about 7% on average

    Impact of Shutdown Techniques for Energy-Efficient Cloud Data Centers

    Get PDF
    International audienceElectricity consumption is a worrying concern in current large-scale systems like datacenters and supercomputers. These infrastructures are often dimensioned according to the workload peak. However, their consumption is not power-proportional: when the workload is low, the consumption is still high. Shutdown techniques have been developed to adapt the number of switched-on servers to the actual workload. However, datacenter operators are reluctant to adopt such approaches because of their potential impact on reactivity and hardware failures, and their energy gain which is often largely misjudged. In this article, we evaluate the potential gain of shutdown techniques by taking into account shutdown and boot up costs in time and energy. This evaluation is made on recent server architectures and future hypothetical energy-aware architectures. We also determine if the knowledge of future is required for saving energy with such techniques. We present simulation results exploiting real traces collected on different infrastructures under various machine configurations with several shutdown policies, with and without workload prediction

    Event detection in location-based social networks

    Get PDF
    With the advent of social networks and the rise of mobile technologies, users have become ubiquitous sensors capable of monitoring various real-world events in a crowd-sourced manner. Location-based social networks have proven to be faster than traditional media channels in reporting and geo-locating breaking news, i.e. Osama Bin Laden’s death was first confirmed on Twitter even before the announcement from the communication department at the White House. However, the deluge of user-generated data on these networks requires intelligent systems capable of identifying and characterizing such events in a comprehensive manner. The data mining community coined the term, event detection , to refer to the task of uncovering emerging patterns in data streams . Nonetheless, most data mining techniques do not reproduce the underlying data generation process, hampering to self-adapt in fast-changing scenarios. Because of this, we propose a probabilistic machine learning approach to event detection which explicitly models the data generation process and enables reasoning about the discovered events. With the aim to set forth the differences between both approaches, we present two techniques for the problem of event detection in Twitter : a data mining technique called Tweet-SCAN and a machine learning technique called Warble. We assess and compare both techniques in a dataset of tweets geo-located in the city of Barcelona during its annual festivities. Last but not least, we present the algorithmic changes and data processing frameworks to scale up the proposed techniques to big data workloads.This work is partially supported by Obra Social “la Caixa”, by the Spanish Ministry of Science and Innovation under contract (TIN2015-65316), by the Severo Ochoa Program (SEV2015-0493), by SGR programs of the Catalan Government (2014-SGR-1051, 2014-SGR-118), Collectiveware (TIN2015-66863-C2-1-R) and BSC/UPC NVIDIA GPU Center of Excellence.We would also like to thank the reviewers for their constructive feedback.Peer ReviewedPostprint (author's final draft

    A dataflow IR for memory efficient RIPL compilation to FPGAs

    Get PDF
    Field programmable gate arrays (FPGAs) are fundamentally different to fixed processors architectures because their memory hierarchies can be tailored to the needs of an algorithm. FPGA compilers for high level languages are not hindered by fixed memory hierarchies. The constraint when compiling to FPGAs is the availability of resources. In this paper we describe how the dataflow intermediary of our declarative FPGA image processing DSL called RIPL (Rathlin Image Processing Language) enables us to constrain memory. We use five benchmarks to demonstrate that memory use with RIPL is comparable to the Vivado HLS OpenCV library without the need for language pragmas to guide hardware synthesis. The benchmarks also show that RIPL is more expressive than the Darkroom FPGA image processing language

    Kulla, a container-centric construction model for building infrastructure-agnostic distributed and parallel applications

    Get PDF
    This paper presents the design, development, and implementation of Kulla, a virtual container-centric construction model that mixes loosely coupled structures with a parallel programming model for building infrastructure-agnostic distributed and parallel applications. In Kulla, applications, dependencies and environment settings, are mapped with construction units called Kulla-Blocks. A parallel programming model enables developers to couple those interoperable structures for creating constructive structures named Kulla-Bricks. In these structures, continuous dataflow and parallel patterns can be created without modifying the code of applications. Methods such as Divide&Containerize (data parallelism), Pipe&Blocks (streaming), and Manager/Block (task parallelism) were developed to create Kulla-Bricks. Recursive combinations of Kulla instances can be grouped in deployment structures called Kulla-Boxes, which are encapsulated into VCs to create infrastructure-agnostic parallel and/or distributed applications. Deployment strategies were created for Kulla-Boxes to improve the IT resource profitability. To show the feasibility and flexibility of this model, solutions combining real-world applications were implemented by using Kulla instances to compose parallel and/or distributed system deployed on different IT infrastructures. An experimental evaluation based on use cases solving satellite and medical image processing problems revealed the efficiency of Kulla model in comparison with some traditional state-of-the-art solutions.This work has been partially supported by the EU project "ASPIDE: Exascale Programing Models for Extreme Data Processing" under grant 801091 and the project "CABAHLA-CM: Convergencia Big data-Hpc: de los sensores a las Aplicaciones" S2018/TCS-4423 from Madrid Regional Government

    An optimization framework for the capacity allocation and admission control of MapReduce jobs in cloud systems

    Get PDF
    Nowadays, we live in a Big Data world and many sectors of our economy are guided by data-driven decision processes. Big Data and Business Intelligence applications are facilitated by the MapReduce programming model, while, at infrastructural layer, cloud computing provides flexible and cost-effective solutions to provide on-demand large clusters. Capacity allocation in such systems, meant as the problem of providing computational power to support concurrent MapReduce applications in a cost-effective fashion, represents a challenge of paramount importance. In this paper we lay the foundation for a solution implementing admission control and capacity allocation for MapReduce jobs with a priori deadline guarantees. In particular, shared Hadoop 2.x clusters supporting batch and/or interactive jobs are targeted. We formulate a linear programming model able to minimize cloud resources costs and rejection penalties for the execution of jobs belonging to multiple classes with deadline guarantees. Scalability analyses demonstrated that the proposed method is able to determine the global optimal solution of the linear problem for systems including up to 10,000 classes in less than 1 s

    Analytical composite performance models for Big Data applications

    Get PDF
    In the era of Big Data, whose digital industry is facing the massive growth of data size and development of data intensive software, more and more companies are moving to use new frameworks and paradigms capable of handling data at scale. The outstanding MapRe- duce (MR) paradigm and its implementation framework, Hadoop are among the most re- ferred ones, and basis for later and more advanced frameworks like Tez and Spark. Accurate prediction of the execution time of a Big Data application helps improving design time de- cisions, reduces over allocation charges, and assists budget management. In this regard, we propose analytical models based on the Stochastic Activity Networks (SANs) to accurately model the execution of MR, Tez and Spark applications in Hadoop environments governed by the YARN Capacity scheduler. We evaluate the accuracy of the proposed models over the TPC-DS industry benchmark across different configurations. Results obtained by numeri- cally solving analytical SAN models show an average error of 6% in estimating the execution time of an application compared to the data gathered from experiments and moreover the model evaluation time is lower than simulation time of state of the art solutions
    corecore