361,932 research outputs found

    Vivaldi: a Python-like domain-specific language for volume rendering and processing on distributed multi CPU-GPU systems

    Get PDF
    Department of Computer EngineeringIn this thesis, a Python-like domain-specific language alled Vivaldi is proposed. Vivaldi is based on Python, but can also provide parallel volume rendering and processing on distributed heterogeneous systems. Nowadays, visualization requires high performance for processing large data and creating high quality images. Therefore, computing systems have also advanced to meet this requirement; one example is graphics processing unit (GPU) clusters. However, even though high-performance systems exist, effective utilization of these systems is not easy for non-experts such as domain scientists and researchers because they require expert programming skills and a significant amount of software development for parallel programming, distributed systems, and heterogeneous architecture. One aim of Vivaldi is to minimize these required programming skills using a domain-specific language for volume rendering and visualization that is Python-like and platform independent. In this language, a parallel visualization model, virtual shared memory model, and platform independent architecture for distributed heterogeneous systems are proposed. The parallel visualization model provides a simple means to implement visualization pipelines. The virtual shared memory model enables the use of cluster memory without Message Passing Interface (MPI) and CUDA. Finally, the platform independent design integrates central processing unit(CPU)s and GPUs into a common, domain-specific language. Vivaldi code was compared to C++ implementations to evaluate its performance according to number of lines, performance, and scalability. The results show that Vivaldi achieved comparable scalability and performance while requiring much less programming effort.ope

    Safe and scalable parallel programming with session types

    Get PDF
    Parallel programming is a technique that can coordinate and utilise multiple hardware resources simultaneously, to improve the overall computation performance. However, reasoning about the communication interactions between the resources is difficult. Moreover, scaling an application often leads to increased number and complexity of interactions, hence we need a systematic way to ensure the correctness of the communication aspects of parallel programs. In this thesis, we take an interaction-centric view of parallel programming, and investigate applying and adapting the theory of Session Types, a formal typing discipline for structured interaction-based communication, to guarantee the lack of communication mismatches and deadlocks in concurrent systems. We focus on scalable, distributed parallel systems that use message-passing for communication. We explore programming language primitives, tools and frameworks to simplify parallel programming. First, we present the design and implementation of Session C, a program ming toolchain for message-passing parallel programming. Session C can ensure deadlock freedom, communication safety and global progress through static type checking, and supports optimisations by refinements through session subtyping. Then we introduce Pabble, a protocol description language for designing parametric interaction protocols. The language can capture scalable interaction patterns found in parallel applications, and guarantees communication-safety and deadlock-freedom despite the undecidability of the underlying parameterised session type theory. Next, we demonstrate an application of Pabble in a workflow that combines Pabble protocols and computation kernel code describing the sequential computation behaviours, to generate a Message-Passing Interface (MPI) parallel application. The framework guarantees, by construction, that generated code are free from communication errors and deadlocks. Finally, we formalise an extension of binary session types and new language primitives for safe and efficient implementations of multiparty parallel applications in a binary server-client programming environment. Our exploration with session-based parallel programming shows that it is a feasible and practical approach to guaranteeing communication aspects of complex, interaction-based scalable parallel programming.Open Acces

    Parallel Programming Using Shared Objects and Broadcasting

    Get PDF
    The two major design approaches taken to build distributed and parallel computer systems, multiprocessing and multicomputing, are discussed. A model that combines the best properties of both multiprocessor and multicomputer systems, easy-to-build hardware, and a conceptually simple programming model is presented. Using this model, a programmer defines and invokes operations on shared objects, the runtime system handles reads and writes on these objects, and the reliable broadcast layer implements indivisible updates to objects using the sequencing protocol. The resulting system is easy to program, easy to build, and has acceptable performance on problems with a moderate grain size in which reads are much more common than writes. Orca, a procedural language whose sequential constructs are roughly similar to languages like C or Modula 2 but which also supports parallel processes and shared objects and has been used to develop applications for the prototype system, is described

    Using PVM to host CLIPS in distributed environments

    Get PDF
    It is relatively easy to enhance CLIPS (C Language Integrated Production System) to support multiple expert systems running in a distributed environment with heterogeneous machines. The task is minimized by using the PVM (Parallel Virtual Machine) code from Oak Ridge Labs to provide the distributed utility. PVM is a library of C and FORTRAN subprograms that supports distributive computing on many different UNIX platforms. A PVM deamon is easily installed on each CPU that enters the virtual machine environment. Any user with rsh or rexec access to a machine can use the one PVM deamon to obtain a generous set of distributed facilities. The ready availability of both CLIPS and PVM makes the combination of software particularly attractive for budget conscious experimentation of heterogeneous distributive computing with multiple CLIPS executables. This paper presents a design that is sufficient to provide essential message passing functions in CLIPS and enable the full range of PVM facilities

    Parallel Brownian dynamics simulations with the message-passing and PGAS programming models

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in Computer Physics Communications. The final authenticated version is available online at: https://doi.org/10.1016/j.cpc.2012.12.015[Abstract] The simulation of particle dynamics is among the most important mechanisms to study the behavior of molecules in a medium under specific conditions of temperature and density. Several models can be used to compute efficiently the forces that act on each particle, and also the interactions between them. This work presents the design and implementation of a parallel simulation code for the Brownian motion of particles in a fluid. Two different parallelization approaches have been followed: (1) using traditional distributed memory message-passing programming with MPI, and (2) using the Partitioned Global Address Space (PGAS) programming model, oriented towards hybrid shared/distributed memory systems, with the Unified Parallel C (UPC) language. Different techniques for domain decomposition and work distribution are analyzed in terms of efficiency and programmability, in order to select the most suitable strategy. Performance results on a supercomputer using up to 2048 cores are also presented for both MPI and UPC codes.Ministerio de Ciencia e Innovación ; TIN2010-16735Xunta de Galicia; ref. 2010/

    Analysis and Optimization of Scientific Applications through Set and Relation Abstractions

    Get PDF
    Writing high performance code has steadily become more challenging since the design of computing systems has moved toward parallel processors in forms of multi and many-core architectures. This trend has resulted in exceedingly more heterogeneous architectures and programming models. Moreover, the prevalence of distributed systems, especially in fields relying on supercomputers, has caused the programming of such diverse environment more difficulties. To mitigate such challenges, an assortment of tools and programming models have been introduced in the past decade or so. Some efforts focused on the characteristics of the code, such as polyhedral compilers (e.g. Pluto, PPCG, etc.) while others took in consideration the aspects of the application domain and proposed domain specific languages (DSLs). DSLs are developed either in the form of a stand-alone language, like Halide for image processing, or as a part of a general purpose language (e.g., Firedrake- a DSL embedded in Python for solving PDEs using FEM.) called embedded. All these approaches attempt to provide the best input to the underlying common programming models like MPI and OpenMP for distributed and shared memory systems respectively. This dissertation introduces Kaashi, a high-level run-time system, embedded in C++ language, designed to manage memory and execution order of programs with large input data and complex dependencies. Kaashi provides a uniform front-end to multiple back-ends focusing on distributed systems. Kaashi abstractions allows the programmer to define the problem’s data domain as a collection of sets and relations between pairs of such sets. The aforesaid level of abstraction could enable series of optimizations which, otherwise, are very expensive to detect or not feasible at all. Furthermore, Kaashi’s API helps novice programmers to write their code more structurally without getting involved in details of data management and communication

    Scalable computational science

    Get PDF
    2015 - 2016Computational science also know as scientific computing is a rapidly growing novel field that uses advanced computing in order to solve complex problems. This new discipline combines technologies, modern computational methods and simulations to address problems too complex to be reliably predicted only by theory and too dangerous or expensive to be reproduced in laboratories. Successes in computational science over the past twenty years have caused demand of supercomputing, to improve the performance of the solutions and to allow the growth of the models, in terms of sizes and quality. From a computer scientist’s perspective, it is natural to think to distribute the computation required to study a complex systems among multiple machines: it is well known that the speed of singleprocessor computers is reaching some physical limits. For these reasons, parallel and distributed computing has become the dominant paradigm for computational scientists who need the latest development on computing resources in order to solve their problems and the “Scalability” has been recognized as the central challenge in this science. In this dissertation the design and implementation of Frameworks, Parallel Languages and Architectures, which enable to improve the state of the art on Scalable Computational Science, are discussed. Frameworks. The proposal of D-MASON, a distributed version of MASON, a wellknown and popular Java toolkit for writing and running Agent-Based Simulations (ABSs). D-MASON introduces a framework level parallelization so that scientists that use the framework (e.g., a domain expert with limited knowledge of distributed programming) could be only minimally aware of such distribution. D-MASON, was began to be developed since 2011, the main purpose of the project was overcoming the limits of the sequentially computation of MASON, using distributed computing. D-MASON enables to do more than MASONin terms of size of simulations (number of agents and complexity of agents behaviors), but allows also to reduce the simulation time of simulations written in MASON. For this reason, one of the most important feature of D-MASON is that it requires a limited number of changing on the MASON’s code in order to execute simulations on distributed systems. v D-MASON, based on Master-Worker paradigm, was initially designed for heterogeneous computing in order to exploit the unused computational resources in labs, but it also provides functionality to be executed in homogeneous systems (as HPC systems) as well as cloud infrastructures. The architecture of D-MASON is presented in the following three papers, which describes all D-MASON layers: • Cordasco G., Spagnuolo C. and Scarano V. Toward the new version of D-MASON: Efficiency, Effectiveness and Correctness in Parallel and Distributed Agent-based Simulations. 1st IEEE Workshop on Parallel and Distributed Processing for Computational Social Systems. IEEE International Parallel & Distributed Processing Symposium 2016. • Cordasco G., De Chiara R., Mancuso A., Mazzeo D., Scarano V. and Spagnuolo C. Bringing together efficiency and effectiveness in distributed simulations: the experience with D-MASON. SIMULATION: Transactions of The Society for Modeling and Simulation International, June 11, 2013. • Cordasco G., De Chiara R., Mancuso A., Mazzeo D., Scarano V. and Spagnuolo C. A Framework for distributing Agent-based simulations. Ninth International Workshop Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms of Euro-Par 2011 conference. Much effort has been made, on the Communication Layer, to improve the communication efficiency in the case of homogeneous systems. D-MASON is based on Publish/Subscribe (PS) communication paradigm and uses a centralized message broker (based on the Java Message Service standard) to deal with heterogeneous systems. The communication for homogeneous system uses the Message Passing Interface (MPI) standard and is also based on PS. In order to use MPI within Java, D-MASON uses a Java binding of MPI. Unfortunately, this binding is relatively new and does not provides all MPI functionalities. Several communication strategies were designed, implemented and evaluated. These strategies were presented in two papers: • Cordasco G., Milone F., Spagnuolo C. and Vicidomini L. Exploiting D-MASON on Parallel Platforms: A Novel Communication Strategy 2st Workshop on Parallel and Distributed Agent-Based Simulations of EuroPar 2014 conference. • Cordasco G., Mancuso A., Milone F. and Spagnuolo C. Communication strategies in Distributed Agent-Based Simulations: the experience with D-MASON 1st Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2013 conference. vi D-MASON provides also mechanisms for the visualization and gathering of the data in distributed simulation (available on the Visualization Layer). These solutions are presented in the paper: • Cordasco G., De Chiara R., Raia F., Scarano V., Spagnuolo C. and Vicidomini L. Designing Computational Steering Facilities for Distributed Agent Based Simulations. Proceedings of the ACM SIGSIM Conference on Principles of Advanced Discrete Simulation 2013. In DABS one of the most complex problem is the partitioning and balancing of the computation. D-MASON provides, in the Distributed Simulation layer, mechanisms for partitioning and dynamically balancing the computation. D-MASON uses field partitioning mechanism to divide the computation among the distributed system. The field partitioning mechanism provides a nice trade-off between balancing and communication effort. Nevertheless a lot of ABS are not based on 2D- or 3D-fields and are based on a communication graph that models the relationship among the agents. Inthiscasethefieldpartitioningmechanismdoesnotensuregoodsimulation performance. Therefore D-MASON provides also a specific mechanisms to manage simulation that uses a graph to describe agent interactions. These solutions were presented in the following publication: • Antelmi A., Cordasco G., Spagnuolo C. and Vicidomini L.. On Evaluating Graph Partitioning Algorithms for Distributed Agent Based Models on Networks. 3rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2015 conference. The field partitioning mechanism, intuitively, enables the mono and bi-dimensional partitioning of an Euclidean space. This approach is also know as uniform partitioning. But in some cases, e.g. simulations that simulate urban areas using a Geographical Information System (GIS), the uniform partitioning degrades the simulation performance, due to the unbalanced distribution of the agents on the field and consequently on the computational resources. In such a case, D-MASON provides a non-uniform partitioning mechanism (inspired by Quad-Tree data structure), presented in the following paper: • Lettieri N., Spagnuolo C. and Vicidomini L.. Distributed Agent-based Simulation and GIS: An Experiment With the dynamics of Social Norms. 3rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2015 conference. • G. Cordasco and C. Spagnuolo and V. Scarano. Work Partitioning on Parallel and Distributed Agent-Based Simulation. IEEE Workshop on vii Parallel and Distributed Processing for Computational Social Systems of International Parallel & Distributed Processing Symposium, 2017. The latest version of D-MASON provides a web-based System Management, to better use D-MASON in Cloud infrastructures. D-MASON on the Amazon EC2 Cloud infrastructure and its performance in terms of speed and cost were compared against D-MASON on an HPC environment. The obtained results, and the new System Management Layer are presented in the following paper: • MCarillo,GCordasco,FSerrapica,CSpagnuolo,P.Szufel,andL.Vicidomini. D-Mason on the Cloud: an Experience with Amazon Web Services. 4rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2016 conference. ParallelLanguages. The proposal of an architecture, which enable to invoke code supported by a Java Virtual Machine (JVM) from code written in C language. Swft/T, is a parallel scripting language for programming highly concurrent applications in parallel and distributed environments. Swift/T is the reimplemented version of Swift language, with a new compiler and runtime. Swift/T improve Swift, allowing scalability over 500 tasks per second, load balancing feature, distributed data structures, and dataflow-driven concurrent task execution. Swif/T provides an interesting feature the one of calling easily and natively other languages (as Python, R, Julia, C) by using special language functions named leaf functions. Considering the actual trend of some supercomputing vendors (such as Cray Inc.) that support in its processors Java Virtual Machines (JVM), it is desirable to provide methods to call also Java code from Swift/T. In particular is really attractive to be able to call scripting languages for JVM as Clojure, Scala, Groovy, JavaScript etc. For this purpose a C binding to instanziate and call JVM was designed. This binding is used in Swif/T (since the version 1.0) to develop leaf functions that call Java code. The code are public available at GitHub project page. Frameworks. The proposal of two tools, which exploit the computing power of parallel systems to improve the effectiveness and the efficiency of Simulation Optimization strategies. Simulations Optimization (SO) is used to refer to the techniques studied for ascertaining the parameters of a complex model that minimize (or maximize) given criteria (one or many), which can only be computed by performing a simulation run. Due to the the high dimensionality of the search space, the heterogeneity of parameters, the irregular shape and the stochastic nature of the objective evaluation function, the tuning of such systems is extremely demanding from the computational point of view. The first frameworks is SOF: Zero Configuration Simulation Optimization Framework on the Cloud, it was designed to run SO process in viii the cloud. SOF is based on the Apache Hadoop infrastructure and is presented in the following paper: • Carillo M., Cordasco G., Scarano V., Serrapica F., Spagnuolo C. and Szufel P. SOF: Zero Configuration Simulation Optimization Framework on the Cloud. Parallel, Distributed, and Network-Based Processing 2016. The second framework is EMEWS: Extreme-scale Model Exploration with Swift/T, it has been designed at Argonne National Laboratory (USA). EMEWS as SOF allows to perform SO processes in distributed system. Both the frameworks are mainly designed for ABS. In particular EMEWS was tested using the ABS simulation toolkit Repast. Initially, EMEWS was not able to easily execute out of the box simulations written in MASON and NetLogo. This thesis presents new functionalities of EMEWS and solutions to easily execute MASON and NetLogo simulations on it. The EMEWS use cases are presented in the following paper: • J. Ozik, N. T. Collier, J. M. Wozniak and C. Spagnuolo From Desktop To Large-scale Model Exploration with Swift/T. Winter Simulation Conference 2016. Architectures. The proposal of an open-source, extensible, architecture for the visualization of data in HTML pages, exploiting a distributed web computing. Following the Edge-centric Computing paradigm, the data visualization is performed edge side ensuring data trustiness, privacy, scalability and dynamic data loading. The architecture has been exploited in the Social Platform for Open Data (SPOD). The proposed architecture, that has also appeared in the following papers: • G. Cordasco, D. Malandrino, P. Palmieri, A. Petta, D. Pirozzi, V. Scarano, L. Serra, C. Spagnuolo, L. Vicidomini A Scalable Data Web Visualization Architecture. Parallel, Distributed, and Network-Based Processing 2017. • G. Cordasco, D. Malandrino, P. Palmieri, A. Petta, D. Pirozzi, V. Scarano, L. Serra, C. Spagnuolo, L. Vicidomini An Architecture for Social Sharing and Collaboration around Open Data Visualisation. In Poster Proc. of the 19th ACM conference on "Computer-Supported Cooperative Work and Social Computing 2016. • G. Cordasco, D. Malandrino, P. Palmieri, A. Petta, D. Pirozzi, V. Scarano, L. Serra, C. Spagnuolo, L. Vicidomini An extensible architecture for an ecosystem of visualization web-components for Open Data Maximising interoperability Workshop— core vocabularies, location-aware data and more 2015. [edited by author]Computational science anche conosciuta come calcolo scientifico è un settore in rapida crescita che usa il calcolo avanzato per affrontare problemi complessi. Questa nuova disciplina, combina tecnologia, moderni metodi computazionali e simulazioni per affrontare problemi troppo difficili da poter essere studiati solo in teoria o troppo pericolosi e costosi per poter essere riprodotti sperimentalmente in laboratorio. I progressi dell’ultimo ventennio in computational science hanno sfruttato il supercalcolo per migliorare le performance delle soluzioni e permettere la crescita dei modelli, in termini di dimensioni e qualità dei risultati ottenuti. Le soluzioni adottate si avvalgono del calcolo distribuito: è ben noto che la velocità di un computer con un singolo processore sta raggiungendo dei limiti fisici. Per queste ragioni, la computazione parallela e distribuita è diventata il principale paradigma di calcolo per affrontare i problemi nell’ambito della computational science, in cui la scalabilità delle soluzioni costituisce la sfida da affrontare. In questa tesi vengono discusse la progettazione e l’implementazione di Framework, Linguaggi Paralleli e Architetture che consentono di migliorare lo stato dell’arte della Scalable Computational Science. In particolare, i maggiori contributi riguardano: Frameworks. La proposta di D-MASON, una versione distribuita di MASON, un toolkit Java per la scrittura e l’esecuzione di simulazioni basate su agenti (AgentBased Simulations, ABSs). D-MASON introduce la parallelizzazione a livello framework per far si che gli scienziati che lo utilizzano (ad esempio un esperto con limitata conoscenza della programmazione distribuita) possano rendersi conto solo minimamente di lavorare in ambiente distribuito (ad esempio esperti del dominio con limitata esperienza o nessuna esperienza nel calcolo distribuito). D-MASON è un progetto iniziato nel 2011, il cui principale obiettivo è quello di superare i limiti del calcolo sequenziale di MASON, sfruttando il calcolo distribuito. D-MASON permette di simulare modelli molto più complessi (in termini di numero di agenti e complessità dei comportamenti dei singoli agenti) rispetto a MASON e inoltre consente, a parità di calcolo, di ridurre il tempo necessario ad eseguire le simulazioni MASON. D-MASON è stato progettato in modo da permettere la migrazione di simulazioni scritte in MASON con un numero limitato di modifiche da apportare al codice, al fine di garantire il massimo della semplicità d’uso. v D-MASON è basato sul paradigma Master-Worker, inizialmente pensato per sistemi di calcolo eterogenei, nelle sue ultime versioni consente l’esecuzione anche in sistemi omogenei come sistemi HPC e infrastrutture di cloud computing. L’architettura di D-MASON è stata presentata nelle seguenti pubblicazioni: • Cordasco G., Spagnuolo C. and Scarano V. Toward the new version of D-MASON: Efficiency, Effectiveness and Correctness in Parallel and Distributed Agent-based Simulations. 1st IEEE Workshop on Parallel and Distributed Processing for Computational Social Systems. IEEE International Parallel & Distributed Processing Symposium 2016. • Cordasco G., De Chiara R., Mancuso A., Mazzeo D., Scarano V. and Spagnuolo C. Bringing together efficiency and effectiveness in distributed simulations: the experience with D-MASON. SIMULATION: Transactions of The Society for Modeling and Simulation International, June 11, 2013. • Cordasco G., De Chiara R., Mancuso A., Mazzeo D., Scarano V. and Spagnuolo C. A Framework for distributing Agent-based simulations. Ninth International Workshop Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms of Euro-Par 2011 conference. Uno degli strati architetturali di D-MASON che ne determina le prestazioni, è il il Communication Layer, il quale offre le funzionalità di comunicazione tra tutte le entità coinvolte nel calcolo. La comunicazione in D-MASON è basata sul paradigma Publish/Subscribe (PS). Al fine di soddisfare la flessibilità e la scalabilità richiesta, vengono fornite due strategie di comunicazione, una centralizzata (utilizzando Java Message Service) e una decentralizzata (utilizzando Message Passing Interface). La comunicazione in sistemi omogenei è sempre basata su PS ma utilizza lo standard Message Passing Interface (MPI). Al fine di utilizzare MPI in Java, lo strato di comunicazione di D-MASON è implementato sfruttando un binding Java a MPI. Tale soluzione non permette però l’utilizzo di tutte le funzionalità di MPI. Al tal proposito molteplici soluzioni sono stare progettate e implementate, e sono presentate nelle seguenti pubblicazioni: • Cordasco G., Milone F., Spagnuolo C. and Vicidomini L. Exploiting D-MASON on Parallel Platforms: A Novel Communication Strategy 2st Workshop on Parallel and Distributed Agent-Based Simulations of EuroPar 2014 conference. • Cordasco G., Mancuso A., Milone F. and Spagnuolo C. Communication strategies in Distributed Agent-Based Simulations: the experience with D-MASON 1st Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2013 conference. vi D-MASON offre anche meccanismi per la visualizzazione centralizzata e la raccolta di informazioni in simulazioni distribuite (tramite il Visualization Layer). I risultati ottenuti sono stati presentati nella seguente pubblicazione: • Cordasco G., De Chiara R., Raia F., Scarano V., Spagnuolo C. and Vicidomini L. Designing Computational Steering Facilities for Distributed Agent Based Simulations. Proceedings of the ACM SIGSIM Conference on Principles of Advanced Discrete Simulation 2013. Quando si parla di simulazioni distribuite una delle principali problematiche è il bilanciamento del carico. D-MASON offre, nel Distributed Simulation Layer, meccanismi per il partizionamento dinamico e il bilanciamento del carico. DMASON utilizza la tecnica del field partitioning per suddividere il lavoro tra le entità del sistema distribuito. La tecnica di field partitioning consente di ottenere un buon equilibrio tra il bilanciamento del carico e l’overhead di comunicazione. Molti modelli di simulazione non sono basati su spazi 2/3-dimensionali ma bensì modellano le relazioni tra gli agenti utilizzando strutture dati grafo. In questi casi la tecnica di field partitioning non garantisce soluzioni che consentono di ottenere buone prestazioni. Per risolvere tale problema, D-MASON fornisce particolari soluzioni per simulazioni che utilizzano i grafi per modellare le relazioni tra gli agenti. I risultati conseguiti sono stati presentati nella seguente pubblicazione: • Antelmi A., Cordasco G., Spagnuolo C. and Vicidomini L.. On Evaluating Graph Partitioning Algorithms for Distributed Agent Based Models on Networks. 3rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2015 conference. Il metodo di field partitioning consente il partizionamento di campi Euclidei mono e bi-dimensionali; tale approccio è anche conosciuto con il nome di partizionamento uniforme. In alcuni casi, come ad esempio simulazioni che utilizzano Geographical Information System (GIS), il metodo di partizionamento uniforme non è in grado di garantire buone prestazioni, a causa del posizionamento non bilanciato degli agenti sul campo di simulazione. In questi casi, D-MASON offre un meccanismo di partizionamento non uniforme (inspirato alla struttura dati Quad-Tree), presentato nelle seguenti pubblicazioni: • Lettieri N., Spagnuolo C. and Vicidomini L.. Distributed Agent-based Simulation and GIS: An Experiment With the dynamics of Social Norms. 3rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2015 conference. • G. Cordasco and C. Spagnuolo and V. Scarano. Work Partitioning on Parallel and Distributed Agent-Based Simulation. IEEE Workshop on vii Parallel and Distributed Processing for Computational Social Systems of International Parallel & Distributed Processing Symposium, 2017. Inoltre, D-MASON èstatoestesoalloscopodifornireun’infrastrutturaSimulation-asa-Service(SIMaaS),chesemplificailprocessodiesecuzionedisimulazionidistribuite in un ambiente di Cloud Computing. D-MASON nella sua versione più recente offre uno strato software di management basato su web, che ne consente estrema facilità d’uso in ambienti Cloud. Utilizzando il System Management, D-MASON è stato sperimentato sull’infrastruttura Cloud Amazon EC2 confrontando le prestazioni in questo ambiente cloud con un sistema HPC. I risultati ottenuti sono stati presentati nella seguente pubblicazione: • MCarillo,GCordasco,FSerrapica,CSpagnuolo,P.Szufel,andL.Vicidomini. D-Mason on the Cloud: an Experience with Amazon Web Services. 4rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2016 conference. LinguaggiParalleli. La proposta di un’architettura, la quale consente di invocare il codice per Java Virtual Machine (JVM) da codice scritto in linguaggio C. Swift/T è un linguaggio di scripting parallelo per sviluppare applicazioni altamente scalabili in ambienti paralleli e distribuiti. Swift/T è l’implementazione del linguaggio Swift per ambienti HPC. Swift/T migliora il linguaggio Swift, consentendo la scalabilità fino a 500 task per secondo, il bilanciamento del carico, strutture dati distribuite, e dataflow task execution. Swift/T consente di invocare nativamente codice scritto in altri

    Scalable computational science

    Get PDF
    2015 - 2016Computational science also know as scientific computing is a rapidly growing novel field that uses advanced computing in order to solve complex problems. This new discipline combines technologies, modern computational methods and simulations to address problems too complex to be reliably predicted only by theory and too dangerous or expensive to be reproduced in laboratories. Successes in computational science over the past twenty years have caused demand of supercomputing, to improve the performance of the solutions and to allow the growth of the models, in terms of sizes and quality. From a computer scientist’s perspective, it is natural to think to distribute the computation required to study a complex systems among multiple machines: it is well known that the speed of singleprocessor computers is reaching some physical limits. For these reasons, parallel and distributed computing has become the dominant paradigm for computational scientists who need the latest development on computing resources in order to solve their problems and the “Scalability” has been recognized as the central challenge in this science. In this dissertation the design and implementation of Frameworks, Parallel Languages and Architectures, which enable to improve the state of the art on Scalable Computational Science, are discussed. Frameworks. The proposal of D-MASON, a distributed version of MASON, a wellknown and popular Java toolkit for writing and running Agent-Based Simulations (ABSs). D-MASON introduces a framework level parallelization so that scientists that use the framework (e.g., a domain expert with limited knowledge of distributed programming) could be only minimally aware of such distribution. D-MASON, was began to be developed since 2011, the main purpose of the project was overcoming the limits of the sequentially computation of MASON, using distributed computing. D-MASON enables to do more than MASONin terms of size of simulations (number of agents and complexity of agents behaviors), but allows also to reduce the simulation time of simulations written in MASON. For this reason, one of the most important feature of D-MASON is that it requires a limited number of changing on the MASON’s code in order to execute simulations on distributed systems. v D-MASON, based on Master-Worker paradigm, was initially designed for heterogeneous computing in order to exploit the unused computational resources in labs, but it also provides functionality to be executed in homogeneous systems (as HPC systems) as well as cloud infrastructures. The architecture of D-MASON is presented in the following three papers, which describes all D-MASON layers: • Cordasco G., Spagnuolo C. and Scarano V. Toward the new version of D-MASON: Efficiency, Effectiveness and Correctness in Parallel and Distributed Agent-based Simulations. 1st IEEE Workshop on Parallel and Distributed Processing for Computational Social Systems. IEEE International Parallel & Distributed Processing Symposium 2016. • Cordasco G., De Chiara R., Mancuso A., Mazzeo D., Scarano V. and Spagnuolo C. Bringing together efficiency and effectiveness in distributed simulations: the experience with D-MASON. SIMULATION: Transactions of The Society for Modeling and Simulation International, June 11, 2013. • Cordasco G., De Chiara R., Mancuso A., Mazzeo D., Scarano V. and Spagnuolo C. A Framework for distributing Agent-based simulations. Ninth International Workshop Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms of Euro-Par 2011 conference. Much effort has been made, on the Communication Layer, to improve the communication efficiency in the case of homogeneous systems. D-MASON is based on Publish/Subscribe (PS) communication paradigm and uses a centralized message broker (based on the Java Message Service standard) to deal with heterogeneous systems. The communication for homogeneous system uses the Message Passing Interface (MPI) standard and is also based on PS. In order to use MPI within Java, D-MASON uses a Java binding of MPI. Unfortunately, this binding is relatively new and does not provides all MPI functionalities. Several communication strategies were designed, implemented and evaluated. These strategies were presented in two papers: • Cordasco G., Milone F., Spagnuolo C. and Vicidomini L. Exploiting D-MASON on Parallel Platforms: A Novel Communication Strategy 2st Workshop on Parallel and Distributed Agent-Based Simulations of EuroPar 2014 conference. • Cordasco G., Mancuso A., Milone F. and Spagnuolo C. Communication strategies in Distributed Agent-Based Simulations: the experience with D-MASON 1st Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2013 conference. vi D-MASON provides also mechanisms for the visualization and gathering of the data in distributed simulation (available on the Visualization Layer). These solutions are presented in the paper: • Cordasco G., De Chiara R., Raia F., Scarano V., Spagnuolo C. and Vicidomini L. Designing Computational Steering Facilities for Distributed Agent Based Simulations. Proceedings of the ACM SIGSIM Conference on Principles of Advanced Discrete Simulation 2013. In DABS one of the most complex problem is the partitioning and balancing of the computation. D-MASON provides, in the Distributed Simulation layer, mechanisms for partitioning and dynamically balancing the computation. D-MASON uses field partitioning mechanism to divide the computation among the distributed system. The field partitioning mechanism provides a nice trade-off between balancing and communication effort. Nevertheless a lot of ABS are not based on 2D- or 3D-fields and are based on a communication graph that models the relationship among the agents. Inthiscasethefieldpartitioningmechanismdoesnotensuregoodsimulation performance. Therefore D-MASON provides also a specific mechanisms to manage simulation that uses a graph to describe agent interactions. These solutions were presented in the following publication: • Antelmi A., Cordasco G., Spagnuolo C. and Vicidomini L.. On Evaluating Graph Partitioning Algorithms for Distributed Agent Based Models on Networks. 3rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2015 conference. The field partitioning mechanism, intuitively, enables the mono and bi-dimensional partitioning of an Euclidean space. This approach is also know as uniform partitioning. But in some cases, e.g. simulations that simulate urban areas using a Geographical Information System (GIS), the uniform partitioning degrades the simulation performance, due to the unbalanced distribution of the agents on the field and consequently on the computational resources. In such a case, D-MASON provides a non-uniform partitioning mechanism (inspired by Quad-Tree data structure), presented in the following paper: • Lettieri N., Spagnuolo C. and Vicidomini L.. Distributed Agent-based Simulation and GIS: An Experiment With the dynamics of Social Norms. 3rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2015 conference. • G. Cordasco and C. Spagnuolo and V. Scarano. Work Partitioning on Parallel and Distributed Agent-Based Simulation. IEEE Workshop on vii Parallel and Distributed Processing for Computational Social Systems of International Parallel & Distributed Processing Symposium, 2017. The latest version of D-MASON provides a web-based System Management, to better use D-MASON in Cloud infrastructures. D-MASON on the Amazon EC2 Cloud infrastructure and its performance in terms of speed and cost were compared against D-MASON on an HPC environment. The obtained results, and the new System Management Layer are presented in the following paper: • MCarillo,GCordasco,FSerrapica,CSpagnuolo,P.Szufel,andL.Vicidomini. D-Mason on the Cloud: an Experience with Amazon Web Services. 4rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2016 conference. ParallelLanguages. The proposal of an architecture, which enable to invoke code supported by a Java Virtual Machine (JVM) from code written in C language. Swft/T, is a parallel scripting language for programming highly concurrent applications in parallel and distributed environments. Swift/T is the reimplemented version of Swift language, with a new compiler and runtime. Swift/T improve Swift, allowing scalability over 500 tasks per second, load balancing feature, distributed data structures, and dataflow-driven concurrent task execution. Swif/T provides an interesting feature the one of calling easily and natively other languages (as Python, R, Julia, C) by using special language functions named leaf functions. Considering the actual trend of some supercomputing vendors (such as Cray Inc.) that support in its processors Java Virtual Machines (JVM), it is desirable to provide methods to call also Java code from Swift/T. In particular is really attractive to be able to call scripting languages for JVM as Clojure, Scala, Groovy, JavaScript etc. For this purpose a C binding to instanziate and call JVM was designed. This binding is used in Swif/T (since the version 1.0) to develop leaf functions that call Java code. The code are public available at GitHub project page. Frameworks. The proposal of two tools, which exploit the computing power of parallel systems to improve the effectiveness and the efficiency of Simulation Optimization strategies. Simulations Optimization (SO) is used to refer to the techniques studied for ascertaining the parameters of a complex model that minimize (or maximize) given criteria (one or many), which can only be computed by performing a simulation run. Due to the the high dimensionality of the search space, the heterogeneity of parameters, the irregular shape and the stochastic nature of the objective evaluation function, the tuning of such systems is extremely demanding from the computational point of view. The first frameworks is SOF: Zero Configuration Simulation Optimization Framework on the Cloud, it was designed to run SO process in viii the cloud. SOF is based on the Apache Hadoop infrastructure and is presented in the following paper: • Carillo M., Cordasco G., Scarano V., Serrapica F., Spagnuolo C. and Szufel P. SOF: Zero Configuration Simulation Optimization Framework on the Cloud. Parallel, Distributed, and Network-Based Processing 2016. The second framework is EMEWS: Extreme-scale Model Exploration with Swift/T, it has been designed at Argonne National Laboratory (USA). EMEWS as SOF allows to perform SO processes in distributed system. Both the frameworks are mainly designed for ABS. In particular EMEWS was tested using the ABS simulation toolkit Repast. Initially, EMEWS was not able to easily execute out of the box simulations written in MASON and NetLogo. This thesis presents new functionalities of EMEWS and solutions to easily execute MASON and NetLogo simulations on it. The EMEWS use cases are presented in the following paper: • J. Ozik, N. T. Collier, J. M. Wozniak and C. Spagnuolo From Desktop To Large-scale Model Exploration with Swift/T. Winter Simulation Conference 2016. Architectures. The proposal of an open-source, extensible, architecture for the visualization of data in HTML pages, exploiting a distributed web computing. Following the Edge-centric Computing paradigm, the data visualization is performed edge side ensuring data trustiness, privacy, scalability and dynamic data loading. The architecture has been exploited in the Social Platform for Open Data (SPOD). The proposed architecture, that has also appeared in the following papers: • G. Cordasco, D. Malandrino, P. Palmieri, A. Petta, D. Pirozzi, V. Scarano, L. Serra, C. Spagnuolo, L. Vicidomini A Scalable Data Web Visualization Architecture. Parallel, Distributed, and Network-Based Processing 2017. • G. Cordasco, D. Malandrino, P. Palmieri, A. Petta, D. Pirozzi, V. Scarano, L. Serra, C. Spagnuolo, L. Vicidomini An Architecture for Social Sharing and Collaboration around Open Data Visualisation. In Poster Proc. of the 19th ACM conference on "Computer-Supported Cooperative Work and Social Computing 2016. • G. Cordasco, D. Malandrino, P. Palmieri, A. Petta, D. Pirozzi, V. Scarano, L. Serra, C. Spagnuolo, L. Vicidomini An extensible architecture for an ecosystem of visualization web-components for Open Data Maximising interoperability Workshop— core vocabularies, location-aware data and more 2015. [edited by author]Computational science anche conosciuta come calcolo scientifico è un settore in rapida crescita che usa il calcolo avanzato per affrontare problemi complessi. Questa nuova disciplina, combina tecnologia, moderni metodi computazionali e simulazioni per affrontare problemi troppo difficili da poter essere studiati solo in teoria o troppo pericolosi e costosi per poter essere riprodotti sperimentalmente in laboratorio. I progressi dell’ultimo ventennio in computational science hanno sfruttato il supercalcolo per migliorare le performance delle soluzioni e permettere la crescita dei modelli, in termini di dimensioni e qualità dei risultati ottenuti. Le soluzioni adottate si avvalgono del calcolo distribuito: è ben noto che la velocità di un computer con un singolo processore sta raggiungendo dei limiti fisici. Per queste ragioni, la computazione parallela e distribuita è diventata il principale paradigma di calcolo per affrontare i problemi nell’ambito della computational science, in cui la scalabilità delle soluzioni costituisce la sfida da affrontare. In questa tesi vengono discusse la progettazione e l’implementazione di Framework, Linguaggi Paralleli e Architetture che consentono di migliorare lo stato dell’arte della Scalable Computational Science. In particolare, i maggiori contributi riguardano: Frameworks. La proposta di D-MASON, una versione distribuita di MASON, un toolkit Java per la scrittura e l’esecuzione di simulazioni basate su agenti (AgentBased Simulations, ABSs). D-MASON introduce la parallelizzazione a livello framework per far si che gli scienziati che lo utilizzano (ad esempio un esperto con limitata conoscenza della programmazione distribuita) possano rendersi conto solo minimamente di lavorare in ambiente distribuito (ad esempio esperti del dominio con limitata esperienza o nessuna esperienza nel calcolo distribuito). D-MASON è un progetto iniziato nel 2011, il cui principale obiettivo è quello di superare i limiti del calcolo sequenziale di MASON, sfruttando il calcolo distribuito. D-MASON permette di simulare modelli molto più complessi (in termini di numero di agenti e complessità dei comportamenti dei singoli agenti) rispetto a MASON e inoltre consente, a parità di calcolo, di ridurre il tempo necessario ad eseguire le simulazioni MASON. D-MASON è stato progettato in modo da permettere la migrazione di simulazioni scritte in MASON con un numero limitato di modifiche da apportare al codice, al fine di garantire il massimo della semplicità d’uso. v D-MASON è basato sul paradigma Master-Worker, inizialmente pensato per sistemi di calcolo eterogenei, nelle sue ultime versioni consente l’esecuzione anche in sistemi omogenei come sistemi HPC e infrastrutture di cloud computing. L’architettura di D-MASON è stata presentata nelle seguenti pubblicazioni: • Cordasco G., Spagnuolo C. and Scarano V. Toward the new version of D-MASON: Efficiency, Effectiveness and Correctness in Parallel and Distributed Agent-based Simulations. 1st IEEE Workshop on Parallel and Distributed Processing for Computational Social Systems. IEEE International Parallel & Distributed Processing Symposium 2016. • Cordasco G., De Chiara R., Mancuso A., Mazzeo D., Scarano V. and Spagnuolo C. Bringing together efficiency and effectiveness in distributed simulations: the experience with D-MASON. SIMULATION: Transactions of The Society for Modeling and Simulation International, June 11, 2013. • Cordasco G., De Chiara R., Mancuso A., Mazzeo D., Scarano V. and Spagnuolo C. A Framework for distributing Agent-based simulations. Ninth International Workshop Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms of Euro-Par 2011 conference. Uno degli strati architetturali di D-MASON che ne determina le prestazioni, è il il Communication Layer, il quale offre le funzionalità di comunicazione tra tutte le entità coinvolte nel calcolo. La comunicazione in D-MASON è basata sul paradigma Publish/Subscribe (PS). Al fine di soddisfare la flessibilità e la scalabilità richiesta, vengono fornite due strategie di comunicazione, una centralizzata (utilizzando Java Message Service) e una decentralizzata (utilizzando Message Passing Interface). La comunicazione in sistemi omogenei è sempre basata su PS ma utilizza lo standard Message Passing Interface (MPI). Al fine di utilizzare MPI in Java, lo strato di comunicazione di D-MASON è implementato sfruttando un binding Java a MPI. Tale soluzione non permette però l’utilizzo di tutte le funzionalità di MPI. Al tal proposito molteplici soluzioni sono stare progettate e implementate, e sono presentate nelle seguenti pubblicazioni: • Cordasco G., Milone F., Spagnuolo C. and Vicidomini L. Exploiting D-MASON on Parallel Platforms: A Novel Communication Strategy 2st Workshop on Parallel and Distributed Agent-Based Simulations of EuroPar 2014 conference. • Cordasco G., Mancuso A., Milone F. and Spagnuolo C. Communication strategies in Distributed Agent-Based Simulations: the experience with D-MASON 1st Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2013 conference. vi D-MASON offre anche meccanismi per la visualizzazione centralizzata e la raccolta di informazioni in simulazioni distribuite (tramite il Visualization Layer). I risultati ottenuti sono stati presentati nella seguente pubblicazione: • Cordasco G., De Chiara R., Raia F., Scarano V., Spagnuolo C. and Vicidomini L. Designing Computational Steering Facilities for Distributed Agent Based Simulations. Proceedings of the ACM SIGSIM Conference on Principles of Advanced Discrete Simulation 2013. Quando si parla di simulazioni distribuite una delle principali problematiche è il bilanciamento del carico. D-MASON offre, nel Distributed Simulation Layer, meccanismi per il partizionamento dinamico e il bilanciamento del carico. DMASON utilizza la tecnica del field partitioning per suddividere il lavoro tra le entità del sistema distribuito. La tecnica di field partitioning consente di ottenere un buon equilibrio tra il bilanciamento del carico e l’overhead di comunicazione. Molti modelli di simulazione non sono basati su spazi 2/3-dimensionali ma bensì modellano le relazioni tra gli agenti utilizzando strutture dati grafo. In questi casi la tecnica di field partitioning non garantisce soluzioni che consentono di ottenere buone prestazioni. Per risolvere tale problema, D-MASON fornisce particolari soluzioni per simulazioni che utilizzano i grafi per modellare le relazioni tra gli agenti. I risultati conseguiti sono stati presentati nella seguente pubblicazione: • Antelmi A., Cordasco G., Spagnuolo C. and Vicidomini L.. On Evaluating Graph Partitioning Algorithms for Distributed Agent Based Models on Networks. 3rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2015 conference. Il metodo di field partitioning consente il partizionamento di campi Euclidei mono e bi-dimensionali; tale approccio è anche conosciuto con il nome di partizionamento uniforme. In alcuni casi, come ad esempio simulazioni che utilizzano Geographical Information System (GIS), il metodo di partizionamento uniforme non è in grado di garantire buone prestazioni, a causa del posizionamento non bilanciato degli agenti sul campo di simulazione. In questi casi, D-MASON offre un meccanismo di partizionamento non uniforme (inspirato alla struttura dati Quad-Tree), presentato nelle seguenti pubblicazioni: • Lettieri N., Spagnuolo C. and Vicidomini L.. Distributed Agent-based Simulation and GIS: An Experiment With the dynamics of Social Norms. 3rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2015 conference. • G. Cordasco and C. Spagnuolo and V. Scarano. Work Partitioning on Parallel and Distributed Agent-Based Simulation. IEEE Workshop on vii Parallel and Distributed Processing for Computational Social Systems of International Parallel & Distributed Processing Symposium, 2017. Inoltre, D-MASON èstatoestesoalloscopodifornireun’infrastrutturaSimulation-asa-Service(SIMaaS),chesemplificailprocessodiesecuzionedisimulazionidistribuite in un ambiente di Cloud Computing. D-MASON nella sua versione più recente offre uno strato software di management basato su web, che ne consente estrema facilità d’uso in ambienti Cloud. Utilizzando il System Management, D-MASON è stato sperimentato sull’infrastruttura Cloud Amazon EC2 confrontando le prestazioni in questo ambiente cloud con un sistema HPC. I risultati ottenuti sono stati presentati nella seguente pubblicazione: • MCarillo,GCordasco,FSerrapica,CSpagnuolo,P.Szufel,andL.Vicidomini. D-Mason on the Cloud: an Experience with Amazon Web Services. 4rd Workshop on Parallel and Distributed Agent-Based Simulations of Euro-Par 2016 conference. LinguaggiParalleli. La proposta di un’architettura, la quale consente di invocare il codice per Java Virtual Machine (JVM) da codice scritto in linguaggio C. Swift/T è un linguaggio di scripting parallelo per sviluppare applicazioni altamente scalabili in ambienti paralleli e distribuiti. Swift/T è l’implementazione del linguaggio Swift per ambienti HPC. Swift/T migliora il linguaggio Swift, consentendo la scalabilità fino a 500 task per secondo, il bilanciamento del carico, strutture dati distribuite, e dataflow task execution. Swift/T consente di invocare nativamente codice scritto in altri

    Development of a New Framework for Distributed Processing of Geospatial Big Data

    Get PDF
    Geospatial technology is still facing a lack of “out of the box” distributed processing solutions which are suitable for the amount and heterogeneity of geodata, and particularly for use cases requiring a rapid response. Moreover, most of the current distributed computing frameworks have important limitations hindering the transparent and flexible control of processing (and/or storage) nodes and control of distribution of data chunks. We investigated the design of distributed processing systems and existing solutions related to Geospatial Big Data. This research area is highly dynamic in terms of new developments and the re-use of existing solutions (that is, the re-use of certain modules to implement further specific developments), with new implementations continuously emerging in areas such as disaster management, environmental monitoring and earth observation. The distributed processing of raster data sets is the focus of this paper, as we believe that the problem of raster data partitioning is far from trivial: a number of tiling and stitching requirements need to be addressed to be able to fulfil the needs of efficient image processing beyond pixel level. We attempt to compare the terms Big Data, Geospatial Big Data and the traditional Geospatial Data in order to clarify the typical differences, to compare them in terms of storage and processing backgrounds for different data representations and to categorize the common processing systems from the aspect of distributed raster processing. This clarification is necessary due to the fact that they behave differently on the processing side, and particular processing solutions need to be developed according to their characteristics. Furthermore, we compare parallel and distributed computing, taking into account the fact that these are used improperly in several cases. We also briefly assess the widely-known MapReduce paradigm in the context of geospatial applications. The second half of the article reports on a new processing framework initiative, currently at the concept and early development stages, which aims to be capable of processing raster, vector and point cloud data in a distributed IT ecosystem. The developed system is modular, has no limitations on programming language environment, and can execute scripts written in any development language (e.g. Python, R or C#)
    corecore