13 research outputs found

    Model Exploration Using OpenMOLE - a workflow engine for large scale distributed design of experiments and parameter tuning

    Get PDF
    OpenMOLE is a scientific workflow engine with a strong emphasis on workload distribution. Workflows are designed using a high level Domain Specific Language (DSL) built on top of Scala. It exposes natural parallelism constructs to easily delegate the workload resulting from a workflow to a wide range of distributed computing environments. In this work, we briefly expose the strong assets of OpenMOLE and demonstrate its efficiency at exploring the parameter set of an agent simulation model. We perform a multi-objective optimisation on this model using computationally expensive Genetic Algorithms (GA). OpenMOLE hides the complexity of designing such an experiment thanks to its DSL, and transparently distributes the optimisation process. The example shows how an initialisation of the GA with a population of 200,000 individuals can be evaluated in one hour on the European Grid Infrastructure.Comment: IEEE High Performance Computing and Simulation conference 2015, Jun 2015, Amsterdam, Netherland

    Half a billion simulations: evolutionary algorithms and distributed computing for calibrating the SimpopLocal geographical model

    Get PDF
    Multi-agent geographical models integrate very large numbers of spatial interactions. In order to validate those models large amount of computing is necessary for their simulation and calibration. Here a new data processing chain including an automated calibration procedure is experimented on a computational grid using evolutionary algorithms. This is applied for the first time to a geographical model designed to simulate the evolution of an early urban settlement system. The method enables us to reduce the computing time and provides robust results. Using this method, we identify several parameter settings that minimise three objective functions that quantify how closely the model results match a reference pattern. As the values of each parameter in different settings are very close, this estimation considerably reduces the initial possible domain of variation of the parameters. The model is thus a useful tool for further multiple applications on empirical historical situations

    PaPaS: A Portable, Lightweight, and Generic Framework for Parallel Parameter Studies

    Full text link
    The current landscape of scientific research is widely based on modeling and simulation, typically with complexity in the simulation's flow of execution and parameterization properties. Execution flows are not necessarily straightforward since they may need multiple processing tasks and iterations. Furthermore, parameter and performance studies are common approaches used to characterize a simulation, often requiring traversal of a large parameter space. High-performance computers offer practical resources at the expense of users handling the setup, submission, and management of jobs. This work presents the design of PaPaS, a portable, lightweight, and generic workflow framework for conducting parallel parameter and performance studies. Workflows are defined using parameter files based on keyword-value pairs syntax, thus removing from the user the overhead of creating complex scripts to manage the workflow. A parameter set consists of any combination of environment variables, files, partial file contents, and command line arguments. PaPaS is being developed in Python 3 with support for distributed parallelization using SSH, batch systems, and C++ MPI. The PaPaS framework will run as user processes, and can be used in single/multi-node and multi-tenant computing systems. An example simulation using the BehaviorSpace tool from NetLogo and a matrix multiply using OpenMP are presented as parameter and performance studies, respectively. The results demonstrate that the PaPaS framework offers a simple method for defining and managing parameter studies, while increasing resource utilization.Comment: 8 pages, 6 figures, PEARC '18: Practice and Experience in Advanced Research Computing, July 22--26, 2018, Pittsburgh, PA, US

    Assessing the impact of forest structure disturbances on the arboreal movement and energetics of orangutans—An agent-based modeling approach

    Get PDF
    This is the final version. Available from Frontiers Media via the DOI in this record. Data availability statement: The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary material.Agent-based models have been developed and widely employed to assess the impact of disturbances or conservation management on animal habitat use, population development, and viability. However, the direct impacts of canopy disturbance on the arboreal movement of individual primates have been less studied. Such impacts could shed light on the cascading effects of disturbances on animal health and fitness. Orangutans are an arboreal primate that commonly encounters habitat quality deterioration due to land-use changes and related disturbances such as forest fires. Forest disturbance may, therefore, create a complex stress scenario threatening orangutan populations. Due to forest disturbances, orangutans may adapt to employ more terrestrial, as opposed to arboreal, movements potentially prolonging the search for fruiting and nesting trees. In turn, this may lead to changes in daily activity patterns (i.e., time spent traveling, feeding, and resting) and available energy budget, potentially decreasing the orangutan's fitness. We developed the agent-based simulation model BORNEO (arBOReal aNimal movEment mOdel), which explicitly describes both orangutans' arboreal and terrestrial movement in a forest habitat, depending on distances between trees and canopy structures. Orangutans in the model perform activities with a motivation to balance energy intake and expenditure through locomotion. We tested the model using forest inventory data obtained in Sebangau National Park, Central Kalimantan, Indonesia. This allowed us to construct virtual forests with real characteristics including tree connectivity, thus creating the potential to expand the environmental settings for simulation experiments. In order to parameterize the energy related processes of the orangutans described in the model, we applied a computationally intensive evolutionary algorithm and evaluated the simulation results against observed behavioral patterns of orangutans. Both the simulated variability and proportion of activity budgets including feeding, resting, and traveling time for female and male orangutans confirmed the suitability of the model for its purpose. We used the calibrated model to compare the activity patterns and energy budgets of orangutans in both natural and disturbed forests. The results confirm field observations that orangutans in the disturbed forest are more likely to experience deficit energy balance due to traveling to the detriment of feeding time. Such imbalance is more pronounced in males than in females. The finding of a threshold of forest disturbances that affects a significant change in activity and energy budgets suggests potential threats to the orangutan population. Our study introduces the first agent-based model describing the arboreal movement of primates that can serve as a tool to investigate the direct impact of forest changes and disturbances on the behavior of species such as orangutans. Moreover, it demonstrates the suitability of high-performance computing to optimize the calibration of complex agent-based models describing animal behavior at a fine spatio-temporal scale (1-m and 1-s granularity).UKR

    Improving the performance of dataflow systems for deep neural network training

    No full text
    Deep neural networks (DNNs) have led to significant advancements in machine learning. With deep structure and flexible model parameterisation, they exhibit state-of-the-art accuracies for many complex tasks e.g. image recognition. To achieve this, models are trained iteratively over large datasets. This process involves expensive matrix operations, making it time-consuming to obtain converged models. To accelerate training, dataflow systems parallelise computation. A scalable approach is to use parameter server framework: it has workers that train model replicas in parallel and parameter servers that synchronise the replicas to ensure the convergence. With distributed DNN systems, there are three challenges that determine the training completion time. In this thesis, we propose practical and effective techniques to address each of these challenges. Since frequent model synchronisation results in high network utilisation, the parameter server approach can suffer from network bottlenecks, thus requiring decisions on resource allocation. Our idea is to use all available network bandwidth and synchronise subject to the available bandwidth. We present Ako, a DNN system that uses partial gradient exchange for synchronising replicas in a peer-to-peer fashion. We show that our technique exhibits a 25% lower convergence time than a hand-tuned parameter-server deployments. For a long training, the compute efficiency of worker nodes is important. We argue that processing hardware should be fully utilised for the best speed-up. The key observation is it is possible to overlap the execution of several matrix operations with other workloads. We describe Crossbow, a GPU-based system that maximises hardware utilisation. By using a multi-streaming scheduler, multiple models are trained in parallel on GPU and achieve a 2.3x speed-up compared to a state-of-the-art system. The choice of model configuration for replicas also directly determines convergence quality. Dataflow systems are used for exploring the promising configurations but provide little support for efficient exploratory workflows. We present Meta-dataflow (MDF), a dataflow model that expresses complex workflows. By taking into account all configurations as a unified workflow, MDFs efficiently reduce time spent on configuration exploration.Open Acces

    Characterising and modeling the co-evolution of transportation networks and territories

    Full text link
    The identification of structuring effects of transportation infrastructure on territorial dynamics remains an open research problem. This issue is one of the aspects of approaches on complexity of territorial dynamics, within which territories and networks would be co-evolving. The aim of this thesis is to challenge this view on interactions between networks and territories, both at the conceptual and empirical level, by integrating them in simulation models of territorial systems.Comment: Doctoral dissertation (2017), Universit\'e Paris 7 Denis Diderot. Translated from French. Several papers compose this PhD thesis; overlap with: arXiv:{1605.08888, 1608.00840, 1608.05266, 1612.08504, 1706.07467, 1706.09244, 1708.06743, 1709.08684, 1712.00805, 1803.11457, 1804.09416, 1804.09430, 1805.05195, 1808.07282, 1809.00861, 1811.04270, 1812.01473, 1812.06008, 1908.02034, 2012.13367, 2102.13501, 2106.11996

    To Develop a Database Management Tool for Multi-Agent Simulation Platform

    Get PDF
    Depuis peu, la Modélisation et Simulation par Agents (ABMs) est passée d'une approche dirigée par les modèles à une approche dirigée par les données (Data Driven Approach, DDA). Cette tendance vers l’utilisation des données dans la simulation vise à appliquer les données collectées par les systèmes d’observation à la simulation (Edmonds and Moss, 2005; Hassan, 2009). Dans la DDA, les données empiriques collectées sur les systèmes cibles sont utilisées non seulement pour la simulation des modèles mais aussi pour l’initialisation, la calibration et l’évaluation des résultats issus des modèles de simulation, par exemple, le système d’estimation et de gestion des ressources hydrauliques du bassin Adour-Garonne Français (Gaudou et al., 2013) et l’invasion des rizières du delta du Mékong au Vietnam par les cicadelles brunes (Nguyen et al., 2012d). Cette évolution pose la question du « comment gérer les données empiriques et celles simulées dans de tels systèmes ». Le constat que l’on peut faire est que, si la conception et la simulation actuelles des modèles ont bénéficié des avancées informatiques à travers l’utilisation des plateformes populaires telles que Netlogo (Wilensky, 1999) ou GAMA (Taillandier et al., 2012), ce n'est pas encore le cas de la gestion des données, qui sont encore très souvent gérées de manière ad-hoc. Cette gestion des données dans des Modèles Basés Agents (ABM) est une des limitations actuelles des plateformes de simulation multiagents (SMA). Autrement dit, un tel outil de gestion des données est actuellement requis dans la construction des systèmes de simulation par agents et la gestion des bases de données correspondantes est aussi un problème important de ces systèmes. Dans cette thèse, je propose tout d’abord une structure logique pour la gestion des données dans des plateformes de SMA. La structure proposée qui intègre des solutions de l’Informatique Décisionnelle et des plateformes multi-agents s’appelle CFBM (Combination Framework of Business intelligence and Multi-agent based platform), elle a plusieurs objectifs : (1) modéliser et exécuter des SMAs, (2) gérer les données en entrée et en sortie des simulations, (3) intégrer les données de différentes sources, et (4) analyser les données à grande échelle. Ensuite, le besoin de la gestion des données dans les simulations agents est satisfait par une implémentation de CFBM dans la plateforme GAMA. Cette implémentation présente aussi une architecture logicielle pour combiner entrepôts deIv données et technologies du traitement analytique en ligne (OLAP) dans les systèmes SMAs. Enfin, CFBM est évaluée pour la gestion de données dans la plateforme GAMA à travers le développement de modèles de surveillance des cicadelles brunes (BSMs), où CFBM est utilisé non seulement pour gérer et intégrer les données empiriques collectées depuis le système cible et les résultats de simulation du modèle simulé, mais aussi calibrer et valider ce modèle. L'intérêt de CFBM réside non seulement dans l'amélioration des faiblesses des plateformes de simulation et de modélisation par agents concernant la gestion des données mais permet également de développer des systèmes de simulation complexes portant sur de nombreuses données en entrée et en sortie en utilisant l’approche dirigée par les données.Recently, there has been a shift from modeling driven approach to data driven approach inAgent Based Modeling and Simulation (ABMS). This trend towards the use of data-driven approaches in simulation aims at using more and more data available from the observation systems into simulation models (Edmonds and Moss, 2005; Hassan, 2009). In a data driven approach, the empirical data collected from the target system are used not only for the design of the simulation models but also in initialization, calibration and evaluation of the output of the simulation platform such as e.g., the water resource management and assessment system of the French Adour-Garonne Basin (Gaudou et al., 2013) and the invasion of Brown Plant Hopper on the rice fields of Mekong River Delta region in Vietnam (Nguyen et al., 2012d). That raises the question how to manage empirical data and simulation data in such agentbased simulation platform. The basic observation we can make is that currently, if the design and simulation of models have benefited from advances in computer science through the popularized use of simulation platforms like Netlogo (Wilensky, 1999) or GAMA (Taillandier et al., 2012), this is not yet the case for the management of data, which are still often managed in an ad hoc manner. Data management in ABM is one of limitations of agent-based simulation platforms. Put it other words, such a database management is also an important issue in agent-based simulation systems. In this thesis, I first propose a logical framework for data management in multi-agent based simulation platforms. The proposed framework is based on the combination of Business Intelligence solution and a multi-agent based platform called CFBM (Combination Framework of Business intelligence and Multi-agent based platform), and it serves several purposes: (1) model and execute multi-agent simulations, (2) manage input and output data of simulations, (3) integrate data from different sources; and (4) analyze high volume of data. Secondly, I fulfill the need for data management in ABM by the implementation of CFBM in the GAMA platform. This implementation of CFBM in GAMA also demonstrates a software architecture to combine Data Warehouse (DWH) and Online Analytical Processing (OLAP) technologies into a multi-agent based simulation system. Finally, I evaluate the CFBM for data management in the GAMA platform via the development of a Brown Plant Hopper Surveillance Models (BSMs), where CFBM is used ii not only to manage and integrate the whole empirical data collected from the target system and the data produced by the simulation model, but also to calibrate and validate the models.The successful development of the CFBM consists not only in remedying the limitation of agent-based modeling and simulation with regard to data management but also in dealing with the development of complex simulation systems with large amount of input and output data supporting a data driven approach
    corecore