Search CORE

6 research outputs found

The Missing Link! A New Skeleton for Evolutionary Multi-agent Systems in Erlang

Author: A Byrski
Adam D. Barwell
Aleksander Byrski
C Brown
Christopher Brown
D Krzywicki
DB Fogel
H González-Vélez
HJ Lee
J Armstrong
J Digalakis
Jan Stypka
Kevin Hammond
M Aldinucci
M Cole
M North
Marek Kisiel-Dorohinicki
PW Trinder
S Cahon
S Luke
Vladimir Janjic
Wojciech Turek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/04/2017
Field of study

Evolutionary multi-agent systems (EMAS) play a critical role in many artificial intelligence applications that are in use today. In this paper, we present a new generic skeleton in Erlang for parallel EMAS computations. The skeleton enables us to capture a wide variety of concrete evolutionary computations that can exploit the same underlying parallel implementation. We demonstrate the use of our skeleton on two different evolutionary computing applications: (1) computing the minimum of the Rastrigin function; and (2) solving an urban traffic optimisation problem. We show that we can obtain very good speedups (up to 142.44 ×× the sequential performance) on a variety of different parallel hardware, while requiring very little parallelisation effort.Publisher PDFPeer reviewe

Crossref

University of Dundee Online Publications

University of St. Andrews - Pure

St Andrews Research Repository

FigShare

A computing origami: Optimized code generation for emerging parallel platforms

Author: HAGIESCU MIRISTE ANDREI MIHAI
Publication venue
Publication date: 08/08/2011
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

JIT-based cost models for adaptive parallelism

Author: Morton John Magnus
Publication venue
Publication date: 01/01/2018
Field of study

Parallel programming is extremely challenging. Worse yet, parallel architectures evolve quickly, and parallel programs must often be refactored for each new architecture. It is highly desirable to provide performance portability, so programs developed on one architecture can deliver good performance on other architectures. This thesis is part of the AJITPar project that investigates a novel approach for achieving performance portability by the development of suitable cost models to inform scheduling decisions with dynamic information about computational and communication costs on the target architecture. The main artifact of the AJITPar project is the Adaptive Skeleton Library (ASL) that pro- vides a distributed-memory master-worker implementation of a set of Algorithmic Skeletons i.e. programming patterns that abstract away the low-level intricacies of parallelism. After JIT warm-up, ASL uses a computational cost model applied to JIT trace information from the Pycket compiler, a tracing JIT implementation of the Racket language, to transform the skeletons. The execution time of an ASL task is primarily determined by computation and communication costs. The Pycket compiler is extended to enable runtime access to JIT traces, both the sequences of instructions and frequency of execution. Crucially for dynamic, adaption these are obtained with minimal overhead. A low cost, dynamic computation cost model for estimating the runtime of JIT compiled Pycket programs, Γ, is developed and validated. This is believed to be the first such model. The design explores the challenges of estimating execution time from JIT trace instructions and presents three increasingly sophisticated cost models. The cost model predicts execution time based on the PyPy JIT instructions present in compiled JIT traces. The final abstract cost model applies weightings for 5 different classes of trace instructions and also proposes a method for aggregating the cost models for single traces into a cost model for an entire program. Execution time is measured, and traces generated are recorded, from a suite of 41 benchmarks. Linear regression is used to determine the weightings for the abstract cost model from this data. The final cost model reveals that allocation operations count most for execution time, followed by guards and numeric operations. The suitability of Γ for predicting the effect of ASL program transformations is investigated. The real utility of Γ is not in absolute predictions of execution times for different programs, but in predicting the effects of applying program transformations on parallel programs. A linear relationship between the actual computational cost for a task, and that predicted by Γ for five benchmarks on two architectures is demonstrated. A series of increasingly accurate low cost, dynamic cost models for estimating the communi- cation costs of ASL programs, K, are developed and validated. Predicting the optimum task size in ASL not only relies on computational cost predictions, but also predictions of the over- head of communicating tasks to worker nodes and results back to the master. The design and iterative development of a cost model which predicts the serialisation, deserialisation, and network send times of spawning a task in ASL is presented. Linear regression of communication timings are used to determine the appropriate weighting parameters for each. K is shown to be valid for predicting other, arbitrary data structures by demonstrating an additive property of the model. The model K is validated by showing a linear relationship between the combined predicted costs of the simple types in instances of aggregated data structures, and measured communication time. This validation is performed on five benchmarks on two platforms. Finally, a low cost dynamic cost model, T , that predicts a good ASL task size by combining information from the computation and communication cost models (Γand K) is developed and validated. The key insight in the design in this model is to balance the communications cost on the master node with the computational and communications cost on the worker nodes. The predictive power of T is tested model using six benchmarks, and it is shown to more accurately predict the optimal task size, reducing total program runtimes when compared with the default ASL prototype

Glasgow Theses Service

Using program behaviour to exploit heterogeneous multi-core processors

Author: McIlroy Ross
Publication venue
Publication date: 01/01/2010
Field of study

Multi-core CPU architectures have become prevalent in recent years. A number of multi-core CPUs consist of not only multiple processing cores, but multiple different types of processing cores, each with different capabilities and specialisations. These heterogeneous multi-core architectures (HMAs) can deliver exceptional performance; however, they are notoriously difficult to program effectively. This dissertation investigates the feasibility of ameliorating many of the difficulties encountered in application development on HMA processors, by employing a behaviour aware runtime system. This runtime system provides applications with the illusion of executing on a homogeneous architecture, by presenting a homogeneous virtual machine interface. The runtime system uses knowledge of a program's execution behaviour, gained through explicit code annotations, static analysis or runtime monitoring, to inform its resource allocation and scheduling decisions, such that the application makes best use of the HMA's heterogeneous processing cores. The goal of this runtime system is to enable non-specialist application developers to write applications that can exploit an HMA, without the developer requiring in-depth knowledge of the HMA's design. This dissertation describes the development of a Java runtime system, called Hera-JVM, aimed at investigating this premise. Hera-JVM supports the execution of unmodified Java applications on both processing core types of the heterogeneous IBM Cell processor. An application's threads of execution can be transparently migrated between the Cell's different core types by Hera-JVM, without requiring the application's involvement. A number of real-world Java benchmarks are executed across both of the Cell's core types, to evaluate the efficacy of abstracting a heterogeneous architecture behind a homogeneous virtual machine. By characterising the performance of each of the Cell processor's core types under different program behaviours, a set of influential program behaviour characteristics is uncovered. A set of code annotations are presented, which enable program code to be tagged with these behaviour characteristics, enabling a runtime system to track a program's behaviour throughout its execution. This information is fed into a cost function, which Hera-JVM uses to automatically estimate whether the executing program's threads of execution would benefit from being migrated to a different core type, given their current behaviour characteristics. The use of history, hysteresis and trend tracking, by this cost function, is explored as a means of increasing its stability and limiting detrimental thread migrations. The effectiveness of a number of different migration strategies is also investigated under real-world Java benchmarks, with the most effective found to be a strategy that can target code, such that a thread is migrated whenever it executes this code. This dissertation also investigates the use of runtime monitoring to enable a runtime system to automatically infer a program's behaviour characteristics, without the need for explicit code annotations. A lightweight runtime behaviour monitoring system is developed, and its effectiveness at choosing the most appropriate core type on which to execute a set of real-world Java benchmarks is examined. Combining explicit behaviour characteristic annotations with those characteristics which are monitored at runtime is also explored. Finally, an initial investigation is performed into the use of behaviour characteristics to improve application performance under a different type of heterogeneous architecture, specifically, a non-uniform memory access (NUMA) architecture. Thread teams are proposed as a method of automatically clustering communicating threads onto the same NUMA node, thereby reducing data access overheads. Evaluation of this approach shows that it is effective at improving application performance, if the application's threads can be partitioned across the available NUMA nodes of a system. The findings of this work demonstrate that a runtime system with a homogeneous virtual machine interface can reduce the challenge of application development for HMA processors, whilst still being able to exploit such a processor by taking program behaviour into account

Glasgow Theses Service

CiteSeerX

OpenGrey Repository

Design of a reference architecture for an IoT sensor network

Author: Gómez Escobar Jairo Alejandro
Publication venue: International Center for Tropical Agriculture
Publication date: 15/04/2020
Field of study

CGSpace

NASA Tech Briefs, June 1992

Author
Publication venue
Publication date
Field of study

Topics covered include: New Product Ideas; Electronic Components and Circuits; Electronic Systems; Physical Sciences; Materials; Computer Programs; Mechanics; Machinery; Fabrication Technology; Mathematics and Information Sciences; Life Sciences

NASA Technical Reports Server