72 research outputs found
A Simulation Suite for Lattice-Boltzmann based Real-Time CFD Applications Exploiting Multi-Level Parallelism on Modern Multi- and Many-Core Architectures
We present a software approach to hardware-oriented numerics which builds upon an augmented, previously published open-source set of libraries facilitating portable code development and optimisation on a wide range of modern computer architectures. In order to maximise eficiency, we exploit all levels of arallelism, including vectorisation within CPU cores, the Cell BE and GPUs, shared memory thread-level parallelism between cores, and parallelism between heterogeneous distributed memory resources in clusters. To evaluate and validate our approach, we implement a collection of modular building blocks for the easy and fast assembly and development of CFD applications based on the shallow water equations: We combine the Lattice-Boltzmann method with i-uid-structure interaction techniques in order to achieve real-time simulations targeting interactive virtual environments. Our results demonstrate that recent multi-core CPUs outperform the Cell BE, while GPUs are significantly faster than conventional multi-threaded SSE code. In addition, we verify good scalability properties of our application on small clusters
A Simulation Suite for Lattice-Boltzmann based Real-Time CFD Applications Exploiting Multi-Level Parallelism on Modern Multi- and Many-Core Architectures
We present a software approach to hardware-oriented numerics which builds upon an augmented, previously published open-source set of libraries facilitating portable code development and optimisation on a wide range of modern computer architectures. In order to maximise eficiency, we exploit all levels of arallelism, including vectorisation within CPU cores, the Cell BE and GPUs, shared memory thread-level parallelism between cores, and parallelism between heterogeneous distributed memory resources in clusters. To evaluate and validate our approach, we implement a collection of modular building blocks for the easy and fast assembly and development of CFD applications based on the shallow water equations: We combine the Lattice-Boltzmann method with i-uid-structure interaction techniques in order to achieve real-time simulations targeting interactive virtual environments. Our results demonstrate that recent multi-core CPUs outperform the Cell BE, while GPUs are significantly faster than conventional multi-threaded SSE code. In addition, we verify good scalability properties of our application on small clusters
An agent-based visualisation system.
This thesis explores the concepts of visual supercomputing, where complex distributed systems are used toward interactive visualisation of large datasets. Such complex systems inherently trigger management and optimisation problems; in recent years the concepts of autonomic computing have arisen to address those issues. Distributed visualisation systems are a very challenging area to apply autonomic computing ideas as such systems are both latency and compute sensitive, while most autonomic computing implementations usually concentrate on one or the other but not both concurrently. A major contribution of this thesis is to provide a case study demonstrating the application of autonomic computing concepts to a computation intensive, real-time distributed visualisation system. The first part of the thesis proposes the realisation of a layered multi-agent system to enable autonomic visualisation. The implementation of a generic multi-agent system providing reflective features is described. This architecture is then used to create a flexible distributed graphic pipeline, oriented toward real-time visualisation of volume datasets. Performance evaluation of the pipeline is presented. The second part of the thesis explores the reflective nature of the system and presents high level architectures based on software agents, or visualisation strategies, that take advantage of the flexibility of the system to provide generic features. Autonomic capabilities are presented, with fault recovery and automatic resource configuration. Performance evaluation, simulation and prediction of the system are presented, exploring different use cases and optimisation scenarios. A performance exploration tool, Delphe, is described, which uses real-time data of the system to let users explore its performance
Mesoscale fluid simulation with the Lattice Boltzmann method
PhDThis thesis describes investigations of several complex fluid effects., including
hydrodynamic spinodal decomposition, viscous instability. and self-assembly of a
cubic surfactant phase, by simulating them with a lattice Boltzmann computational
model.
The introduction describes what is meant by the term "complex fluid", and why
such fluids are both important and difficult to understand. A key feature of complex
fluids is that their behaviour spans length and time scales. The lattice Boltzmann
method is presented as a modelling technique which sits at a "mesoscale" level
intermediate between coarse-grained and fine-grained detail, and which is therefore
ideal for modelling certain classes of complex fluids.
The following chapters describe simulations which have been performed using
this technique, in two and three dimensions. Chapter 2 presents an investigation
into the separation of a mixture of two fluids. This process is found to involve several
physical mechanisms at different stages. The simulated behaviour is found to be in
good agreement with existing theory, and a curious effect, due to multiple competing
mechanisms, is observed, in agreement with experiments and other simulations.
Chapter 3 describes an improvement to lattice Boltzmann models of Hele-Shaw
flow, along with simulations which quantitatively demonstrate improvements in both
accuracy and numerical stability. The Saffman-Taylor hydrodynamic instability is
demonstrated using this model.
Chapter 4 contains the details and results of the TeraGyroid experiment, which
involved extremely large-scale simulations to investigate the dynamical behaviour
of a self-assembling structure. The first finite- size-effect- free dynamical simulations
of such a system are presented. It is found that several different mechanisms are
responsible for the assembly; the existence of chiral domains is demonstrated, along
with an examination of domain growth during self-assembly.
Appendix A describes some aspects of the implementation of the lattice Boltzmann
codes used in this thesis; appendix B describes some of the Grid computing
techniques which were necessary for the simulations of chapter 4.
Chapter 5 summarises the work, and makes suggestions for further research and
improvement.Huntsman Corporation Queen Mary University Schlumberger Cambridge Researc
An elastic, parallel and distributed computing architecture for machine learning
Machine learning is a powerful tool that allows us to make better and faster decisions in a data-driven fashion based on training data. Neural networks are especially popular in the context of supervised learning due to their ability to approximate auxiliary functions. However, building these models is typically computationally intensive, which can take significant time to complete on a conventional CPU-based computer. Such a long turnaround time makes business and research infeasible using these models. This research seeks to accelerate this training process through parallel and distributed computing using High-Performance Computing (HPC) resources.
To understand machine learning on HPC platforms, theoretical performance analysis from this thesis summarises four key factors for data-parallel machine learning: convergence, batch size, computational and communication efficiency. It is discovered that a maximum computational speed-up exists through parallel and distributed computing for a fixed experimental setup.
This primary focus of this thesis is convolutional neural network applications on the Apache Spark platform. The work presented in this thesis directly addresses the computational and communication inefficiencies associated with the Spark platform with improvements to the Resilient Distributed Dataset (RDD) and the introduction of an elastic non-blocking all-reduce. In addition to implementation optimisations, the computational performance has been further improved by overlapping computation and communication, and the use of large batch sizes through fine-grained control. The impacts of these improvements are more prominent with the rise of massively parallel processors and high-speed networks.
With all the techniques combined, it is predicted that training the ResNet50 model on the ImageNet dataset for 100 epochs at an effective batch size of 16K will take under 20 minutes on an NVIDIA Tesla P100 cluster, in contrast to 26 months on a single Intel Xeon E5-2660 v3 2.6 GHz processor.
Due to the similarities to scientific computing, the resulting computing model of this thesis serves as an exemplar of the integration of high-performance computing and elastic computing with dynamic workloads, which lays the foundation for future research in emerging computational steering applications, such as interactive physics simulations and data assimilation in weather forecast and research
Extempore: The design, implementation and application of a cyber-physical programming language
There is a long history of experimental and exploratory
programming
supported by systems that expose interaction through a
programming
language interface. These live programming systems enable
software
developers to create, extend, and modify the behaviour of
executing
software by changing source code without perceptual breaks for
recompilation. These live programming systems have taken many
forms,
but have generally been limited in their ability to express
low-level
programming concepts and the generation of efficient native
machine
code. These shortcomings have limited the effectiveness of live
programming in domains that require highly efficient numerical
processing and explicit memory management.
The most general questions addressed by this thesis are what a
systems
language designed for live programming might look like and how
such a
language might influence the development of live programming in
performance sensitive domains requiring real-time support,
direct
hardware control, or high performance computing. This thesis
answers
these questions by exploring the design, implementation and
application of Extempore, a new systems programming language,
designed specifically for live interactive programming
Heterogeneity, High Performance Computing, Self-Organization and the Cloud
application; blueprints; self-management; self-organisation; resource management; supply chain; big data; PaaS; Saas; HPCaa
- …