1,780 research outputs found
Proceedings of Abstracts Engineering and Computer Science Research Conference 2019
© 2019 The Author(s). This is an open-access work distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. For further details please see https://creativecommons.org/licenses/by/4.0/. Note: Keynote: Fluorescence visualisation to evaluate effectiveness of personal protective equipment for infection control is © 2019 Crown copyright and so is licensed under the Open Government Licence v3.0. Under this licence users are permitted to copy, publish, distribute and transmit the Information; adapt the Information; exploit the Information commercially and non-commercially for example, by combining it with other Information, or by including it in your own product or application. Where you do any of the above you must acknowledge the source of the Information in your product or application by including or linking to any attribution statement specified by the Information Provider(s) and, where possible, provide a link to this licence: http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/This book is the record of abstracts submitted and accepted for presentation at the Inaugural Engineering and Computer Science Research Conference held 17th April 2019 at the University of Hertfordshire, Hatfield, UK. This conference is a local event aiming at bringing together the research students, staff and eminent external guests to celebrate Engineering and Computer Science Research at the University of Hertfordshire. The ECS Research Conference aims to showcase the broad landscape of research taking place in the School of Engineering and Computer Science. The 2019 conference was articulated around three topical cross-disciplinary themes: Make and Preserve the Future; Connect the People and Cities; and Protect and Care
Sensitivity analysis of distributed photovoltaic system capacity estimation based on artificial neural network
Residential solar photovoltaic (PV) system installations are expected to continue increasing due to their growing cost competitiveness and supportive government policies. However, excessive installations of unknown behind-the-meter solar panels present a challenge for accurate load prediction and reliable operations of power networks. To address such growing concerns of distribution network operators (DNOs), this research proposes a novel model for distributed PV system capacity estimations. Innovative extracted features from 24-hour substation net load curves were fed into a deep neural network to estimate the PV capacity linked to the substation feeder. A comprehensive study into the sensitivity of the model’s accuracy to specific temporal scales of data collection, number of households served by a substation, and proportion of PV-equipped properties was conducted. This study revealed that a model developed to be used exclusively in summer achieved a 18.1% decrease in estimation root mean squared error (RMSE) compared to an all-year model, whilst using only a third of the training data amount. Similarly, compared to an all-year model, RMSE decreased by 26.9% when only data from Mondays to Thursdays were used to train and test the model. Also, for the all-year model, the most accurate estimations occur when 20% to 80% of households have PV systems installed and estimation percentage error tend to remain constant at around 10% when more than 20% of households have PV systems installed. A machine learning-ready dataset of substations with known PV capacity and experiment results are both useful to inform DNOs on the potential of the proposed method in reducing grid operation costs
Techniques for power system simulation using multiple processors
The thesis describes development work which was undertaken to improve the speed of a real-time power system simulator used for the development and testing of control schemes. The solution of large, highly sparse matrices was targeted because this is the most time-consuming part of the current simulator. Major improvements in the speed of the matrix ordering phase of the solution were achieved through the development of a new ordering strategy. This was thoroughly investigated, and is shown to provide important additional improvements compared to standard ordering methods, in reducing path length and minimising potential pipeline stalls. Alterations were made to the remainder of the solution process which provided more flexibility in scheduling calculations. This was used to dramatically ease the run-time generation of efficient code, dedicated to the solution of one matrix structure, and also to reduce memory requirements. A survey of the available microprocessors was performed, which concluded that a special-purpose design could best implement the code generated at run-time, and a design was produced using a microprogrammable floating-point processor, which matched the code produced by the earlier work. A method of splitting the matrix solution onto parallel processors was investigated, and two methods of producing network splits were developed and their results compared. The best results from each method were found to agree well, with a predicted three-fold speed-up for the matrix solution of the C.E.G.B. transmission system from the use of six processors. This gain will increase for the whole simulator. A parallel processing topology of the partitioned network and produce the necessary structures for the remainder of the solution process
EDEN: A high-performance, general-purpose, NeuroML-based neural simulator
Modern neuroscience employs in silico experimentation on ever-increasing and
more detailed neural networks. The high modelling detail goes hand in hand with
the need for high model reproducibility, reusability and transparency. Besides,
the size of the models and the long timescales under study mandate the use of a
simulation system with high computational performance, so as to provide an
acceptable time to result. In this work, we present EDEN (Extensible Dynamics
Engine for Networks), a new general-purpose, NeuroML-based neural simulator
that achieves both high model flexibility and high computational performance,
through an innovative model-analysis and code-generation technique. The
simulator runs NeuroML v2 models directly, eliminating the need for users to
learn yet another simulator-specific, model-specification language. EDEN's
functional correctness and computational performance were assessed through
NeuroML models available on the NeuroML-DB and Open Source Brain model
repositories. In qualitative experiments, the results produced by EDEN were
verified against the established NEURON simulator, for a wide range of models.
At the same time, computational-performance benchmarks reveal that EDEN runs up
to 2 orders-of-magnitude faster than NEURON on a typical desktop computer, and
does so without additional effort from the user. Finally, and without added
user effort, EDEN has been built from scratch to scale seamlessly over multiple
CPUs and across computer clusters, when available.Comment: 29 pages, 9 figure
Automated cache optimisations of stencil computations for partial differential equations
This thesis focuses on numerical methods that solve partial differential equations.
Our focal point is the finite difference method, which solves partial
differential equations by approximating derivatives with explicit finite differences.
These partial differential equation solvers consist of stencil computations on structured grids.
Stencils for computing real-world practical applications are patterns often
characterised by many memory accesses and non-trivial arithmetic expressions
that lead to high computational costs compared to simple stencils used in much prior
proof-of-concept work.
In addition, the loop nests to express stencils on structured grids may often be complicated.
This work is highly motivated by a specific domain of stencil computations where one of the challenges is non-aligned to the structured grid ("off-the-grid") operations.
These operations update neighbouring grid points through scatter and gather operations via non-affine memory accesses, such as {A[B[i]]}.
In addition to this challenge, these practical stencils often include many computation fields (need to store multiple grid copies), complex data dependencies and imperfect loop nests.
In this work, we aim to increase the performance of stencil kernel execution.
We study automated cache-memory-dependent optimisations for stencil computations.
This work consists of two core parts with their respective contributions.The first part of our work tries to reduce the data movement in stencil computations of practical interest.
Data movement is a dominant factor affecting the performance of high-performance computing applications.
It has long been a target of optimisations due to its impact on execution time and energy consumption.
This thesis tries to relieve this cost by applying temporal blocking optimisations, also known as time-tiling, to stencil computations.
Temporal blocking is a well-known technique to enhance data reuse in stencil computations.
However, it is rarely used in practical applications but rather in theoretical examples to prove its efficacy.
Applying temporal blocking to scientific simulations is more complex.
More specifically, in this work, we focus on the application context of seismic and medical imaging.
In this area, we often encounter scatter and gather operations due to signal sources and receivers at arbitrary locations in the computational domain.
These operations make the application of temporal blocking challenging.
We present an approach to overcome this challenge and successfully apply temporal blocking.In the second part of our work, we extend the first part as an automated approach targeting a wide range of simulations modelled with partial differential equations.
Since temporal blocking is error-prone, tedious to apply by hand and highly complex to assimilate theoretically and practically, we are motivated to automate its application and automatically generate code that benefits from it.
We discuss algorithmic approaches and present a generalised compiler pipeline to automate the application of temporal blocking.
These passes are written in the Devito compiler. They are used to accelerate the computation of stencil kernels in areas such as seismic and medical imaging, computational fluid dynamics and machine learning.
\href{www.devitoproject.org}{Devito} is a Python package to implement optimised stencil computation (e.g., finite differences, image processing, machine learning) from high-level symbolic problem definitions.
Devito builds on \href{www.sympy.org}{SymPy} and employs automated code generation and just-in-time compilation to execute optimised computational kernels on several computer platforms, including CPUs, GPUs, and clusters thereof.
We show how we automate temporal blocking code generation without user intervention and often achieve better time-to-solution.
We enable domain-specific optimisation through compiler passes and offer temporal blocking gains from a high-level symbolic abstraction.
These automated optimisations benefit various computational kernels for solving real-world application problems.Open Acces
Pathway to Future Symbiotic Creativity
This report presents a comprehensive view of our vision on the development
path of the human-machine symbiotic art creation. We propose a classification
of the creative system with a hierarchy of 5 classes, showing the pathway of
creativity evolving from a mimic-human artist (Turing Artists) to a Machine
artist in its own right. We begin with an overview of the limitations of the
Turing Artists then focus on the top two-level systems, Machine Artists,
emphasizing machine-human communication in art creation. In art creation, it is
necessary for machines to understand humans' mental states, including desires,
appreciation, and emotions, humans also need to understand machines' creative
capabilities and limitations. The rapid development of immersive environment
and further evolution into the new concept of metaverse enable symbiotic art
creation through unprecedented flexibility of bi-directional communication
between artists and art manifestation environments. By examining the latest
sensor and XR technologies, we illustrate the novel way for art data collection
to constitute the base of a new form of human-machine bidirectional
communication and understanding in art creation. Based on such communication
and understanding mechanisms, we propose a novel framework for building future
Machine artists, which comes with the philosophy that a human-compatible AI
system should be based on the "human-in-the-loop" principle rather than the
traditional "end-to-end" dogma. By proposing a new form of inverse
reinforcement learning model, we outline the platform design of machine
artists, demonstrate its functions and showcase some examples of technologies
we have developed. We also provide a systematic exposition of the ecosystem for
AI-based symbiotic art form and community with an economic model built on NFT
technology. Ethical issues for the development of machine artists are also
discussed
Hypergraph-based parallel computation of passage time densities in large semi-Markov models
AbstractPassage time densities and quantiles are important performance and quality of service metrics, but their numerical derivation is, in general, computationally expensive. We present an iterative algorithm for the calculation of passage time densities in semi-Markov models, along with a theoretical analysis and empirical measurement of its convergence behaviour. In order to implement the algorithm efficiently in parallel, we use hypergraph partitioning to minimise communication between processors and to balance workloads. This enables the analysis of models with very large state spaces which could not be held within the memory of a single machine. We produce passage time densities and quantiles for very large semi-Markov models with over 15 million states and validate the results against simulation
Recommended from our members
Model-Architecture Co-design of Deep Neural Networks for Embedded Systems
In deep learning, a convolutional neural network (ConvNet or CNN) is a powerful tool for building interesting embedded applications that use data to make predictions. An application running on an embedded system typically has limited access to memory resources, processing power, and storage. Implementing deep convolutional neural network-based inference on resource-constrained devices can be very challenging, as these environments cannot usually make use of the massive computing power and storage that are present in cloud server environments. Furthermore, the constantly evolving nature of modern deep network architecture aggravates the problem by making it necessary to balance flexibility against specialisation to avoid the inability to adapt. However, much of the baseline architecture of a deep convolutional neural network stayed the same. With careful optimisation of the most common and widely occurring layer architectures, it is typically possible to accelerate these emerging workloads for resource-constrained embedded systems.
This thesis makes four contributions. I first developed a lossy three-stage low-rank approximation scheme that can reduce the computational complexity of a pre-trained model by 3-5x and up to 8-9x for individual convolutional layers. This scheme requires restructuring of the convolutional layers and generally suits the scenario where both the training data and trained model are available.
In many scenarios, the training data is not available for fine-tuning any loss in prediction accuracy if structural changes are made to a model as a post-processing step. Besides the lack of availability of training data, there are other situations where the architecture of a model cannot be changed after training. My second contribution handles this scenario by using a low-level optimisation scheme that requires no changes to the model architecture, unlike the low-rank approximation scheme. This novel scheme uses a modified version of the Cook-Toom algorithm to reduce the computational intensity of commonly occurring dense and spatial convolutional layers and speedup inference time by 2-4x.
My third contribution is an efficient implementation of the Cook-Toom class of algorithms on ubiquitous Arm's low-power Cortex processor. Unlike the direct convolution, computing convolutions using the modified Cook-Toom algorithm requires a different data processing pipeline as it involves pre- and post-transformations of the intermediate activations. I introduced a multi-channel multi-region (MCMR) scheme to enable an efficient implementation of the fast Cook-Toom algorithm. I demonstrate that by effectively using SIMD instructions and the MCMR scheme an average 2-3x and a peak 4x per layer speedup is easily achievable.
My final contribution is the Cook-Toom accelerator, a custom hardware architecture for modern convolutional neural networks. This accelerator architecture is designed from the ground up to address some of the limitations of a resource-constrained SIMD processor. I also illustrate how new emerging layer types can be mapped efficiently to the same flexible architecture without any modification
Deep Learning at Scale with Nearest Neighbours Communications
As deep learning techniques become more and more popular, there is the need to move these applications from the data scientist’s Jupyter notebook to efficient and reliable enterprise solutions. Moreover, distributed training of deep learning models will happen more and more outside the well-known borders of cloud and HPC infrastructure and will move to edge and mobile platforms. Current techniques for distributed deep learning have drawbacks in both these scenarios, limiting their long-term applicability.
After a critical review of the established techniques for Data Parallel training from both a distributed computing and deep learning perspective, a novel approach based on nearest-neighbour communications is presented in order to overcome some of the issues related to mainstream approaches, such as global communication patterns. Moreover, in order to validate the proposed strategy, the Flexible Asynchronous Scalable Training (FAST) framework is introduced, which allows to apply the nearest-neighbours communications approach to a deep learning framework of choice.
Finally, a relevant use-case is deployed on a medium-scale infrastructure to demonstrate both the framework and the methodology presented. Training convergence and scalability results are presented and discussed in comparison to a baseline defined by using state-of-the-art distributed training tools provided by a well-known deep learning framework
- …