31 research outputs found
Customisable arithmetic hardware designs
Imperial Users onl
Cellular Automata
Modelling and simulation are disciplines of major importance for science and engineering. There is no science without models, and simulation has nowadays become a very useful tool, sometimes unavoidable, for development of both science and engineering. The main attractive feature of cellular automata is that, in spite of their conceptual simplicity which allows an easiness of implementation for computer simulation, as a detailed and complete mathematical analysis in principle, they are able to exhibit a wide variety of amazingly complex behaviour. This feature of cellular automata has attracted the researchers' attention from a wide variety of divergent fields of the exact disciplines of science and engineering, but also of the social sciences, and sometimes beyond. The collective complex behaviour of numerous systems, which emerge from the interaction of a multitude of simple individuals, is being conveniently modelled and simulated with cellular automata for very different purposes. In this book, a number of innovative applications of cellular automata models in the fields of Quantum Computing, Materials Science, Cryptography and Coding, and Robotics and Image Processing are presented
High-throughput machine learning algorithms
The field of machine learning has become strongly compute driven, such that emerging research and applications require larger amounts of specialised hardware or smarter algorithms to advance beyond the state-of-the-art. This thesis develops specialised techniques and algorithms for a subset of computationally difficult machine learning problems. The applications under investigation are quantile approximation in the limited-memory data streaming setting, interpretability of decision tree ensembles, efficient sampling methods in the space of permutations, and the generation of large numbers of pseudorandom permutations. These specific applications are investigated as they represent significant bottlenecks in real-world machine learning pipelines, where improvements to throughput have significant impact on the outcomes of machine learning projects in both industry and research. To address these bottlenecks, we discuss both theoretical improvements, such as improved convergence rates, and hardware/software related improvements, such as optimised algorithm design for high throughput hardware accelerators.
Some contributions include: the evaluation of bin-packing methods for efficiently scheduling small batches of dependent computations to GPU hardware execution units, numerically stable reduction operators for higher-order statistical moments, and memory bandwidth optimisation for GPU shuffling. Additionally, we apply theory of the symmetric group of permutations in reproducing kernel Hilbert spaces, resulting in improved analysis of Monte Carlo methods for Shapley value estimation and new, computationally more efficient algorithms based on kernel herding and Bayesian quadrature. We also utilise reproducing kernels over permutations to develop a novel statistical test for the hypothesis that a sample of permutations is drawn from a uniform distribution.
The techniques discussed lie at the intersection of machine learning, high-performance computing, and applied mathematics. Much of the above work resulted in open source software used in real applications, including the GPUTreeShap library [38], shuffling primitives for the Thrust parallel computing library [2], extensions to the Shap package [31], and extensions to the XGBoost library [6]
Publications of the Jet Propulsion Laboratory, July 1969 - June 1970
JPL bibliography of technical reports released from July 1969 through June 197
Performance-efficient cryptographic primitives in constrained devices
PhD ThesisResource-constrained devices are small, low-cost, usually fixed function and very limitedresource devices. They are constrained in terms of memory, computational capabilities,
communication bandwidth and power. In the last decade, we have seen widespread use of
these devices in health care, smart homes and cities, sensor networks, wearables, automotive
systems, and other fields. Consequently, there has been an increase in the research activities
in the security of these devices, especially in how to design and implement cryptography that
meets the devices’ extreme resource constraints.
Cryptographic primitives are low-level cryptographic algorithms used to construct security protocols that provide security, authenticity, and integrity of the messages. The building
blocks of the primitives, which are built heavily on mathematical theories, are computationally complex and demands considerable computing resources. As a result, most of these
primitives are either too large to fit on resource-constrained devices or highly inefficient
when implemented on them.
There have been many attempts to address this problem in the literature where cryptography engineers modify conventional primitives into lightweight versions or build new
lightweight primitives from scratch. Unfortunately, both solutions suffer from either reduced
security, low performance, or high implementation cost.
This thesis investigates the performance of the conventional cryptographic primitives and
explores the effect of their different building blocks and design choices on their performance.
It also studies the impact of the various implementations approaches and optimisation
techniques on their performance. Moreover, it investigates the limitations imposed by the
tight processing and storage capabilities in constrained devices in implementing cryptography.
Furthermore, it evaluates the performance of many newly designed lightweight cryptographic
primitives and investigates the resources required to run them with acceptable performance.
The thesis aims to provide an insight into the performance of the cryptographic primitives and
the resource needed to run them with acceptable performance. This will help in providing
solutions that balance performance, security, and resource requirements for these devices.The Institute of
Public Administration in Riyadh, and the Saudi Arabian Cultural Bureau in
Londo
Standardized development of computer software. Part 2: Standards
This monograph contains standards for software development and engineering. The book sets forth rules for design, specification, coding, testing, documentation, and quality assurance audits of software; it also contains detailed outlines for the documentation to be produced
Doctor of Philosophy
dissertationThis dissertation explores three key facets of software algorithms for custom hardware ray tracing: primitive intersection, shading, and acceleration structure construction. For the first, primitive intersection, we show how nearly all of the existing direct three-dimensional (3D) ray-triangle intersection tests are mathematically equivalent. Based on this, a genetic algorithm can automatically tune a ray-triangle intersection test for maximum speed on a particular architecture. We also analyze the components of the intersection test to determine how much floating point precision is required and design a numerically robust intersection algorithm. Next, for shading, we deconstruct Perlin noise into its basic parts and show how these can be modified to produce a gradient noise algorithm that improves the visual appearance. This improved algorithm serves as the basis for a hardware noise unit. Lastly, we show how an existing bounding volume hierarchy can be postprocessed using tree rotations to further reduce the expected cost to traverse a ray through it. This postprocessing also serves as the basis for an efficient update algorithm for animated geometry. Together, these contributions should improve the efficiency of both software- and hardware-based ray tracers
Parallel simulation methods for large-scale agent-based predator-prey systems : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, New Zealand
The Animat is an agent-based artificial-life model that is suitable for gaining insight into the interactions of autonomous individuals in complex predator-prey systems and the emergent phenomena
they may exhibit. Certain dynamics of the model may only be present in large systems, and a large
number of agents may be required to compare with macroscopic models. Large systems can be infeasible to simulate on single-core machines due to processing time required. The model can be
parallelised to improve the performance; however, reproducing the original model behaviour and
retaining the performance gain is not straightforward.
Parallel update strategies and data structures for multi-core CPU and graphical processing units (GPUs) are developed to simulate a typical predator-prey Animat model with improved perfor-
mance while reproducing the behaviour of the original model. An analysis is presented of the model to identify dependencies and conditions the parallel update strategy must satisfy to retain original model behaviour.
The parallel update strategy for multi-core CPUs is constructed using a spatial domain decomposition approach and supporting data structure. The GPU implementation is developed with a new update strategy that consists of an iterative conflict resolution method and priority number system to
simultaneously update many agents with thousands of GPU cores. This update method is supported
by a compressed sparse data structure developed to allow for efficient memory transactions.
The performance of the Animat simulation is improved with parallelism and without a change
in model behaviour. The simulation usability is considered, and an internal agent definition system using a CUDA device Lambda feature is developed to improve the ease of configuring agents without significant changes to the program and loss of performance