Search CORE

31 research outputs found

Customisable arithmetic hardware designs

Author: Cheung Chak-Chung Ray
Cheung Chak-Chung Ray
Publication venue
Publication date: 01/01/2007
Field of study

Imperial Users onl

Spiral - Imperial College Digital Repository

VLSI signal processing through bit-serial architectures and silicon compilation

Author: Renshaw D.
Publication venue: The University of Edinburgh
Publication date: 01/01/1984
Field of study

Edinburgh Research Archive

Cellular Automata

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Modelling and simulation are disciplines of major importance for science and engineering. There is no science without models, and simulation has nowadays become a very useful tool, sometimes unavoidable, for development of both science and engineering. The main attractive feature of cellular automata is that, in spite of their conceptual simplicity which allows an easiness of implementation for computer simulation, as a detailed and complete mathematical analysis in principle, they are able to exhibit a wide variety of amazingly complex behaviour. This feature of cellular automata has attracted the researchers' attention from a wide variety of divergent fields of the exact disciplines of science and engineering, but also of the social sciences, and sometimes beyond. The collective complex behaviour of numerous systems, which emerge from the interaction of a multitude of simple individuals, is being conveniently modelled and simulated with cellular automata for very different purposes. In this book, a number of innovative applications of cellular automata models in the fields of Quantum Computing, Materials Science, Cryptography and Coding, and Robotics and Image Processing are presented

Directory of Open Access Books (DOAB)

High-throughput machine learning algorithms

Author: Mitchell Rory
Publication venue: The University of Waikato
Publication date: 16/11/2021
Field of study

The field of machine learning has become strongly compute driven, such that emerging research and applications require larger amounts of specialised hardware or smarter algorithms to advance beyond the state-of-the-art. This thesis develops specialised techniques and algorithms for a subset of computationally difficult machine learning problems. The applications under investigation are quantile approximation in the limited-memory data streaming setting, interpretability of decision tree ensembles, efficient sampling methods in the space of permutations, and the generation of large numbers of pseudorandom permutations. These specific applications are investigated as they represent significant bottlenecks in real-world machine learning pipelines, where improvements to throughput have significant impact on the outcomes of machine learning projects in both industry and research. To address these bottlenecks, we discuss both theoretical improvements, such as improved convergence rates, and hardware/software related improvements, such as optimised algorithm design for high throughput hardware accelerators. Some contributions include: the evaluation of bin-packing methods for efficiently scheduling small batches of dependent computations to GPU hardware execution units, numerically stable reduction operators for higher-order statistical moments, and memory bandwidth optimisation for GPU shuffling. Additionally, we apply theory of the symmetric group of permutations in reproducing kernel Hilbert spaces, resulting in improved analysis of Monte Carlo methods for Shapley value estimation and new, computationally more efficient algorithms based on kernel herding and Bayesian quadrature. We also utilise reproducing kernels over permutations to develop a novel statistical test for the hypothesis that a sample of permutations is drawn from a uniform distribution. The techniques discussed lie at the intersection of machine learning, high-performance computing, and applied mathematics. Much of the above work resulted in open source software used in real applications, including the GPUTreeShap library [38], shuffling primitives for the Thrust parallel computing library [2], extensions to the Shap package [31], and extensions to the XGBoost library [6]

Research Commons@Waikato

Publications of the Jet Propulsion Laboratory, July 1969 - June 1970

Author
Publication venue
Publication date
Field of study

JPL bibliography of technical reports released from July 1969 through June 197

NASA Technical Reports Server

Performance-efficient cryptographic primitives in constrained devices

Author: Alrowaithy Majed Humaid
Publication venue: Newcastle University
Publication date: 01/01/2021
Field of study

PhD ThesisResource-constrained devices are small, low-cost, usually fixed function and very limitedresource devices. They are constrained in terms of memory, computational capabilities, communication bandwidth and power. In the last decade, we have seen widespread use of these devices in health care, smart homes and cities, sensor networks, wearables, automotive systems, and other fields. Consequently, there has been an increase in the research activities in the security of these devices, especially in how to design and implement cryptography that meets the devices’ extreme resource constraints. Cryptographic primitives are low-level cryptographic algorithms used to construct security protocols that provide security, authenticity, and integrity of the messages. The building blocks of the primitives, which are built heavily on mathematical theories, are computationally complex and demands considerable computing resources. As a result, most of these primitives are either too large to fit on resource-constrained devices or highly inefficient when implemented on them. There have been many attempts to address this problem in the literature where cryptography engineers modify conventional primitives into lightweight versions or build new lightweight primitives from scratch. Unfortunately, both solutions suffer from either reduced security, low performance, or high implementation cost. This thesis investigates the performance of the conventional cryptographic primitives and explores the effect of their different building blocks and design choices on their performance. It also studies the impact of the various implementations approaches and optimisation techniques on their performance. Moreover, it investigates the limitations imposed by the tight processing and storage capabilities in constrained devices in implementing cryptography. Furthermore, it evaluates the performance of many newly designed lightweight cryptographic primitives and investigates the resources required to run them with acceptable performance. The thesis aims to provide an insight into the performance of the cryptographic primitives and the resource needed to run them with acceptable performance. This will help in providing solutions that balance performance, security, and resource requirements for these devices.The Institute of Public Administration in Riyadh, and the Saudi Arabian Cultural Bureau in Londo

Newcastle University eTheses

Standardized development of computer software. Part 2: Standards

Author: Tausworthe R. C.
Publication venue
Publication date
Field of study

This monograph contains standards for software development and engineering. The book sets forth rules for design, specification, coding, testing, documentation, and quality assurance audits of software; it also contains detailed outlines for the documentation to be produced

NASA Technical Reports Server

Doctor of Philosophy

Author: Kensler Andrew E.
Publication venue: University of Utah
Publication date: 27/04/2011
Field of study

dissertationThis dissertation explores three key facets of software algorithms for custom hardware ray tracing: primitive intersection, shading, and acceleration structure construction. For the first, primitive intersection, we show how nearly all of the existing direct three-dimensional (3D) ray-triangle intersection tests are mathematically equivalent. Based on this, a genetic algorithm can automatically tune a ray-triangle intersection test for maximum speed on a particular architecture. We also analyze the components of the intersection test to determine how much floating point precision is required and design a numerically robust intersection algorithm. Next, for shading, we deconstruct Perlin noise into its basic parts and show how these can be modified to produce a gradient noise algorithm that improves the visual appearance. This improved algorithm serves as the basis for a hardware noise unit. Lastly, we show how an existing bounding volume hierarchy can be postprocessed using tree rotations to further reduce the expected cost to traverse a ray through it. This postprocessing also serves as the basis for an efficient update algorithm for animated geometry. Together, these contributions should improve the efficiency of both software- and hardware-based ray tracers

The University of Utah: J. Willard Marriott Digital Library

Parallel simulation methods for large-scale agent-based predator-prey systems : a thesis presented in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Computer Science at Massey University, Albany, New Zealand

Author: Quach Dara (Minh) Quang
Publication venue: 'Massey University'
Publication date: 01/01/2019
Field of study

The Animat is an agent-based artiﬁcial-life model that is suitable for gaining insight into the interactions of autonomous individuals in complex predator-prey systems and the emergent phenomena they may exhibit. Certain dynamics of the model may only be present in large systems, and a large number of agents may be required to compare with macroscopic models. Large systems can be infeasible to simulate on single-core machines due to processing time required. The model can be parallelised to improve the performance; however, reproducing the original model behaviour and retaining the performance gain is not straightforward. Parallel update strategies and data structures for multi-core CPU and graphical processing units (GPUs) are developed to simulate a typical predator-prey Animat model with improved perfor- mance while reproducing the behaviour of the original model. An analysis is presented of the model to identify dependencies and conditions the parallel update strategy must satisfy to retain original model behaviour. The parallel update strategy for multi-core CPUs is constructed using a spatial domain decomposition approach and supporting data structure. The GPU implementation is developed with a new update strategy that consists of an iterative conﬂict resolution method and priority number system to simultaneously update many agents with thousands of GPU cores. This update method is supported by a compressed sparse data structure developed to allow for efﬁcient memory transactions. The performance of the Animat simulation is improved with parallelism and without a change in model behaviour. The simulation usability is considered, and an internal agent deﬁnition system using a CUDA device Lambda feature is developed to improve the ease of conﬁguring agents without signiﬁcant changes to the program and loss of performance

Massey Research Online