Search CORE

195 research outputs found

Recommended from our members

A SIMD architecture for hard real-time systems

Author: Spliet Roy
Publication venue: University of Cambridge
Publication date: 31/03/2020
Field of study

Emerging safety-critical systems require high-performance data-parallel architectures and, problematically, ones that can guarantee tight and safe worst-case execution times. Given the complexity of existing architectures like GPUs, it is unlikely that sufficiently accurate models and algorithms for timing analysis will emerge in the foreseeable future. This motivates a clean-slate approach to designing a real-time data-parallel architecture. In this work I present Sim-D: a wide-SIMD architecture for hard real-time systems. Similar to GPUs, Sim-D performs hardware strip-mining to schedule the work for a compute kernel in entities called work-groups. Sim-D schedules the work for each work-group as a sequence of uninterruptible access- and execute program phases, interleaving the phases of two work-groups. By providing performance isolation between the memory- and compute resources, the execution time of each phase can be tightly bound through static analysis. I present a predictable closed-page DRAM controller that processes requests for large 1D- and 2D blocks of data, as well as indirect indexed transfers. These large transfers coalesce the data requests of a whole work-group. For a linear 4KiB transfer over a 64-bit data bus, the utilisation provably exceeds 78% for DDR4-3200AA DRAM. For 2D blocks, a well-chosen tiling configuration can achieve near-similar efficiency. I show that bounds on the execution time of indexed transfers are pessimistic by nature, but propose a novel snoopy indexed transfer mechanism that permits more reasonable bounds when the buffer size is limited. Finally, I present a worst-case execution time calculation algorithm for Sim-D. This algorithm is paired with two hardware work-group scheduling policies that deterministically reduce run-time variance. The worst-case execution time analysis algorithm combines static control flow analysis with a simulation-based cost model for execution and DRAM transfers. Its key novelty is the addition of a stage that considers work-group scheduling effects. I show that the work-group scheduling policies degrade performance on average by 8.9%, but permit the calculation of worst-case execution time bounds that are tight within 14.3% on average for benchmarks that avoid inefficient indexed transfers

Apollo (Cambridge)

Utilising path-vertex data to improve Monte Carlo global illumination.

Author: Ian Christopher Doidge
Publication venue
Publication date: 01/01/2014
Field of study

Efficient techniques for photo-realistic rendering are in high demand across a wide array of industries. Notable applications include visual effects for film, entertainment and virtual reality. Less direct applications such as visualisation for architecture, lighting design and product development also rely on the synthesis of realistic and physically based illumination. Such applications assert ever increasing demands on light transport algorithms, requiring the computation of photo-realistic effects while handling complex geometry, light scattering models and illumination. Techniques based on Monte Carlo integration handle such scenarios elegantly and robustly, but despite seeing decades of focused research and wide commercial support, these methods and their derivatives still exhibit undesirable side effects that are yet to be resolved. In this thesis, Monte Carlo path tracing techniques are improved upon by utilizing path vertex data and intermediate radiance contributions readily available during rendering. This permits the development of novel progressive algorithms that render low noise global illumination while striving to maintain the desirable accuracy and convergence properties of unbiased methods. The thesis starts by presenting a discussion into optical phenomenon, physically based rendering and achieving photo realistic image synthesis. This is followed by in-depth discussion of the published theoretical and practical research in this field, with a focus on stochastic methods and modem rendering methodologies. This provides insight into the issues surrounding Monte Carlo integration both in the general and rendering specific contexts, along with an appreciation for the complexities of solving global light transport. Alternative methods that aim to address these issues are discussed, providing an insight into modem rendering paradigms and their characteristics. Thus, an understanding of the key aspects is obtained, that is necessary to build up and discuss the novel research and contributions to the field developed throughout this thesis. First, a path space filtering strategy is proposed that allows the path-based space of light transport to be classified into distinct subsets. This permits the novel combination of robust path tracing and recent progressive photon mapping algorithms to handle each subset based on the characteristics of the light transport in that space. This produces a hybrid progressive rendering technique that utilises the strengths of existing state of the art Monte Carlo and photon mapping methods to provide efficient and consistent rendering of complex scenes with vanishing bias. The second original contribution is a probabilistic image-based filtering and sample clustering framework that provides high quality previews of global illumination whilst remaining aware of high frequency detail and features in geometry, materials and the incident illumination. As will be seen, the challenges of edge-aware noise reduction are numerous and long standing, particularly when identifying high frequency features in noisy illumination signals. Discontinuities such as hard shadows and glossy reflections are commonly overlooked by progressive filtering techniques, however by dividing path space into multiple layers, once again based on utilising path vertex data, the overlapping illumination of varying intensities, colours and frequencies is more effectively handled. Thus noise is removed from each layer independent of features present in the remaining path space, effectively preserving such features

Cronfa at Swansea University

Advances in quantitative microscopy

Author: Cooper Samuel
Publication venue: Department of Surgery & Cancer, Imperial College London
Publication date: 01/09/2019
Field of study

Microscopy allows us to peer into the complex deeply shrouded world that the cells of our body grow and thrive in. With the emergence of automated digital microscopes and software for anlysing and processing the large numbers of image that they produce; quantitative microscopy approaches are now allowing us to answer ever larger and more complex biological questions. In this thesis I explore two trends. Firstly, that of using quantitative microscopy for performing unbiased screens, the advances made here include developing strategies to handle imaging data captured from physiological models, and unsupervised analysis screening data to derive unbiased biological insights. Secondly, I develop software for analysing live cell imaging data, that can now be captured at greater rates than ever before and use this to help answer key questions covering the biology of how cells make the decision to arrest or proliferate in response to DNA damage. Together this thesis represents a view of the current state of the art in high-throughput quantitative microscopy and details where the field is heading as machine learning approaches become ever more sophisticated.Open Acces

Spiral - Imperial College Digital Repository

Parallel implementation of fractal image compression

Author: Uys Ryan F.
Publication venue
Publication date: 01/01/2000
Field of study

Thesis (M.Sc.Eng.)-University of Natal, Durban, 2000.Fractal image compression exploits the piecewise self-similarity present in real images as a form of information redundancy that can be eliminated to achieve compression. This theory based on Partitioned Iterated Function Systems is presented. As an alternative to the established JPEG, it provides a similar compression-ratio to fidelity trade-off. Fractal techniques promise faster decoding and potentially higher fidelity, but the computationally intensive compression process has prevented commercial acceptance. This thesis presents an algorithm mapping the problem onto a parallel processor architecture, with the goal of reducing the encoding time. The experimental work involved implementation of this approach on the Texas Instruments TMS320C80 parallel processor system. Results indicate that the fractal compression process is unusually well suited to parallelism with speed gains approximately linearly related to the number of processors used. Parallel processing issues such as coherency, management and interfacing are discussed. The code designed incorporates pipelining and parallelism on all conceptual and practical levels ensuring that all resources are fully utilised, achieving close to optimal efficiency. The computational intensity was reduced by several means, including conventional classification of image sub-blocks by content with comparisons across class boundaries prohibited. A faster approach adopted was to perform estimate comparisons between blocks based on pixel value variance, identifying candidates for more time-consuming, accurate RMS inter-block comparisons. These techniques, combined with the parallelism, allow compression of 512x512 pixel x 8 bit images in under 20 seconds, while maintaining a 30dB PSNR. This is up to an order of magnitude faster than reported for conventional sequential processor implementations. Fractal based compression of colour images and video sequences is also considered. The work confirms the potential of fractal compression techniques, and demonstrates that a parallel implementation is appropriate for addressing the compression time problem. The processor system used in these investigations is faster than currently available PC platforms, but the relevance lies in the anticipation that future generations of affordable processors will exceed its performance. The advantages of fractal image compression may then be accessible to the average computer user, leading to commercial acceptance

ResearchSpace@UKZN

Improving the performance of dataflow systems for deep neural network training

Author: Watcharapichat Pijika
Publication venue: Computing, Imperial College London
Publication date: 01/02/2018
Field of study

Deep neural networks (DNNs) have led to significant advancements in machine learning. With deep structure and flexible model parameterisation, they exhibit state-of-the-art accuracies for many complex tasks e.g. image recognition. To achieve this, models are trained iteratively over large datasets. This process involves expensive matrix operations, making it time-consuming to obtain converged models. To accelerate training, dataflow systems parallelise computation. A scalable approach is to use parameter server framework: it has workers that train model replicas in parallel and parameter servers that synchronise the replicas to ensure the convergence. With distributed DNN systems, there are three challenges that determine the training completion time. In this thesis, we propose practical and effective techniques to address each of these challenges. Since frequent model synchronisation results in high network utilisation, the parameter server approach can suffer from network bottlenecks, thus requiring decisions on resource allocation. Our idea is to use all available network bandwidth and synchronise subject to the available bandwidth. We present Ako, a DNN system that uses partial gradient exchange for synchronising replicas in a peer-to-peer fashion. We show that our technique exhibits a 25% lower convergence time than a hand-tuned parameter-server deployments. For a long training, the compute efficiency of worker nodes is important. We argue that processing hardware should be fully utilised for the best speed-up. The key observation is it is possible to overlap the execution of several matrix operations with other workloads. We describe Crossbow, a GPU-based system that maximises hardware utilisation. By using a multi-streaming scheduler, multiple models are trained in parallel on GPU and achieve a 2.3x speed-up compared to a state-of-the-art system. The choice of model configuration for replicas also directly determines convergence quality. Dataflow systems are used for exploring the promising configurations but provide little support for efficient exploratory workflows. We present Meta-dataflow (MDF), a dataflow model that expresses complex workflows. By taking into account all configurations as a unified workflow, MDFs efficiently reduce time spent on configuration exploration.Open Acces

Spiral - Imperial College Digital Repository

Enhancing remanufacturing automation using deep learning approach

Author: Nwankpa Chigozie Enyinna
Publication venue
Publication date: 01/01/2022
Field of study

In recent years, remanufacturing has significant interest from researchers and practitioners to improve efficiency through maximum value recovery of products at end-of-life (EoL). It is a process of returning used products, known as EoL products, to as-new condition with matching or higher warranty than the new products. However, these remanufacturing processes are complex and time-consuming to implement manually, causing reduced productivity and posing dangers to personnel. These challenges require automating the various remanufacturing process stages to achieve higher throughput, reduced lead time, cost and environmental impact while maximising economic gains. Besides, as highlighted by various research groups, there is currently a shortage of adequate remanufacturing-specific technologies to achieve full automation. -- This research explores automating remanufacturing processes to improve competitiveness by analysing and developing deep learning-based models for automating different stages of the remanufacturing processes. Analysing deep learning algorithms represents a viable option to investigate and develop technologies with capabilities to overcome the outlined challenges. Deep learning involves using artificial neural networks to learn high-level abstractions in data. Deep learning (DL) models are inspired by human brains and have produced state-of-the-art results in pattern recognition, object detection and other applications. The research further investigates the empirical data of torque converter components recorded from a remanufacturing facility in Glasgow, UK, using the in-case and cross-case analysis to evaluate the remanufacturing inspection, sorting, and process control applications. -- Nevertheless, the developed algorithm helped capture, pre-process, train, deploy and evaluate the performance of the respective processes. The experimental evaluation of the in-case and cross-case analysis using model prediction accuracy, misclassification rate, and model loss highlights that the developed models achieved a high prediction accuracy of above 99.9% across the sorting, inspection and process control applications. Furthermore, a low model loss between 3x10-3 and 1.3x10-5 was obtained alongside a misclassification rate that lies between 0.01% to 0.08% across the three applications investigated, thereby highlighting the capability of the developed deep learning algorithms to perform the sorting, process control and inspection in remanufacturing. The results demonstrate the viability of adopting deep learning-based algorithms in automating remanufacturing processes, achieving safer and more efficient remanufacturing. -- Finally, this research is unique because it is the first to investigate using deep learning and qualitative torque-converter image data for modelling remanufacturing sorting, inspection and process control applications. It also delivers a custom computational model that has the potential to enhance remanufacturing automation when utilised. The findings and publications also benefit both academics and industrial practitioners. Furthermore, the model is easily adaptable to other remanufacturing applications with minor modifications to enhance process efficiency in today's workplaces.In recent years, remanufacturing has significant interest from researchers and practitioners to improve efficiency through maximum value recovery of products at end-of-life (EoL). It is a process of returning used products, known as EoL products, to as-new condition with matching or higher warranty than the new products. However, these remanufacturing processes are complex and time-consuming to implement manually, causing reduced productivity and posing dangers to personnel. These challenges require automating the various remanufacturing process stages to achieve higher throughput, reduced lead time, cost and environmental impact while maximising economic gains. Besides, as highlighted by various research groups, there is currently a shortage of adequate remanufacturing-specific technologies to achieve full automation. -- This research explores automating remanufacturing processes to improve competitiveness by analysing and developing deep learning-based models for automating different stages of the remanufacturing processes. Analysing deep learning algorithms represents a viable option to investigate and develop technologies with capabilities to overcome the outlined challenges. Deep learning involves using artificial neural networks to learn high-level abstractions in data. Deep learning (DL) models are inspired by human brains and have produced state-of-the-art results in pattern recognition, object detection and other applications. The research further investigates the empirical data of torque converter components recorded from a remanufacturing facility in Glasgow, UK, using the in-case and cross-case analysis to evaluate the remanufacturing inspection, sorting, and process control applications. -- Nevertheless, the developed algorithm helped capture, pre-process, train, deploy and evaluate the performance of the respective processes. The experimental evaluation of the in-case and cross-case analysis using model prediction accuracy, misclassification rate, and model loss highlights that the developed models achieved a high prediction accuracy of above 99.9% across the sorting, inspection and process control applications. Furthermore, a low model loss between 3x10-3 and 1.3x10-5 was obtained alongside a misclassification rate that lies between 0.01% to 0.08% across the three applications investigated, thereby highlighting the capability of the developed deep learning algorithms to perform the sorting, process control and inspection in remanufacturing. The results demonstrate the viability of adopting deep learning-based algorithms in automating remanufacturing processes, achieving safer and more efficient remanufacturing. -- Finally, this research is unique because it is the first to investigate using deep learning and qualitative torque-converter image data for modelling remanufacturing sorting, inspection and process control applications. It also delivers a custom computational model that has the potential to enhance remanufacturing automation when utilised. The findings and publications also benefit both academics and industrial practitioners. Furthermore, the model is easily adaptable to other remanufacturing applications with minor modifications to enhance process efficiency in today's workplaces

STAX (Strathclyde Repository)

Parallel implementation of fractal image compression

Author: Uys Ryan F.
Publication venue
Publication date: 01/01/2000
Field of study

Vytautas Magnus University Institutional Repository (VMU ePub)

ResearchSpace@UKZN

Recommended from our members

Advances in manufacturing technology – XXII

Author: Cheng K
Harrison DJ
Makatsoris H
Publication venue: Brunel University
Publication date: 01/01/2008
Field of study

Brunel University Research Archive

A multi-level functional IR with rewrites for higher-level synthesis of accelerators

Author: Schlaak Christof
Publication venue: The University of Edinburgh
Publication date: 04/12/2023
Field of study

Specialised accelerators deliver orders of magnitude higher energy-efficiency than general-purpose processors. Field Programmable Gate Arrays (FPGAs) have become the substrate of choice, because the ever-changing nature of modern workloads, such as machine learning, demands reconfigurability. However, they are notoriously hard to program directly using Hardware Description Languages (HDLs). Traditional High-Level Synthesis (HLS) tools improve productivity, but come with their own problems. They often produce sub-optimal designs and programmers are still required to write hardware-specific code, thus development cycles remain long. This thesis proposes Shir, a higher-level synthesis approach for high-performance accelerator design with a hardware-agnostic programming entry point, a multi-level Intermediate Representation (IR), a compiler and rewrite rules for optimisation. First, a novel, multi-level functional IR structure for accelerator design is described. The IRs operate on different levels of abstraction, cleanly separating different hardware concerns. They enable the expression of different forms of parallelism and standard memory features, such as asynchronous off-chip memories or synchronous on-chip buffers, as well as arbitration of such shared resources. Exposing these features at the IR level is essential for achieving high performance. Next, mechanical lowering procedures are introduced to automatically compile a program specification through Shir’s functional IRs until low-level HDL code for FPGA synthesis is emitted. Each lowering step gradually adds implementation details. Finally, this thesis presents rewrite rules for automatic optimisations around parallelisation, buffering and data reshaping. Reshaping operations pose a challenge to functional approaches in particular. They introduce overheads that compromise performance or even prevent the generation of synthesisable hardware designs altogether. This fundamental issue is solved by the application of rewrite rules. The viability of this approach is demonstrated by running matrix multiplication and 2D convolution on an Intel Arria 10 FPGA. A limited design space exploration is conducted, confirming the ability of the IR to exploit various hardware features. Using rewrite rules for optimisation, it is possible to generate high-performance designs that are competitive with highly tuned OpenCL implementations and that outperform hardware-agnostic OpenCL code. The performance impact of the optimisations is further evaluated showing that they are essential to achieving high performance, and in many cases also necessary to produce hardware that fits the resource constraints

Edinburgh Research Archive

A kinematic numerical camera model for the SPOT-1 sensor

Author: O'Neill Mark Anthony
Publication venue: UCL (University College London)
Publication date: 01/01/1992
Field of study

A novel method for modelling linear push-broom sensors has been developed. A numerical model which incorporates the satellite attitude and position data is used to compute the absolute orientation. This method makes a break with traditional photogrammetric practice, in that instead of using an approach based on collinearity equations, the absolute orientation is computed iteratively using a numerical multi-variable minimisation scheme. All current implementations of the model use the Powell direction-set method, but in principle, any multivariable minimisation scheme could be substituted. The numerical method has significant advantages over the collinearity approach. The number of ground control points needed to form an accurate model is reduced and the numerical approach offers a superior basis for the development of general purpose multi sensor modelling software. In order to test these assertions, a numerical model of the SPOT-1 sensor was coded and tested against a pre-existing collinearity based model. Exhaustive tests showed the numerical model, using 3 or fewer ground control points, consistendy equaled or bettered the performance of the earlier model, using between 6 and 15 ground control points, on the same test data. A general purpose sensor modelling system was developed using the code developed for the initial SPOT-1 model. Currently this system supports many rigid linear sensors systems including SPOT-1, SPOT-2, FTIR, MISR, MEOSS and ASAS. Further extensions to the system to enable it to model non-rigid linear sensors such as AVHRR and ATM are planned. Work to enable the system to perform relative orientations for a variety of sensor types is also ongoing

UCL Discovery