27 research outputs found

    Modelling and Simulation for Power Distribution Grids of 3D Tiled Computing Arrays

    Get PDF
    This thesis presents modelling and simulation developments for power distribution grids of 3D tiled computing arrays (TCAs), a novel type of paradigm for HPC systems, and tests the feasibility of such systems for HPC systems domains. The exploration of a complex power-grid such as those found in the TCA concept requires detailed simulations of systems with hundreds and possibly thousands of modular nodes, each contributing to the collective behaviour of the system. In particular power, voltage, and current behaviours are critically important observations. To facilitate this investigation, and test the hypothesis, which seeks to understand if scalability is feasible for such systems, a bespoke simulation platform has been developed, and (importantly) validated against hardware prototypes of small systems. A number of systems are simulated, including systems consisting of arrays of ā€™ballsā€™. Balls are collections of modular tiles that form a ball-like modular unit, and can then themselves be tiled into large scale systems. Evaluations typically involved simulation of cubic arrays of sizes ranging from 2x2x2 balls up to 10x10x10. Larger systems require extended simulation times. Therefore models are developed to extrapolate system behaviours for higher-orders of systems and to gauge the ultimate scalability of such TCA systems. It is found that systems of 40x40x40 are quite feasible with appropriate configurations. Data connectivity is explored to a lesser degree, but comparisons were made between TCA systems and well known comparable HPC systems, and it is concluded that TCA systems can be built with comparable data-flow and scalability, and that the electrical and engineering challenges associated with the novelty of 3D tiled systems can be met with practical solutions

    Visual inspection : image sampling, algorithms and architectures

    Get PDF
    The thesis concerns the hexagonal sampling of images, the processing of industrially derived images, and the design of a novel processor element that can be assembled into pipelines to effect fast, economic and reliable processing. A hexagonally sampled two dimensional image can require 13.4% fewer sampling points than a square sampled equivalent. The grid symmetry results in simpler processing operators that compute more efficiently than square grid operators. Computation savings approaching 44% arc demonstrated. New hexagonal operators arc reported including a Gaussian smoothing filter, a binary thinner, and an edge detector with comparable accuracy to that of the Sobel detector. The design of hexagonal arrays of sensors is considered. Operators requiring small local areas of support are shown to be sufficient for processing controlled lighting and industrial images. Case studies show that small features in hexagonally processed images maintain their shape better, and that processes can tolerate a lower signal to noise ratio, than that for equivalent square processed images. The modelling of small defects in surfaces has been studied in depth. The flexible programmable processor element can perform the low level local operators required for industrial image processing on both square and hexagonal grids. The element has been specified and simulated by a high level computer program. A fast communication channel allows for dynamic reprogramming by a control computer, and the video rate element can be assembled into various pipeline architectures, that may eventually be adaptively controlled

    SpiNNaker - A Spiking Neural Network Architecture

    Get PDF
    20 years in conception and 15 in construction, the SpiNNaker project has delivered the worldā€™s largest neuromorphic computing platform incorporating over a million ARM mobile phone processors and capable of modelling spiking neural networks of the scale of a mouse brain in biological real time. This machine, hosted at the University of Manchester in the UK, is freely available under the auspices of the EU Flagship Human Brain Project. This book tells the story of the origins of the machine, its development and its deployment, and the immense software development effort that has gone into making it openly available and accessible to researchers and students the world over. It also presents exemplar applications from ā€˜Talkā€™, a SpiNNaker-controlled robotic exhibit at the Manchester Art Gallery as part of ā€˜The Imitation Gameā€™, a set of works commissioned in 2016 in honour of Alan Turing, through to a way to solve hard computing problems using stochastic neural networks. The book concludes with a look to the future, and the SpiNNaker-2 machine which is yet to come

    SpiNNaker - A Spiking Neural Network Architecture

    Get PDF
    20 years in conception and 15 in construction, the SpiNNaker project has delivered the worldā€™s largest neuromorphic computing platform incorporating over a million ARM mobile phone processors and capable of modelling spiking neural networks of the scale of a mouse brain in biological real time. This machine, hosted at the University of Manchester in the UK, is freely available under the auspices of the EU Flagship Human Brain Project. This book tells the story of the origins of the machine, its development and its deployment, and the immense software development effort that has gone into making it openly available and accessible to researchers and students the world over. It also presents exemplar applications from ā€˜Talkā€™, a SpiNNaker-controlled robotic exhibit at the Manchester Art Gallery as part of ā€˜The Imitation Gameā€™, a set of works commissioned in 2016 in honour of Alan Turing, through to a way to solve hard computing problems using stochastic neural networks. The book concludes with a look to the future, and the SpiNNaker-2 machine which is yet to come

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    Improving Model-Based Software Synthesis: A Focus on Mathematical Structures

    Get PDF
    Computer hardware keeps increasing in complexity. Software design needs to keep up with this. The right models and abstractions empower developers to leverage the novelties of modern hardware. This thesis deals primarily with Models of Computation, as a basis for software design, in a family of methods called software synthesis. We focus on Kahn Process Networks and dataļ¬‚ow applications as abstractions, both for programming and for deriving an eļ¬ƒcient execution on heterogeneous multicores. The latter we accomplish by exploring the design space of possible mappings of computation and data to hardware resources. Mapping algorithms are not at the center of this thesis, however. Instead, we examine the mathematical structure of the mapping space, leveraging its inherent symmetries or geometric properties to improve mapping methods in general. This thesis thoroughly explores the process of model-based design, aiming to go beyond the more established software synthesis on dataļ¬‚ow applications. We starting with the problem of assessing these methods through benchmarking, and go on to formally examine the general goals of benchmarks. In this context, we also consider the role modern machine learning methods play in benchmarking. We explore different established semantics, stretching the limits of Kahn Process Networks. We also discuss novel models, like Reactors, which are designed to be a deterministic, adaptive model with time as a ļ¬rst-class citizen. By investigating abstractions and transformations in the Ohua language for implicit dataļ¬‚ow programming, we also focus on programmability. The focus of the thesis is in the models and methods, but we evaluate them in diverse use-cases, generally centered around Cyber-Physical Systems. These include the 5G telecommunication standard, automotive and signal processing domains. We even go beyond embedded systems and discuss use-cases in GPU programming and microservice-based architectures

    Enhancing productivity and performance portability of opencl applications on heterogeneous systems using runtime optimizations

    Get PDF
    Initially driven by a strong need for increased computational performance in science and engineering, heterogeneous systems have become ubiquitous and they are getting increasingly complex. The single processor era has been replaced with multi-core processors, which have quickly been surrounded by satellite devices aiming to increase the throughput of the entire system. These auxiliary devices, such as Graphics Processing Units, Field Programmable Gate Arrays or other specialized processors have very different architectures. This puts an enormous strain on programming models and software developers to take full advantage of the computing power at hand. Because of this diversity and the unachievable flexibility and portability necessary to optimize for each target individually, heterogeneous systems remain typically vastly under-utilized. In this thesis, we explore two distinct ways to tackle this problem. Providing automated, non intrusive methods in the form of compiler tools and implementing efficient abstractions to automatically tune parameters for a restricted domain are two complementary approaches investigated to better utilize compute resources in heterogeneous systems. First, we explore a fully automated compiler based approach, where a runtime system analyzes the computation flow of an OpenCL application and optimizes it across multiple compute kernels. This method can be deployed on any existing application transparently and replaces significant software engineering effort spent to tune application for a particular system. We show that this technique achieves speedups of up to 3x over unoptimized code and an average of 1.4x over manually optimized code for highly dynamic applications. Second, a library based approach is designed to provide a high level abstraction for complex problems in a specific domain, stencil computation. Using domain specific techniques, the underlying framework optimizes the code aggressively. We show that even in a restricted domain, automatic tuning mechanisms and robust architectural abstraction are necessary to improve performance. Using the abstraction layer, we demonstrate strong scaling of various applications to multiple GPUs with a speedup of up to 1.9x on two GPUs and 3.6x on four

    A Hexagonal Shaped Processor and Interconnect Topology for Tightly-tiled Many-Core Architecture

    No full text
    Abstract ā€” 2-Dimensional meshes are the most commonly used Network-on-Chip (NoC) topology for on-chip communication in many-core processor arrays due to their low complexity and excellent match to rectangular processor tiles. However, 2D meshes may incur local traffic congestion for applications with significant levels of traffic with non-neighboring cores, resulting in long latencies and high power consumption. In this paper, we propose an 8-neighbor mesh topology and a 6-neighbor topology with hexagonal-shaped processor tiles. A 16-bit DSP processor and the corresponding processor arrays are implemented in all three topologies. The hexagonal processor tile and arrays of tiles are laid out using industry-standard CAD tools and automatic place and route flow without full-custom design, and result in DRC-clean and LVS-clean layout. A 1080p H.264/AVC residual video encoder and a 54 Mbps 802.11a/11g OFDM wireless LAN baseband receiver are mapped onto all topologies. The 6-neighbor hexagonal grid topology incurs a 2.9 % area increase per tile compared to the 4-neighbor 2D mesh, but its much more effective inter-processor interconnect yields an average total application area reduction of 21%, an average power reduction of 17%, and a total application inter-processor communication distance reduction of 19%. I

    DIVE on the internet

    Get PDF
    This dissertation reports research and development of a platform for Collaborative Virtual Environments (CVEs). It has particularly focused on two major challenges: supporting the rapid development of scalable applications and easing their deployment on the Internet. This work employs a research method based on prototyping and refinement and promotes the use of this method for application development. A number of the solutions herein are in line with other CVE systems. One of the strengths of this work consists in a global approach to the issues raised by CVEs and the recognition that such complex problems are best tackled using a multi-disciplinary approach that understands both user and system requirements. CVE application deployment is aided by an overlay network that is able to complement any IP multicast infrastructure in place. Apart from complementing a weakly deployed worldwide multicast, this infrastructure provides for a certain degree of introspection, remote controlling and visualisation. As such, it forms an important aid in assessing the scalability of running applications. This scalability is further facilitated by specialised object distribution algorithms and an open framework for the implementation of novel partitioning techniques. CVE application development is eased by a scripting language, which enables rapid development and favours experimentation. This scripting language interfaces many aspects of the system and enables the prototyping of distribution-related components as well as user interfaces. It is the key construct of a distributed environment to which components, written in different languages, connect and onto which they operate in a network abstracted manner. The solutions proposed are exemplified and strengthened by three collaborative applications. The Dive room system is a virtual environment modelled after the room metaphor and supporting asynchronous and synchronous cooperative work. WebPath is a companion application to a Web browser that seeks to make the current history of page visits more visible and usable. Finally, the London travel demonstrator supports travellers by providing an environment where they can explore the city, utilise group collaboration facilities, rehearse particular journeys and access tourist information data
    corecore