116 research outputs found

    Efficient implementation of the Shack-Hartmann centroid extraction for edge computing

    Get PDF
    Adaptive optics (AO) is an established technique to measure and compensate for optical aberrations. One of its key components is the wavefront sensor (WFS), which is typically a Shack-Hartmann sensor (SH) capturing an image related to the aberrated wavefront. We propose an efficient implementation of the SH-WFS centroid extraction algorithm, tailored for edge computing. In the edge-computing paradigm, the data are elaborated close to the source (i.e., at the edge) through low-power embedded architectures, in which CPU computing elements are combined with heterogeneous accelerators (e.g., CPUs, field-programmable gate arrays). Since the control loop latency must be minimized to compensate for the wavefront aberration temporal dynamics, we propose an optimized algorithm that takes advantage of the unified CPU/GPU memory of recent low-power embedded architectures. Experimental results show that the centroid extraction latency obtained over spot images up to 700 x 700 pixels wide is smaller than 2 ms. Therefore, our approach meets the temporal requirements of small- to medium-sized AO systems, which are equipped with deformable mirrors having tens of actuators. (C) 2020 Optical Society of Americ

    Applying Artificial Intelligence Planning to Optimise Heterogeneous Signal Processing for Surface and Dimensional Measurement Systems

    Get PDF
    The need for in-process measurement has surpassed the processing capability of traditional computer hardware. As Industry 4.0 changes the way modern manufacturing occurs, researchers and industry are turning to hardware acceleration to increase the performance of their signal processing to allow real-time process and quality control. This thesis reviewed Industry 4.0 and the challenges that have arisen from transitioning towards a connected smart factory. It has investigated the different hardware acceleration techniques available and the bespoke nature of software that industry and researchers are being forced towards in the pursuit of greater performance. In addition, the application of hardware acceleration within surface and dimensional instrument signal processing was researched and to what extent it is benefitting researchers. The collection of algorithms that the field are using were examined finding significant commonality across multiple instrument types, with work being repeated many times over by different people. The first use of PDDL to optimise heterogenous signal processing within surface and dimensional measurements is proposed. Optical Signal Processing Workspace (OSPW) is presented as a self-optimising software package using GPGPU acceleration using Compute Unified Device Architecture (CUDA)for Nvidia GPUs. OSPW was designed from scratch to be easy to use with very little-to-no programming experience needed, unlike other popular systems such LabVIEW and MATLAB. It provides an intuitive and easy to navigate User Interface (UI) that allows a user to select the signal processing algorithms required, display system outputs, control actuation devices, and modify capture device properties. OSPW automatically profiles the execution time of the signal processing algorithms selected by the user and creates and executes a fully optimised version using an AI planning language, Planning Description Domain Language (PDDL), by selecting the optimum architecture for each signal processing function. OSPW was then evaluated against two case studies, Dispersed Reference Interferometry (DRI) and Line-Scanning Dispersed Interferometry (LSDI). These case studies demonstrated that OSPW can achieve at least21x greater performance than an identical MATLAB implementation with a further 13% improvement found using the PDDL’s heterogenous solution. This novel approach to providing a configurable signal processing library that is self-optimising using AI planning will provide considerable performance gains to researchers and industrial engineers. With some additional development work it will save both academia and industry time and money which can be reinvested to further advance surface and dimensional instrumentation research

    Doctor of Philosophy

    Get PDF
    dissertationDeep Neural Networks (DNNs) are the state-of-art solution in a growing number of tasks including computer vision, speech recognition, and genomics. However, DNNs are computationally expensive as they are carefully trained to extract and abstract features from raw data using multiple layers of neurons with millions of parameters. In this dissertation, we primarily focus on inference, e.g., using a DNN to classify an input image. This is an operation that will be repeatedly performed on billions of devices in the datacenter, in self-driving cars, in drones, etc. We observe that DNNs spend a vast majority of their runtime to runtime performing matrix-by-vector multiplications (MVM). MVMs have two major bottlenecks: fetching the matrix and performing sum-of-product operations. To address these bottlenecks, we use in-situ computing, where the matrix is stored in programmable resistor arrays, called crossbars, and sum-of-product operations are performed using analog computing. In this dissertation, we propose two hardware units, ISAAC and Newton.In ISAAC, we show that in-situ computing designs can outperform DNN digital accelerators, if they leverage pipelining, smart encodings, and can distribute a computation in time and space, within crossbars, and across crossbars. In the ISAAC design, roughly half the chip area/power can be attributed to the analog-to-digital conversion (ADC), i.e., it remains the key design challenge in mixed-signal accelerators for deep networks. In spite of the ADC bottleneck, ISAAC is able to out-perform the computational efficiency of the state-of-the-art design (DaDianNao) by 8x. In Newton, we take advantage of a number of techniques to address ADC inefficiency. These techniques exploit matrix transformations, heterogeneity, and smart mapping of computation to the analog substrate. We show that Newton can increase the efficiency of in-situ computing by an additional 2x. Finally, we show that in-situ computing, unfortunately, cannot be easily adapted to handle training of deep networks, i.e., it is only suitable for inference of already-trained networks. By improving the efficiency of DNN inference with ISAAC and Newton, we move closer to low-cost deep learning that in turn will have societal impact through self-driving cars, assistive systems for the disabled, and precision medicine

    Fast algorithm for real-time rings reconstruction

    Get PDF
    The GAP project is dedicated to study the application of GPU in several contexts in which real-time response is important to take decisions. The definition of real-time depends on the application under study, ranging from answer time of ÎĽs up to several hours in case of very computing intensive task. During this conference we presented our work in low level triggers [1] [2] and high level triggers [3] in high energy physics experiments, and specific application for nuclear magnetic resonance (NMR) [4] [5] and cone-beam CT [6]. Apart from the study of dedicated solution to decrease the latency due to data transport and preparation, the computing algorithms play an essential role in any GPU application. In this contribution, we show an original algorithm developed for triggers application, to accelerate the ring reconstruction in RICH detector when it is not possible to have seeds for reconstruction from external trackers

    Implementation, modeling, and exploration of precision visual servo systems

    Get PDF

    Applications and Techniques for Fast Machine Learning in Science

    Get PDF
    In this community review report, we discuss applications and techniques for fast machine learning (ML) in science - the concept of integrating powerful ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs
    • …
    corecore