215 research outputs found

    Are We There Yet? Product Quantization and its Hardware Acceleration

    Full text link
    Conventional multiply-accumulate (MAC) operations have long dominated computation time for deep neural networks (DNNs). Recently, product quantization (PQ) has been successfully applied to these workloads, replacing MACs with memory lookups to pre-computed dot products. While this property makes PQ an attractive solution for model acceleration, little is understood about the associated trade-offs in terms of compute and memory footprint, and the impact on accuracy. Our empirical study investigates the impact of different PQ settings and training methods on layerwise reconstruction error and end-to-end model accuracy. When studying the efficiency of deploying PQ DNNs, we find that metrics such as FLOPs, number of parameters, and even CPU/GPU performance, can be misleading. To address this issue, and to more fairly assess PQ in terms of hardware efficiency, we design the first custom hardware accelerator to evaluate the speed and efficiency of running PQ models. We identify PQ configurations that are able to improve performance-per-area for ResNet20 by 40%-104%, even when compared to a highly optimized conventional DNN accelerator. Our hardware performance outperforms recent PQ solutions by 4x, with only a 0.6% accuracy degradation. This work demonstrates the practical and hardware-aware design of PQ models, paving the way for wider adoption of this emerging DNN approximation methodology

    Fast algorithm for real-time rings reconstruction

    Get PDF
    The GAP project is dedicated to study the application of GPU in several contexts in which real-time response is important to take decisions. The definition of real-time depends on the application under study, ranging from answer time of μs up to several hours in case of very computing intensive task. During this conference we presented our work in low level triggers [1] [2] and high level triggers [3] in high energy physics experiments, and specific application for nuclear magnetic resonance (NMR) [4] [5] and cone-beam CT [6]. Apart from the study of dedicated solution to decrease the latency due to data transport and preparation, the computing algorithms play an essential role in any GPU application. In this contribution, we show an original algorithm developed for triggers application, to accelerate the ring reconstruction in RICH detector when it is not possible to have seeds for reconstruction from external trackers

    An efficient software tool to segment slice and view electron tomograms

    Get PDF
    Dissertação de mestrado em Computer ScienceSegmentation is a key method to extract useful information in Electron Tomography. Manual segmentation is the most commonly used method, but it is subject to user bias and the process is slow. The lack of adequate automated processes, due to the high complexity and to the low signal-to-noise ratio of these tomograms, provided the main challenges for this dissertation: to develop a software tool to efficiently handle electron tomograms, including a novel 3D segmentation algorithm. Tomograms can be seen as a stack of 2D images; operations on tomograms usually lead to computationally intense tasks. This is due to the large amount of involved data and to the strided and random memory access patterns. These characteristics represent serious problems on novel computing systems, which rely on complex memory hierarchy architectures to hide memory access latency time. A software tool with a user-friendly interface — TomSeg — was designed, implemented and tested with experimental datasets, built with sequences of Scanning Electron Microscopy images obtained using a Slice and View technique. This tool lets users align, crop, segment and export electron tomograms, using computationally efficient processes. TomSeg takes advantage of the most usual architectures of modern compute servers, namely based on multicore and many-core CPU devices, exploring vector and parallel programming techniques; it also explores the available GPU-devices to speedup critical code functions. Validation and performance results on a compute server are presented together with the performance improvements obtained during the implementation and test phases. TomSeg is an open-source tool for Unix and Windows that can be easily extended with new algorithms to efficiently handle generic tomograms.A segmentação é uma técnica fundamental na tomografia eletrónica para a extração de informação. A segmentação manual é o método mais utilizado, mas é um processo lento e sujeito à parcialidade humana. A falta de métodos automáticos adequados, muito devido à elevada complexidade e à baixa relação sinal-ruído destes tomogramas, conduziu aos principais desafios desta dissertação: desenvolver uma ferramenta de software para manusear tomogramas eletrónicos de forma eficiente, que inclui um novo algoritmo de segmentação 3D. Os tomogramas podem ser vistos como uma pilha de imagens 2D; operações sobre tomogramas costumam originar tarefas computacionalmente exigentes. Isto deve-se à grande quantidade de dados envolvidos e aos acessos espaçados e aleatórios à memória. Estas características representam problemas sérios nos mais recentes sistemas de computação, que dependem de uma complexa arquitetura hierárquica para esconder o tempo de acesso à memória. Desenhou-se, implementou-se e testou-se uma ferramenta de software com uma interface de utilização amigável — TomSeg — utilizando conjuntos de dados experimentais, construídos a partir de sequências de imagens de microscopia eletrónica de varrimento obtidas através de uma técnica de Slice and View. Esta ferramenta permite aos utilizadores alinhar, cortar, segmentar e exportar tomogramas eletrónicos, utilizando processos computacionalmente eficientes. O TomSeg tira vantagem das arquiteturas mais habituais dos servidores de computação atuais, nomeadamente daqueles baseados em dispositivos CPU multicore e many-core, explorando técnicas de programação vetorial e paralela; os dispositivos GPU podem ainda ser usados como aceleradores de algumas funções. Vários resultados de validação obtidos num servidor de computação são apresentados, em conjunto com algumas melhorias obtidas durante as fases de implementação e teste. O TomSeg é uma ferramenta de código aberto para Unix e Windows que pode ser estendida facilmente com novos algoritmos para manusear de forma eficiente qualquer tipo de tomogramas

    Reducing adaptive optics latency using many-core processors

    Get PDF
    Atmospheric turbulence reduces the achievable resolution of ground based optical telescopes. Adaptive optics systems attempt to mitigate the impact of this turbulence and are required to update their corrections quickly and deterministically (i.e. in realtime). The technological challenges faced by the future extremely large telescopes (ELTs) and their associated instruments are considerable. A simple extrapolation of current systems to the ELT scale is not sufficient. My thesis work consisted in the identification and examination of new many-core technologies for accelerating the adaptive optics real-time control loop. I investigated the Mellanox TILE-Gx36 and the Intel Xeon Phi (5110p). The TILE-Gx36 with 4x10 GbE ports and 36 processing cores is a good candidate for fast computation of the wavefront sensor images. The Intel Xeon Phi with 60 processing cores and high memory bandwidth is particularly well suited for the acceleration of the wavefront reconstruction. Through extensive testing I have shown that the TILE-Gx can provide the performance required for the wavefront processing units of the ELT first light instruments. The Intel Xeon Phi (Knights Corner) while providing good overall performance does not have the required determinism. We believe that the next generation of Xeon Phi (Knights Landing) will provide the necessary determinism and increased performance. In this thesis, we show that by using currently available novel many-core processors it is possible to reach the performance required for ELT instruments

    Heterogeneous Multi-core Architectures for High Performance Computing

    Get PDF
    This thesis deals with heterogeneous architectures in standard workstations. Heterogeneous architectures represent an appealing alternative to traditional supercomputers because they are based on commodity components fabricated in large quantities. Hence their price-performance ratio is unparalleled in the world of high performance computing (HPC). In particular, different aspects related to the performance and consumption of heterogeneous architectures have been explored. The thesis initially focuses on an efficient implementation of a parallel application, where the execution time is dominated by an high number of floating point instructions. Then the thesis touches the central problem of efficient management of power peaks in heterogeneous computing systems. Finally it discusses a memory-bounded problem, where the execution time is dominated by the memory latency. Specifically, the following main contributions have been carried out: A novel framework for the design and analysis of solar field for Central Receiver Systems (CRS) has been developed. The implementation based on desktop workstation equipped with multiple Graphics Processing Units (GPUs) is motivated by the need to have an accurate and fast simulation environment for studying mirror imperfection and non-planar geometries. Secondly, a power-aware scheduling algorithm on heterogeneous CPU-GPU architectures, based on an efficient distribution of the computing workload to the resources, has been realized. The scheduler manages the resources of several computing nodes with a view to reducing the peak power. The two main contributions of this work follow: the approach reduces the supply cost due to high peak power whilst having negligible impact on the parallelism of computational nodes. from another point of view the developed model allows designer to increase the number of cores without increasing the capacity of the power supply unit. Finally, an implementation for efficient graph exploration on reconfigurable architectures is presented. The purpose is to accelerate graph exploration, reducing the number of random memory accesses

    Adaptive Optics Progress

    Get PDF
    For over four decades there has been continuous progress in adaptive optics technology, theory, and systems development. Recently there also has been an explosion of applications of adaptive optics throughout the fields of communications and medicine in addition to its original uses in astronomy and beam propagation. This volume is a compilation of research and tutorials from a variety of international authors with expertise in theory, engineering, and technology. Eight chapters include discussion of retinal imaging, solar astronomy, wavefront-sensorless adaptive optics systems, liquid crystal wavefront correctors, membrane deformable mirrors, digital adaptive optics, optical vortices, and coupled anisoplanatism

    Hardware Considerations for Signal Processing Systems: A Step Toward the Unconventional.

    Full text link
    As we progress into the future, signal processing algorithms are becoming more computationally intensive and power hungry while the desire for mobile products and low power devices is also increasing. An integrated ASIC solution is one of the primary ways chip developers can improve performance and add functionality while keeping the power budget low. This work discusses ASIC hardware for both conventional and unconventional signal processing systems, and how integration, error resilience, emerging devices, and new algorithms can be leveraged by signal processing systems to further improve performance and enable new applications. Specifically this work presents three case studies: 1) a conventional and highly parallel mix signal cross-correlator ASIC for a weather satellite performing real-time synthetic aperture imaging, 2) an unconventional native stochastic computing architecture enabled by memristors, and 3) two unconventional sparse neural network ASICs for feature extraction and object classification. As improvements from technology scaling alone slow down, and the demand for energy efficient mobile electronics increases, such optimization techniques at the device, circuit, and system level will become more critical to advance signal processing capabilities in the future.PhDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116685/1/knagphil_1.pd

    Computational methods and software for the design of inertial microfluidic flow sculpting devices

    Get PDF
    The ability to sculpt inertially flowing fluid via bluff body obstacles has enormous promise for applications in bioengineering, chemistry, and manufacturing within microfluidic devices. However, the computational difficulty inherent to full scale 3-dimensional fluid flow simulations makes designing and optimizing such systems tedious, costly, and generally tasked to computational experts with access to high performance resources. The goal of this work is to construct efficient models for the design of inertial microfluidic flow sculpting devices, and implement these models in freely available, user-friendly software for the broader microfluidics community. Two software packages were developed to accomplish this: uFlow and FlowSculpt . uFlow solves the forward problem in flow sculpting, that of predicting the net deformation from an arbitrary sequence of obstacles (pillars), and includes estimations of transverse mass diffusion and particles formed by optical lithography. FlowSculpt solves the more difficult inverse problem in flow sculpting, which is to design a flow sculpting device which produces a target flow shape. Each piece of software uses efficient, experimentally validated forward models developed within this work, which are applied to deep learning techniques to explore other routes to solving the inverse problem. The models are also highly modular, capable of incorporating new microfluidic components and flow physics to the design process. It is anticipated that the microfluidics community will integrate the tools developed here into their own research, and bring new designs, components, and applications to the inertial flow sculpting platform

    Research and Technology Report. Goddard Space Flight Center

    Get PDF
    This issue of Goddard Space Flight Center's annual report highlights the importance of mission operations and data systems covering mission planning and operations; TDRSS, positioning systems, and orbit determination; ground system and networks, hardware and software; data processing and analysis; and World Wide Web use. The report also includes flight projects, space sciences, Earth system science, and engineering and materials
    • …
    corecore