830 research outputs found

    HPC Accelerators with 3D Memory

    Get PDF
    Artículo invitado, publicado en las actas del congreso por IEEE Society Press. Páginas 320 a 328. ISBN: 978-1-5090-3593-9.DOI 10.1109/CSE-EUC-DCABES-2016.203After a decade evolving in the High Performance Computing arena, GPU-equipped supercomputers have con- quered the top500 and green500 lists, providing us unprecedented levels of computational power and memory bandwidth. This year, major vendors have introduced new accelerators based on 3D memory, like Xeon Phi Knights Landing by Intel and Pascal architecture by Nvidia. This paper reviews hardware features of those new HPC accelerators and unveils potential performance for scientific applications, with an emphasis on Hybrid Memory Cube (HMC) and High Bandwidth Memory (HBM) used by commercial products according to roadmaps already announced.Universidad de Málaga. Campus de Excelencia Internacional Andalucia Tec

    MaskPlace: Fast Chip Placement via Reinforced Visual Representation Learning

    Full text link
    Placement is an essential task in modern chip design, aiming at placing millions of circuit modules on a 2D chip canvas. Unlike the human-centric solution, which requires months of intense effort by hardware engineers to produce a layout to minimize delay and energy consumption, deep reinforcement learning has become an emerging autonomous tool. However, the learning-centric method is still in its early stage, impeded by a massive design space of size ten to the order of a few thousand. This work presents MaskPlace to automatically generate a valid chip layout design within a few hours, whose performance can be superior or comparable to recent advanced approaches. It has several appealing benefits that prior arts do not have. Firstly, MaskPlace recasts placement as a problem of learning pixel-level visual representation to comprehensively describe millions of modules on a chip, enabling placement in a high-resolution canvas and a large action space. It outperforms recent methods that represent a chip as a hypergraph. Secondly, it enables training the policy network by an intuitive reward function with dense reward, rather than a complicated reward function with sparse reward from previous methods. Thirdly, extensive experiments on many public benchmarks show that MaskPlace outperforms existing RL approaches in all key performance metrics, including wirelength, congestion, and density. For example, it achieves 60%-90% wirelength reduction and guarantees zero overlaps. We believe MaskPlace can improve AI-assisted chip layout design. The deliverables are released at https://laiyao1.github.io/maskplace

    High Lundquist Number Simulations of Parker\u27s Model of Coronal Heating: Scaling and Current Sheet Statistics Using Heterogeneous Computing Architectures

    Get PDF
    Parker\u27s model [Parker, Astrophys. J., 174, 499 (1972)] is one of the most discussed mechanisms for coronal heating and has generated much debate. We have recently obtained new scaling results for a 2D version of this problem suggesting that the heating rate becomes independent of resistivity in a statistical steady state [Ng and Bhattacharjee, Astrophys. J., 675, 899 (2008)]. Our numerical work has now been extended to 3D using high resolution MHD numerical simulations. Random photospheric footpoint motion is applied for a time much longer than the correlation time of the motion to obtain converged average coronal heating rates. Simulations are done for different values of the Lundquist number to determine scaling. In the high-Lundquist number limit (S \u3e 1000), the coronal heating rate obtained is consistent with a trend that is independent of the Lundquist number, as predicted by previous analysis and 2D simulations. We will present scaling analysis showing that when the dissipation time is comparable or larger than the correlation time of the random footpoint motion, the heating rate tends to become independent of Lundquist number, and that the magnetic energy production is also reduced significantly. We also present a comprehensive reprogramming of our simulation code to run on NVidia graphics processing units using the Compute Unified Device Architecture (CUDA) and report code performance on several large scale heterogenous machines

    Rigid continuation paths I. Quasilinear average complexity for solving polynomial systems

    Get PDF
    How many operations do we need on the average to compute an approximate root of a random Gaussian polynomial system? Beyond Smale's 17th problem that asked whether a polynomial bound is possible, we prove a quasi-optimal bound (input size)1+o(1)\text{(input size)}^{1+o(1)}. This improves upon the previously known (input size)32+o(1)\text{(input size)}^{\frac32 +o(1)} bound. The new algorithm relies on numerical continuation along \emph{rigid continuation paths}. The central idea is to consider rigid motions of the equations rather than line segments in the linear space of all polynomial systems. This leads to a better average condition number and allows for bigger steps. We show that on the average, we can compute one approximate root of a random Gaussian polynomial system of~nn equations of degree at most DD in n+1n+1 homogeneous variables with O(n5D2)O(n^5 D^2) continuation steps. This is a decisive improvement over previous bounds that prove no better than 2min(n,D)\sqrt{2}^{\min(n, D)} continuation steps on the average

    Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning

    Full text link
    Visual language grounding is widely studied in modern neural image captioning systems, which typically adopts an encoder-decoder framework consisting of two principal components: a convolutional neural network (CNN) for image feature extraction and a recurrent neural network (RNN) for language caption generation. To study the robustness of language grounding to adversarial perturbations in machine vision and perception, we propose Show-and-Fool, a novel algorithm for crafting adversarial examples in neural image captioning. The proposed algorithm provides two evaluation approaches, which check whether neural image captioning systems can be mislead to output some randomly chosen captions or keywords. Our extensive experiments show that our algorithm can successfully craft visually-similar adversarial examples with randomly targeted captions or keywords, and the adversarial examples can be made highly transferable to other image captioning systems. Consequently, our approach leads to new robustness implications of neural image captioning and novel insights in visual language grounding.Comment: Accepted by 56th Annual Meeting of the Association for Computational Linguistics (ACL 2018). Hongge Chen and Huan Zhang contribute equally to this wor

    Efficient reconfigurable architectures for 3D medical image compression

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Recently, the more widespread use of three-dimensional (3-D) imaging modalities, such as magnetic resonance imaging (MRI), computed tomography (CT), positron emission tomography (PET), and ultrasound (US) have generated a massive amount of volumetric data. These have provided an impetus to the development of other applications, in particular telemedicine and teleradiology. In these fields, medical image compression is important since both efficient storage and transmission of data through high-bandwidth digital communication lines are of crucial importance. Despite their advantages, most 3-D medical imaging algorithms are computationally intensive with matrix transformation as the most fundamental operation involved in the transform-based methods. Therefore, there is a real need for high-performance systems, whilst keeping architectures exible to allow for quick upgradeability with real-time applications. Moreover, in order to obtain efficient solutions for large medical volumes data, an efficient implementation of these operations is of significant importance. Reconfigurable hardware, in the form of field programmable gate arrays (FPGAs) has been proposed as viable system building block in the construction of high-performance systems at an economical price. Consequently, FPGAs seem an ideal candidate to harness and exploit their inherent advantages such as massive parallelism capabilities, multimillion gate counts, and special low-power packages. The key achievements of the work presented in this thesis are summarised as follows. Two architectures for 3-D Haar wavelet transform (HWT) have been proposed based on transpose-based computation and partial reconfiguration suitable for 3-D medical imaging applications. These applications require continuous hardware servicing, and as a result dynamic partial reconfiguration (DPR) has been introduced. Comparative study for both non-partial and partial reconfiguration implementation has shown that DPR offers many advantages and leads to a compelling solution for implementing computationally intensive applications such as 3-D medical image compression. Using DPR, several large systems are mapped to small hardware resources, and the area, power consumption as well as maximum frequency are optimised and improved. Moreover, an FPGA-based architecture of the finite Radon transform (FRAT)with three design strategies has been proposed: direct implementation of pseudo-code with a sequential or pipelined description, and block random access memory (BRAM)- based method. An analysis with various medical imaging modalities has been carried out. Results obtained for image de-noising implementation using FRAT exhibits promising results in reducing Gaussian white noise in medical images. In terms of hardware implementation, promising trade-offs on maximum frequency, throughput and area are also achieved. Furthermore, a novel hardware implementation of 3-D medical image compression system with context-based adaptive variable length coding (CAVLC) has been proposed. An evaluation of the 3-D integer transform (IT) and the discrete wavelet transform (DWT) with lifting scheme (LS) for transform blocks reveal that 3-D IT demonstrates better computational complexity than the 3-D DWT, whilst the 3-D DWT with LS exhibits a lossless compression that is significantly useful for medical image compression. Additionally, an architecture of CAVLC that is capable of compressing high-definition (HD) images in real-time without any buffer between the quantiser and the entropy coder is proposed. Through a judicious parallelisation, promising results have been obtained with limited resources. In summary, this research is tackling the issues of massive 3-D medical volumes data that requires compression as well as hardware implementation to accelerate the slowest operations in the system. Results obtained also reveal a significant achievement in terms of the architecture efficiency and applications performance.Ministry of Higher Education Malaysia (MOHE), Universiti Tun Hussein Onn Malaysia (UTHM) and the British Counci

    Genetic improvement of GPU software

    Get PDF
    We survey genetic improvement (GI) of general purpose computing on graphics cards. We summarise several experiments which demonstrate four themes. Experiments with the gzip program show that genetic programming can automatically port sequential C code to parallel code. Experiments with the StereoCamera program show that GI can upgrade legacy parallel code for new hardware and software. Experiments with NiftyReg and BarraCUDA show that GI can make substantial improvements to current parallel CUDA applications. Finally, experiments with the pknotsRG program show that with semi-automated approaches, enormous speed ups can sometimes be had by growing and grafting new code with genetic programming in combination with human input
    corecore