Search CORE

22,159 research outputs found

MP-STREAM: A Memory Performance Benchmark for Design Space Exploration on Heterogeneous HPC Devices

Author: Nabi Syed Waqar
Vanderbauwhede Wim
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/08/2018
Field of study

Sustained memory throughput is a key determinant of performance in HPC devices. Having an accurate estimate of this parameter is essential for manual or automated design space exploration for any HPC device. While there are benchmarks for measuring the sustained memory bandwidth for CPUs and GPUs, such a benchmark for FPGAs has been missing. We present MP-STREAM, an OpenCL-based synthetic micro-benchmark for measuring sustained memory bandwidth, optimized for FPGAs, but which can be used on multiple platforms. Our main contribution is the introduction of various generic as well as device-specific parameters that can be tuned to measure their effect on memory bandwidth. We present results of running our benchmark on a CPU, a GPU and two FPGA targets, and discuss our observations. The experiments underline the utility of our benchmark for optimizing HPC applications for FPGAs, and provide valuable optimization hints for FPGA programmers

Crossref

Enlighten

Neural Architecture Search using Deep Neural Networks and Monte Carlo Tree Search

Author: Fonseca Rodrigo
Jinnai Yuu
Tian Yuandong
Wang Linnan
Zhao Yiyang
Publication venue
Publication date: 21/11/2019
Field of study

Neural Architecture Search (NAS) has shown great success in automating the design of neural networks, but the prohibitive amount of computations behind current NAS methods requires further investigations in improving the sample efficiency and the network evaluation cost to get better results in a shorter time. In this paper, we present a novel scalable Monte Carlo Tree Search (MCTS) based NAS agent, named AlphaX, to tackle these two aspects. AlphaX improves the search efficiency by adaptively balancing the exploration and exploitation at the state level, and by a Meta-Deep Neural Network (DNN) to predict network accuracies for biasing the search toward a promising region. To amortize the network evaluation cost, AlphaX accelerates MCTS rollouts with a distributed design and reduces the number of epochs in evaluating a network by transfer learning, which is guided with the tree structure in MCTS. In 12 GPU days and 1000 samples, AlphaX found an architecture that reaches 97.84\% top-1 accuracy on CIFAR-10, and 75.5\% top-1 accuracy on ImageNet, exceeding SOTA NAS methods in both the accuracy and sampling efficiency. Particularly, we also evaluate AlphaX on NASBench-101, a large scale NAS dataset; AlphaX is 3x and 2.8x more sample efficient than Random Search and Regularized Evolution in finding the global optimum. Finally, we show the searched architecture improves a variety of vision applications from Neural Style Transfer, to Image Captioning and Object Detection.Comment: To appear in the Thirty-Fourth AAAI conference on Artificial Intelligence (AAAI-2020

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Fast and adaptive fractal tree-based path planning for programmable bevel tip steerable needles

Author: Baena FRY
Garriga-Casanovas A
Liu F
Secoli R
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/01/2016
Field of study

© 2016 IEEE. Steerable needles are a promising technology for minimally invasive surgery, as they can provide access to difficult to reach locations while avoiding delicate anatomical regions. However, due to the unpredictable tissue deformation associated with needle insertion and the complexity of many surgical scenarios, a real-time path planning algorithm with high update frequency would be advantageous. Real-time path planning for nonholonomic systems is commonly used in a broad variety of fields, ranging from aerospace to submarine navigation. In this letter, we propose to take advantage of the architecture of graphics processing units (GPUs) to apply fractal theory and thus parallelize real-time path planning computation. This novel approach, termed adaptive fractal trees (AFT), allows for the creation of a database of paths covering the entire domain, which are dense, invariant, procedurally produced, adaptable in size, and present a recursive structure. The generated cache of paths can in turn be analyzed in parallel to determine the most suitable path in a fraction of a second. The ability to cope with nonholonomic constraints, as well as constraints in the space of states of any complexity or number, is intrinsic to the AFT approach, rendering it highly versatile. Three-dimensional (3-D) simulations applied to needle steering in neurosurgery show that our approach can successfully compute paths in real-time, enabling complex brain navigation

Spiral - Imperial College Digital Repository

Towards a Software Transactional Memory for heterogeneous CPU-GPU processors

Author: Asenjo-Plaza Rafael
Navarro Angeles
Plata-Gonzalez Oscar Guillermo
Villegas Alejandro
Publication venue
Publication date: 15/09/2017
Field of study

The heterogeneous Accelerated Processing Units (APUs) integrate a multi-core CPU and a GPU within the same chip. Modern APUs provide the programmer with platform atomics, used to communicate the CPU cores with the GPU using simple atomic datatypes. However, ensuring consistency for complex data types is a task delegated to programmers, who have to implement a mutual exclusion mechanism. Transactional Memory (TM) is an optimistic approach to implement mutual exclusion. With TM, shared data can be accessed by multiple computing threads speculatively, but changes are only visible if a transaction ends with no conflict with others in its memory accesses. TM has been studied and implemented in software and hardware for both CPU and GPU platforms, but an integrated solution has not been provided for APU processors. In this paper we present APUTM, a software TM designed to work on heterogeneous APU processors. The design of APUTM focuses on minimizing the access to shared metadata in order to reduce the communication overhead via expensive platform atomics. The main objective of APUTM is to help us understand the tradeoffs of implementing a sofware TM on an heterogeneous CPU-GPU platform and to identify the key aspects to be considered in each device. In our experiments, we compare the adaptability of APUTM to execute in one of the devices (CPU or GPU) or in both of them simultaneously. These experiments show that APUTM is able to outperform sequential execution of the applications.This work has been supported by projects TIN2013-42253-P and TIN2016-80920-R, from the Spanish Government, P11-TIC8144 and P12- TIC1470, from Junta de Andalucía, and Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

Repositorio Institucional Universidad de Málaga