Search CORE

12 research outputs found

Case Studies on Optimizing Algorithms for GPU Architectures

Author: Brown Shawn
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2015
Field of study

Modern GPUs are complex, massively multi-threaded, and high-performance. Programmers naturally gravitate towards taking advantage of this high performance for achieving faster results. However, in order to do so successfully, programmers must first understand and then master a new set of skills – writing parallel code, using different types of parallelism, adapting to GPU architectural features, and understanding issues that limit performance. In order to ease this learning process and help GPU programmers become productive more quickly, this dissertation introduces three data access skeletons (DASks) – Block, Column, and Row -- and two block access skeletons (BASks) – Block-By-Block and Warp-by-Warp. Each “skeleton” provides a high-performance implementation framework that partitions data arrays into data blocks and then iterates over those blocks. The programmer must still write “body” methods on individual data blocks to solve their specific problem. These skeletons provide efficient machine dependent data access patterns for use on GPUs. DASks group n data elements into m fixed size data blocks. These m data block are then partitioned across p thread blocks using a 1D or 2D layout pattern. The fixed-size data blocks are parameterized using three C++ template parameters – nWork, WarpSize, and nWarps. Generic programming techniques use these three parameters to enable performance experiments on three different types of parallelism – instruction-level parallelism (ILP), data-level parallelism (DLP), and thread-level parallelism (TLP). These different DASks and BASks are introduced using a simple memory I/O (Copy) case study. A nearest neighbor search case study resulted in the development of DASKs and BASks but does not use these skeletons itself. Three additional case studies – Reduce/Scan, Histogram, and Radix Sort -- demonstrate DASks and BASks in action on parallel primitives and also provides more valuable performance lessons.Doctor of Philosoph

Carolina Digital Repository

Accelerating transitive closure of large-scale sparse graphs

Author: Patel Sanyamee Milindkumar
Publication venue: Digital Commons @ NJIT
Publication date: 31/12/2020
Field of study

Finding the transitive closure of a graph is a fundamental graph problem where another graph is obtained in which an edge exists between two nodes if and only if there is a path in our graph from one node to the other. The reachability matrix of a graph is its transitive closure. This thesis describes a novel approach that uses anti-sections to obtain the transitive closure of a graph. It also examines its advantages when implemented in parallel on a CPU using the Hornet graph data structure. Graph representations of real-world systems are typically sparse in nature due to lesser connectivity between nodes. The anti-section approach is designed specifically to improve performance for large scale sparse graphs. The NVIDIA Titan V CPU is used for the execution of the anti-section parallel implementations. The Dual-Round and Hash-Based implementations of the Anti-Section transitive closure approach provide a significant speedup over several parallel and sequential implementations

Digital Commons @ New Jersey Institute of Technology (NJIT)

Study of Data Structures for Ray Tracing Acceleration

Author: DEVILLERS Hugo
Publication venue
Publication date: 04/09/2020
Field of study

Repository of the University of Namur

Harnessing emerging supercomputers for remote and interactive visual discovery in astronomy

Author: Dykes Timothy
Publication venue
Publication date: 01/12/2018
Field of study

Portsmouth University Research Portal (Pure)

The observation of extended sources with the Hartebeesthoek radio telescope

Author: Mountfort Peter Ian
Publication venue: Faculty of Science, Physics and Electronics
Publication date: 01/01/1990
Field of study

The Hartebeesthoek Radio Telescope is well suited to mapping large areas of sky at 2.3 GHz because of the stability and sensitivity of the noise-adding radiometer (Nicolson, 1970) and cryogenic amplifier used at this frequency, the relatively large 20' beam of the 26 m dish antenna, and its high-speed drive capability. Telescope control programs were written for the Observatory's online computer for automated mapping. Effort centred on removing the curved baseline or 'background' from each Declination (Dec) scan, due to atmospheric and ground radiation contributions varying as the antenna is scanned. Initially these backgrounds were measured over a wide range of Hour Angle (HA) for the Dec range of a map, and an interpolated curve subtracted from each on-source scan for its HA. A common base level was established by comparison with drift scans (observed with the antenna stationary). These different observations (on- and off-source Dec scans and drift scans) were combined into one in the Skymap system by performing Dec scans at a fixed starting HA for a period long enough to permit 'cold sky' and the source to drift through. A background formed by fitting a smooth curve through the lowest sample at each Dec provides a consistent relative base level for all the scans in an observation. A high scanning speed is used so that observations may fruitfully be repeated three times and interleaved to build a reliable, fully sampled map. As each observation has its own background removed, it may be made at any HA. For comparison, maps of Upper Scorpio produced by the earlier method (Baart et al., 1980) and the Magellanic Cloud region produced by Skymap (Mountfort et al., 1987) are shown. Skymap provides a simple and flexible mapping method which relies on the stability of the noise-adding radiometer and high-speed repeated scans to produce good maps of large or small extent with little computation. Correction for drift is more difficult than with systems which use intersecting scans, such as the 'nodding' scans used by Haslam et al. (1981) or the Azimuth scans of Reich (1982)

South East Academic Libraries System (SEALS)

Rhodes Repository (SEALS)

LIPIcs, Volume 274, ESA 2023, Complete Volume

Author: Farach-Colton Martin
Herman Grzegorz
Puglisi Simon J.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Annual European Symposium on Algorithms (ESA 2023)
Publication date: 01/01/2023
Field of study