Search CORE

184 research outputs found

Exploration of Optimization Options for Increasing Performance of a GPU Implementation of a Three-dimensional Bilateral Filter

Author: Bethel E. Wes
Bethel E. Wes
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 06/01/2012
Field of study

This report explores using GPUs as a platform for performing high performance medical image data processing, specifically smoothing using a 3D bilateral filter, which performs anisotropic, edge-preserving smoothing. The algorithm consists of a running a specialized 3D convolution kernel over a source volume to produce an output volume. Overall, our objective is to understand what algorithmic design choices and configuration options lead to optimal performance of this algorithm on the GPU. We explore the performance impact of using different memory access patterns, of using different types of device/on-chip memories, of using strictly aligned and unaligned memory, and of varying the size/shape of thread blocks. Our results reveal optimal configuration parameters for our algorithm when executed sample 3D medical data set, and show performance gains ranging from 30x to over 200x as compared to a single-threaded CPU implementation

Crossref

UNT Digital Library

DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives

Author: Bethel E. Wes
Camp David
Childs Hank
Heinemann Colleen
Lessley Brenton
Perciano Talita
Publication venue
Publication date: 13/09/2018
Field of study

We present a new parallel algorithm for probabilistic graphical model optimization. The algorithm relies on data-parallel primitives (DPPs), which provide portable performance over hardware architecture. We evaluate results on CPUs and GPUs for an image segmentation problem. Compared to a serial baseline, we observe runtime speedups of up to 13X (CPU) and 44X (GPU). We also compare our performance to a reference, OpenMP-based algorithm, and find speedups of up to 7X (CPU).Comment: LDAV 2018, October 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Recommended from our members

Hybrid Parallelism for Volume Rendering on Large, Multi- and Many-core Systems

Author: Bethel E. Wes
Childs Hank
Howison Mark
Publication venue: Lawrence Berkeley National Laboratory
Publication date: 01/01/2011
Field of study

With the computing industry trending towards multi- and many-core processors, we study how a standard visualization algorithm, ray-casting volume rendering, can benefit from a hybrid parallelism approach. Hybrid parallelism provides the best of both worlds: using distributed-memory parallelism across a large numbers of nodes increases available FLOPs and memory, while exploiting shared-memory parallelism among the cores within each node ensures that each node performs its portion of the larger calculation as efficiently as possible. We demonstrate results from weak and strong scaling studies, at levels of concurrency ranging up to 216,000, and with datasets as large as 12.2 trillion cells. The greatest benefit from hybrid parallelism lies in the communication portion of the algorithm, the dominant cost at higher levels of concurrency. We show that reducing the number of participants with a hybrid approach significantly improves performance

UNT Digital Library

Recommended from our members

Interactive, Internet Delivery of Visualization via Structured,Prerendered multiresolution Imagery

Author: Bethel E. Wes
Chen Jerry
Yoon Ilmi
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 25/10/2007
Field of study

One of the fundamental problems in remote visualization --where I/O and data intensive visualization activities take place at acentrally located supercomputer center and resulting imagery is deliveredto a remotely located user -- is reduced interactivity resulting from thecombination of high network latency and relatively low network bandwidth.This research project has produced a novel approach for latency-tolerantdelivery of visualization and rendering results where client-side framerate display performance is independent of source dataset size, imagesize, visualization technique or rendering complexity. As such, it is asuitable solution for remote visualization image delivery for anyvisualization or rendering application that can generate image frames inan ordered fashion. This new capability is suitable for use in addressingmany of ASCR s remote visualization needs, particularly deployment atopen computing facilities to provide remote visualization capabilities toteams of scientific researchers

UNT Digital Library

Recommended from our members

Improving Performance of M-to-N Processing and Data Redistribution in In Transit Analysis and Visualization

Author: Bethel E Wes
Ferrier Nicola
Gu Junmin
Kress James
Logan Jeremey
Loring Burlen
Rizzi Silvio
Shudler Sergei
Wolf Matthew
Publication venue: eScholarship, University of California
Publication date: 25/05/2020
Field of study

In an in transit setting, a parallel data producer, such as a numerical simulation, runs on one set of ranks M, while a data consumer, such as a parallel visualization application, runs on a different set of ranks N. One of the central challenges in this in transit setting is to determine the mapping of data from the set of M producer ranks to the set of N consumer ranks. This is a challenging problem for several reasons, such as the producer and consumer codes potentially having different scaling characteristics and different data models. The resulting mapping from M to N ranks can have a significant impact on aggregate application performance. In this work, we present an approach for performing this M-to-N mapping in a way that has broad applicability across a diversity of data producer and consumer applications. We evaluate its design and performance with a study that runs at high concurrency on a modern HPC platform. By leveraging design characteristics, which facilitate an “intelligent” mapping from M-to-N, we observe significant performance gains are possible in terms of several different metrics, including time-to-solution and amount of data moved

eScholarship - University of California

Performance Analysis of Traditional and Data-Parallel Primitive Implementations of Visualization and Analysis Kernels

Author: Bethel E. Wes
Camp David
Heinemann Colleen
Perciano Talita
Publication venue
Publication date: 05/10/2020
Field of study

Measurements of absolute runtime are useful as a summary of performance when studying parallel visualization and analysis methods on computational platforms of increasing concurrency and complexity. We can obtain even more insights by measuring and examining more detailed measures from hardware performance counters, such as the number of instructions executed by an algorithm implemented in a particular way, the amount of data moved to/from memory, memory hierarchy utilization levels via cache hit/miss ratios, and so forth. This work focuses on performance analysis on modern multi-core platforms of three different visualization and analysis kernels that are implemented in different ways: one is "traditional", using combinations of C++ and VTK, and the other uses a data-parallel approach using VTK-m. Our performance study consists of measurement and reporting of several different hardware performance counters on two different multi-core CPU platforms. The results reveal interesting performance differences between these two different approaches for implementing these kernels, results that would not be apparent using runtime as the only metric

arXiv.org e-Print Archive

eScholarship - University of California

Recommended from our members

Federal Market Information Technology in the Post Flash Crash Era: Roles for Supercomputing

Author: Bethel E. Wes
Leinweber David
Ruebel Oliver
Wu Kesheng
Publication venue: Lawrence Berkeley National Laboratory
Publication date: 16/09/2011
Field of study

This paper describes collaborative work between active traders, regulators, economists, and supercomputing researchers to replicate and extend investigations of the Flash Crash and other market anomalies in a National Laboratory HPC environment. Our work suggests that supercomputing tools and methods will be valuable to market regulators in achieving the goal of market safety, stability, and security. Research results using high frequency data and analytics are described, and directions for future development are discussed. Currently the key mechanism for preventing catastrophic market action are “circuit breakers.” We believe a more graduated approach, similar to the “yellow light” approach in motorsports to slow down traffic, might be a better way to achieve the same goal. To enable this objective, we study a number of indicators that could foresee hazards in market conditions and explore options to confirm such predictions. Our tests confirm that Volume Synchronized Probability of Informed Trading (VPIN) and a version of volume Herfindahl-Hirschman Index (HHI) for measuring market fragmentation can indeed give strong signals ahead of the Flash Crash event on May 6 2010. This is a preliminary step toward a full-fledged early-warning system for unusual market conditions

UNT Digital Library

Accelerating Network Traffic Analytics Using Query-DrivenVisualization

Author: Bethel E. Wes
Campbell Scott
Dart Eli
Stockinger Kurt
Wu Kesheng
Publication venue: Lawrence Berkeley National Laboratory
Publication date: 01/01/2006
Field of study

Realizing operational analytics solutions where large and complex data must be analyzed in a time-critical fashion entails integrating many different types of technology. This paper focuses on an interdisciplinary combination of scientific data management and visualization/analysis technologies targeted at reducing the time required for data filtering, querying, hypothesis testing and knowledge discovery in the domain of network connection data analysis. We show that use of compressed bitmap indexing can quickly answer queries in an interactive visual data analysis application, and compare its performance with two alternatives for serial and parallel filtering/querying on 2.5 billion records worth of network connection data collected over a period of 42 weeks. Our approach to visual network connection data exploration centers on two primary factors: interactive ad-hoc and multiresolution query formulation and execution over n dimensions and visual display of then-dimensional histogram results. This combination is applied in a case study to detect a distributed network scan and to then identify the set of remote hosts participating in the attack. Our approach is sufficiently general to be applied to a diverse set of data understanding problems as well as used in conjunction with a diverse set of analysis and visualization tools

CiteSeerX

Crossref

eScholarship - University of California

UNT Digital Library

Recommended from our members

Query-Driven Visualization of Time-Varying Adaptive Mesh Refinement Data

Author: Anderson John C.
Bethel E. Wes
Gosink Luke J.
Joy Kenneth I.
Publication venue: Lawrence Berkeley National Laboratory
Publication date: 01/08/2008
Field of study

The visualization and analysis of AMR-based simulations is integral to the process of obtaining new insight in scientific research. We present a new method for performing query-driven visualization and analysis on AMR data, with specific emphasis on time-varying AMR data. Our work introduces a new method that directly addresses the dynamic spatial and temporal properties of AMR grids which challenge many existing visualization techniques. Further, we present the first implementation of query-driven visualization on the GPU that uses a GPU-based indexing structure to both answer queries and efficiently utilize GPU memory. We apply our method to two different science domains to demonstrate its broad applicability

UNT Digital Library

Recommended from our members

Visualization and Analysis of 3D Gene Expression Data

Author: Bethel E. Wes
Hagen Hans
Hamann Bernd
Rubel Oliver
Weber Gunther H.
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 25/10/2007
Field of study

Recent methods for extracting precise measurements ofspatial gene expression patterns from three-dimensional (3D) image dataopens the way for new analysis of the complex gene regulatory networkscontrolling animal development. To support analysis of this novel andhighly complex data we developed PointCloudXplore (PCX), an integratedvisualization framework that supports dedicated multi-modal, physical andinformation visualization views along with algorithms to aid in analyzingthe relationships between gene expression levels. Using PCX, we helpedour science stakeholders to address many questions in 3D gene expressionresearch, e.g., to objectively define spatial pattern boundaries andtemporal profiles of genes and to analyze how mRNA patterns arecontrolled by their regulatory transcription factors

eScholarship - University of California

UNT Digital Library