3,106 research outputs found
Scheduling and Tuning Kernels for High-performance on Heterogeneous Processor Systems
Accelerated parallel computing techniques using devices such as GPUs and Xeon Phis (along with CPUs) have proposed promising solutions of extending the cutting edge of high-performance computer systems. A significant performance improvement can be achieved when suitable workloads are handled by the accelerator. Traditional CPUs can handle those workloads not well suited for accelerators. Combination of multiple types of processors in a single computer system is referred to as a heterogeneous system. This dissertation addresses tuning and scheduling issues in heterogeneous systems. The first section presents work on tuning scientific workloads on three different types of processors: multi-core CPU, Xeon Phi massively parallel processor, and NVIDIA GPU; common tuning methods and platform-specific tuning techniques are presented. Then, analysis is done to demonstrate the performance characteristics of the heterogeneous system on different input data. This section of the dissertation is part of the GeauxDock project, which prototyped a few state-of-art bioinformatics algorithms, and delivered a fast molecular docking program. The second section of this work studies the performance model of the GeauxDock computing kernel. Specifically, the work presents an extraction of features from the input data set and the target systems, and then uses various regression models to calculate the perspective computation time. This helps understand why a certain processor is faster for certain sets of tasks. It also provides the essential information for scheduling on heterogeneous systems. In addition, this dissertation investigates a high-level task scheduling framework for heterogeneous processor systems in which, the pros and cons of using different heterogeneous processors can complement each other. Thus a higher performance can be achieve on heterogeneous computing systems. A new scheduling algorithm with four innovations is presented: Ranked Opportunistic Balancing (ROB), Multi-subject Ranking (MR), Multi-subject Relative Ranking (MRR), and Automatic Small Tasks Rearranging (ASTR). The new algorithm consistently outperforms previously proposed algorithms with better scheduling results, lower computational complexity, and more consistent results over a range of performance prediction errors. Finally, this work extends the heterogeneous task scheduling algorithm to handle power capping feature. It demonstrates that a power-aware scheduler significantly improves the power efficiencies and saves the energy consumption. This suggests that, in addition to performance benefits, heterogeneous systems may have certain advantages on overall power efficiency
[Activity of Institute for Computer Applications in Science and Engineering]
This report summarizes research conducted at the Institute for Computer Applications in Science and Engineering in applied mathematics, fluid mechanics, and computer science
PerfVis: Pervasive Visualization in Immersive AugmentedReality for Performance Awareness
Developers are usually unaware of the impact of code changes to the
performance of software systems. Although developers can analyze the
performance of a system by executing, for instance, a performance test to
compare the performance of two consecutive versions of the system, changing
from a programming task to a testing task would disrupt the development flow.
In this paper, we propose the use of a city visualization that dynamically
provides developers with a pervasive view of the continuous performance of a
system. We use an immersive augmented reality device (Microsoft HoloLens) to
display our visualization and extend the integrated development environment on
a computer screen to use the physical space. We report on technical details of
the design and implementation of our visualization tool, and discuss early
feedback that we collected of its usability. Our investigation explores a new
visual metaphor to support the exploration and analysis of possibly very large
and multidimensional performance data. Our initial result indicates that the
city metaphor can be adequate to analyze dynamic performance data on a large
and non-trivial software system.Comment: ICPE'19 vision, 4 pages, 2 figure, conferenc
MaterialVis: Material visualization tool using direct volume and surface rendering techniques
Cataloged from PDF version of article.Visualization of the materials is an indispensable part of their structural analysis. We developed a visualization tool for amorphous as well as crystalline structures, called Material Vis. Unlike the existing tools, Material Vis represents material structures as a volume and a surface manifold, in addition to plain atomic coordinates. Both amorphous and crystalline structures exhibit topological features as well as various defects. Material Vis provides a wide range of functionality to visualize such topological structures and crystal defects interactively. Direct volume rendering techniques are used to visualize the volumetric features of materials, such as crystal defects, which are responsible for the distinct fingerprints of a specific sample. In addition, the tool provides surface visualization to extract hidden topological features within the material. Together with the rich set of parameters and options to control the visualization, Material Vis allows users to visualize various aspects of materials very efficiently as generated by modern analytical techniques such as the Atom Probe Tomography. (C) 2014 Elsevier Inc. All rights reserved
Shared-Memory Parallel Maximal Clique Enumeration
We present shared-memory parallel methods for Maximal Clique Enumeration
(MCE) from a graph. MCE is a fundamental and well-studied graph analytics task,
and is a widely used primitive for identifying dense structures in a graph. Due
to its computationally intensive nature, parallel methods are imperative for
dealing with large graphs. However, surprisingly, there do not yet exist
scalable and parallel methods for MCE on a shared-memory parallel machine. In
this work, we present efficient shared-memory parallel algorithms for MCE, with
the following properties: (1) the parallel algorithms are provably
work-efficient relative to a state-of-the-art sequential algorithm (2) the
algorithms have a provably small parallel depth, showing that they can scale to
a large number of processors, and (3) our implementations on a multicore
machine shows a good speedup and scaling behavior with increasing number of
cores, and are substantially faster than prior shared-memory parallel
algorithms for MCE.Comment: 10 pages, 3 figures, proceedings of the 25th IEEE International
Conference on. High Performance Computing, Data, and Analytics (HiPC), 201
Hofstadter butterflies of carbon nanotubes: Pseudofractality of the magnetoelectronic spectrum
The electronic spectrum of a two-dimensional square lattice in a
perpendicular magnetic field has become known as the Hofstadter butterfly
[Hofstadter, Phys. Rev. B 14, 2239 (1976).]. We have calculated
quasi-one-dimensional analogs of the Hofstadter butterfly for carbon nanotubes
(CNTs). For the case of single-wall CNTs, it is straightforward to implement
magnetic fields parallel to the tube axis by means of zone folding in the
graphene reciprocal lattice. We have also studied perpendicular magnetic fields
which, in contrast to the parallel case, lead to a much richer, pseudofractal
spectrum. Moreover, we have investigated magnetic fields piercing double-wall
CNTs and found strong signatures of interwall interaction in the resulting
Hofstadter butterfly spectrum, which can be understood with the help of a
minimal model. Ubiquitous to all perpendicular magnetic field spectra is the
presence of cusp catastrophes at specific values of energy and magnetic field.
Resolving the density of states along the tube circumference allows recognition
of the snake states already predicted for nonuniform magnetic fields in the
two-dimensional electron gas. An analytic model of the magnetic spectrum of
electrons on a cylindrical surface is used to explain some of the results.Comment: 14 pages, 12 figures update to published versio
- …