8 research outputs found
Recommended from our members
Inter-loop optimizations in RAJA using loop chains
Typical parallelization approaches such as OpenMP and CUDA provide constructs for parallelizing and blocking for data locality for individual loops. By focusing on each loop separately, these approaches fail to leverage sources of data locality possible due to inter-loop data reuse. The loop chain abstraction provides a framework for reasoning about and applying inter-loop optimizations. In this work, we incorporate the loop chain abstraction into RAJA, a performance portability layer for high-performance computing applications. Using the loop-chain-extended RAJA, or RAJALC, developers can have the RAJA library apply loop transformations like loop fusion and overlapped tiling while maintaining the original structure of their programs. By introducing targeted symbolic evaluation capabilities, we can collect and cache data access information required to verify loop transformations. We evaluate the performance improvement and refactoring costs of our extension. Overall, our results demonstrate 85-98% of the performance improvements of hand-optimized kernels with dramatically fewer code changes. © 2021 Association for Computing Machinery.Immediate accessThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
A Fast Parallel Graph Partitioner for Shared-Memory Inspector/Executor Strategies
Abstract. Graph partitioners play an important role in many parallel work distribution and locality optimization approaches. Surprisingly, however, to our knowledge there is no freely available parallel graph partitioner designed for execution on a shared memory multicore system. This paper presents a shared memory parallel graph partitioner, ParCubed, for use in the context of sparse tiling run-time data and computation reordering. Sparse tiling is a run-time scheduling technique that schedules groups of iterations across loops together when they access the same data and one or more of the loops contains indirect array accesses. For sparse tiling, which is implemented with an inspector/executor strategy, the inspector needs to find an initial seed partitioning of adequate quality very quickly. We compare our presented hierarchical clustering partitioner, ParCubed, with GPart and METIS in terms of partitioning speed, partitioning quality, and the effect the generated seed partitions have on executor speed. We find that the presented partitioner is 25 to 100 times faster than METIS on a 16 core machine. The total edge cut of the partitioning generated by ParCubed was found not to exceed 1.27x that of the partitioning found by METIS
Set and Relation Manipulation for the Sparse Polyhedral Framework
Abstract. The Sparse Polyhedral Framework (SPF) extends the Polyhedral Model by using the uninterpreted function call abstraction for the compile-time specification of run-time reordering transformations such as loop and data reordering and sparse tiling approaches that schedule irregular sets of iteration across loops. The Polyhedral Model represents sets of iteration points in imperfectly nested loops with unions of polyhedral and represents loop transformations with affine functions applied to such polyhedra sets. Existing tools such as ISL, Cloog, and Omega manipulate polyhedral sets and affine functions, however the ability to represent the sets and functions where some of the constraints include uninterpreted function calls such as those needed in the SPF is non-existant or severely restricted. This paper presents algorithms for manipulating sets and relations with uninterpreted function symbols to enable the Sparse Polyhedral Framework. The algorithms have been implemented in an open source, C++ library called IEGenLib (The Inspector/Executor Generator Library).
Vorapaxar in the secondary prevention of atherothrombotic events
Item does not contain fulltextBACKGROUND: Thrombin potently activates platelets through the protease-activated receptor PAR-1. Vorapaxar is a novel antiplatelet agent that selectively inhibits the cellular actions of thrombin through antagonism of PAR-1. METHODS: We randomly assigned 26,449 patients who had a history of myocardial infarction, ischemic stroke, or peripheral arterial disease to receive vorapaxar (2.5 mg daily) or matching placebo and followed them for a median of 30 months. The primary efficacy end point was the composite of death from cardiovascular causes, myocardial infarction, or stroke. After 2 years, the data and safety monitoring board recommended discontinuation of the study treatment in patients with a history of stroke owing to the risk of intracranial hemorrhage. RESULTS: At 3 years, the primary end point had occurred in 1028 patients (9.3%) in the vorapaxar group and in 1176 patients (10.5%) in the placebo group (hazard ratio for the vorapaxar group, 0.87; 95% confidence interval [CI], 0.80 to 0.94; P<0.001). Cardiovascular death, myocardial infarction, stroke, or recurrent ischemia leading to revascularization occurred in 1259 patients (11.2%) in the vorapaxar group and 1417 patients (12.4%) in the placebo group (hazard ratio, 0.88; 95% CI, 0.82 to 0.95; P=0.001). Moderate or severe bleeding occurred in 4.2% of patients who received vorapaxar and 2.5% of those who received placebo (hazard ratio, 1.66; 95% CI, 1.43 to 1.93; P<0.001). There was an increase in the rate of intracranial hemorrhage in the vorapaxar group (1.0%, vs. 0.5% in the placebo group; P<0.001). CONCLUSIONS: Inhibition of PAR-1 with vorapaxar reduced the risk of cardiovascular death or ischemic events in patients with stable atherosclerosis who were receiving standard therapy. However, it increased the risk of moderate or severe bleeding, including intracranial hemorrhage. (Funded by Merck; TRA 2P-TIMI 50 ClinicalTrials.gov number, NCT00526474.)