32,498 research outputs found
A Fast and Accurate Cost Model for FPGA Design Space Exploration in HPC Applications
Heterogeneous High-Performance Computing
(HPC) platforms present a significant programming challenge,
especially because the key users of HPC resources are scientists,
not parallel programmers. We contend that compiler technology
has to evolve to automatically create the best program variant
by transforming a given original program. We have developed a
novel methodology based on type transformations for generating
correct-by-construction design variants, and an associated
light-weight cost model for evaluating these variants for
implementation on FPGAs. In this paper we present a key
enabler of our approach, the cost model. We discuss how we
are able to quickly derive accurate estimates of performance
and resource-utilization from the design’s representation in our
intermediate language. We show results confirming the accuracy
of our cost model by testing it on three different scientific
kernels. We conclude with a case-study that compares a solution
generated by our framework with one from a conventional
high-level synthesis tool, showing better performance and
power-efficiency using our cost model based approach
From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation
Starting from a high-level problem description in terms of partial
differential equations using abstract tensor notation, the Chemora framework
discretizes, optimizes, and generates complete high performance codes for a
wide range of compute architectures. Chemora extends the capabilities of
Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient
manner for complex applications, without low-level code tuning. Chemora
achieves parallelism through MPI and multi-threading, combining OpenMP and
CUDA. Optimizations include high-level code transformations, efficient loop
traversal strategies, dynamically selected data and instruction cache usage
strategies, and JIT compilation of GPU code tailored to the problem
characteristics. The discretization is based on higher-order finite differences
on multi-block domains. Chemora's capabilities are demonstrated by simulations
of black hole collisions. This problem provides an acid test of the framework,
as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific
Programmin
- …