2 research outputs found
Tools for improving performance portability in heterogeneous environments
Programa Oficial de Doutoramento en Investigaci贸n en Tecnolox铆as da Informaci贸n. 524V01[Abstract]
Parallel computing is currently partially dominated by the availability of heterogeneous
devices. These devices differ from each other in aspects such as the
instruction set they execute, the number and the type of computing devices that
they offer or the structure of their memory systems. In the last years, langnages,
libraries and extensions have appeared to allow to write a parallel code once aud
run it in a wide variety of devices, OpenCL being the most widespread solution of
this kind. However, functional portability does not imply performance portability.
This way, one of the probletns that is still open in this field is to achieve automatic
performance portability. That is, the ability to automatically tune a given code for
any device where it will be execnted so that it ill obtain a good performance. This
thesis develops three different solutions to tackle this problem. The three of them
are based on typical source-to-sonrce optimizations for heterogeneous devices. Both
the set of optimizations to apply and the way they are applied depend on different
optimization parameters, whose values have to be tuned for each specific device.
The first solution is OCLoptimizer, a source-to-source optimizer that can optimize
annotated OpenCL kemels with the help of configuration files that guide the
optimization process. The tool optimizes kernels for a specific device, and it is also
able to automate the generation of functional host codes when only a single kernel
is optimized.
The two remaining solutions are built on top of the Heterogeneous Programming
Library (HPL), a C++ framework that provides an easy and portable way to exploit
heterogeneous computing systexns. The first of these solutions uses the run-time
code generation capabilities of HPL to generate a self-optimizing version of a matrix
multiplication that can optimize itself at run-time for an spedfic device. The last solut铆on is the development of a built-in just-in-time optirnizer for HPL, that can
optirnize, at run-tirne, a HPL code for an specific device. While the first two solutions
use search processes to find the best values for the optimization parameters, this Iast
alternative relies on heuristics bMed on general optirnization strategies.[Resumen]
Actualmente la computaci贸n paralela se encuentra dominada parcialmente por
los m煤ltiples dispositivos heterog茅neos disponibles. Estos dispositivos difieren entre
s铆 en caracter铆sticas tales como el conjunto de instrucciones que ejecutan, el n煤mero
y tipo de unidades de computaci贸n que incluyen o la estructura de sus sistemas de
memoria. Durante los 煤ltimos a帽os han aparecido lenguajes, librer铆as y extensiones
que permiten escribir una 煤nica vez la versi贸n paralela de un c贸digo y ejecutarla en
un amplio abanico de dispositivos, siendo de entre todos ellos OpenCL la soluci贸n
m谩s extendida. Sin embargo, la portabilidad funcional no implica portabilidad de
rendimiento. As铆, uno de los grandes problemas que sigue abierto en este campo
es la automatizaci贸n de la portabilidad de rendimiento, es decir, la capacidad de
adaptar autom谩ticamente un c贸digo dado para su ejecuci贸n en cualquier dispositivo
y obtener un buen rendimiento. Esta tesis aborda este problema planteando tres
soluciones diferentes al mismo. Las tres se basan en la aplicaci贸n de optimizaciones
de c贸digo a c贸digo usadas habitualmente en dispositivos heterog茅neos. Tanto el
conjunto de optimizaciones a aplicar como la forma de aplicarlas dependen de varios
par谩metros de optimizaci贸n, cuyos valores han de ser ajustados para cada dispositivo
concreto.
La primera soluci贸n planteada es OCLoptirnizer, un optimizador de c贸digo a
c贸digo que a partir de kernels OpenCL anotados y ficheros de configuraci贸n como
apoyo, obtiene versiones optimizada de dichos kernels para un dispositivo concreto.
Adem谩s, cuando el kernel a optimizar es 煤nico, automatiza la generaci贸n de un
c贸digo de host funcional para ese kernel.
Las otras dos soluciones han sido implementadas utilizando Heterogeneous Prograrnming
LibranJ (HPL), una librer铆a C++ que permite programar sistemas heterog茅neos de forma f谩cil y portable. La primera de estas soluciones explota las
capacidades de generaci贸n de c贸digo en tiempo de ejecuci贸n de HPL para generar
versiones de un producto de matrices que se adaptan autom谩ticamente en tiempo
de ejecuci贸n a las caracter铆sticas de un dispositivo concreto. La 煤ltima soluci贸n consiste
en el desarrollo e incorporaci贸n a HPL de un optimizador al vuelo, de fonna
que se puedan obtener en tiempo de ejecuci贸n versiones optimizadas de un c贸digo
HPL para un dispositivo dado. Mientras las dos primeras soluciones usan procesos
de b煤squeda para encontrar los mejores valores para los par谩metros de optimizaci贸n,
esta 煤ltima altemativa se basa para ello en heur铆sticas definidas a partir de
recomendaciones generales de optimizaci贸n.[Resumo]
Actualmente a computaci贸n paralela at贸pase dominada parcialmente polos m煤ltiples
dispositivos heterox茅neos dispo帽ibles. Estes dispositivos difiren entre si en caracter铆sticas
tales como o conxunto de instrucci贸ns que executan, o n煤mero e tipo
de unidades de computaci贸n que incl煤en ou a estrutura dos seus sistemas de mem~
r铆a. Nos 煤ltimos anos apareceron linguaxes, bibliotecas e extensi贸ns que permiten
escribir unha soa vez a versi贸n paralela dun c贸digo e executala nun amplio abano de
dispositivos, senda de entre todos eles OpenCL a soluci贸n m谩is extendida. Por茅n, a
portabilidade funcional non implica portabilidade de rendemento. Deste xeito, uns
dos grandes problemas que segue aberto neste campo 茅 a automatizaci贸n da portabilidade
de rendemento, isto 茅, a capacidade de adaptar automaticamente un c贸digo
dado para a s煤a execuci贸n en calquera dispositivo e obter un bo rendemento. Esta
tese aborda este problema propondo tres soluci贸ns diferentes. As tres est谩n baseadas
na aplicaci贸n de optimizaci贸ns de c贸digo a c贸digo usadas habitualmente en disp~
sitivos heterox茅neos. Tanto o conxunto de optimizaci贸ns a aplicar como a forma de
aplicalas dependen de varios par谩metros de optimizaci贸n para os que 茅 preciso fixar
determinados valores en funci贸n do dispositivo concreto.
A primeira soluci贸n pro posta 茅 OCLoptirnizer, un optimizador de c贸digo a c贸digo
que partindo de kemels OpenCL anotados e ficheiros de configuraci贸n de apoio,
obt茅n versi贸ns optimizadas dos devanditos kernels para un dispositivo concreto.
Amais, cando o kernel a optimizar茅 煤nico, tarn茅n automatiza a xeraci贸n dun c贸digo
de host funcional para ese kernel.
As outras d煤as soluci贸ns foron implementadas utilizando Heterogeneous Programming
Library (HPL), unha biblioteca C++ que permite programar sistemas
heterox茅neos de xeito f谩cil e portable. A primeira destas soluci贸ns explota as capacidades de xeraci贸n de c贸digo en tempo de execuci贸n de HPL para xerar versi贸ns
dun produto de matrices que se adaptan automaticamente 谩s caracter铆sticas dun
dispositivo concreto. A 煤ltima soluci贸n consiste no deseuvolvemento e incorporaci贸n
a HPL dun optimizador capaz de obter en tiempo de execuci贸n versi贸ns optimizada<;
dun c贸digo HPL para un dispositivo dado. Mentres as d煤as primeiras soluci贸ns usan
procesos de procura para atopar os mellares valores para os par谩metros de optimizaci贸n,
esta 煤ltima alternativa bas茅ase para iso en heur铆sticas definidas a partir de
recomendaci贸ns xerais de optimizaci贸n
Vision 2040: A Roadmap for Integrated, Multiscale Modeling and Simulation of Materials and Systems
Over the last few decades, advances in high-performance computing, new materials characterization methods, and, more recently, an emphasis on integrated computational materials engineering (ICME) and additive manufacturing have been a catalyst for multiscale modeling and simulation-based design of materials and structures in the aerospace industry. While these advances have driven significant progress in the development of aerospace components and systems, that progress has been limited by persistent technology and infrastructure challenges that must be overcome to realize the full potential of integrated materials and systems design and simulation modeling throughout the supply chain. As a result, NASA's Transformational Tools and Technology (TTT) Project sponsored a study (performed by a diverse team led by Pratt & Whitney) to define the potential 25-year future state required for integrated multiscale modeling of materials and systems (e.g., load-bearing structures) to accelerate the pace and reduce the expense of innovation in future aerospace and aeronautical systems. This report describes the findings of this 2040 Vision study (e.g., the 2040 vision state; the required interdependent core technical work areas, Key Element (KE); identified gaps and actions to close those gaps; and major recommendations) which constitutes a community consensus document as it is a result of over 450 professionals input obtain via: 1) four society workshops (AIAA, NAFEMS, and two TMS), 2) community-wide survey, and 3) the establishment of 9 expert panels (one per KE) consisting on average of 10 non-team members from academia, government and industry to review, update content, and prioritize gaps and actions. The study envisions the development of a cyber-physical-social ecosystem comprised of experimentally verified and validated computational models, tools, and techniques, along with the associated digital tapestry, that impacts the entire supply chain to enable cost-effective, rapid, and revolutionary design of fit-for-purpose materials, components, and systems. Although the vision focused on aeronautics and space applications, it is believed that other engineering communities (e.g., automotive, biomedical, etc.) can benefit as well from the proposed framework with only minor modifications. Finally, it is TTT's hope and desire that this vision provides the strategic guidance to both public and private research and development decision makers to make the proposed 2040 vision state a reality and thereby provide a significant advancement in the United States global competitiveness