9 research outputs found

    Lichttransportsimulation auf Spezialhardware

    Get PDF
    It cannot be denied that the developments in computer hardware and in computer algorithms strongly influence each other, with new instructions added to help with video processing, encryption, and in many other areas. At the same time, the current cap on single threaded performance and wide availability of multi-threaded processors has increased the focus on parallel algorithms. Both influences are extremely prominent in computer graphics, where the gaming and movie industries always strive for the best possible performance on the current, as well as future, hardware. In this thesis we examine the hardware-algorithm synergies in the context of ray tracing and Monte-Carlo algorithms. First, we focus on the very basic element of all such algorithms - the casting of rays through a scene, and propose a dedicated hardware unit to accelerate this common operation. Then, we examine existing and novel implementations of many Monte-Carlo rendering algorithms on massively parallel hardware, as full hardware utilization is essential for peak performance. Lastly, we present an algorithm for tackling complex interreflections of glossy materials, which is designed to utilize both powerful processing units present in almost all current computers: the Centeral Processing Unit (CPU) and the Graphics Processing Unit (GPU). These three pieces combined show that it is always important to look at hardware-algorithm mapping on all levels of abstraction: instruction, processor, and machine.Zweifelsohne beeinflussen sich Computerhardware und Computeralgorithmen gegenseitig in ihrer Entwicklung: Prozessoren bekommen neue Instruktionen, um zum Beispiel Videoverarbeitung, Verschlüsselung oder andere Anwendungen zu beschleunigen. Gleichzeitig verstärkt sich der Fokus auf parallele Algorithmen, bedingt durch die limitierte Leistung von für einzelne Threads und die inzwischen breite Verfügbarkeit von multi-threaded Prozessoren. Beide Einflüsse sind im Grafikbereich besonders stark , wo es z.B. für die Spiele- und Filmindustrie wichtig ist, die bestmögliche Leistung zu erreichen, sowohl auf derzeitiger und zukünftiger Hardware. In Rahmen dieser Arbeit untersuchen wir die Synergie von Hardware und Algorithmen anhand von Ray-Tracing- und Monte-Carlo-Algorithmen. Zuerst betrachten wir einen grundlegenden Hardware-Bausteins für alle diese Algorithmen, die Strahlenverfolgung in einer Szene, und präsentieren eine spezielle Hardware-Einheit zur deren Beschleunigung. Anschließend untersuchen wir existierende und neue Implementierungen verschiedener MonteCarlo-Algorithmen auf massiv-paralleler Hardware, wobei die maximale Auslastung der Hardware im Fokus steht. Abschließend stellen wir dann einen Algorithmus zur Berechnung von komplexen Beleuchtungseffekten bei glänzenden Materialien vor, der versucht, die heute fast überall vorhandene Kombination aus Hauptprozessor (CPU) und Grafikprozessor (GPU) optimal auszunutzen. Zusammen zeigen diese drei Aspekte der Arbeit, wie wichtig es ist, Hardware und Algorithmen auf allen Ebenen gleichzeitig zu betrachten: Auf den Ebenen einzelner Instruktionen, eines Prozessors bzw. eines gesamten Systems

    Optimising a fluid plasma turbulence simulation on modern high performance computers

    Get PDF
    Nuclear fusion offers the potential of almost limitless energy from sea water and lithium without the dangers of carbon emissions or long term radioactive waste. At the forefront of fusion technology are the tokamaks, toroidal magnetic confinement devices that contain miniature stars on Earth. Nuclei can only fuse by overcoming the strong electrostatic forces between them which requires high temperatures and pressures. The temperatures in a tokamak are so great that the Deuterium-Tritium fusion fuel forms a plasma which must be kept hot and under pressure to maintain the fusion reaction. Turbulence in the plasma causes disruption by transporting mass and energy away from this core, reducing the efficiency of the reaction. Understanding and controlling the mechanisms of plasma turbulence is key to building a fusion reactor capable of producing sustained output. The extreme temperatures make detailed empirical observations difficult to acquire, so numerical simulations are used as an additional method of investigation. One numerical model used to study turbulence and diffusion is CENTORI, a direct two-fluid magneto-hydrodynamic simulation of a tokamak plasma developed by the Culham Centre for Fusion Energy (CCFE formerly UKAEA:Fusion). It simulates the entire tokamak plasma with realistic geometry, evolving bulk plasma quantities like pressure, density and temperature through millions of timesteps. This requires CENTORI to run in parallel on a Massively Parallel Processing (MPP) supercomputer to produce results in an acceptable time. Any improvements in CENTORI’s performance increases the rate and/or total number of results that can be obtained from access to supercomputer resources. This thesis presents the substantial effort to optimise CENTORI on the current generation of academic supercomputers. It investigates and reviews the properties of contemporary computer architectures then proposes, implements and executes a benchmark suite of CENTORI’s fundamental kernels. The suite is used to compare the performance of three competing memory layouts of the primary vector data structure using a selection of compilers on a variety of computer architectures. The results show there is no optimal memory layout on all platforms so a flexible optimisation strategy was adopted to pursue “portable” optimisation i.e optimisations that can easily be added, adapted or removed from future platforms depending on their performance. This required designing an interface to functions and datatypes that separate CENTORI’s fundamental algorithms from repetitive, low-level implementation details. This approach offered multiple benefits including: the clearer representation of CENTORI’s core equations as mathematical expressions in Fortran source code allows rapid prototyping and development of new features; the reduction in the total data volume by a factor of three reduces the amount of data transferred over the memory bus to almost a third; and the reduction in the number of intense floating point kernels reduces the effort of optimising the application on new platforms. The project proceeds to rewrite CENTORI using the new Application Programming Interface (API) and evaluates two optimised implementations. The first is a traditional library implementation that uses hand optimised subroutines to implement the library functions. The second uses a dynamic optimisation engine to perform automatic stripmining to improve the performance of the memory hierarchy. The automatic stripmining implementation uses lazy evaluation to delay calculations until absolutely necessary, allowing it to identify temporary data structures and minimise them for optimal cache use. This novel technique is combined with highly optimised implementations of the kernel operations and optimised parallel communication routines to produce a significant improvement in CENTORI’s performance. The maximum measured speed up of the optimised versions over the original code was 3.4 times on 128 processors on HPCx, 2.8 times on 1024 processors on HECToR and 2.3 times on 256 processors on HPC-FF

    Proceedings, MSVSCC 2015

    Get PDF
    The Virginia Modeling, Analysis and Simulation Center (VMASC) of Old Dominion University hosted the 2015 Modeling, Simulation, & Visualization Student capstone Conference on April 16th. The Capstone Conference features students in Modeling and Simulation, undergraduates and graduate degree programs, and fields from many colleges and/or universities. Students present their research to an audience of fellow students, faculty, judges, and other distinguished guests. For the students, these presentations afford them the opportunity to impart their innovative research to members of the M&S community from academic, industry, and government backgrounds. Also participating in the conference are faculty and judges who have volunteered their time to impart direct support to their students’ research, facilitate the various conference tracks, serve as judges for each of the tracks, and provide overall assistance to this conference. 2015 marks the ninth year of the VMASC Capstone Conference for Modeling, Simulation and Visualization. This year our conference attracted a number of fine student written papers and presentations, resulting in a total of 51 research works that were presented. This year’s conference had record attendance thanks to the support from the various different departments at Old Dominion University, other local Universities, and the United States Military Academy, at West Point. We greatly appreciated all of the work and energy that has gone into this year’s conference, it truly was a highly collaborative effort that has resulted in a very successful symposium for the M&S community and all of those involved. Below you will find a brief summary of the best papers and best presentations with some simple statistics of the overall conference contribution. Followed by that is a table of contents that breaks down by conference track category with a copy of each included body of work. Thank you again for your time and your contribution as this conference is designed to continuously evolve and adapt to better suit the authors and M&S supporters. Dr.Yuzhong Shen Graduate Program Director, MSVE Capstone Conference Chair John ShullGraduate Student, MSVE Capstone Conference Student Chai

    High-level compiler analysis for OpenMP

    Get PDF
    Nowadays, applications from dissimilar domains, such as high-performance computing and high-integrity systems, require levels of performance that can only be achieved by means of sophisticated heterogeneous architectures. However, the complex nature of such architectures hinders the production of efficient code at acceptable levels of time and cost. Moreover, the need for exploiting parallelism adds complications of its own (e.g., deadlocks, race conditions,...). In this context, compiler analysis is fundamental for optimizing parallel programs. There is however a trade-off between complexity and profit: low complexity analyses (e.g., reaching definitions) provide information that may be insufficient for many relevant transformations, and complex analyses based on mathematical representations (e.g., polyhedral model) give accurate results at a high computational cost. A range of parallel programming models providing different levels of programmability, performance and portability enable the exploitation of current architectures. However, OpenMP has proved many advantages over its competitors: 1) it delivers levels of performance comparable to highly tunable models such as CUDA and MPI, and better robustness than low level libraries such as Pthreads; 2) the extensions included in the latest specification meet the characteristics of current heterogeneous architectures (i.e., the coupling of a host processor to one or more accelerators, and the capability of expressing fine-grained, both structured and unstructured, and highly-dynamic task parallelism); 3) OpenMP is widely implemented by several chip (e.g., Kalray MPPA, Intel) and compiler (e.g., GNU, Intel) vendors; and 4) although currently the model lacks resiliency and reliability mechanisms, many works, including this thesis, pursue their introduction in the specification. This thesis addresses the study of compiler analysis techniques for OpenMP with two main purposes: 1) enhance the programmability and reliability of OpenMP, and 2) prove OpenMP as a suitable model to exploit parallelism in safety-critical domains. Particularly, the thesis focuses on the tasking model because it offers the flexibility to tackle the parallelization of algorithms with load imbalance, recursiveness and uncountable loop based kernels. Additionally, current works have proved the time-predictability of this model, shortening the distance towards its introduction in safety-critical domains. To enable the analysis of applications using the OpenMP tasking model, the first contribution of this thesis is the extension of a set of classic compiler techniques with support for OpenMP. As a basis for including reliability mechanisms, the second contribution consists of the development of a series of algorithms to statically detect situations involving OpenMP tasks, which may lead to a loss of performance, non-deterministic results or run-time failures. A well-known problem of parallel processing related to compilers is the static scheduling of a program represented by a directed graph. Although the literature is extensive in static scheduling techniques, the work related to the generation of the task graph at compile-time is very scant. Compilers are limited by the knowledge they can extract, which depends on the application and the programming model. The third contribution of this thesis is the generation of a predicated task dependency graph for OpenMP that can be interpreted by the runtime in such a way that the cost of solving dependences is reduced to the minimum. With the previous contributions as a basis for determining the functional safety of OpenMP, the final contribution of this thesis is the adaptation of OpenMP to the safety-critical domain considering two directions: 1) indicating how OpenMP can be safely used in such a domain, and 2) integrating OpenMP into Ada, a language widely used in the safety-critical domain.Actualment, aplicacions de dominis diversos com la computació d'altes prestacions i els sistemes d'alta integritat, requereixen nivells de rendiment assolibles només mitjançant arquitectures heterogènies sofisticades. No obstant, la natura complexa d'aquestes dificulta la producció de codi eficient en un temps i cost acceptables. A més, la necessitat d’explotar paral·lelisme introdueix complicacions en sí mateixa (p. ex. bloqueig mutu, condicions de carrera,...). En aquest context, l'anàlisi de compiladors és fonamental per optimitzar programes paral·lels. Existeix però un equilibri entre complexitat i beneficis: la informació obtinguda amb anàlisis simples (p. ex. definicions abastables) pot ser insuficient per moltes transformacions rellevants, i anàlisis complexos basats en models matemàtics (p. ex. model polièdric) faciliten resultats acurats a un alt cost computacional. Existeixen molts models de programació paral·lela que proporcionen diferents nivells de programabilitat, rendiment i portabilitat per l'explotació de les arquitectures actuals. En aquest marc, OpenMP ha demostrat molts avantatges respecte dels seus competidors: 1) el seu nivell de rendiment és comparable a models molt ajustables com CUDA i MPI, i proporciona més robustesa que llibreries de baix nivell com Pthreads; 2) les extensions que inclou la darrera especificació satisfan les característiques de les actuals arquitectures heterogènies (és a dir, l’acoblament d’un processador principal i un o més acceleradors, i la capacitat d'expressar paral·lelisme de tasques de gra fi, ja sigui estructurat o sense estructura; 3) OpenMP és àmpliament implementat per venedors de xips (p. ex. Kalray MPPA, Intel) i compiladors (p. ex. GNU, Intel); i 4) tot i que el model actual manca de mecanismes de resiliència i fiabilitat, molts treballs, incloent aquesta tesi, busquen la seva introducció a l'especificació. Aquesta tesi adreça l'estudi de tècniques d’anàlisi de compiladors amb dos objectius: 1) millorar la programabilitat i la fiabilitat de OpenMP, i 2) provar que OpenMP és un model adequat per explotar paral·lelisme en sistemes crítics. En particular, la tesi es centra en el model de tasques per què aquest ofereix la flexibilitat per abordar aplicacions amb problemes de balanceig de càrrega, recursivitat i bucles incomptables. A més, treballs recents han provat la predictibilitat en qüestió de temps del model, escurçant la distància cap a la seva introducció en sistemes crítics. Per a poder analitzar aplicacions que utilitzen el model de tasques d’OpenMP, la primera contribució d’aquesta tesi consisteix en l’extensió d'un conjunt de tècniques clàssiques de compilació per suportar OpenMP. Com a base per incloure mecanismes de fiabilitat, la segona contribució consisteix en el desenvolupament duna sèrie d'algorismes per detectar de forma estàtica situacions que involucren tasques d’OpenMP, i que poden conduir a una pèrdua de rendiment, resultats no deterministes, o fallades en temps d’execució. Un problema ben conegut del processament paral·lel relacionat amb els compiladors és la planificació estàtica d’un programa representat mitjançant un graf dirigit. Tot i que la literatura sobre planificació estàtica és extensa, aquella relacionada amb la generació del graf en temps de compilació és molt escassa. Els compiladors estan limitats pel coneixement que poden extreure, que depèn de l’aplicació i del model de programació. La tercera contribució de la tesi és la generació d’un graf de dependències enriquit que pot ser interpretat pel sistema en temps d’execució de manera que el cost de resoldre les dependències sigui mínim. Amb les anteriors contribucions com a base per a determinar la seguretat funcional de OpenMP, la darrera contribució de la tesi consisteix en adaptar OpenMP a sistemes crítics, explorant dues direccions: 1) indicar com OpenMP es pot utilitzar de forma segura en un domini com, i 2) integrar OpenMP en Ada, un llenguatge molt utilitzat en el domini de seguretat.Postprint (published version

    Understanding Quantum Technologies 2022

    Full text link
    Understanding Quantum Technologies 2022 is a creative-commons ebook that provides a unique 360 degrees overview of quantum technologies from science and technology to geopolitical and societal issues. It covers quantum physics history, quantum physics 101, gate-based quantum computing, quantum computing engineering (including quantum error corrections and quantum computing energetics), quantum computing hardware (all qubit types, including quantum annealing and quantum simulation paradigms, history, science, research, implementation and vendors), quantum enabling technologies (cryogenics, control electronics, photonics, components fabs, raw materials), quantum computing algorithms, software development tools and use cases, unconventional computing (potential alternatives to quantum and classical computing), quantum telecommunications and cryptography, quantum sensing, quantum technologies around the world, quantum technologies societal impact and even quantum fake sciences. The main audience are computer science engineers, developers and IT specialists as well as quantum scientists and students who want to acquire a global view of how quantum technologies work, and particularly quantum computing. This version is an extensive update to the 2021 edition published in October 2021.Comment: 1132 pages, 920 figures, Letter forma

    Proceedings of the 19th Sound and Music Computing Conference

    Get PDF
    Proceedings of the 19th Sound and Music Computing Conference - June 5-12, 2022 - Saint-Étienne (France). https://smc22.grame.f
    corecore