8 research outputs found

    High performance cluster computing with 3-D nonlinear diffusion filters

    Get PDF
    This paper deals with parallelisation and implementation aspects of PDE-based image processing models for large cluster environments with distributed memory. As an example we focus on nonlinear diffusion filtering which we discretise by means of an additive operator splitting (AOS). We start by decomposing the algorithm into small modules that shall be parallelised separately. For this purpose image partitioning strategies are discussed and their impact on the communication pattern and volume is analysed. Based on the results we develop an algorithmic implementation with excellent scaling properties on massively connected low latency networks. Test runs on a high-end Myrinet cluster yield almost linear speedup factors up to 209 for 256 processors. This results in typical denoising times of 0.5 seconds for five iterations on a 256 x 256 x 128 data cube

    DCT Implementation on GPU

    Get PDF
    There has been a great progress in the field of graphics processors. Since, there is no rise in the speed of the normal CPU processors; Designers are coming up with multi-core, parallel processors. Because of their popularity in parallel processing, GPUs are becoming more and more attractive for many applications. With the increasing demand in utilizing GPUs, there is a great need to develop operating systems that handle the GPU to full capacity. GPUs offer a very efficient environment for many image processing applications. This thesis explores the processing power of GPUs for digital image compression using Discrete cosine transform

    Partial Differential Equations Parallel Solutions

    Get PDF
    Práce se zabývá parciálními diferenciálními rovnicemi, pro jejichž řešení je navržen speciální numerický integrátor zpracovávající operandy ve formátu plovoucí řádové čárky. Návrhy jsou postaveny na principech Eulerovy metody i zpracování více členů Taylorovy řady. Práce ukazuje srovnání paralelního a sériového přístupu ke zpracování mantis a exponentů při numerické integraci. V textu najdeme rovněž srovnání specializovaného numerického integrátoru s dostupnými paralelními systémy.This thesis deals with the concepts of numerical integrator using floating point arithmetic for solving partial differential equations. The integrator uses Euler method and Taylor series. Thesis shows parallel and serial approach to computing with exponents and significands. There is also a comparison between modern parallel systems and the proposed concepts.

    Improvements to physically based cloth simulation

    Get PDF
    Physically based cloth simulation in computer graphics has come a long way since the 1980s. Although extensive methods have been developed, physically based cloth animation remains challenging in a number of aspects, including the efficient simulation of complex internal dynamics, better performance and the generation of more effects of friction in collisions, to name but a few. These opportunities motivate the work presented in this thesis to improve on current state of the art in cloth simulation by proposing methods for cloth bending deformation simulation, collision detection and friction in collision response. The structure of the thesis is as follows. A literature review of work related to physically based cloth simulation including aspects of internal dynamics, collision handling and GPU computing for cloth simulation is given in Chapter 2. In order to provide a basis for understanding of the work of the subsequent chapters of the thesis, Chapter 3 describes and discusses main components of our physically based cloth simulation framework which can be seen as the basis of our developments, as methods presented in the following chapters use this framework. Chapter 4 presents an approach that effectively models cloth non-linear features in bending behaviour, such as energy dissipation, plasticity and fatigue weakening. This is achieved by a simple mathematical approximation to an ideal hysteresis loop at a high level, while in textile research bending non-linearity is computed using complex internal friction models at the geometric structure level. Due to cloth flexibility and the large quantity of triangles, in a robust cloth system collision detection is the most time consuming task. The approach proposed in Chapter 5 improves performance of collision detection using a GPU-based approach employing spatial subdivision. It addresses a common issue, uneven triangle sizes, which can easily impair the spatial subdivision efficiency. To achieve this, a virtual subdivision scheme with a uniform grid is used to virtually subdivide large triangles, resulting in a more appropriate cell size and thus a more efficient subdivision. The other common issue that limits the subdivision efficiency is uneven triangle spatial distributions, and is difficult to tackle via uniform grids because areas with different triangle densities may require different cell sizes. In order to address this problem, Chapter 6 shows how to build an octree grid to adaptively partition space according to triangle spatial distribution on a GPU, which delivers further improvements in the performance of collision detection. Friction is an important component in collision response. Frictional effects include phenomena that are velocity dependent, such as stiction, Stribeck friction, viscous friction and the stick-slip phenomenon, which are not modelled by the classic Coulomb friction model adopted by existing cloth systems. Chapter 7 reports a more comprehensive friction model to capture these additional effects. Chapter 8 concludes this thesis and briefly discusses potential avenues for future work

    Fast and accurate finite-element multigrid solvers for PDE simulations on GPU clusters

    Get PDF
    Der wichtigste Beitrag dieser Dissertation ist es aufzuzeigen, dass Grafikprozessoren (GPUs) als Repräsentanten der Entwicklung hin zu Vielkern-Architekturen sehr gut geeignet sind zur schnellen und genauen Lösung großer, dünn besetzter linearer Gleichungssysteme, insbesondere mit parallelen Mehrgittermethoden auf heterogenen Rechenclustern. Solche Systeme treten bspw. bei der Diskretisierung (elliptischer) partieller Differentialgleichungen mittels finiter Elemente auf. Wir demonstrieren Beschleunigungsfaktoren von mindestens einer Größenordnung gegenüber konventionellen, hochoptimierten CPU-Implementierungen, ohne Verlust von Genauigkeit und Funktionsumfang. Im Detail liefert diese Dissertation die folgenden Beiträge: Berechnungen in einfach genauer Fließkommadarstellung können für die hier betrachteten Problemklassen nicht ausreichen. Wir greifen die Methode gemischt genauer iterativer Verfeinerung (Nachiteration) wieder auf, um nicht nur die Genauigkeit von berechneten Lösungen zu verbessern, sondern vielmehr die Effizienz des Lösungsprozesses als ganzes zu steigern. Sowohl auf CPUs als auch auf GPUs demonstrieren wir eine deutliche Leistungssteigerung ohne Genauigkeitsverlust im Vergleich zur Berechnung in höherer Fliesskomma-Genauigkeit. Wir präsentieren effiziente Parallelisierungstechniken für Mehrgitter-Löser auf Grafik-Hardware, insbesondere für numerisch starke Glätter und Vorkonditionierer, die für stark anisotrope Gitter und Operatoren geeignet sind. Ein Beispiel ist die Entwicklung einer effizienten Reformulierung des Verfahrens der zyklischen Reduktion für die Lösung tridiagonaler Gleichungssysteme. Im Hinblick auf Hardware-orientierte Numerik analysieren wir sorgfältig den Kompromiss zwischen numerischer und Laufzeit-Effizienz für inexakte Parallelisierungstechniken, die einige der inhärent sequentiellen Charakteristiken solcher starker Glätter zugunsten besserer Parallelisierungseigenschaften entkoppeln. Die Reimplementierung großer, etablierter Softwarepakete zur Anpassung auf neue Hardwareplattformen ist oft inakzeptabel teuer. Wir entwickeln einen "minimalinvasiven" Zugang zur Integration von Co-Prozessoren wie GPUs in FEAST, einem exemplarischen finite Elemente Diskretisierungs- und Löserpaket. Der Hauptvorteil unserer Technik ist, dass Applikationen, die auf FEAST aufsetzen, nicht geändert werden müssen um von der Beschleunigung durch solche Co-Prozessoren zu profitieren. Wir evaluieren unseren Zugang auf großen GPU-beschleunigten Rechenclustern für klassische Benchmarkprobleme aus der linearisierten Elastizität und der Simulation stationärer laminarer Strömungsvorgänge, und beobachten gute Beschleunigungsfaktoren und gute schwache Skalierbarkeit. Die maximal erreichbare Beschleunigung wird zudem analysiert und theoretisch modelliert, um bspw. Vorhersagen treffen zu können. Weiterhin fassen wir die historische Entwicklung des Forschungsgebiets "wissenschaftliches Rechnen auf Grafikhardware" seit 2001/2002 zusammen, d.h. die Entwicklung von GPGPU als obskures Nischenthema hin zum fachübergreifenden Einsatz heute. Die Darstellung umfasst gleichermaßen die Hardware und das Programmiermodell und beinhaltet eine ausgiebige Bibliografie von Veröffentlichungen im Bereich der Simulation von PDE-Problemen auf GPUs.The main contribution of this thesis is to demonstrate that graphics processors (GPUs) as representatives of emerging many-core architectures are very well-suited for the fast and accurate solution of large sparse linear systems of equations, using parallel multigrid methods on heterogeneous compute clusters. Such systems arise for instance in the discretisation of (elliptic) partial differential equations with finite elements. We report on at least one order of magnitude speedup over highly-tuned conventional CPU implementations, without sacrificing neither accuracy nor functionality. In more detail, this thesis includes the following contributions: Single precision floating point computations may be insufficient for the class of problems considered in this thesis. We revisit mixed precision iterative refinement techniques to not only increase the accuracy of computed results, but also to increase the efficiency of the solution process. Both on CPUs and on GPUs, we demonstrate a significant performance improvement without loss of accuracy compared to computing in high precision only. We present efficient parallelisation techniques for multigrid solvers on graphics hardware, in particular for numerically strong smoothers and preconditioners that are suitable for highly anisotropic grids and operators. For instance, an efficient formulation of the cyclic reduction algorithm to solve tridiagonal systems is developed. In view of hardware-oriented numerics, we carefully analyse the trade-off between numerical and runtime performance for inexact parallelisation techniques that decouple some of the inherently sequential characteristics of strong smoothing operators. For large-scale established software frameworks, the re-implementation tailored to novel hardware platforms is often prohibitively expensive. We develop a 'minimally invasive' approach to integrate support for co-processor hardware like GPUs into FEAST, a finite element discretisation and solver toolbox. Our technique has the major advantage that applications built on top of the toolbox do not have to be changed at all to benefit from co-processor acceleration. The approach is evaluated for benchmark problems in linearised elasticity and stationary laminar flow computed on large-scale GPU-enhanced clusters. Good speedup factors and near-ideal weak scalability are observed. The achievable speedup is analysed and a theoretical speedup model is presented. Finally, we provide a historical overview of scientific computing on graphics hardware since the early beginnings in 2001/2002, when GPGPU was an obscure research topic pursued by few, to the widespread adoption nowadays. We discuss the evolution of the hardware and the programming model, and provide a comprehensive bibliography of publications related to PDE simulations on GPUs

    Nonlinear Diffusion in Graphics Hardware

    No full text
    Multiscale methods have proved to be successful tools in image denoising, edge enhancement and shape recovery. They are based on the numerical solution of a nonlinear diffusion problem where a noisy or damaged image which has to be smoothed or restorated is considered as initial data. Here a novel approach is presented which will soon be capable to ensure real time performance of these methods. It is based on an implementation of a corresponding finite element scheme in texture hardware of modern graphics engines. The method regards vectors as textures and represents linear algebra operations as texture processing operations. Thus, the resulting performance can profit from the superior bandwidth and the build in parallelism of the graphics hardware. Here the concept of this approach is introduced and perspectives are outlined picking up the basic Perona Malik model on 2D images.
    corecore