269 research outputs found

    Transformations of High-Level Synthesis Codes for High-Performance Computing

    Full text link
    Specialized hardware architectures promise a major step in performance and energy efficiency over the traditional load/store devices currently employed in large scale computing systems. The adoption of high-level synthesis (HLS) from languages such as C/C++ and OpenCL has greatly increased programmer productivity when designing for such platforms. While this has enabled a wider audience to target specialized hardware, the optimization principles known from traditional software design are no longer sufficient to implement high-performance codes. Fast and efficient codes for reconfigurable platforms are thus still challenging to design. To alleviate this, we present a set of optimizing transformations for HLS, targeting scalable and efficient architectures for high-performance computing (HPC) applications. Our work provides a toolbox for developers, where we systematically identify classes of transformations, the characteristics of their effect on the HLS code and the resulting hardware (e.g., increases data reuse or resource consumption), and the objectives that each transformation can target (e.g., resolve interface contention, or increase parallelism). We show how these can be used to efficiently exploit pipelining, on-chip distributed fast memory, and on-chip streaming dataflow, allowing for massively parallel architectures. To quantify the effect of our transformations, we use them to optimize a set of throughput-oriented FPGA kernels, demonstrating that our enhancements are sufficient to scale up parallelism within the hardware constraints. With the transformations covered, we hope to establish a common framework for performance engineers, compiler developers, and hardware developers, to tap into the performance potential offered by specialized hardware architectures using HLS

    Scaling non-regular shared-memory codes by reusing custom loop schedules

    Get PDF
    In this paper we explore the idea of customizing and reusing loop schedules to improve the scalability of non-regular numerical codes in shared-memory architectures with non-uniform memory access latency. The main objective is to implicitly setup affinity links between threads and data, by devising loop schedules that achieve balanced work distribution within irregular data spaces and reusing them as much as possible along the execution of the program for better memory access locality. This transformation provides a great deal of flexibility in optimizing locality, without compromising the simplicity of the shared-memory programming paradigm. In particular, the programmer does not need to explicitly distribute data between processors. The paper presents practical examples from real applications and experiments showing the efficiency of the approach.Peer ReviewedPostprint (author's final draft

    Evaluating Component Assembly Specialization for 3D FFT

    Get PDF
    The Fast Fourier Transform (FFT) is a widely-used building block for many high-performance scienti c applications. Ef- cient computing of FFT is paramount for the performance of these applications. This has led to many e orts to implement machine and computation speci c optimizations. However, no existing FFT library is capable of easily integrating and au- tomating the selection of new and/or unique optimizations. To ease FFT specialization, this paper evaluates the use of component-based software engineering, a programming paradigm which consists in building applications by assembling small software units. Component models are known to have many software engineering bene ts but usually have insucient performance for high-performance scienti c applications. This paper uses the L2C model, a general purpose high-performance component model, and studies its performance and adaptation capabilities on 3D FFTs. Experiments show that L2C, and components in general, enables easy handling of 3D FFT specializations while obtaining performance comparable to that of well-known libraries. However, a higher-level component model is needed to automatically generate an adequate L2C assembly

    Multi-partitioning for ADI-schemes on message passing architectures

    Get PDF
    A kind of discrete-operator splitting called Alternating Direction Implicit (ADI) has been found to be useful in simulating fluid flow problems. In particular, it is being used to study the effects of hot exhaust jets from high performance aircraft on landing surfaces. Decomposition techniques that minimize load imbalance and message-passing frequency are described. Three strategies that are investigated for implementing the NAS Scalar Penta-diagonal Parallel Benchmark (SP) are transposition, pipelined Gaussian elimination, and multipartitioning. The multipartitioning strategy, which was used on Ethernet, was found to be the most efficient, although it was considered only a moderate success because of Ethernet's limited communication properties. The efficiency derived largely from the coarse granularity of the strategy, which reduced latencies and allowed overlap of communication and computation

    Učinkovita metoda paralelnog računanja za obradu velike količine podataka prikupljanih senzorom

    Get PDF
    In recent years we witness the advent of the Internet of Things and the wide deployment of sensors in many applications for collecting and aggregating data. Efficient techniques are required to analyze these massive data for supporting intelligent decisions making. Partial differential problems which involve large data are the most common in the engineering and scientific research. For simulations of large-scale three-dimensional partial differential equations, the intensive computation ability and large amounts of memory requirements for modeling are the main research problems. To address the two challenges, this paper provided an effective parallel method for partial differential equations. The proposed approach combines the overlapping domain decomposition strategy and the multi-core cluster technology to achieve parallel simulations of partial differential equations, uses the finite difference method to discretize equations and adopts the hybrid MPI/OpenMP programming model to exploit two-level parallelism on a multi-core cluster. The three-dimensional groundwater flow model with the parallel finite difference overlapping domain decomposition strategy was successfully set up and carried out by the parallel MPI/OpenMP implementation on a multi-core cluster with two nodes. The experimental results show that the proposed parallel approach can efficiently simulate partial differential problems with large amounts of data.Posljednjih godina svjedočimo dolasku tzv. interneta stvari i širokoj uporabi senzora u raznim primjenama prikupljanja i objedinjavanja podataka. Učinkovite metode su potrebne za analizu velike količine podataka u svrhu podrške inteligentnom odlučivanju. Parcijalno diferencijalni problemi koji uključuju veliku količinu podataka su gotovo uobičajeni u inženjerstvu i znanstvenom istraživanju. Za simulaciju masovnih trodimenzionalnih parcijalnih diferencijalnih jednadžbi potrebne su značajne računalne mogućnosti, a potreba za velikom količinom memorije za modeliranje je glavni istraživački problem. Za rješavanje oba problema ovaj rad pruža efektivnu paralelnu metodu za parcijalne diferencijalne jednadžbe. Predloženi pristup kombinira strategiju dekompozicije preklapajućih domena i tehnologiju višejezgrenih klastera za postizanje paralelnih simulacija parcijalnih diferencijalnih jednadžbi, koristi metodu konačnog diferenciranja za diskretizaciju jednadžbi te model MPI/OpenMP hibridnog programiranja za iskorištavanje dvorazinskog paralelizma na višejezgrenom klasteru. Formiran je trodimenzionalan model toka podzemnih voda sa strategijom dekompozicije preklapajućih domena konačnih diferencija. Za izvođenje je korištena paralelna MPI/OpenMP implementacija na višejezgrenom klasteru s dva čvora. Eksperimentalni rezultati su pokazali kako predloženi paralelni pristup može učinkovito simulirati parcijalno diferencijalne probleme s velikom količinom podataka

    Type Oriented Parallel Programming

    Get PDF
    Context: Parallel computing is an important field within the sciences. With the emergence of multi, and soon many, core CPUs this is moving more and more into the domain of general computing. HPC programmers want performance, but at the moment this comes at a cost; parallel languages are either efficient or conceptually simple, but not both. Aim: To develop and evaluate a novel programming paradigm which will address the problem of parallel programming and allow for languages which are both conceptually simple and efficient. Method: A type-based approach, which allows the programmer to control all aspects of parallelism by the use and combination of types has been developed. As a vehicle to present and analyze this new paradigm a parallel language, Mesham, and associated compilation tools have also been created. By using types to express parallelism the programmer can exercise efficient, flexible control in a high level abstract model yet with a sufficiently rich amount of information in the source code upon which the compiler can perform static analysis and optimization. Results: A number of case studies have been implemented in Mesham. Official benchmarks have been performed which demonstrate the paradigm allows one to write code which is comparable, in terms of performance, with existing high performance solutions. Sections of the parallel simulation package, Gadget-2, have been ported into Mesham, where substantial code simplifications have been made. Conclusions: The results obtained indicate that the type-based approach does satisfy the aim of the research described in this thesis. By using this new paradigm the programmer has been able to write parallel code which is both simple and efficient

    Reconstructing the galactic magnetic field

    Get PDF
    Diese Dissertation befasst sich mit der Rekonstruktion des Magnetfeldes der Milchstraße (GMF für Galaktisches Magnetfeld). Eine genaue Beschreibung des Magnetfeldes ist für mehrere Fragestellungen der Astrophysik relevant. Erstens spielt es eine wichtige Rolle dabei, wie sich die Struktur der Milchstraße entwickelt, da die Ströme von interstellarem Gas und kosmischer Strahlung durch das GMF abgelenkt werden. Zweitens stört es die Messung und Analyse von Strahlung extra-galaktischer Quellen. Drittens lenkt es ultra-hoch-energetische kosmische Strahung (UHECR) derartig stark ab, dass die Zuordnung von gemessenen UHECR zu potentiellen Quellen nicht ohne Korrekturrechnung möglich ist. Viertens kann mit dem GMF ein kosmischer Dynamo-Prozess inklusive dessen innerer Strukturen studiert werden. Im Gegensatz zum GMF ist bei Sternen und Planeten nur das äußere Magnetfeld zugänglich und messbar. So großen Einfluss das GMF auf eine Vielzahl von Effekten hat, genauso schwer ist es auch zu ermitteln. Der Grund dafür ist, dass das Magnetfeld nicht direkt, sondern nur durch seinen Einfluss auf verschiedene physikalische Observablen messbar ist. Messungen dieser Observablen liefern für eine konkrete Sichtlinie ihren gesamt-akkumulierten Wert. Aufgrund der festen Position des Sonnensystems in der Milchstraße ist es daher eine Herausforderung der gemessenen Wirkung des Magnetfelds einer räumlichen Tiefe zuzuordnen. Als Informationsquelle dienen vor allem Messungen der Intensität und Polarisation von Radiound Mikrowellen, sowohl für den gesamten Himmel, als auch für einzelne Sterne, deren Position im Raum bekannt ist. Durch die Betrachtung der zugrunde liegenden physikalischen Prozesse wie Synchrotronemission und Faraday Rotation kann auf das GMF rückgeschlossen werden. Voraussetzung dafür sind jedoch dreidimensionale Dichte-Karten anderer Konstituenten der Milchstraße, beispielsweise der thermischen Elektronen oder des interstellaren Staubes. Für die Erstellung dieser Hilfskarten sind physikalische Prozesse wie Dispersion und Staubabsorption von entscheidender Bedeutung. Um das GMF anhand der vorhandenen Messdaten zu rekonstruieren, gibt es im Wesentlichen zwei Herangehensweisen. Zum einen benutzt man den phänomenologischen Ansatz parametrischer Magnetfeld-Modelle. Dabei wird die Struktur des Magnetfeldes durch analytische Formeln mit einer begrenzten Anzahl von Parametern festgelegt. Diese Modelle beinhalten die generelle Morphologie des Magnetfeldes, wie etwa Galaxie-Arme und Feld-Umkehrungen, aber auch lokale Charakteristika wie Nebel in der Nachbarschaft des Sonnensystems. Gegeben einem Satz Messdaten versucht man nun, jene Modellparameter zu finden, die eine möglichst gute Übereinstimmung mit den Observablen ergeben. Zu diesem Zweck wurde im Rahmen dieser Doktorarbeit Imagine, die Interstellar MAGnetic field INference Engine, entwickelt. Aufgrund der verhältnismäßig geringen Anzahl an Parametern ist eine Parameteranpassung auch mit robusten all-sky maps möglich, auch wenn diese keine Tiefen-Information enthalten. Allerdings gibt es bei der Herangehensweise über parametrische Modelle das Problem der Beliebigkeit: es gibt eine Vielzahl an Modellen verschiedenster Komplexität, die sich darüber hinaus häufig gegenseitig widersprechen. In der Vergangenheit wurden dann meist auch noch die Unsicherheit der Parameter-Rekonstruktionen unterschätzt. Im Gegensatz dazu ermöglicht eine rigorose Bayes’sche Analyse, beispielsweise mit dem in dieser Doktorarbeit entwickelten Imagine, eine verlässliche Bestimmung der Modellparameter. Neben parametrischen Modellen kann das GMF auch über einen nicht-parametrischen Ansatz rekonstruiert werden. Dabei hat jedes Raumvoxel zwei unabhängige Freiheitsgrade für das Magnetfeld. Diese Art der Rekonstruktion stellt deutlich höhere Ansprüche an die Datenmenge und -qualität, die Algorithmik, und die Rechenkapazität. Aufgrund der hohen Anzahl an Freiheitsgraden werden Messdaten benötigt, die direkte (Parallax-Messungen) oder indirekte (über das Hertzsprung Russel Diagramm) Tiefeninformation beinhalten. Zudem sind starke Prior für jene Raumbereiche notwendig, die von den Daten nur schwach abgedeckt werden. Einfache Bayes’sche Methoden reichen hierfür nicht mehr aus. Vielmehr ist nun Informationsfeldtheorie (IFT) nötig, um die verschiedenen Informationsquellen korrekt zu kombinieren, und verlässliche Unsicherheiten zu erhalten. Für diese Aufgabe ist das Python Framework NIFTy (Numerical Information Field Theory) prädestiniert. In seiner ersten Release-Version war NIFTy jedoch noch nicht für Magnetfeldrekonstruktionen und die benötigten Größenordnungen geeignet. Um die Datenmengen verarbeiten zu können wurde daher zunächst d2o als eigenständiges Werkzeug für Daten-Parallelisierung entwickelt. Damit kann parallelisierter Code entwickelt werden, ohne das die eigentliche Entwicklungsarbeit behindert wird. Da im Grunde alle numerischen Disziplinen mit großen Datensätzen, die sich nicht in Teilmengen zerlegen lassen davon profitieren können, wurde d2o als eigenständiges Paket veröffentlicht. Darüber hinaus wurde NIFTy so umfassend in seinem Funktionsumfang und seiner Struktur überarbeitet, sodass nun unter anderem auch hochaufgelöste Magnetfeldrekonstruktionen durchgeführt werden können. Außerdem ist es jetzt mit NIFTy auch möglich Karten der thermischen Elektronendichte und des interstellaren Staubes auf Basis neuer und gleichzeitig auch sehr großer Datensätze zu erstellen. Damit wurde der Weg zu einer nicht-parametrischen Rekonstruktionen des GMF geebnet.This thesis deals with the reconstruction of the magnetic field of the MilkyWay (GMF for Galactic Magnetic Field). A detailed description of the magnetic field is relevant for several problems in astrophysics. First, it plays an important role in how the structure of the Milky Way develops as the currents of interstellar gas and cosmic rays are deflected by the GMF. Second, it interferes with the measurement and analysis of radiation from extra-galactic sources. Third, it deflects ultra-high energetic cosmic rays (UHECR) to such an extent that the assignment of measured UHECR to potential sources is not possible without a correcting calculations. Fourth, the GMF can be used to study a cosmic dynamo process including its internal structures. In contrast to the GMF, normally only the outer magnetic field of stars and planets is accessible and measurable. As much as the GMF has an impact on a variety of effects, it is just as diffcult to determine. The reason for this is that the magnetic field cannot be measured directly, but only by its influence on various physical observables. Measurements of these observables yield their total accumulated value for a certain line of sight. Due to the fixed position of the solar system in the Milky Way, it is therefore a challenge to map the measured effect of the magnetic field to a spatial depth. Measurements of the intensity and polarization of radio and microwaves, both for the entire sky and for individual stars whose position in space is known, serve as a source of information. Based on physical processes such as synchrotron emission and Faraday rotation, the GMF can be deduced. However, this requires three-dimensional density maps of other constituents of the Milky Way, such as thermal electrons or interstellar dust. Physical processes like dispersion and dust absorption are crucial for the creation of these auxiliary maps. To reconstruct the GMF on the basis of existing measurement data, there are basically two approaches. On the one hand, the phenomenological approach of parametric magnetic field models can be used. This involves defining the structure of the magnetic field using analytical formulas with a limited number of parameters. These models include the general morphology of the magnetic field, such as galaxy arms and field reversals, but also local characteristics like nebulae in the solar system’s neighbourhood. If a set of measurement data is given, one tries to find those model parameter values that are in concordance with the observables as closely as possible. For this purpose, within the course of this doctoral thesis Imagine, the Interstellar MAGnetic field INference Engine was developed. Due to parametric model’s relatively small number of parameters, a fit is also possible with robust all-sky maps, even if they do not contain any depth information. However, there is the problem of arbitrariness in the approach of parametric models: there is a large number of models of different complexity available, which on top of that often contradict each other. In the past, the reconstructed parameter’s uncertainty was often underestimated. In contrast, a rigorous Bayesian analysis, as for example developed in this doctoral thesis with Imagine, provides a reliable analysis. On the other hand, in addition to parametric models the GMF can also be reconstructed following a non-parametric approach. In this case, each space voxel has two independent degrees of freedom for the magnetic field. Hence, this type of reconstruction places much higher demands on the amount and quality of data, the algorithms, and the computing capacity. Due to the high number of degrees of freedom, measurement data are required which contain direct (parallax measurements) or indirect (by means of the Russel diagram) depth information. In addition, strong priors are necessary for those areas of space that are only weakly covered by the data. Simple Bayesian methods are no longer suffcient for this. Rather, information field theory (IFT) is now needed to combine the various sources of information correctly and to obtain reliable uncertainties. The Python framework NIFTy (Numerical Information Field Theory) is predestined for this task. In its first release version, however, NIFTy was not yet natively capable of reconstructing a magnetic field and dealing with the order of magnitude of the problem’s data. To be able to process given data, d2o was developed as an independent tool for data parallelization. With d2o parallel code can be developed without any hindrance of the actual development work. Basically all numeric disciplines with large datasets that cannot be broken down into subsets can benefit from this, which is the reason why d2o has been released as an independent package. In addition, NIFTy has been comprehensively revised in its functional scope and structure, so that now, among other things, high-resolution magnetic field reconstructions can be carried out. With NIFTy it is now also possible to create maps of thermal electron density and interstellar dust on the basis of new and at the same time very large datasets. This paved the way for a non-parametric reconstruction of the GMF

    New Sequential and Parallel Division Free Methods for Determinant of Matrices

    Get PDF
    A determinant plays an important role in many applications of linear algebra. Finding determinants using non division free methods will encounter problems if entries of matrices are represented in rational or polynomial expressions, and also when floating point errors arise. To overcome this problem, division free methods are used instead. The two commonly used division free methods for finding determinant are cross multiplication and cofactor expansion. However, cross multiplication which uses the Sarrus Rule only works for matrices of order less or equal to three, whereas cofactor expansion requires lengthy and tedious computation when dealing with large matrices. This research, therefore, attempts to develop new sequential and parallel methods for finding determinants of matrices. The research also aims to generalise the Sarrus Rule for any order of square matrices based on permutations which are derived using starter sets. Two strategies were introduced to generate distinct starter sets namely the circular and the exchanging of two elements operations. Some theoretical works and mathematical properties for generating permutation and determining determinants were also constructed to support the research. Numerical results indicated that the new proposed methods performed better than the existing methods in term of computation times. The computation times in the newly developed sequential methods were dominated by generating starter sets. Therefore, two parallel strategies were developed to parallelise this algorithm so as to reduce the computation times. Numerical results showed that the parallel methods were able to compute determinants faster than the sequential counterparts, particularly when the tasks were equally allocated. In conclusion, the newly developed methods can be used as viable alternatives for finding determinants of matrices
    corecore