197 research outputs found

    Compiler analysis for trace-level speculative multithreaded architectures

    Get PDF
    Trace-level speculative multithreaded processors exploit trace-level speculation by means of two threads working cooperatively. One thread, called the speculative thread, executes instructions ahead of the other by speculating on the result of several traces. The other thread executes speculated traces and verifies the speculation made by the first thread. In this paper, we propose a static program analysis for identifying candidate traces to be speculated. This approach identifies large regions of code whose live-output values may be successfully predicted. We present several heuristics to determine the best opportunities for dynamic speculation, based on compiler analysis and program profiling information. Simulation results show that the proposed trace recognition techniques achieve on average a speed-up close to 38% for a collection of SPEC2000 benchmarks.Peer ReviewedPostprint (published version

    PERFORMANCE OPTIMIZATION OF A STRUCTURED CFD CODE - GHOST ON COMMODITY CLUSTER ARCHITECTURES

    Get PDF
    This thesis focuses on optimizing the performance of an in-house, structured, 2D CFD code – GHOST, on commodity cluster architectures. The basic philosophy of the work is to optimize the cache usage of the code by implementing efficient coding techniques without changing the underlying numerical algorithm. Various optimization techniques that were implemented and the resulting changes in performance have been presented. Two techniques, external and internal blocking that were implemented earlier to tune the performance of this code have been reviewed. What follows is further tuning effort in order to circumvent the problems associated with using the blocking techniques. Later, to establish the universality of the optimization techniques, testing has been done on more complicated test case. All the techniques presented in this thesis have been tested on steady, laminar test cases. It has been proved that optimized versions of the code achieve better performances on variety of commodity cluster architectures chosen in this study

    Resource-aware Programming in a High-level Language - Improved performance with manageable effort on clustered MPSoCs

    Get PDF
    Bis 2001 bedeutete Moores und Dennards Gesetz eine Verdoppelung der Ausführungszeit alle 18 Monate durch verbesserte CPUs. Heute ist Nebenläufigkeit das dominante Mittel zur Beschleunigung von Supercomputern bis zu mobilen Geräten. Allerdings behindern neuere Phänomene wie "Dark Silicon" zunehmend eine weitere Beschleunigung durch Hardware. Um weitere Beschleunigung zu erreichen muss sich auch die Soft­ware mehr ihrer Hardware Resourcen gewahr werden. Verbunden mit diesem Phänomen ist eine immer heterogenere Hardware. Supercomputer integrieren Beschleuniger wie GPUs. Mobile SoCs (bspw. Smartphones) integrieren immer mehr Fähigkeiten. Spezialhardware auszunutzen ist eine bekannte Methode, um den Energieverbrauch zu senken, was ein weiterer wichtiger Aspekt ist, welcher mit der reinen Geschwindigkeit abgewogen werde muss. Zum Beispiel werden Supercomputer auch nach "Performance pro Watt" bewertet. Zur Zeit sind systemnahe low-level Programmierer es gewohnt über Hardware nachzudenken, während der gemeine high-level Programmierer es vorzieht von der Plattform möglichst zu abstrahieren (bspw. Cloud). "High-level" bedeutet nicht, dass Hardware irrelevant ist, sondern dass sie abstrahiert werden kann. Falls Sie eine Java-Anwendung für Android entwickeln, kann der Akku ein wichtiger Aspekt sein. Irgendwann müssen aber auch Hochsprachen resourcengewahr werden, um Geschwindigkeit oder Energieverbrauch zu verbessern. Innerhalb des Transregio "Invasive Computing" habe ich an diesen Problemen gearbeitet. In meiner Dissertation stelle ich ein Framework vor, mit dem man Hochsprachenanwendungen resourcengewahr machen kann, um so die Leistung zu verbessern. Das könnte beispielsweise erhöhte Effizienz oder schnellerer Ausführung für das System als Ganzes bringen. Ein Kerngedanke dabei ist, dass Anwendungen sich nicht selbst optimieren. Stattdessen geben sie alle Informationen an das Betriebssystem. Das Betriebssystem hat eine globale Sicht und trifft Entscheidungen über die Resourcen. Diesen Prozess nennen wir "Invasion". Die Aufgabe der Anwendung ist es, sich an diese Entscheidungen anzupassen, aber nicht selbst welche zu fällen. Die Herausforderung besteht darin eine Sprache zu definieren, mit der Anwendungen Resourcenbedingungen und Leistungsinformationen kommunizieren. So eine Sprache muss ausdrucksstark genug für komplexe Informationen, erweiterbar für neue Resourcentypen, und angenehm für den Programmierer sein. Die zentralen Beiträge dieser Dissertation sind: Ein theoretisches Modell der Resourcen-Verwaltung, um die Essenz des resourcengewahren Frameworks zu beschreiben, die Korrektheit der Entscheidungen des Betriebssystems bezüglich der Bedingungen einer Anwendung zu begründen und zum Beweis meiner Thesen von Effizienz und Beschleunigung in der Theorie. Ein Framework und eine Übersetzungspfad resourcengewahrer Programmierung für die Hochsprache X10. Zur Bewertung des Ansatzes haben wir Anwendungen aus dem High Performance Computing implementiert. Eine Beschleunigung von 5x konnte gemessen werden. Ein Speicherkonsistenzmodell für die X10 Programmiersprache, da dies ein notwendiger Schritt zu einer formalen Semantik ist, die das theoretische Modell und die konkrete Implementierung verknüpft. Zusammengefasst zeige ich, dass resourcengewahre Programmierung in Hoch\-sprachen auf zukünftigen Architekturen mit vielen Kernen mit vertretbarem Aufwand machbar ist und die Leistung verbessert

    Practical Sparse Matrices in C++ with Hybrid Storage and Template-Based Expression Optimisation

    Get PDF
    Despite the importance of sparse matrices in numerous fields of science, software implementations remain difficult to use for non-expert users, generally requiring the understanding of underlying details of the chosen sparse matrix storage format. In addition, to achieve good performance, several formats may need to be used in one program, requiring explicit selection and conversion between the formats. This can be both tedious and error-prone, especially for non-expert users. Motivated by these issues, we present a user-friendly and open-source sparse matrix class for the C++ language, with a high-level application programming interface deliberately similar to the widely used MATLAB language. This facilitates prototyping directly in C++ and aids the conversion of research code into production environments. The class internally uses two main approaches to achieve efficient execution: (i) a hybrid storage framework, which automatically and seamlessly switches between three underlying storage formats (compressed sparse column, Red-Black tree, coordinate list) depending on which format is best suited and/or available for specific operations, and (ii) a template-based meta-programming framework to automatically detect and optimise execution of common expression patterns. Empirical evaluations on large sparse matrices with various densities of non-zero elements demonstrate the advantages of the hybrid storage framework and the expression optimisation mechanism.Comment: extended and revised version of an earlier conference paper arXiv:1805.0338

    Practical Sparse Matrices in C++ with Hybrid Storage and Template-Based Expression Optimisation

    Get PDF
    Despite the importance of sparse matrices in numerous fields of science, software implementations remain difficult to use for non-expert users, generally requiring the understanding of underlying details of the chosen sparse matrix storage format. In addition, to achieve good performance, several formats may need to be used in one program, requiring explicit selection and conversion between the formats. This can be both tedious and error-prone, especially for non-expert users. Motivated by these issues, we present a user-friendly and open-source sparse matrix class for the C++ language, with a high-level application programming interface deliberately similar to the widely used MATLAB language. This facilitates prototyping directly in C++ and aids the conversion of research code into production environments. The class internally uses two main approaches to achieve efficient execution: (i) a hybrid storage framework, which automatically and seamlessly switches between three underlying storage formats (compressed sparse column, Red-Black tree, coordinate list) depending on which format is best suited and/or available for specific operations, and (ii) a template-based meta-programming framework to automatically detect and optimise execution of common expression patterns. Empirical evaluations on large sparse matrices with various densities of non-zero elements demonstrate the advantages of the hybrid storage framework and the expression optimisation mechanism.Comment: extended and revised version of an earlier conference paper arXiv:1805.0338

    Exploring novel designs of NLP solvers: Architecture and Implementation of WORHP

    Get PDF
    Mathematical Optimization in general and Nonlinear Programming in particular, are applied by many scientific disciplines, such as the automotive sector, the aerospace industry, or the space agencies. With some established NLP solvers having been available for decades, and with the mathematical community being rather conservative in this respect, many of their programming standards are severely outdated. It is safe to assume that such usability shortcomings impede the wider use of NLP methods; a representative example is the use of static workspaces by legacy FORTRAN codes. This dissertation gives an account of the construction of the European NLP solver WORHP by using and combining software standards and techniques that have not previously been applied to mathematical software to this extent. Examples include automatic code generation, a consistent reverse communication architecture and the elimination of static workspaces. The result is a novel, industrial-grade NLP solver that overcomes many technical weaknesses of established NLP solvers and other mathematical software

    Scaling Robot Motion Planning to Multi-core Processors and the Cloud

    Get PDF
    Imagine a world in which robots safely interoperate with humans, gracefully and efficiently accomplishing everyday tasks. The robot's motions for these tasks, constrained by the design of the robot and task at hand, must avoid collisions with obstacles. Unfortunately, planning a constrained obstacle-free motion for a robot is computationally complex---often resulting in slow computation of inefficient motions. The methods in this dissertation speed up this motion plan computation with new algorithms and data structures that leverage readily available parallel processing, whether that processing power is on the robot or in the cloud, enabling robots to operate safer, more gracefully, and with improved efficiency. The contributions of this dissertation that enable faster motion planning are novel parallel lock-free algorithms, fast and concurrent nearest neighbor searching data structures, cache-aware operation, and split robot-cloud computation. Parallel lock-free algorithms avoid contention over shared data structures, resulting in empirical speedup proportional to the number of CPU cores working on the problem. Fast nearest neighbor data structures speed up searching in SO(3) and SE(3) metric spaces, which are needed for rigid body motion planning. Concurrent nearest neighbor data structures improve searching performance on metric spaces common to robot motion planning problems, while providing asymptotic wait-free concurrent operation. Cache-aware operation avoids long memory access times, allowing the algorithm to exhibit superlinear speedup. Split robot-cloud computation enables robots with low-power CPUs to react to changing environments by having the robot compute reactive paths in real-time from a set of motion plan options generated in a computationally intensive cloud-based algorithm. We demonstrate the scalability and effectiveness of our contributions in solving motion planning problems both in simulation and on physical robots of varying design and complexity. Problems include finding a solution to a complex motion planning problem, pre-computing motion plans that converge towards the optimal, and reactive interaction with dynamic environments. Robots include 2D holonomic robots, 3D rigid-body robots, a self-driving 1/10 scale car, articulated robot arms with and without mobile bases, and a small humanoid robot.Doctor of Philosoph

    Proceedings of the Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016) Sofia, Bulgaria

    Get PDF
    Proceedings of: Third International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2016). Sofia (Bulgaria), October, 6-7, 2016
    • …
    corecore