16 research outputs found

    Programming models for parallel computing

    Get PDF
    Mit dem Auftauchen von Multicore Prozessoren beginnt parallele Programmierung den Massenmarkt zu erobern. Derzeit ist der Parallelismus noch relativ eingeschränkt, da aktuelle Prozessoren nur über eine geringe Anzahl an Kernen verfügen, doch schon bald wird der Schritt zu Prozessoren mit Hunderten an Kernen vollzogen sein. Während sich die Hardware unaufhaltsam in Richtung Parallelismus weiterentwickelt, ist es für Softwareentwickler schwierig, mit diesen Entwicklungen Schritt zu halten. Parallele Programmierung erfordert neue Ansätze gegenüber den bisher verwendeten sequentiellen Programmiermodellen. In der Vergangenheit war es ausreichend, die nächste Prozessorgeneration abzuwarten, um Computerprogramme zu beschleunigen. Heute jedoch kann ein sequentielles Programm mit einem neuen Prozessor sogar langsamer werden, da die Geschwindigkeit eines einzelnen Prozessorkerns nun oft zugunsten einer größeren Gesamtzahl an Kernen in einem Prozessor reduziert wird. Angesichts dieser Tatsache wird es in der Softwareentwicklung in Zukunft notwendig sein, Parallelismus explizit auszunutzen, um weiterhin performante Programme zu entwickeln, die auch auf zukünftigen Prozessorgenerationen skalieren. Die Problematik liegt dabei darin, dass aktuelle Programmiermodelle weiterhin auf dem sogenannten "Assembler der parallelen Programmierung", d.h. auf Multithreading für Shared-Memory- sowie auf Message Passing für Distributed-Memory Architekturen basieren, was zu einer geringen Produktivität und einer hohen Fehleranfälligkeit führt. Um dies zu ändern, wird an neuen Programmiermodellen, -sprachen und -werkzeugen, die Parallelismus auf einer höheren Abstraktionsebene als bisherige Programmiermodelle zu behandeln versprechen, geforscht. Auch wenn bereits einige Teilerfolge erzielt wurden und es gute, performante Lösungen für bestimmte Bereiche gibt, konnte bis jetzt noch kein allgemeingültiges paralleles Programmiermodell entwickelt werden - viele bezweifeln, dass das überhaupt möglich ist. Das Ziel dieser Arbeit ist es, einen Überblick über aktuelle Entwicklungen bei parallelen Programmiermodellen zu geben. Da homogenen Multi- und Manycore Prozessoren in nächster Zukunft die meiste Bedeutung zukommen wird, wird das Hauptaugenmerk darauf gelegt, inwieweit die behandelten Programmiermodelle für diese Plattformen nützlich sind. Durch den Vergleich unterschiedlicher, auch experimenteller Ansätze soll erkennbar werden, wohin die Entwicklung geht und welche Werkzeuge aktuell verwendet werden können.With the emergence of multi-core processors in the consumer market, parallel computing is moving to the mainstream. Currently parallelism is still very restricted as modern consumer computers only contain a small number of cores. Nonetheless, the number is constantly increasing, and the time will come when we move to hundreds of cores. For software developers it is becoming more difficult to keep up with these new developments. Parallel programming requires a new way of thinking. No longer will a new processor generation accelerate every existing program. On the contrary, some programs might even get slower because good single-thread performance of a processor is traded in for a higher level of parallelism. For that reason, it becomes necessary to exploit parallelism explicitly and to make sure that the program scales well. Unfortunately, parallelism in current programming models is mostly based on the "assembler of parallel programming", namely low level threading for shared multiprocessors and message passing for distributed multiprocessors. This leads to low programmer productivity and erroneous programs. Because of this, a lot of effort is put into developing new high level programming models, languages and tools that should help parallel programming to keep up with hardware development. Although there have been successes in different areas, no good all-round solution has emerged until now, and there are doubts that there ever will be one. The aim of this work is to give an overview of current developments in the area of parallel programming models. The focus is put onto programming models for multi- and many-core architectures as this is the area most relevant for the near future. Through the comparison of different approaches, including experimental ones, the reader will be able to see which existing programming models can be used for which tasks and to anticipate future developments

    Android Application Development for the Intel Platform

    Get PDF
    Computer scienc

    Generating and auto-tuning parallel stencil codes

    Get PDF
    In this thesis, we present a software framework, Patus, which generates high performance stencil codes for different types of hardware platforms, including current multicore CPU and graphics processing unit architectures. The ultimate goals of the framework are productivity, portability (of both the code and performance), and achieving a high performance on the target platform. A stencil computation updates every grid point in a structured grid based on the values of its neighboring points. This class of computations occurs frequently in scientific and general purpose computing (e.g., in partial differential equation solvers or in image processing), justifying the focus on this kind of computation. The proposed key ingredients to achieve the goals of productivity, portability, and performance are domain specific languages (DSLs) and the auto-tuning methodology. The Patus stencil specification DSL allows the programmer to express a stencil computation in a concise way independently of hardware architecture-specific details. Thus, it increases the programmer productivity by disburdening her or him of low level programming model issues and of manually applying hardware platform-specific code optimization techniques. The use of domain specific languages also implies code reusability: once implemented, the same stencil specification can be reused on different hardware platforms, i.e., the specification code is portable across hardware architectures. Constructing the language to be geared towards a special purpose makes it amenable to more aggressive optimizations and therefore to potentially higher performance. Auto-tuning provides performance and performance portability by automated adaptation of implementation-specific parameters to the characteristics of the hardware on which the code will run. By automating the process of parameter tuning — which essentially amounts to solving an integer programming problem in which the objective function is the number representing the code's performance as a function of the parameter configuration, — the system can also be used more productively than if the programmer had to fine-tune the code manually. We show performance results for a variety of stencils, for which Patus was used to generate the corresponding implementations. The selection includes stencils taken from two real-world applications: a simulation of the temperature within the human body during hyperthermia cancer treatment and a seismic application. These examples demonstrate the framework's flexibility and ability to produce high performance code

    High level compilation for gate reconfigurable architectures

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2001.Includes bibliographical references (p. 205-215).A continuing exponential increase in the number of programmable elements is turning management of gate-reconfigurable architectures as "glue logic" into an intractable problem; it is past time to raise this abstraction level. The physical hardware in gate-reconfigurable architectures is all low level - individual wires, bit-level functions, and single bit registers - hence one should look to the fetch-decode-execute machinery of traditional computers for higher level abstractions. Ordinary computers have machine-level architectural mechanisms that interpret instructions - instructions that are generated by a high-level compiler. Efficiently moving up to the next abstraction level requires leveraging these mechanisms without introducing the overhead of machine-level interpretation. In this dissertation, I solve this fundamental problem by specializing architectural mechanisms with respect to input programs. This solution is the key to efficient compilation of high-level programs to gate reconfigurable architectures. My approach to specialization includes several novel techniques. I develop, with others, extensive bitwidth analyses that apply to registers, pointers, and arrays. I use pointer analysis and memory disambiguation to target devices with blocks of embedded memory. My approach to memory parallelization generates a spatial hierarchy that enables easier-to-synthesize logic state machines with smaller circuits and no long wires.(cont.) My space-time scheduling approach integrates the techniques of high-level synthesis with the static routing concepts developed for single-chip multiprocessors. Using DeepC, a prototype compiler demonstrating my thesis, I compile a new benchmark suite to Xilinx Virtex FPGAs. Resulting performance is comparable to a custom MIPS processor, with smaller area (40 percent on average), higher evaluation speeds (2.4x), and lower energy (18x) and energy-delay (45x). Specialization of advanced mechanisms results in additional speedup, scaling with hardware area, at the expense of power. For comparison, I also target IBM's standard cell SA-27E process and the RAW microprocessor. Results include sensitivity analysis to the different mechanisms specialized and a grand comparison between alternate targets.by Jonathan William Babb.Ph.D

    Increasing the Performance and Predictability of the Code Execution on an Embedded Java Platform

    Get PDF
    This thesis explores the execution of object-oriented code on an embedded Java platform. It presents established and derives new approaches for the implementation of high-level object-oriented functionality and commonly expected system services. The goal of the developed techniques is the provision of the architectural base for an efficient and predictable code execution. The research vehicle of this thesis is the Java-programmed SHAP platform. It consists of its platform tool chain and the highly-customizable SHAP bytecode processor. SHAP offers a fully operational embedded CLDC environment, in which the proposed techniques have been implemented, verified, and evaluated. Two strands are followed to achieve the goal of this thesis. First of all, the sequential execution of bytecode is optimized through a joint effort of an optimizing offline linker and an on-chip application loader. Additionally, SHAP pioneers a reference coloring mechanism, which enables a constant-time interface method dispatch that need not be backed a large sparse dispatch table. Secondly, this thesis explores the implementation of essential system services within designated concurrent hardware modules. This effort is necessary to decouple the computational progress of the user application from the interference induced by time-sharing software implementations of these services. The concrete contributions comprise a spill-free, on-chip stack; a predictable method cache; and a concurrent garbage collection. Each approached means is described and evaluated after the relevant state of the art has been reviewed. This review is not limited to preceding small embedded approaches but also includes techniques that have proven successful on larger-scale platforms. The other way around, the chances that these platforms may benefit from the techniques developed for SHAP are discussed

    Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

    Get PDF
    ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

    Prototyping parallel functional intermediate languages

    Get PDF
    Non-strict higher-order functional programming languages are elegant, concise, mathematically sound and contain few environment-specific features, making them obvious candidates for harnessing high-performance architectures. The validity of this approach has been established by a number of experimental compilers. However, while there have been a number of important theoretical developments in the field of parallel functional programming, implementations have been slow to materialise. The myriad design choices and demands of specific architectures lead to protracted development times. Furthermore, the resulting systems tend to be monolithic entities, and are difficult to extend and test, ultimatly discouraging experimentation. The traditional solution to this problem is the use of a rapid prototyping framework. However, as each existing systems tends to prefer one specific platform and a particular way of expressing parallelism (including implicit specification) it is difficult to envisage a general purpose framework. Fortunately, most of these systems have at least one point of commonality: the use of an intermediate form. Typically, these abstract representations explicitly identify all parallel components but without the background noise of syntactic and (potentially arbitrary) implementation details. To this end, this thesis outlines a framework for rapidly prototyping such intermediate languages. Based on the traditional three-phase compiler model, the design process is driven by the development of various semantic descriptions of the language. Executable versions of the specifications help to both debug and informally validate these models. A number of case studies, covering the spectrum of modern implementations, demonstrate the utility of the framework

    Supercomputing futures : the next sharing paradigm for HPC resources : economic model, market analysis and consequences for the Grid

    Get PDF
    À la croisée des chemins du génie informatique, de la finance et de l'économétrie, cette thèse se veut fondamentalement un exercice en ingénierie économique dont l' objectif est de contribuer un système novateur, durable et adaptatif pour le partage de resources de calcul haute-performance. Empruntant à la finance fondamentale et à l'analyse technique, le modèle proposé construit des ratios et des indices de marché à partir de statistiques transactionnelles. Cette approche, encourageant les comportements stratégiques, pave la voie à une métaphore de partage plus efficace pour la Grid, où l'échange de ressources se voit maintenant pondéré. Le concept de monnaie de Grid, un instrument beaucoup plus liquide et utilisable que le troc de resources comme telles est proposé: les Grid Credits. Bien que les indices proposés ne doivent pas être considérés comme des indicateurs absolus et contraignants, ils permettent néanmoins aux négociants de se faire une idée de la valeur au marché des différentes resources avant de se positionner. Semblable sur de multiples facettes aux bourses de commodités, le Grid Exchange, tel que présenté, permet l'échange de resources via un mécanisme de double-encan. Néanmoins, comme les resources de super-calculateurs n'ont rien de standardisé, la plate-forme permet l'échange d'ensemble de commodités, appelés requirement sets, pour les clients, et component sets, pour les fournisseurs. Formellement, ce modèle économique n'est qu'une autre instance de la théorie des jeux non-coopératifs, qui atteint éventuellement ses points d'équilibre. Suivant les règles du "libre-marché", les utilisateurs sont encouragés à spéculer, achetant, ou vendant, à leur bon vouloir, l'utilisation des différentes composantes de superordinateurs. En fin de compte, ce nouveau paradigme de partage de resources pour la Grid dresse la table à une nouvelle économie et une foule de possibilités. Investissement et positionnement stratégique, courtiers, spéculateurs et même la couverture de risque technologique sont autant d'avenues qui s'ouvrent à l'horizon de la recherche dans le domaine
    corecore