Search CORE

2 research outputs found

Resource Constrained and Speculative Scheduling of an Algorithm Class with Run-Time Dependent Conditionals

Author: Frank Hannig
Jürgen Teich
Publication venue
Publication date
Field of study

In this paper we present a significant extension of the quantified equation based algorithm class of piecewise regular algorithms. The main contributions of the following paper are: (1) the class of piecewise regular algorithms is extended by allowing run-time dependent conditionals, (2) a mixed integer linear program is given to derive optimal schedules of the novel class we call dynamic piecewise regular algorithms, and (3) in order to achieve highest performance, we present a speculative scheduling approach. The results are applied to an illustrative example

CiteSeerX

Parallelisierung von Algorithmen zur Nutzung auf Architekturen mit Teilwortparallelität

Author: Schaffer Rainer
Publication venue
Publication date: 09/03/2010
Field of study

Der technologische Fortschritt gestattet die Implementierung zunehmend komplexerer Prozessorarchitekturen auf einem Schaltkreis. Ein Trend der letzten Jahre ist die Implementierung von mehr und mehr Verarbeitungseinheiten auf einem Chip. Daraus ergeben sich neue Herausforderungen für die Abbildung von Algorithmen auf solche Architekturen, denn alle Verarbeitungseinheiten sollen effizient bei der Ausführung des Algorithmus genutzt werden. Der Schwerpunkt der eingereichten Dissertation ist die Ausnutzung der Parallelität von Rechenfeldern mit Teilwortparallelität. Solche Architekturen erlauben Parallelverarbeitung auf mehreren Ebenen. Daher wurde eine Abbildungsstrategie, mit besonderem Schwerpunkt auf Teilwortparallelität entwickelt. Diese Abbildungsstrategie basiert auf den Methoden des Rechenfeldentwurfs. Rechenfelder sind regelmäßig angeordnete Prozessorelemente, die nur mit ihren Nachbarelementen kommunizieren. Die Datenein- und -ausgabe wird durch die Prozessorelemente am Rand des Rechenfeldes realisiert. Jedes Prozessorelement kann mehrere Funktionseinheiten besitzen, welche die Rechenoperationen des Algorithmus ausführen. Die Teilwortparallelität bezeichnet die Fähigkeit zur Teilung des Datenpfads der Funktionseinheit in mehrere schmale Datenpfade für die parallele Ausführung von Daten mit geringer Wortbreite. Die entwickelte Abbildungsstrategie unterteilt sich in zwei Schritte, die \"Vorverarbeitung\" und die \"Mehrstufige Modifizierte Copartitionierung\" (kurz: MMC). Die \"Vorverarbeitung\" verändert den Algorithmus in einer solchen Art, dass der veränderte Algorithmus schnell und effizient auf die Zielarchitektur abgebildet werden kann. Hierfür wurde ein Optimierungsproblem entwickelt, welches schrittweise die Parameter für die Transformation des Algorithmus bestimmt. Die \"Mehrstufige Modifizierte Copartitionierung\" wird für die schrittweise Anpassung des Algorithmus an die Zielarchitektur eingesetzt. Darüber hinaus ermöglicht die Abbildungsmethode die Ausnutzung der lokalen Register in den Prozessorelementen und die Anpassung des Algorithmus an die Speicherarchitektur, an die das Rechenfeld angebunden ist. Die erste Stufe der MMC dient der Transformation eines Algorithmus mit Einzeldatenoperationen in einen Algorithmus mit teilwortparallelen Operationen. Mit der zweiten Copartitionierungsstufe wird der Algorithmus an die lokalen Register und an das Rechenfeld angepasst. Weitere Copartitionierungsstufen können zur Anpassung des Algorithmus an die Speicherarchitektur verwendet werden.The technological progress allows the implementation of complex processor architectures on a chip. One trend of the last years is the implemenation of more and more execution units on one chip. That implies new challenges for the mapping of algorithms on such architectures, because the execution units should be used efficiently during the execution of the algorithm. The focus of the submitted dissertation thesis is the utilization of the parallelism of processor arrays with subword parallelism. Such architectures allow parallel executions on different levels. Therefore an algorithm mapping strategy was developed, where the exploitation of the subword parallelism was in the focus. This algorithm mapping strategy is based on the methods of the processor array design. Processor arrays are regular arranged processor elements, which communicate with their neighbors elements only. The data in- and output will be realized by the processor elements on the border of the array. Each processor element can have several functional units, which execute the computational operations. Subword parallelism means the capability for splitting the data path of the functional units in several smaller chunks for the parallel execution of data with lower word width. The developed mapping strategy is subdivided in two steps, the \"Preprocessing\" and the \"Multi-Level Modified Copartitioning\" (kurz: MMC), whereat the MMC means the method of the step simultaneously. The \"Preprocessing\" alter the algorithm in such a kind, that the altered algorithm can be fast and efficient mapped on the target architecture. Therefore an optimization problem was developed, which determines gradual the parameter for the transformation of the algorithm. The \"Multi-Level Modified Copartitioning\" is used for mapping the algorithm gradual on the target architecture. Furthermore the mapping methodology allows the exploitation of the local registers in the processing elements and the adaptation of the algorithm on the memory architecture, where the processing array is connected on. The first level of the MMC is used for the transformation of an algorithm with operation based on single data to an algorithm with subword parallel operations. With the second level, the algorithm will be adapted to the local registers in the processing elements and to the processor array. Further copartition levels can be used for matching the algorithm to the memory architecture

Technische Universität Dresden: Qucosa