Search CORE

53 research outputs found

Optimizing for a Many-Core Architecture without Compromising Ease-of-Programming

Author: Caragea George Constantin
Publication venue
Publication date: 01/01/2011
Field of study

Faced with nearly stagnant clock speed advances, chip manufacturers have turned to parallelism as the source for continuing performance improvements. But even though numerous parallel architectures have already been brought to market, a universally accepted methodology for programming them for general purpose applications has yet to emerge. Existing solutions tend to be hardware-specific, rendering them difficult to use for the majority of application programmers and domain experts, and not providing scalability guarantees for future generations of the hardware. This dissertation advances the validation of the following thesis: it is possible to develop efficient general-purpose programs for a many-core platform using a model recognized for its simplicity. To prove this thesis, we refer to the eXplicit Multi-Threading (XMT) architecture designed and built at the University of Maryland. XMT is an attempt at re-inventing parallel computing with a solid theoretical foundation and an aggressive scalable design. Algorithmically, XMT is inspired by the PRAM (Parallel Random Access Machine) model and the architecture design is focused on reducing inter-task communication and synchronization overheads and providing an easy-to-program parallel model. This thesis builds upon the existing XMT infrastructure to improve support for efficient execution with a focus on ease-of-programming. Our contributions aim at reducing the programmer's effort in developing XMT applications and improving the overall performance. More concretely, we: (1) present a work-flow guiding programmers to produce efficient parallel solutions starting from a high-level problem; (2) introduce an analytical performance model for XMT programs and provide a methodology to project running time from an implementation; (3) propose and evaluate RAP -- an improved resource-aware compiler loop prefetching algorithm targeted at fine-grained many-core architectures; we demonstrate performance improvements of up to 34.79% on average over the GCC loop prefetching implementation and up to 24.61% on average over a simple hardware prefetching scheme; and (4) implement a number of parallel benchmarks and evaluate the overall performance of XMT relative to existing serial and parallel solutions, showing speedups of up to 13.89x vs.~ a serial processor and 8.10x vs.~parallel code optimized for an existing many-core (GPU). We also discuss the implementation and optimization of the Max-Flow algorithm on XMT, a problem which is among the more advanced in terms of complexity, benchmarking and research interest in the parallel algorithms community. We demonstrate better speed-ups compared to a best serial solution than previous attempts on other parallel platforms

CiteSeerX

Digital Repository at the University of Maryland

Recommended from our members

Combinatorial Optimization

Author
Publication venue: Zürich : EMS Publ. House
Publication date: 01/01/2005
Field of study

This report summarizes the meeting on Combinatorial Optimization where new and promising developments in the field were discussed. Th

Repositorium für Naturwissenschaften und Technik

Recommended from our members

Game theory for dynamic spectrum sharing cognitive radio

Author: Raoof Omar
Publication venue: Brunel University School of Engineering and Design PhD Theses
Publication date: 01/01/2013
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel University on 21 June 2010.‘Game Theory’ is the formal study of conflict and cooperation. The theory is based on a set of tools that have been developed in order to assist with the modelling and analysis of individual, independent decision makers. These actions potentially affect any decisions, which are made by other competitors. Therefore, it is well suited and capable of addressing the various issues linked to wireless communications. This work presents a Green Game-Based Hybrid Vertical Handover Model. The model is used for heterogeneous wireless networks, which combines both dynamic (Received Signal Strength and Node Mobility) and static (Cost, Power Consumption and Bandwidth) factors. These factors control the handover decision process; whereby the mechanism successfully eliminates any unnecessary handovers, reduces delay and overall number of handovers to 50% less and 70% less dropped packets and saves 50% more energy in comparison to other mechanisms. A novel Game-Based Multi-Interface Fast-Handover MIPv6 protocol is introduced in this thesis as an extension to the Multi-Interface Fast-handover MIPv6 protocol. The protocol works when the mobile node has more than one wireless interface. The protocol controls the handover decision process by deciding whether a handover is necessary and helps the node to choose the right access point at the right time. In addition, the protocol switches the mobile nodes interfaces ‘ON’ and ‘OFF’ when needed to control the mobile node’s energy consumption and eliminate power lost of adding another interface. The protocol successfully reduces the number of handovers to 70%, 90% less dropped packets, 40% more received packets and acknowledgments and 85% less end-to-end delay in comparison to other Protocols. Furthermore, the thesis adapts a novel combination of both game and auction theory in dynamic resource allocation and price-power-based routing in wireless Ad-Hoc networks. Under auction schemes, destinations nodes bid the information data to access to the data stored in the server node. The server will allocate the data to the winner who values it most. Once the data has been allocated to the winner, another mechanism for dynamic routing is adopted. The routing mechanism is based on the source-destination cooperation, power consumption and source-compensation to the intermediate nodes. The mechanism dramatically increases the seller’s revenue to 50% more when compared to random allocation scheme and briefly evaluates the reliability of predefined route with respect to data prices, source and destination cooperation for different network settings. Last but not least, this thesis adjusts an adaptive competitive second-price pay-to-bid sealed auction game and a reputation-based game. This solves the fairness problems associated with spectrum sharing amongst one primary user and a large number of secondary users in a cognitive radio environment. The proposed games create a competition between the bidders and offers better revenue to the players in terms of fairness to more than 60% in certain scenarios. The proposed game could reach the maximum total profit for both primary and secondary users with better fairness; this is illustrated through numerical results

Brunel University Research Archive

Recommended from our members

Algorithmic Graph Theory

Author
Publication venue: Zürich : EMS Publ. House
Publication date: 01/01/2006
Field of study

The main focus of this workshop was on mathematical techniques needed for the development of eﬃcient solutions and algorithms for computationally diﬃcult graph problems. The techniques studied at the workshhop included: the probabilistic method and randomized algorithms, approximation and optimization, structured families of graphs and approximation algorithms for large problems. The workshop Algorithmic Graph Theory was attended by 46 participants, many of them being young researchers. In 15 survey talks an overview of recent developments in Algorithmic Graph Theory was given. These talks were supplemented by 10 shorter talks and by two special sessions

Repositorium für Naturwissenschaften und Technik

SAT-based Analysis, (Re-)Configuration & Optimization in the Context of Automotive Product documentation

Author: Walter Rouven
Publication venue: Universität Tübingen
Publication date: 01/01/2017
Field of study

Es gibt einen steigenden Trend hin zu kundenindividueller Massenproduktion (mass customization), insbesondere im Bereich der Automobilkonfiguration. Kundenindividuelle Massenproduktion führt zu einem enormen Anstieg der Komplexität. Es gibt Hunderte von Ausstattungsoptionen aus denen ein Kunde wählen kann um sich sein persönliches Auto zusammenzustellen. Die Anzahl der unterschiedlichen konfigurierbaren Autos eines deutschen Premium-Herstellers liegt für ein Fahrzeugmodell bei bis zu 10^80. SAT-basierte Methoden haben sich zur Verifikation der Stückliste (bill of materials) von Automobilkonfigurationen etabliert. Carsten Sinz hat Mitte der 90er im Bereich der SAT-basierten Verifikationsmethoden für die Daimler AG Pionierarbeit geleistet. Darauf aufbauend wurde nach 2005 ein produktives Software System bei der Daimler AG installiert. Später folgten weitere deutsche Automobilhersteller und installierten ebenfalls SAT-basierte Systeme zur Verifikation ihrer Stücklisten. Die vorliegende Arbeit besteht aus zwei Hauptteilen. Der erste Teil beschäftigt sich mit der Entwicklung weiterer SAT-basierter Methoden für Automobilkonfigurationen. Wir zeigen, dass sich SAT-basierte Methoden für interaktive Automobilkonfiguration eignen. Wir behandeln unterschiedliche Aspekte der interaktiven Konfiguration. Darunter Konsistenzprüfung, Generierung von Beispielen, Erklärungen und die Vermeidung von Fehlkonfigurationen. Außerdem entwickeln wir SAT-basierte Methoden zur Verifikation von dynamischen Zusammenbauten. Ein dynamischer Zusammenbau repräsentiert die chronologische Zusammenbau-Reihenfolge komplexer Teile. Der zweite Teil beschäftigt sich mit der Optimierung von Automobilkonfigurationen. Wir erläutern und vergleichen unterschiedliche Optimierungsprobleme der Aussagenlogik sowie deren algorithmische Lösungsansätze. Wir beschreiben Anwendungsfälle aus der Automobilkonfiguration und zeigen wie diese als aussagenlogisches Optimierungsproblem formalisiert werden können. Beispielsweise möchte man zu einer Menge an Ausstattungswünschen ein Test-Fahrzeug mit minimaler Ergänzung weiterer Ausstattungen berechnen um Kosten zu sparen. DesWeiteren beschäftigen wir uns mit der Problemstellung eine kleinste Menge an Fahrzeugen zu berechnen um eine Testmenge abzudecken. Im Rahmen dieser Arbeit haben wir einen Prototypen eines (Re-)Konfigurators, genannt AutoConfig, entwickelt. Unser (Re-)Konfigurator verwendet im Kern SAT-basierte Methoden und besitzt eine grafische Benutzeroberfläche, welche interaktive Konfiguration erlaubt. AutoConfig kann mit Instanzen von drei großen deutschen Automobilherstellern umgehen, aber ist nicht alleine darauf beschränkt. Mit Hilfe dieses Prototyps wollen wir die Anwendbarkeit unserer Methoden demonstrieren

Publikationsserver der Universität Tübingen

Decomposition methods for large scale stochastic and robust optimization problems

Author: Becker Adrian Bernard Druke
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2011
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 107-112).We propose new decomposition methods for use on broad families of stochastic and robust optimization problems in order to yield tractable approaches for large-scale real world application. We introduce a new type of a Markov decision problem named the Generalized Rest less Bandits Problem that encompasses a broad generalization of the restless bandit problem. For this class of stochastic optimization problems, we develop a nested policy heuristic which iteratively solves a series of sub-problems operating on smaller bandit systems. We also develop linear-optimization based bounds for the Generalized Restless Bandit problem and demonstrate promising computational performance of the nested policy heuristic on a large-scale real world application of search term selection for sponsored search advertising. We further study the distributionally robust optimization problem with known mean, covariance and support. These optimization models are attractive in their real world applications as they require the model consumer to only rely on those statistics of uncertainty that are known with relative confidence rather than making arbitrary assumptions about the exact dynamics of the underlying distribution of uncertainty. Known to be AP - hard, current approaches invoke tractable but often weak relaxations for real-world applications. We develop a decomposition method for this family of problems which recursively derives sub-policies along projected dimensions of uncertainty and provides a sequence of bounds on the value of the derived policy. In the development of this method, we prove that non-convex quadratic optimization in n-dimensions over a box in two-dimensions is efficiently solvable. We also show that this same decomposition method yields a promising heuristic for the MAXCUT problem. We then provide promising computational results in the context of a real world fixed income portfolio optimization problem. The decomposition methods developed in this thesis recursively derive sub-policies on projected dimensions of the master problem. These sub-policies are optimal on relaxations which admit "tight" projections of the master problem; that is, the projection of the feasible region for the relaxation is equivalent to the projection of that of master problem along the dimensions of the sub-policy. Additionally, these decomposition strategies provide a hierarchical solution structure that aids in solving large-scale problems.by Adrian Bernard Druke Becker.Ph.D

DSpace@MIT

Geometric intersection problems

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Eight Biennial Report : April 2005 – March 2007

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2007
Field of study

MPG.PuRe

On Flows, Paths, Roots, and Zeros

Author: Becker Ruben
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2017
Field of study

This thesis has two parts; in the first of which we give new results for various network flow problems. (1) We present a novel dual ascent algorithm for min-cost flow and show that an implementation of it is very efficient on certain instance classes. (2) We approach the problem of numerical stability of interior point network flow algorithms by giving a path following method that works with integer arithmetic solely and is thus guaranteed to be free of any nu-merical instabilities. (3) We present a gradient descent approach for the undirected transship-ment problem and its special case, the single source shortest path problem (SSSP). For distrib-uted computation models this yields the first SSSP-algorithm with near-optimal number of communication rounds. The second part deals with fundamental topics from algebraic computation. (1) We give an algorithm for computing the complex roots of a complex polynomial. While achieving a com-parable bit complexity as previous best results, our algorithm is simple and promising to be of practical impact. It uses a test for counting the roots of a polynomial in a region that is based on Pellet's theorem. (2) We extend this test to polynomial systems, i.e., we develop an algorithm that can certify the existence of a k-fold zero of a zero-dimensional polynomial system within a given region. For bivariate systems, we show experimentally that this approach yields signifi-cant improvements when used as inclusion predicate in an elimination method.Im ersten Teil dieser Dissertation präsentieren wir neue Resultate für verschiedene Netzwerkflussprobleme. (1)Wir geben eine neue Duale-Aufstiegsmethode für das Min-Cost-Flow- Problem an und zeigen, dass eine Implementierung dieser Methode sehr effizient auf gewissen Instanzklassen ist. (2)Wir behandeln numerische Stabilität von Innere-Punkte-Methoden fürNetwerkflüsse, indem wir eine solche Methode angeben die mit ganzzahliger Arithmetik arbeitet und daher garantiert frei von numerischen Instabilitäten ist. (3) Wir präsentieren ein Gradienten-Abstiegsverfahren für das ungerichtete Transshipment-Problem, und seinen Spezialfall, das Single-Source-Shortest-Problem (SSSP), die für SSSP in verteilten Rechenmodellen die erste mit nahe-optimaler Anzahl von Kommunikationsrunden ist. Der zweite Teil handelt von fundamentalen Problemen der Computeralgebra. (1) Wir geben einen Algorithmus zum Berechnen der komplexen Nullstellen eines komplexen Polynoms an, der eine vergleichbare Bitkomplexität zu vorherigen besten Resultaten hat, aber vergleichsweise einfach und daher vielversprechend für die Praxis ist. (2)Wir erweitern den darin verwendeten Pellet-Test zum Zählen der Nullstellen eines Polynoms auf Polynomsysteme, sodass wir die Existenz einer k-fachen Nullstelle eines Systems in einer gegebenen Region zertifizieren können. Für bivariate Systeme zeigen wir experimentell, dass eine Integration dieses Ansatzes in eine Eliminationsmethode zu einer signifikanten Verbesserung führt

Universaar

Acronym

MPG.PuRe