45 research outputs found
Some Optimally Adaptive Parallel Graph Algorithms on EREW PRAM Model
The study of graph algorithms is an important area of research in computer science, since graphs offer useful tools to model many real-world situations. The commercial availability of parallel computers have led to the development of efficient parallel graph algorithms.
Using an exclusive-read and exclusive-write (EREW) parallel random access machine (PRAM) as the computation model with a fixed number of processors, we design and analyze parallel algorithms for seven undirected graph problems, such as, connected components, spanning forest, fundamental cycle set, bridges, bipartiteness, assignment problems, and approximate vertex coloring. For all but the last two problems, the input data structure is an unordered list of edges, and divide-and-conquer is the paradigm for designing algorithms. One of the algorithms to solve the assignment problem makes use of an appropriate variant of dynamic programming strategy. An elegant data structure, called the adjacency list matrix, used in a vertex-coloring algorithm avoids the sequential nature of linked adjacency lists.
Each of the proposed algorithms achieves optimal speedup, choosing an optimal granularity (thus exploiting maximum parallelism) which depends on the density or the number of vertices of the given graph. The processor-(time)2 product has been identified as a useful parameter to measure the cost-effectiveness of a parallel algorithm. We derive a lower bound on this measure for each of our algorithms
Aspects of practical implementations of PRAM algorithms
The PRAM is a shared memory model of parallel computation which abstracts away from inessential engineering details. It provides a very simple architecture independent model and provides a good programming environment. Theoreticians of the computer science community have proved that it is possible to emulate the theoretical PRAM model using current technology. Solutions have been found for effectively interconnecting processing elements, for routing data on these networks and for distributing the data among memory modules without hotspots. This thesis reviews this emulation and the possibilities it provides for large scale general purpose parallel computation. The emulation employs a bridging model which acts as an interface between the actual hardware and the PRAM model. We review the evidence that such a scheme crn achieve scalable parallel performance and portable parallel software and that PRAM algorithms can be optimally implemented on such practical models. In the course of this review we presented the following new results:
1. Concerning parallel approximation algorithms, we describe an NC algorithm for finding an approximation to a minimum weight perfect matching in a complete weighted graph. The algorithm is conceptually very simple and it is also the first NC-approximation algorithm for the task with a sub-linear performance ratio.
2. Concerning graph embedding, we describe dense edge-disjoint embeddings of the complete binary tree with n leaves in the following n-node communication networks: the hypercube, the de Bruijn and shuffle-exchange networks and the 2-dimcnsional mesh. In the embeddings the maximum distance from a leaf to the root of the tree is asymptotically optimally short. The embeddings facilitate efficient implementation of many PRAM algorithms on networks employing these graphs as interconnection networks.
3. Concerning bulk synchronous algorithmics, we describe scalable transportable algorithms for the following three commonly required types of computation; balanced tree computations. Fast Fourier Transforms and matrix multiplications
Query evaluation revised: parallel, distributed, via rewritings
This is a thesis on query evaluation in parallel and distributed settings, and structurally simple rewritings.
It consists of three parts.
In the first part, we investigate the efficiency of constant-time parallel evaluation algorithms. That is, the number of required processors or, asymptotically equivalent, the work required to evaluate queries in constant time. It is known that relational algebra queries can be evaluated in constant time. However, work-efficiency has not been a focus, and indeed known evaluation algorithms yield huge (polynomial) work bounds. We establish work-efficient constant-time algorithms for several query classes: (free-connex) acyclic, semi-join algebra, and natural join queries; the latter in the worst-case framework.
The second part is about deciding parallel-correctness of distributed evaluation strategies: Given a query and policies specifying how data is distributed and communicated among multiple servers, does the distributed evaluation yield the same result as the classical evaluation, for every database? Ketsman et al. proved that parallel-correctness for Datalog is undecidable; by reduction from the undecidable containment problem for Datalog. We show that parallel-correctness is already undecidable for monadic and frontier-guarded Datalog queries, for which containment is decidable. However, deciding parallel-correctness for frontier-guarded Datalog and constraint-based communication policies satisfying a certain property is 2ExpTime-complete. Furthermore, we obtain the same bounds for the parallel-boundedness problem, which asks whether the number of required communication rounds is bounded, over all databases.
The third part is about structurally simple rewritings. The (classical) rewriting problem asks whether, for a given query and a set of views, there is a query, called rewriting, over the views that is equivalent to the given query. We study the variant of this problem for (subclasses of) conjunctive queries and views that asks for a structurally simple rewriting. We prove that, if the given query is acyclic, an acyclic rewriting exists if there is any rewriting at all. Analogous statements hold for free-connex acyclic, hierarchical, and q-hierarchical queries. Furthermore, we prove that the problem is NP-hard, even if the given query and the views are acyclic or hierarchical. It becomes tractable if the views are free-connex acyclic or q-hierarchical (and the arity of the database schema is bounded)
Computing Shortest Paths in Series-Parallel Graphs in Logarithmic Space
Series-parallel graphs, which are built by repeatedly applying
series or parallel composition operations to paths, play an
important role in computer science as they model the flow of
information in many types of programs. For directed series-parallel
graphs, we study the problem of finding a shortest path between two
given vertices. Our main result is that we can find such a path in
logarithmic space, which shows that the distance problem for
series-parallel graphs is L-complete. Previously, it was known
that one can compute some path in logarithmic space; but for
other graph types, like undirected graphs or tournament graphs,
constructing some path between given vertices is possible in
logarithmic space while constructing a shortest path is
NL-complete
Flexible Smart Display with Integrated Graphics Rasterizor using Single Grain TFTs
Flexible electronics is a fast emerging market and includes electronics fabricated on flexible substrates, large area displays, low cost and disposable electronics. Both research and commercial institutions around the world have been trying to develop low temperature processes which will enable fabrication of electronic devices on arbitrary substrates including glass and plastic. While most of these technologies are still in the research phase, many approaches have shown promising results. One such technology is being developed in DIMES, TU Delft which uses single grain silicon crystals to fabricate Single Grain Thin Film Transistors (SG-TFTs) at plastic compatible temperatures. SG-TFTs and other similar technologies can potentially enable fabricating electronics directly on arbitrary substrates. This would further enable integration of embedded intelligence in devices that would enhance the current functionalists of displays. This paper is an effort in this direction as it undertakes a study to design a flexible display with an integrated graphics rasterizor unit. The paper introduces the novel idea to move parts of the graphics pipeline from the CPU/GPU to the display. This will add intelligence to the display so as to realize a smart-display. The paper proposes several architectures for implementing a rasterizor unit on smart-display, conceptually fabricated on a flexible substrate using SG-TFT technology. While the transistors fabricated with SG-TFT and similar technologies are relatively slower than the standard CMOS, this paper proposes and concludes that a tile based system design can potentially result into enhanced system performance
On expressing different concurrency paradigms on virtual execution environment
Virtual execution environments (VEE) such as the Java Virtual Machine (JVM) and the Microsoft Common Language Runtime (CLR) have been designed when the dominant computer architecture featured a Von-Neumann interface to programs: a single processor hiding all the complexity of parallel computations inside its design. Programs are expressed in an intermediate form that is executed by the VEE that defines an abstract computational model in which the concurrency model has been influenced by these design choices and it basically exposes the multi-threading model of the underlying operating system. Recently computer systems have introduced computational units in which concurrency is explicit and under program control. Relevant examples are the Graphical Processing Units (GPU such as Nvidia or AMD) and the Cell BE architecture which allow for explicit control of single processing unit, local memories and communication channels. Unfortunately programs designed for Virtual Machines cannot access to these resources since are not available through the abstractions provided by the VEE. A major redesign of VEEs seems to be necessary in order to bridge this gap. In this thesis we study the problem of exposing non-von Neumann computing resources within the Virtual Machine without need for a redesign of the whole execution infrastructure. In this work we express parallel computations relying on extensible meta-data and reflection to encode information. Meta-programming techniques are then used to rewrite the program into an equivalent one using the special purpose underlying architecture. We provide a case study in which this approach is applied to compiling Common Intermediate Language (CIL) methods to multi-core GPUs; we show that it is possible to access these non-standard computing resources without any change to the virtual machine design
Models for Parallel Computation in Multi-Core, Heterogeneous, and Ultra Wide-Word Architectures
Multi-core processors have become the dominant processor architecture with 2, 4, and 8 cores on a chip being widely available and an increasing number of cores predicted for the future. In addition, the decreasing costs and increasing programmability of Graphic Processing Units (GPUs) have made these an accessible source of parallel processing power in general purpose computing. Among the many research challenges that this scenario has raised are the fundamental problems related to theoretical modeling of computation in these architectures. In this thesis we study several aspects of computation in modern parallel architectures, from modeling of computation in multi-cores and heterogeneous platforms, to multi-core cache management strategies, through the proposal of an architecture that exploits bit-parallelism on thousands of bits.
Observing that in practice multi-cores have a small number of cores, we propose a model for low-degree parallelism for these architectures. We argue that assuming a small number of processors (logarithmic in a problem's input size) simplifies the design of parallel algorithms. We show that in this model a large class of divide-and-conquer and dynamic programming algorithms can be parallelized with simple modifications to sequential programs, while achieving optimal parallel speedups. We further explore low-degree-parallelism in computation, providing evidence of fundamental differences in practice and theory between systems with a sublinear and linear number of processors, and suggesting a sharp theoretical gap between the classes of problems that are efficiently parallelizable in each case.
Efficient strategies to manage shared caches play a crucial role in multi-core performance. We propose a model for paging in multi-core shared caches, which extends classical paging to a setting in which several threads share the cache. We show that in this setting traditional cache management policies perform poorly, and that any effective strategy must partition the cache among threads, with a partition that adapts dynamically to the demands of each thread. Inspired by the shared cache setting,
we introduce the minimum cache usage problem, an extension to classical sequential paging in which algorithms must account for the amount of cache they use.
This cache-aware model seeks algorithms with good performance in terms of faults and the amount of cache used, and has applications in energy efficient caching and in shared cache scenarios.
The wide availability of GPUs has added to the parallel power of multi-cores, however, most applications underutilize the available resources. We propose a model for hybrid computation in heterogeneous systems with multi-cores and GPU, and describe strategies for generic parallelization and efficient scheduling of a large class of divide-and-conquer algorithms.
Lastly, we introduce the Ultra-Wide Word architecture and model, an extension of the word-RAM model, that allows for constant time operations on thousands of bits in parallel. We show that a large class of existing algorithms can be
implemented in the Ultra-Wide Word model, achieving speedups comparable to those of multi-threaded computations, while avoiding the more difficult aspects of parallel programming
Recommended from our members
An investigation to study the feasibility of on-line bibliographic information retrieval system using an APP
This thesis was submitted for the degree of Doctor of Philosophy and was awarded by Brunel University.This thesis reports an investigation on the feasibility study of a
searching mechanism using an APP suitable for an on-line bibliographic
retrieval, operation, especially for retrospective searches.
From the study of the searching methods used in the conventional
systems it is seen that elaborate file- and data- structures are
introduced to improve the response time of the system. These
consequently lead to software and hardware redundancies. To mask
these complexities of the system an expensive computer with higher
capabilities and more powerful instruction set is commonly used.
Thus the service of the systen becomes cost-ineffective.
On the other hand the primitive operations of a searching mechanism,
such as, association, domain selection, intersection and unions, are
the intrinsic features of an associative parallel processor. Therefore
it is important to establish the feasibility of an APP as a cost-effective
searching mechanise.
In this thesis a searching mechanism using an 'ON-THE-FLY' searching
technique has been proposed. The parallel search unit uses a Byte-oriented
VRL-APP for efficient character string processing.
At the time of undertaking this work the specification for neither the
retrieval systems nor the BO-VRL APP's were well established; hence a
two-phase investigation was originated. In the Phase I of the work a
bottom up approach was adopted to derive a formal and precise
specification for the BO-VRL-APP. During the Phase II of the work
a top-down approach was opted for the implementation of the searching
mechanism.
An experimental research vehicle has been developed to establish
the feasibility of an APP as a cost-effective searching mechanism.
Although rigorous proof of the feasibility has not been obtained,
the thesis establishes that the APP is well suited for on-line
bibligraphic information retrieval operations where substring searches
including boolean selection and threshold weights are efficiently
supported