372,692 research outputs found
Parallel and Distributed Algorithms for a Class of Graph-Related Computational Problems.
There exist at least two models of parallel computing, namely, shared-memory and message-passing. This research addresses problems in both these types of systems, and proposes efficient parallel (Shared-Memory Model) and distributed (message-passing) algorithms for a variety of graph related computational problems. In part I, we design algorithms for three generic problems in distributed systems: set manipulation, network structure recognition and facility placement. We present optimal distributed algorithms for recognizing rectangular-mesh networks. The time and message complexity of our algorithm is linear in the number of nodes in the network. We also lay the foundation for the recognition of 2-reducible, outer-planar and cactus graphs. These algorithms have a message complexity of O(kn), where, k is the number of isolated two degree nodes in the network. We introduce the problem of reliable r-domination and design unified optimal distributed algorithms for the total, reliable and independent r-domination on trees. The time and message complexity of our algorithm is O(n), where n is the number of nodes in the tree. In the domain of set manipulation we design optimal algorithms for determining the intersection of sets in a distributed environment, where each processor is assumed to have its own set. The time and message complexity of our set intersection algorithm is O(mn), where m is the cardinality of the smallest set. In part II of our research we design optimal algorithms for r-domination and efficient parallel algorithms for the p-center problems on trees. We also present an optimal algorithm for computing the maximum independent set on intervals i the EREW-PRAM model. The r-domination problem on trees can now be solved in O(logn)time with O(n/logn) processors using the EREW-PRAM model. A parallel algorithm for range searching is developed using the concept of distributed data structures. We show that O(logn) search time can be effected for a range search on n 3-dimensional points using (2.log\sp2n-14.logn + 8) processors. Our algorithm can easily be generalized for the case of d-dimensional range search. (Abstract shortened with permission of author.)
Multidimensional Range Queries on Modern Hardware
Range queries over multidimensional data are an important part of database
workloads in many applications. Their execution may be accelerated by using
multidimensional index structures (MDIS), such as kd-trees or R-trees. As for
most index structures, the usefulness of this approach depends on the
selectivity of the queries, and common wisdom told that a simple scan beats
MDIS for queries accessing more than 15%-20% of a dataset. However, this wisdom
is largely based on evaluations that are almost two decades old, performed on
data being held on disks, applying IO-optimized data structures, and using
single-core systems. The question is whether this rule of thumb still holds
when multidimensional range queries (MDRQ) are performed on modern
architectures with large main memories holding all data, multi-core CPUs and
data-parallel instruction sets. In this paper, we study the question whether
and how much modern hardware influences the performance ratio between index
structures and scans for MDRQ. To this end, we conservatively adapted three
popular MDIS, namely the R*-tree, the kd-tree, and the VA-file, to exploit
features of modern servers and compared their performance to different flavors
of parallel scans using multiple (synthetic and real-world) analytical
workloads over multiple (synthetic and real-world) datasets of varying size,
dimensionality, and skew. We find that all approaches benefit considerably from
using main memory and parallelization, yet to varying degrees. Our evaluation
indicates that, on current machines, scanning should be favored over parallel
versions of classical MDIS even for very selective queries
Dynamic load balancing for the distributed mining of molecular structures
In molecular biology, it is often desirable to find common properties in large numbers of drug candidates. One family of
methods stems from the data mining community, where algorithms to find frequent graphs have received increasing attention over the
past years. However, the computational complexity of the underlying problem and the large amount of data to be explored essentially
render sequential algorithms useless. In this paper, we present a distributed approach to the frequent subgraph mining problem to
discover interesting patterns in molecular compounds. This problem is characterized by a highly irregular search tree, whereby no
reliable workload prediction is available. We describe the three main aspects of the proposed distributed algorithm, namely, a dynamic
partitioning of the search space, a distribution process based on a peer-to-peer communication framework, and a novel receiverinitiated
load balancing algorithm. The effectiveness of the distributed method has been evaluated on the well-known National Cancer
Institute’s HIV-screening data set, where we were able to show close-to linear speedup in a network of workstations. The proposed
approach also allows for dynamic resource aggregation in a non dedicated computational environment. These features make it suitable
for large-scale, multi-domain, heterogeneous environments, such as computational grids
Recursive Algorithms for Distributed Forests of Octrees
The forest-of-octrees approach to parallel adaptive mesh refinement and
coarsening (AMR) has recently been demonstrated in the context of a number of
large-scale PDE-based applications. Although linear octrees, which store only
leaf octants, have an underlying tree structure by definition, it is not often
exploited in previously published mesh-related algorithms. This is because the
branches are not explicitly stored, and because the topological relationships
in meshes, such as the adjacency between cells, introduce dependencies that do
not respect the octree hierarchy. In this work we combine hierarchical and
topological relationships between octree branches to design efficient recursive
algorithms.
We present three important algorithms with recursive implementations. The
first is a parallel search for leaves matching any of a set of multiple search
criteria. The second is a ghost layer construction algorithm that handles
arbitrarily refined octrees that are not covered by previous algorithms, which
require a 2:1 condition between neighboring leaves. The third is a universal
mesh topology iterator. This iterator visits every cell in a domain partition,
as well as every interface (face, edge and corner) between these cells. The
iterator calculates the local topological information for every interface that
it visits, taking into account the nonconforming interfaces that increase the
complexity of describing the local topology. To demonstrate the utility of the
topology iterator, we use it to compute the numbering and encoding of
higher-order nodal basis functions.
We analyze the complexity of the new recursive algorithms theoretically, and
assess their performance, both in terms of single-processor efficiency and in
terms of parallel scalability, demonstrating good weak and strong scaling up to
458k cores of the JUQUEEN supercomputer.Comment: 35 pages, 15 figures, 3 table
Parallel Weighted Random Sampling
Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory machines. We give efficient, fast, and practicable algorithms for sampling single items, k items with/without replacement, permutations, subsets, and reservoirs. We also give improved sequential algorithms for alias table construction and for sampling with replacement. Experiments on shared-memory parallel machines with up to 158 threads show near linear speedups both for construction and queries
Graphs, Matrices, and the GraphBLAS: Seven Good Reasons
The analysis of graphs has become increasingly important to a wide range of
applications. Graph analysis presents a number of unique challenges in the
areas of (1) software complexity, (2) data complexity, (3) security, (4)
mathematical complexity, (5) theoretical analysis, (6) serial performance, and
(7) parallel performance. Implementing graph algorithms using matrix-based
approaches provides a number of promising solutions to these challenges. The
GraphBLAS standard (istc- bigdata.org/GraphBlas) is being developed to bring
the potential of matrix based graph algorithms to the broadest possible
audience. The GraphBLAS mathematically defines a core set of matrix-based graph
operations that can be used to implement a wide class of graph algorithms in a
wide range of programming environments. This paper provides an introduction to
the GraphBLAS and describes how the GraphBLAS can be used to address many of
the challenges associated with analysis of graphs.Comment: 10 pages; International Conference on Computational Science workshop
on the Applications of Matrix Computational Methods in the Analysis of Modern
Dat
Non-blocking Priority Queue based on Skiplists with Relaxed Semantics
Priority queues are data structures that store information in an orderly fashion. They are of tremendous importance because they are an integral part of many applications, like Dijkstra’s shortest path algorithm, MST algorithms, priority schedulers, and so on.
Since priority queues by nature have high contention on the delete_min operation, the design of an efficient priority queue should involve an intelligent choice of the data structure as well as relaxation bounds on the data structure. Lock-free data structures provide higher scalability as well as progress guarantee than a lock-based data structure. That is another factor to be considered in the priority queue design.
We present a relaxed non-blocking priority queue based on skiplists. We address all the design issues mentioned above in our priority queue. Use of skiplists allows multiple threads to concurrently access different parts of the skiplist quickly, whereas relaxing the priority queue delete_min operation distributes contention over the skiplist instead of just at the front. Furthermore, a non-blocking implementation guarantees that the system will make progress even when some process fails.
Our priority queue is internally composed of several priority queues, one for each thread and one shared priority queue common to all threads. Each thread selects the best value from its local priority queue and the shared priority queue and returns the value. In case a thread is unable to delete an item, it tries to spy items from other threads\u27 local priority queues.
We experimentally and theoretically show the correctness of our data structure. We also compare the performance of our data structure with other variations like priority queues based on coarse-grained skiplists for both relaxed and non-relaxed semantics
- …