861 research outputs found
Performance of a parallel code for the Euler equations on hypercube computers
The performance of hypercubes were evaluated on a computational fluid dynamics problem and the parallel environment issues were considered that must be addressed, such as algorithm changes, implementation choices, programming effort, and programming environment. The evaluation focuses on a widely used fluid dynamics code, FLO52, which solves the two dimensional steady Euler equations describing flow around the airfoil. The code development experience is described, including interacting with the operating system, utilizing the message-passing communication system, and code modifications necessary to increase parallel efficiency. Results from two hypercube parallel computers (a 16-node iPSC/2, and a 512-node NCUBE/ten) are discussed and compared. In addition, a mathematical model of the execution time was developed as a function of several machine and algorithm parameters. This model accurately predicts the actual run times obtained and is used to explore the performance of the code in interesting but yet physically realizable regions of the parameter space. Based on this model, predictions about future hypercubes are made
CCL: a portable and tunable collective communication library for scalable parallel computers
A collective communication library for parallel computers includes frequently used operations such as broadcast, reduce, scatter, gather, concatenate, synchronize, and shift. Such a library provides users with a convenient programming interface, efficient communication operations, and the advantage of portability. A library of this nature, the Collective Communication Library (CCL), intended for the line of scalable parallel computer products by IBM, has been designed. CCL is part of the parallel application programming interface of the recently announced IBM 9076 Scalable POWERparallel System 1 (SP1). In this paper, we examine several issues related to the functionality, correctness, and performance of a portable collective communication library while focusing on three novel aspects in the design and implementation of CCL: 1) the introduction of process groups, 2) the definition of semantics that ensures correctness, and 3) the design of new and tunable algorithms based on a realistic point-to-point communication model
A Parallel Algorithm for Exact Bayesian Structure Discovery in Bayesian Networks
Exact Bayesian structure discovery in Bayesian networks requires exponential
time and space. Using dynamic programming (DP), the fastest known sequential
algorithm computes the exact posterior probabilities of structural features in
time and space, if the number of nodes (variables) in the
Bayesian network is and the in-degree (the number of parents) per node is
bounded by a constant . Here we present a parallel algorithm capable of
computing the exact posterior probabilities for all edges with optimal
parallel space efficiency and nearly optimal parallel time efficiency. That is,
if processors are used, the run-time reduces to
and the space usage becomes per
processor. Our algorithm is based the observation that the subproblems in the
sequential DP algorithm constitute a - hypercube. We take a delicate way
to coordinate the computation of correlated DP procedures such that large
amount of data exchange is suppressed. Further, we develop parallel techniques
for two variants of the well-known \emph{zeta transform}, which have
applications outside the context of Bayesian networks. We demonstrate the
capability of our algorithm on datasets with up to 33 variables and its
scalability on up to 2048 processors. We apply our algorithm to a biological
data set for discovering the yeast pheromone response pathways.Comment: 32 pages, 12 figure
Efficient Bayesian Learning in Social Networks with Gaussian Estimators
We consider a group of Bayesian agents who try to estimate a state of the
world through interaction on a social network. Each agent
initially receives a private measurement of : a number picked
from a Gaussian distribution with mean and standard deviation one.
Then, in each discrete time iteration, each reveals its estimate of to
its neighbors, and, observing its neighbors' actions, updates its belief using
Bayes' Law.
This process aggregates information efficiently, in the sense that all the
agents converge to the belief that they would have, had they access to all the
private measurements. We show that this process is computationally efficient,
so that each agent's calculation can be easily carried out. We also show that
on any graph the process converges after at most steps, where
is the number of agents and is the diameter of the network. Finally, we
show that on trees and on distance transitive-graphs the process converges
after steps, and that it preserves privacy, so that agents learn very
little about the private signal of most other agents, despite the efficient
aggregation of information. Our results extend those in an unpublished
manuscript of the first and last authors.Comment: Added coauthor. Added proofs for fast convergence on trees and
distance transitive graphs. Also, now analyzing a notion of privac
A partitioning strategy for nonuniform problems on multiprocessors
The partitioning of a problem on a domain with unequal work estimates in different subddomains is considered in a way that balances the work load across multiple processors. Such a problem arises for example in solving partial differential equations using an adaptive method that places extra grid points in certain subregions of the domain. A binary decomposition of the domain is used to partition it into rectangles requiring equal computational effort. The communication costs of mapping this partitioning onto different microprocessors: a mesh-connected array, a tree machine and a hypercube is then studied. The communication cost expressions can be used to determine the optimal depth of the above partitioning
I/O embedding and broadcasting in star interconnection networks
The issues of communication between a host or central controller and processors, in large interconnection networks are very important and have been studied in the past by several researchers. There is a plethora of problems that arise when processors are asked to exchange information on parallel computers on which processors are interconnected according to a specific topology. In robust networks, it is desirable at times to send (receive) data/control information to (from) all the processors in minimal time. This type of communication is commonly referred to as broadcasting. To speed up broadcasting in a given network without modifying its topology, certain processors called stations can be specified to act as relay agents. In this thesis, broadcasting issues in a star-based interconnection network are studied. The model adopted assumes all-port communication and wormhole switching mechanism. Initially, the problem treated is one of finding the minimum number of stations required to cover all the nodes in the star graph with i-adjacency. We consider 1-, 2-, and 3-adjacencies and determine the upper bound on the number of stations required to cover the nodes for each case. After deriving the number of stations, two algorithms are designed to broadcast the messages first from the host to stations, and then from stations to remaining nodes; In addition, a Binary-based Algorithm is designed to allow routing in the network by directly working on the binary labels assigned to the star graph. No look-up table is consulted during routing and minimum number of bits are used to represent a node label. At the end, the thesis sheds light on another algorithm for routing using parallel paths in the star network
Analysis and conception of tuple spaces in the eye of scalability
Applications in the emerging fields of eCommerce and
Ubiquitous Computing are composed of heterogenous systems that
have been designed separately.
Hence, these systems loosely coupled and require a coordination
mechanism that is able to gap spatial and temporal remoteness.
The use of tuple spaces for data-driven coordination of these
systems has been proposed in the past. In addition, applications
of eCommerce and Ubiquitous Computing are not bound to a
predefined size, so that the underlying coordination
mechanism has to be highly scalable. However, it seems to be
difficult to conceive a scalable tuple space.
This report is an English version of the author\u27s diploma
thesis. It comprises the chapter two, three, four, and five. By
this means, the design and the implementation of the proposed
tuple space is not part of this report
- …