1,206 research outputs found
Supporting shared data structures on distributed memory architectures
Programming nonshared memory systems is more difficult than programming shared memory systems, since there is no support for shared data structures. Current programming languages for distributed memory architectures force the user to decompose all data structures into separate pieces, with each piece owned by one of the processors in the machine, and with all communication explicitly specified by low-level message-passing primitives. A new programming environment is presented for distributed memory architectures, providing a global name space and allowing direct access to remote parts of data values. The analysis and program transformations required to implement this environment are described, and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes are described
A message passing kernel for the hypercluster parallel processing test bed
A Message-Passing Kernel (MPK) for the Hypercluster parallel-processing test bed is described. The Hypercluster is being developed at the NASA Lewis Research Center to support investigations of parallel algorithms and architectures for computational fluid and structural mechanics applications. The Hypercluster resembles the hypercube architecture except that each node consists of multiple processors communicating through shared memory. The MPK efficiently routes information through the Hypercluster, using a message-passing protocol when necessary and faster shared-memory communication whenever possible. The MPK also interfaces all of the processors with the Hypercluster operating system (HYCLOPS), which runs on a Front-End Processor (FEP). This approach distributes many of the I/O tasks to the Hypercluster processors and eliminates the need for a separate I/O support program on the FEP
Hypercube technology
The JPL designed MARKIII hypercube supercomputer has been in application service since June 1988 and has had successful application to a broad problem set including electromagnetic scattering, discrete event simulation, plasma transport, matrix algorithms, neural network simulation, image processing, and graphics. Currently, problems that are not homogeneous are being attempted, and, through this involvement with real world applications, the software is evolving to handle the heterogeneous class problems efficiently
Recent Developments in Parallelization of the Multidimensional Integration Package DICE
DICE is a general purpose multidimensional numerical integration package.
There can be two ways in the parallelization of DICE, "distributing random
numbers into workers" and "distributing hypercubes into workers". Furthermore,
there can be the combination of both ways. So far, we had developed the
parallelization code using the former way and reported it in ACAT2002 in
Moscow. Here, we will present the recent developments of parallelized DICE in
the latter way as the 2nd stage of our parallelization activities.Comment: 5 pages, 2 figures, Talk given at the X International Workshop on
Advanced Computing and Analysis Techniques in Physics Research, ACAT 2005,
DESY-Zeuthen, Germany, 22-27 May 200
Initial operating capability for the hypercluster parallel-processing test bed
The NASA Lewis Research Center is investigating the benefits of parallel processing to applications in computational fluid and structural mechanics. To aid this investigation, NASA Lewis is developing the Hypercluster, a multi-architecture, parallel-processing test bed. The initial operating capability (IOC) being developed for the Hypercluster is described. The IOC will provide a user with a programming/operating environment that is interactive, responsive, and easy to use. The IOC effort includes the development of the Hypercluster Operating System (HYCLOPS). HYCLOPS runs in conjunction with a vendor-supplied disk operating system on a Front-End Processor (FEP) to provide interactive, run-time operations such as program loading, execution, memory editing, and data retrieval. Run-time libraries, that augment the FEP FORTRAN libraries, are being developed to support parallel and vector processing on the Hypercluster. Special utilities are being provided to enable passage of information about application programs and their mapping to the operating system. Communications between the FEP and the Hypercluster are being handled by dedicated processors, each running a Message-Passing Kernel, (MPK). A shared-memory interface allows rapid data exchange between HYCLOPS and the communications processors. Input/output handlers are built into the HYCLOPS-MPK interface, eliminating the need for the user to supply separate I/O support programs on the FEP
Performance of a parallel code for the Euler equations on hypercube computers
The performance of hypercubes were evaluated on a computational fluid dynamics problem and the parallel environment issues were considered that must be addressed, such as algorithm changes, implementation choices, programming effort, and programming environment. The evaluation focuses on a widely used fluid dynamics code, FLO52, which solves the two dimensional steady Euler equations describing flow around the airfoil. The code development experience is described, including interacting with the operating system, utilizing the message-passing communication system, and code modifications necessary to increase parallel efficiency. Results from two hypercube parallel computers (a 16-node iPSC/2, and a 512-node NCUBE/ten) are discussed and compared. In addition, a mathematical model of the execution time was developed as a function of several machine and algorithm parameters. This model accurately predicts the actual run times obtained and is used to explore the performance of the code in interesting but yet physically realizable regions of the parameter space. Based on this model, predictions about future hypercubes are made
A Parallel Algorithm for Exact Bayesian Structure Discovery in Bayesian Networks
Exact Bayesian structure discovery in Bayesian networks requires exponential
time and space. Using dynamic programming (DP), the fastest known sequential
algorithm computes the exact posterior probabilities of structural features in
time and space, if the number of nodes (variables) in the
Bayesian network is and the in-degree (the number of parents) per node is
bounded by a constant . Here we present a parallel algorithm capable of
computing the exact posterior probabilities for all edges with optimal
parallel space efficiency and nearly optimal parallel time efficiency. That is,
if processors are used, the run-time reduces to
and the space usage becomes per
processor. Our algorithm is based the observation that the subproblems in the
sequential DP algorithm constitute a - hypercube. We take a delicate way
to coordinate the computation of correlated DP procedures such that large
amount of data exchange is suppressed. Further, we develop parallel techniques
for two variants of the well-known \emph{zeta transform}, which have
applications outside the context of Bayesian networks. We demonstrate the
capability of our algorithm on datasets with up to 33 variables and its
scalability on up to 2048 processors. We apply our algorithm to a biological
data set for discovering the yeast pheromone response pathways.Comment: 32 pages, 12 figure
Budget-constrained Edge Service Provisioning with Demand Estimation via Bandit Learning
Shared edge computing platforms, which enable Application Service Providers
(ASPs) to deploy applications in close proximity to mobile users are providing
ultra-low latency and location-awareness to a rich portfolio of services.
Though ubiquitous edge service provisioning, i.e., deploying the application at
all possible edge sites, is always preferable, it is impractical due to often
limited operational budget of ASPs. In this case, an ASP has to cautiously
decide where to deploy the edge service and how much budget it is willing to
use. A central issue here is that the service demand received by each edge
site, which is the key factor of deploying benefit, is unknown to ASPs a
priori. What's more complicated is that this demand pattern varies temporally
and spatially across geographically distributed edge sites. In this paper, we
investigate an edge resource rental problem where the ASP learns service demand
patterns for individual edge sites while renting computation resource at these
sites to host its applications for edge service provisioning. An online
algorithm, called Context-aware Online Edge Resource Rental (COERR), is
proposed based on the framework of Contextual Combinatorial Multi-armed Bandit
(CC-MAB). COERR observes side-information (context) to learn the demand
patterns of edge sites and decides rental decisions (including where to rent
the computation resource and how much to rent) to maximize ASP's utility given
a limited budget. COERR provides a provable performance achieving sublinear
regret compared to an Oracle algorithm that knows exactly the expected service
demand of edge sites. Experiments are carried out on a real-world dataset and
the results show that COERR significantly outperforms other benchmarks
Spatio-temporal Edge Service Placement: A Bandit Learning Approach
Shared edge computing platforms deployed at the radio access network are
expected to significantly improve quality of service delivered by Application
Service Providers (ASPs) in a flexible and economic way. However, placing edge
service in every possible edge site by an ASP is practically infeasible due to
the ASP's prohibitive budget requirement. In this paper, we investigate the
edge service placement problem of an ASP under a limited budget, where the ASP
dynamically rents computing/storage resources in edge sites to host its
applications in close proximity to end users. Since the benefit of placing edge
service in a specific site is usually unknown to the ASP a priori, optimal
placement decisions must be made while learning this benefit. We pose this
problem as a novel combinatorial contextual bandit learning problem. It is
"combinatorial" because only a limited number of edge sites can be rented to
provide the edge service given the ASP's budget. It is "contextual" because we
utilize user context information to enable finer-grained learning and decision
making. To solve this problem and optimize the edge computing performance, we
propose SEEN, a Spatial-temporal Edge sErvice placemeNt algorithm. Furthermore,
SEEN is extended to scenarios with overlapping service coverage by
incorporating a disjunctively constrained knapsack problem. In both cases, we
prove that our algorithm achieves a sublinear regret bound when it is compared
to an oracle algorithm that knows the exact benefit information. Simulations
are carried out on a real-world dataset, whose results show that SEEN
significantly outperforms benchmark solutions
Semi-automatic process partitioning for parallel computation
On current multiprocessor architectures one must carefully distribute data in memory in order to achieve high performance. Process partitioning is the operation of rewriting an algorithm as a collection of tasks, each operating primarily on its own portion of the data, to carry out the computation in parallel. A semi-automatic approach to process partitioning is considered in which the compiler, guided by advice from the user, automatically transforms programs into such an interacting task system. This approach is illustrated with a picture processing example written in BLAZE, which is transformed into a task system maximizing locality of memory reference
- …