Search CORE

1,206 research outputs found

Supporting shared data structures on distributed memory architectures

Author: Koelbel Charles
Mehrotra Piyush
Vanrosendale John
Publication venue
Publication date
Field of study

Programming nonshared memory systems is more difficult than programming shared memory systems, since there is no support for shared data structures. Current programming languages for distributed memory architectures force the user to decompose all data structures into separate pieces, with each piece owned by one of the processors in the machine, and with all communication explicitly specified by low-level message-passing primitives. A new programming environment is presented for distributed memory architectures, providing a global name space and allowing direct access to remote parts of data values. The analysis and program transformations required to implement this environment are described, and the efficiency of the resulting code on the NCUBE/7 and IPSC/2 hypercubes are described

NASA Technical Reports Server

A message passing kernel for the hypercluster parallel processing test bed

Author: Blech Richard A.
Cole Gary L.
Quealy Angela
Publication venue
Publication date
Field of study

A Message-Passing Kernel (MPK) for the Hypercluster parallel-processing test bed is described. The Hypercluster is being developed at the NASA Lewis Research Center to support investigations of parallel algorithms and architectures for computational fluid and structural mechanics applications. The Hypercluster resembles the hypercube architecture except that each node consists of multiple processors communicating through shared memory. The MPK efficiently routes information through the Hypercluster, using a message-passing protocol when necessary and faster shared-memory communication whenever possible. The MPK also interfaces all of the processors with the Hypercluster operating system (HYCLOPS), which runs on a Front-End Processor (FEP). This approach distributes many of the I/O tasks to the Hypercluster processors and eliminates the need for a separate I/O support program on the FEP

NASA Technical Reports Server

Hypercube technology

Author: Cwik Tom
Ferraro Robert D.
Liewer Paulett C.
Parker Jay W.
Patterson Jean E.
Publication venue
Publication date
Field of study

The JPL designed MARKIII hypercube supercomputer has been in application service since June 1988 and has had successful application to a broad problem set including electromagnetic scattering, discrete event simulation, plasma transport, matrix algorithms, neural network simulation, image processing, and graphics. Currently, problems that are not homogeneous are being attempted, and, through this involvement with real world applications, the software is evolving to handle the heterogeneous class problems efficiently

NASA Technical Reports Server

Recent Developments in Parallelization of the Multidimensional Integration Package DICE

Author: F. Yuasa
K. Tobimatsu
S. Kawabata
Yuasa
Publication venue: 'Elsevier BV'
Publication date: 01/01/2005
Field of study

DICE is a general purpose multidimensional numerical integration package. There can be two ways in the parallelization of DICE, "distributing random numbers into workers" and "distributing hypercubes into workers". Furthermore, there can be the combination of both ways. So far, we had developed the parallelization code using the former way and reported it in ACAT2002 in Moscow. Here, we will present the recent developments of parallelized DICE in the latter way as the 2nd stage of our parallelization activities.Comment: 5 pages, 2 figures, Talk given at the X International Workshop on Advanced Computing and Analysis Techniques in Physics Research, ACAT 2005, DESY-Zeuthen, Germany, 22-27 May 200

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

Initial operating capability for the hypercluster parallel-processing test bed

Author: Blech Richard A.
Cole Gary L.
Quealy Angela
Publication venue
Publication date
Field of study

The NASA Lewis Research Center is investigating the benefits of parallel processing to applications in computational fluid and structural mechanics. To aid this investigation, NASA Lewis is developing the Hypercluster, a multi-architecture, parallel-processing test bed. The initial operating capability (IOC) being developed for the Hypercluster is described. The IOC will provide a user with a programming/operating environment that is interactive, responsive, and easy to use. The IOC effort includes the development of the Hypercluster Operating System (HYCLOPS). HYCLOPS runs in conjunction with a vendor-supplied disk operating system on a Front-End Processor (FEP) to provide interactive, run-time operations such as program loading, execution, memory editing, and data retrieval. Run-time libraries, that augment the FEP FORTRAN libraries, are being developed to support parallel and vector processing on the Hypercluster. Special utilities are being provided to enable passage of information about application programs and their mapping to the operating system. Communications between the FEP and the Hypercluster are being handled by dedicated processors, each running a Message-Passing Kernel, (MPK). A shared-memory interface allows rapid data exchange between HYCLOPS and the communications processors. Input/output handlers are built into the HYCLOPS-MPK interface, eliminating the need for the user to supply separate I/O support programs on the FEP

NASA Technical Reports Server

Performance of a parallel code for the Euler equations on hypercube computers

Author: Barszcz Eric
Chan Tony F.
Jesperson Dennis C.
Tuminaro Raymond S.
Publication venue
Publication date
Field of study

The performance of hypercubes were evaluated on a computational fluid dynamics problem and the parallel environment issues were considered that must be addressed, such as algorithm changes, implementation choices, programming effort, and programming environment. The evaluation focuses on a widely used fluid dynamics code, FLO52, which solves the two dimensional steady Euler equations describing flow around the airfoil. The code development experience is described, including interacting with the operating system, utilizing the message-passing communication system, and code modifications necessary to increase parallel efficiency. Results from two hypercube parallel computers (a 16-node iPSC/2, and a 512-node NCUBE/ten) are discussed and compared. In addition, a mathematical model of the execution time was developed as a function of several machine and algorithm parameters. This model accurately predicts the actual run times obtained and is used to explore the performance of the code in interesting but yet physically realizable regions of the parameter space. Based on this model, predictions about future hypercubes are made

NASA Technical Reports Server

A Parallel Algorithm for Exact Bayesian Structure Discovery in Bayesian Networks

Author: Jin Tian
Olga Nikolova
Sage Bionetworks
Srinivas Aluru
Yetian Chen
Publication venue
Publication date: 13/08/2016
Field of study

Exact Bayesian structure discovery in Bayesian networks requires exponential time and space. Using dynamic programming (DP), the fastest known sequential algorithm computes the exact posterior probabilities of structural features in

O(2(d+1)n2^n)

time and space, if the number of nodes (variables) in the Bayesian network is

n

and the in-degree (the number of parents) per node is bounded by a constant

d

. Here we present a parallel algorithm capable of computing the exact posterior probabilities for all

n(n-1)

edges with optimal parallel space efficiency and nearly optimal parallel time efficiency. That is, if

p=2^k

processors are used, the run-time reduces to

O(5(d+1)n2^{n-k}+k(n-k)^d)

and the space usage becomes

O(n2^{n-k})

per processor. Our algorithm is based the observation that the subproblems in the sequential DP algorithm constitute a

n

D

hypercube. We take a delicate way to coordinate the computation of correlated DP procedures such that large amount of data exchange is suppressed. Further, we develop parallel techniques for two variants of the well-known \emph{zeta transform}, which have applications outside the context of Bayesian networks. We demonstrate the capability of our algorithm on datasets with up to 33 variables and its scalability on up to 2048 processors. We apply our algorithm to a biological data set for discovering the yeast pheromone response pathways.Comment: 32 pages, 12 figure

arXiv.org e-Print Archive

CiteSeerX

Budget-constrained Edge Service Provisioning with Demand Estimation via Bandit Learning

Author: Chen Lixing
Xu Jie
Publication venue
Publication date: 21/03/2019
Field of study

Shared edge computing platforms, which enable Application Service Providers (ASPs) to deploy applications in close proximity to mobile users are providing ultra-low latency and location-awareness to a rich portfolio of services. Though ubiquitous edge service provisioning, i.e., deploying the application at all possible edge sites, is always preferable, it is impractical due to often limited operational budget of ASPs. In this case, an ASP has to cautiously decide where to deploy the edge service and how much budget it is willing to use. A central issue here is that the service demand received by each edge site, which is the key factor of deploying benefit, is unknown to ASPs a priori. What's more complicated is that this demand pattern varies temporally and spatially across geographically distributed edge sites. In this paper, we investigate an edge resource rental problem where the ASP learns service demand patterns for individual edge sites while renting computation resource at these sites to host its applications for edge service provisioning. An online algorithm, called Context-aware Online Edge Resource Rental (COERR), is proposed based on the framework of Contextual Combinatorial Multi-armed Bandit (CC-MAB). COERR observes side-information (context) to learn the demand patterns of edge sites and decides rental decisions (including where to rent the computation resource and how much to rent) to maximize ASP's utility given a limited budget. COERR provides a provable performance achieving sublinear regret compared to an Oracle algorithm that knows exactly the expected service demand of edge sites. Experiments are carried out on a real-world dataset and the results show that COERR significantly outperforms other benchmarks

arXiv.org e-Print Archive

University of Miami: Scholarship Miami

Spatio-temporal Edge Service Placement: A Bandit Learning Approach

Author: Chen Lixing
Ren Shaolei
Xu Jie
Zhou Pan
Publication venue
Publication date: 06/10/2018
Field of study

Shared edge computing platforms deployed at the radio access network are expected to significantly improve quality of service delivered by Application Service Providers (ASPs) in a flexible and economic way. However, placing edge service in every possible edge site by an ASP is practically infeasible due to the ASP's prohibitive budget requirement. In this paper, we investigate the edge service placement problem of an ASP under a limited budget, where the ASP dynamically rents computing/storage resources in edge sites to host its applications in close proximity to end users. Since the benefit of placing edge service in a specific site is usually unknown to the ASP a priori, optimal placement decisions must be made while learning this benefit. We pose this problem as a novel combinatorial contextual bandit learning problem. It is "combinatorial" because only a limited number of edge sites can be rented to provide the edge service given the ASP's budget. It is "contextual" because we utilize user context information to enable finer-grained learning and decision making. To solve this problem and optimize the edge computing performance, we propose SEEN, a Spatial-temporal Edge sErvice placemeNt algorithm. Furthermore, SEEN is extended to scenarios with overlapping service coverage by incorporating a disjunctively constrained knapsack problem. In both cases, we prove that our algorithm achieves a sublinear regret bound when it is compared to an oracle algorithm that knows the exact benefit information. Simulations are carried out on a real-world dataset, whose results show that SEEN significantly outperforms benchmark solutions

arXiv.org e-Print Archive

University of Miami: Scholarship Miami

Semi-automatic process partitioning for parallel computation

Author: Koelbel Charles
Mehrotra Piyush
Vanrosendale John
Publication venue
Publication date
Field of study

On current multiprocessor architectures one must carefully distribute data in memory in order to achieve high performance. Process partitioning is the operation of rewriting an algorithm as a collection of tasks, each operating primarily on its own portion of the data, to carry out the computation in parallel. A semi-automatic approach to process partitioning is considered in which the compiler, guided by advice from the user, automatically transforms programs into such an interacting task system. This approach is illustrated with a picture processing example written in BLAZE, which is transformed into a task system maximizing locality of memory reference

NASA Technical Reports Server