774 research outputs found
Support for flexible and transparent distributed computing
Modern distributed computing developed from the traditional supercomputing community rooted firmly
in the culture of batch management. Therefore, the field has been dominated by queuing-based resource
managers and work flow based job submission environments where static resource demands needed be
determined and reserved prior to launching executions. This has made it difficult to support resource
environments (e.g. Grid, Cloud) where the available resources as well as the resource requirements
of applications may be both dynamic and unpredictable. This thesis introduces a flexible execution
model where the compute capacity can be adapted to fit the needs of applications as they change during
execution. Resource provision in this model is based on a fine-grained, self-service approach instead
of the traditional one-time, system-level model. The thesis introduces a middleware based Application
Agent (AA) that provides a platform for the applications to dynamically interact and negotiate resources
with the underlying resource infrastructure.
We also consider the issue of transparency, i.e., hiding the provision and management of the distributed
environment. This is the key to attracting public to use the technology. The AA not only replaces
user-controlled process of preparing and executing an application with a transparent software-controlled
process, it also hides the complexity of selecting right resources to ensure execution QoS. This service
is provided by an On-line Feedback-based Automatic Resource Configuration (OAC) mechanism cooperating
with the flexible execution model. The AA constantly monitors utility-based feedbacks from the
application during execution and thus is able to learn its behaviour and resource characteristics. This
allows it to automatically compose the most efficient execution environment on the fly and satisfy any
execution requirements defined by users. Two policies are introduced to supervise the information learning
and resource tuning in the OAC. The Utility Classification policy classifies hosts according to their
historical performance contributions to the application. According to this classification, the AA chooses
high utility hosts and withdraws low utility hosts to configure an optimum environment. The Desired
Processing Power Estimation (DPPE) policy dynamically configures the execution environment according
to the estimated desired total processing power needed to satisfy users’ execution requirements.
Through the introducing of flexibility and transparency, a user is able to run a dynamic/normal
distributed application anywhere with optimised execution performance, without managing distributed
resources. Based on the standalone model, the thesis further introduces a federated resource negotiation
framework as a step forward towards an autonomous multi-user distributed computing world
A statistical approach to the inverse problem in magnetoencephalography
Magnetoencephalography (MEG) is an imaging technique used to measure the
magnetic field outside the human head produced by the electrical activity
inside the brain. The MEG inverse problem, identifying the location of the
electrical sources from the magnetic signal measurements, is ill-posed, that
is, there are an infinite number of mathematically correct solutions. Common
source localization methods assume the source does not vary with time and do
not provide estimates of the variability of the fitted model. Here, we
reformulate the MEG inverse problem by considering time-varying locations for
the sources and their electrical moments and we model their time evolution
using a state space model. Based on our predictive model, we investigate the
inverse problem by finding the posterior source distribution given the multiple
channels of observations at each time rather than fitting fixed source
parameters. Our new model is more realistic than common models and allows us to
estimate the variation of the strength, orientation and position. We propose
two new Monte Carlo methods based on sequential importance sampling. Unlike the
usual MCMC sampling scheme, our new methods work in this situation without
needing to tune a high-dimensional transition kernel which has a very high
cost. The dimensionality of the unknown parameters is extremely large and the
size of the data is even larger. We use Parallel Virtual Machine (PVM) to speed
up the computation.Comment: Published in at http://dx.doi.org/10.1214/14-AOAS716 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Parallel software tools at Langley Research Center
This document gives a brief overview of parallel software tools available on the Intel iPSC/860 parallel computer at Langley Research Center. It is intended to provide a source of information that is somewhat more concise than vendor-supplied material on the purpose and use of various tools. Each of the chapters on tools is organized in a similar manner covering an overview of the functionality, access information, how to effectively use the tool, observations about the tool and how it compares to similar software, known problems or shortfalls with the software, and reference documentation. It is primarily intended for users of the iPSC/860 at Langley Research Center and is appropriate for both the experienced and novice user
ATCOM: Automatically tuned collective communication system for SMP clusters.
Conventional implementations of collective communications are based on point-to-point communications, and their optimizations have been focused on efficiency of those communication algorithms. However, point-to-point communications are not the optimal choice for modern computing clusters of SMPs due to their two-level communication structure. In recent years, a few research efforts have investigated efficient collective communications for SMP clusters. This dissertation is focused on platform-independent algorithms and implementations in this area;There are two main approaches to implementing efficient collective communications for clusters of SMPs: using shared memory operations for intra-node communications, and over-lapping inter-node/intra-node communications. The former fully utilizes the hardware based shared memory of an SMP, and the latter takes advantage of the inherent hierarchy of the communications within a cluster of SMPs. Previous studies focused on clusters of SMP from certain vendors. However, the previously proposed methods are not portable to other systems. Because the performance optimization issue is very complicated and the developing process is very time consuming, it is highly desired to have self-tuning, platform-independent implementations. As proven in this dissertation, such an implementation can significantly outperform the other point-to-point based portable implementations and some platform-specific implementations;The dissertation describes in detail the architecture of the platform-independent implementation. There are four system components: shared memory-based collective communications, overlapping mechanisms for inter-node and intra-node communications, a prediction-based tuning module and a micro-benchmark based tuning module. Each component is carefully designed with the goal of automatic tuning in mind
Characterization of message-passing overhead on the AP3000 multicomputer
This is a post-peer-review, pre-copyedit version. The final authenticated version is available online at: http://dx.doi.org/10.1109/ICPP.2001.952077[Abstract] The performance of the communication primitives of parallel computers is critical for the overall system performance. The characterization of the communication overhead is very important to estimate the global performance of parallel applications and to detect possible bottlenecks. In this paper, we evaluate, model and compare the performance of the message-passing libraries provided by the Fujitsu AP3000 multicomputer: MPI/AP, PVM/AP and APlib. Our aim is to fairly characterize the communication primitives using general models and performance metrics.Ministerio de Ciencia y Tecnología; 1FD97-0118-C02
Comparison and tuning of MPI implementations in a grid context
Today, clusters are often interconnected by long distance networks within grids to offer a huge number of available ressources to a range of users. MPI, the standard communication library used to write parallel applications, has been implemented for clusters. Two main features of grids: long distance networks and technological heterogeneity, raise the question of MPI efficiency in grids. This report presents an evaluation of four recent MPI implementations (MPICH2, MPICH-Madeleine, OpenMPI and GridMPI) in the french research grid: Grid'5000. The comparison is based on the execution of pingpong, NAS Parallel Benchmarks and a real application in geophysics. We show that this implementations present performance differences. Executing MPI applications on the grid can be beneficial if the parameters are well tuned. The paper details the tuning required on each implementation to get the best performances
Optimisation of patch distribution strategies for AMR applications
As core counts increase in the world's most powerful supercomputers, applications are becoming limited not only by computational power, but also by data availability. In the race to exascale, efficient and effective communication policies are key to achieving optimal application performance. Applications using adaptive mesh refinement (AMR) trade off communication for computational load balancing, to enable the focused computation of specific areas of interest. This class of application is particularly susceptible to the communication performance of the underlying architectures, and are inherently difficult to scale efficiently. In this paper we present a study of the effect of patch distribution strategies on the scalability of an AMR code. We demonstrate the significance of patch placement on communication overheads, and by balancing the computation and communication costs of patches, we develop a scheme to optimise performance of a specific, industry-strength, benchmark application
- …