346 research outputs found
Parallelizing RRT on distributed-memory architectures
This paper addresses the problem of improving the performance of the Rapidly-exploring Random Tree (RRT) algorithm by parallelizing it. For scalability reasons we do so on a distributed-memory architecture, using the message-passing paradigm. We present three parallel versions of RRT along with the technicalities involved in their implementation. We also evaluate the algorithms and study how they behave on different motion planning problems
Transparent Orchestration of Task-based Parallel Applications in Containers Platforms
This paper presents a framework to easily build and execute parallel applications in container-based distributed computing platforms in a user-transparent way. The proposed framework is a combination of the COMP Superscalar (COMPSs) programming model and runtime, which provides a straightforward way to develop task-based parallel applications from sequential codes, and containers management platforms that ease the deployment of applications in computing environments (as Docker, Mesos or Singularity). This framework provides scientists and developers with an easy way to implement parallel distributed applications and deploy them in a one-click fashion. We have built a prototype which integrates COMPSs with different containers engines in different scenarios: i) a Docker cluster, ii) a Mesos cluster, and iii) Singularity in an HPC cluster. We have evaluated the overhead in the building phase, deployment and execution of two benchmark applications compared to a Cloud testbed based on KVM and OpenStack and to the usage of bare metal nodes. We have observed an important gain in comparison to cloud environments during the building and deployment phases. This enables better adaptation of resources with respect to the computational load. In contrast, we detected an extra overhead during the execution, which is mainly due to the multi-host Docker networking.This work is partly supported by the Spanish Government through Programa Severo Ochoa (SEV-2015-0493), by the Spanish Ministry of Science and Technology through TIN2015-65316 project, by the Generalitat de Catalunya under contracts 2014-SGR-1051 and 2014-SGR-1272, and by the European Union through the Horizon 2020 research and innovation program under grant 690116 (EUBra-BIGSEA Project). Results presented in this paper were obtained using the Chameleon testbed supported by the National Science Foundation.Peer ReviewedPostprint (author's final draft
Parallelizing RRT on large-scale distributed-memory architectures
This paper addresses the problem of parallelizing the Rapidly-exploring Random Tree (RRT) algorithm on large-scale distributed-memory architectures, using the Message Passing Interface. We compare three parallel versions of RRT based on classical parallelization schemes. We evaluate them on different motion planning problems and analyze the various factors influencing their performance
Grid and P2P middleware for scientific computing systems
Grid and P2P systems have achieved a notable success in the domain of scientific and engineering applications, which commonly demand considerable amounts of computational resources. However, Grid and P2P systems remain still difficult to be used by the domain scientists and engineers due to the inherent complexity of the corresponding middleware and the lack of adequate documentation. In this paper we survey recent developments of Grid and P2P middleware in the context of scientific computing systems. The differences on the approaches taken for Grid and P2P middleware as well as the common points of both paradigms are highlighted. In addition, we discuss the corresponding programming models, languages, and applications.Peer ReviewedPostprint (published version
A method of evaluation of high-performance computing batch schedulers
According to Sterling et al., a batch scheduler, also called workload management, is an application or set of services that provide a method to monitor and manage the flow of work through the system [Sterling01]. The purpose of this research was to develop a method to assess the execution speed of workloads that are submitted to a batch scheduler. While previous research exists, this research is different in that more complex jobs were devised that fully exercised the scheduler with established benchmarks. This research is important because the reduction of latency even if it is miniscule can lead to massive savings of electricity, time, and money over the long term. This is especially important in the era of green computing [Reuther18]. The methodology used to assess these schedulers involved the execution of custom automation scripts. These custom scripts were developed as part of this research to automatically submit custom jobs to the schedulers, take measurements, and record the results. There were multiple experiments conducted throughout the course of the research. These experiments were designed to apply the methodology and assess the execution speed of a small selection of batch schedulers. Due to time constraints, the research was limited to four schedulers. x The measurements that were taken during the experiments were wall time, RAM usage, and CPU usage. These measurements captured the utilization of system resources of each of the schedulers. The custom scripts were executed using, 1, 2, and 4 servers to determine how well a scheduler scales with network growth. The experiments were conducted on local school resources. All hardware was similar and was co-located within the same data-center. While the schedulers that were investigated as part of the experiments are agnostic to whether the system is grid, cluster, or super-computer; the investigation was limited to a cluster architecture
Parallel, distributed and GPU computing technologies in single-particle electron microscopy
An introduction to the current paradigm shift towards concurrency in software
Adaptive structured parallelism
Algorithmic skeletons abstract commonly-used patterns of parallel computation, communication, and interaction. Parallel programs are expressed by interweaving parameterised skeletons analogously to the way in which structured sequential programs are developed, using well-defined constructs. Skeletons provide top-down design composition and control inheritance throughout the program structure. Based on the algorithmic skeleton concept, structured parallelism provides a high-level parallel programming technique which
allows the conceptual description of parallel programs whilst fostering platform independence and algorithm abstraction. By decoupling the algorithm
specification from machine-dependent structural considerations, structured parallelism allows programmers to code programs regardless of how the computation and communications will be executed in the system platform.Meanwhile, large non-dedicated multiprocessing systems have long posed
a challenge to known distributed systems programming techniques as a result
of the inherent heterogeneity and dynamism of their resources. Scant research
has been devoted to the use of structural information provided by skeletons
in adaptively improving program performance, based on resource utilisation.
This thesis presents a methodology to improve skeletal parallel programming
in heterogeneous distributed systems by introducing adaptivity through resource awareness. As we hypothesise that a skeletal program should be able
to adapt to the dynamic resource conditions over time using its structural forecasting information, we have developed ASPara: Adaptive Structured Parallelism. ASPara is a generic methodology to incorporate structural information at compilation into a parallel program, which will help it to adapt at
execution
Support for flexible and transparent distributed computing
Modern distributed computing developed from the traditional supercomputing community rooted firmly
in the culture of batch management. Therefore, the field has been dominated by queuing-based resource
managers and work flow based job submission environments where static resource demands needed be
determined and reserved prior to launching executions. This has made it difficult to support resource
environments (e.g. Grid, Cloud) where the available resources as well as the resource requirements
of applications may be both dynamic and unpredictable. This thesis introduces a flexible execution
model where the compute capacity can be adapted to fit the needs of applications as they change during
execution. Resource provision in this model is based on a fine-grained, self-service approach instead
of the traditional one-time, system-level model. The thesis introduces a middleware based Application
Agent (AA) that provides a platform for the applications to dynamically interact and negotiate resources
with the underlying resource infrastructure.
We also consider the issue of transparency, i.e., hiding the provision and management of the distributed
environment. This is the key to attracting public to use the technology. The AA not only replaces
user-controlled process of preparing and executing an application with a transparent software-controlled
process, it also hides the complexity of selecting right resources to ensure execution QoS. This service
is provided by an On-line Feedback-based Automatic Resource Configuration (OAC) mechanism cooperating
with the flexible execution model. The AA constantly monitors utility-based feedbacks from the
application during execution and thus is able to learn its behaviour and resource characteristics. This
allows it to automatically compose the most efficient execution environment on the fly and satisfy any
execution requirements defined by users. Two policies are introduced to supervise the information learning
and resource tuning in the OAC. The Utility Classification policy classifies hosts according to their
historical performance contributions to the application. According to this classification, the AA chooses
high utility hosts and withdraws low utility hosts to configure an optimum environment. The Desired
Processing Power Estimation (DPPE) policy dynamically configures the execution environment according
to the estimated desired total processing power needed to satisfy users’ execution requirements.
Through the introducing of flexibility and transparency, a user is able to run a dynamic/normal
distributed application anywhere with optimised execution performance, without managing distributed
resources. Based on the standalone model, the thesis further introduces a federated resource negotiation
framework as a step forward towards an autonomous multi-user distributed computing world
- …