82 research outputs found
Vcluster: A Portable Virtual Computing Library For Cluster Computing
Message passing has been the dominant parallel programming model in cluster computing, and libraries like Message Passing Interface (MPI) and Portable Virtual Machine (PVM) have proven their novelty and efficiency through numerous applications in diverse areas. However, as clusters of Symmetric Multi-Processor (SMP) and heterogeneous machines become popular, conventional message passing models must be adapted accordingly to support this new kind of clusters efficiently. In addition, Java programming language, with its features like object oriented architecture, platform independent bytecode, and native support for multithreading, makes it an alternative language for cluster computing. This research presents a new parallel programming model and a library called VCluster that implements this model on top of a Java Virtual Machine (JVM). The programming model is based on virtual migrating threads to support clusters of heterogeneous SMP machines efficiently. VCluster is implemented in 100% Java, utilizing the portability of Java to address the problems of heterogeneous machines. VCluster virtualizes computational and communication resources such as threads, computation states, and communication channels across multiple separate JVMs, which makes a mobile thread possible. Equipped with virtual migrating thread, it is feasible to balance the load of computing resources dynamically. Several large scale parallel applications have been developed using VCluster to compare the performance and usage of VCluster with other libraries. The results of the experiments show that VCluster makes it easier to develop multithreading parallel applications compared to conventional libraries like MPI. At the same time, the performance of VCluster is comparable to MPICH, a widely used MPI library, combined with popular threading libraries like POSIX Thread and OpenMP. In the next phase of our work, we implemented thread group and thread migration to demonstrate the feasibility of dynamic load balancing in VCluster. We carried out experiments to show that the load can be dynamically balanced in VCluster, resulting in a better performance. Thread group also makes it possible to implement collective communication functions between threads, which have been proved to be useful in process based libraries
Silkroad : A system supporting DSM and multiple paradigms in cluster computing
Ph.DDOCTOR OF PHILOSOPH
Flexible multi-layer virtual machine design for virtual laboratory in distributed systems and grids.
We propose a flexible Multi-layer Virtual Machine (MVM) design intended to improve efficiencies in distributed and grid computing and to overcome the known current problems that exist within traditional virtual machine architectures and those used in distributed and grid systems. This thesis presents a novel approach to building a virtual laboratory to support e-science by adapting MVMs within the distributed systems and grids, thereby providing enhanced flexibility and reconfigurability by raising the level of abstraction. The MVM consists of three layers. They are OS-level VM, queue VMs, and components VMs. The group of MVMs provides the virtualized resources, virtualized networks, and reconfigurable components layer for virtual laboratories. We demonstrate how our reconfigurable virtual machine can allow software designers and developers to reuse parallel communication patterns. In our framework, the virtual machines can be created on-demand and their applications can be distributed at the source-code level, compiled and instantiated in runtime. (Abstract shortened by UMI.) Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2005 .K56. Source: Masters Abstracts International, Volume: 44-03, page: 1405. Thesis (M.Sc.)--University of Windsor (Canada), 2005
Алгоритмы динамической балансировки вычислительной нагрузки и их реализации
В работе рассматриваются алгоритмы балансировки вычислительной нагрузки в задачах связанных с адаптивными перестроениями сетки, локальными повышениями порядка аппроксимирующих функций выполняемых на многопроцессорных вычислительных системах, в том числе неоднородных/гибридных. Рассматривается балансировка на уровне: операционной системы, промежуточного программного обеспечения и пользовательского приложения.20-6
Recommended from our members
A strategy for mapping unstructured mesh computational mechanics programs onto distributed memory parallel architectures
The motivation of this thesis was to develop strategies that would enable unstructured mesh based computational mechanics codes to exploit the computational advantages offered by distributed memory parallel processors. Strategies that successfully map structured mesh codes onto parallel machines have been developed over the previous decade and used to build a toolkit for automation of the parallelisation process. Extension of the capabilities of this toolkit to include unstructured mesh codes requires new strategies to be developed.
This thesis examines the method of parallelisation by geometric domain decomposition using the single program multi data programming paradigm with explicit message passing. This technique involves splitting (decomposing) the problem definition into P parts that may be distributed over P processors in a parallel machine. Each processor runs the same program and operates only on its part of the problem. Messages passed between the processors allow data exchange to maintain consistency with the original algorithm.
The strategies developed to parallelise unstructured mesh codes should meet a number of requirements:
The algorithms are faithfully reproduced in parallel.
The code is largely unaltered in the parallel version.
The parallel efficiency is maximised.
The techniques should scale to highly parallel systems.
The parallelisation process should become automated.
Techniques and strategies that meet these requirements are developed and tested in this dissertation using a state of the art integrated computational fluid dynamics and solid mechanics code. The results presented demonstrate the importance of the problem partition in the definition of inter-processor communication and hence parallel performance.
The classical measure of partition quality based on the number of cut edges in the mesh partition can be inadequate for real parallel machines. Consideration of the topology of the parallel machine in the mesh partition is demonstrated to be a more significant factor than the number of cut edges in the achieved parallel efficiency. It is shown to be advantageous to allow an increase in the volume of communication in order to achieve an efficient mapping dominated by localised communications. The limitation to parallel performance resulting from communication startup latency is clearly revealed together with strategies to minimise the effect.
The generic application of the techniques to other unstructured mesh codes is discussed in the context of automation of the parallelisation process. Automation of parallelisation based on the developed strategies is presented as possible through the use of run time inspector loops to accurately determine the dependencies that define the necessary inter-processor communication
Architecture aware parallel programming in Glasgow parallel Haskell (GPH)
General purpose computing architectures are evolving quickly to become manycore
and hierarchical: i.e. a core can communicate more quickly locally than
globally. To be effective on such architectures, programming models must be
aware of the communications hierarchy. This thesis investigates a programming
model that aims to share the responsibility of task placement, load balance, thread
creation, and synchronisation between the application developer and the runtime
system.
The main contribution of this thesis is the development of four new architectureaware
constructs for Glasgow parallel Haskell that exploit information about task
size and aim to reduce communication for small tasks, preserve data locality, or to
distribute large units of work. We define a semantics for the constructs that specifies the sets of PEs that each construct identifies, and we check four properties
of the semantics using QuickCheck.
We report a preliminary investigation of architecture aware programming
models that abstract over the new constructs. In particular, we propose architecture
aware evaluation strategies and skeletons. We investigate three common
paradigms, such as data parallelism, divide-and-conquer and nested parallelism,
on hierarchical architectures with up to 224 cores. The results show that the
architecture-aware programming model consistently delivers better speedup and
scalability than existing constructs, together with a dramatic reduction in the
execution time variability.
We present a comparison of functional multicore technologies and it reports
some of the first ever multicore results for the Feedback Directed Implicit Parallelism
(FDIP) and the semi-explicit parallelism (GpH and Eden) languages. The
comparison reflects the growing maturity of the field by systematically evaluating
four parallel Haskell implementations on a common multicore architecture.
The comparison contrasts the programming effort each language requires with
the parallel performance delivered.
We investigate the minimum thread granularity required to achieve satisfactory
performance for three implementations parallel functional language on a
multicore platform. The results show that GHC-GUM requires a larger thread
granularity than Eden and GHC-SMP. The thread granularity rises as the number
of cores rises
Granularity in Large-Scale Parallel Functional Programming
This thesis demonstrates how to reduce the runtime of large non-strict functional programs using parallel evaluation. The parallelisation of several programs shows the importance of granularity, i.e. the computation costs of program expressions. The aspect of granularity is studied both on a practical level, by presenting and measuring runtime granularity improvement mechanisms, and at a more formal level, by devising a static granularity analysis. By parallelising several large functional programs this thesis demonstrates for the first time the advantages of combining lazy and parallel evaluation on a large scale: laziness aids modularity, while parallelism reduces runtime. One of the parallel programs is the Lolita system which, with more than 47,000 lines of code, is the largest existing parallel non-strict functional program. A new mechanism for parallel programming, evaluation strategies, to which this thesis contributes, is shown to be useful in this parallelisation. Evaluation strategies simplify parallel programming by separating algorithmic code from code specifying dynamic behaviour. For large programs the abstraction provided by functions is maintained by using a data-oriented style of parallelism, which defines parallelism over intermediate data structures rather than inside the functions. A highly parameterised simulator, GRANSIM, has been constructed collaboratively and is discussed in detail in this thesis. GRANSIM is a tool for architecture-independent parallelisation and a testbed for implementing runtime-system features of the parallel graph reduction model. By providing an idealised as well as an accurate model of the underlying parallel machine, GRANSIM has proven to be an essential part of an integrated parallel software engineering environment. Several parallel runtime- system features, such as granularity improvement mechanisms, have been tested via GRANSIM. It is publicly available and in active use at several universities worldwide. In order to provide granularity information this thesis presents an inference-based static granularity analysis. This analysis combines two existing analyses, one for cost and one for size information. It determines an upper bound for the computation costs of evaluating an expression in a simple strict higher-order language. By exposing recurrences during cost reconstruction and using a library of recurrences and their closed forms, it is possible to infer the costs for some recursive functions. The possible performance improvements are assessed by measuring the parallel performance of a hand-analysed and annotated program
FT-PAS-A framework for pattern specific fault-tolerance in parallel programming
Fault-tolerance is an important requirement for long running parallel applications. Many approaches are discussed in various literatures about providing fault-tolerance for parallel systems. Most of them exhibit one or more of these shortcomings in delivering fault-tolerance: non-specific solution (i.e., the fault-tolerance solution is general), no separation-of-concern (i.e., the application developer's involvement in implementing the fault tolerance is significant) and limited to inbuilt fault-tolerance solution. In this thesis, we propose a different approach to deliver fault-tolerance to the parallel programs using a-priori knowledge about their patterns. Our approach is based on the observation that different patterns require different fault-tolerance techniques (specificity). Consequently, we have contributed by classifying patterns into sub-patterns based on fault-tolerance strategies. Moreover, the core functionalities of these fault-tolerance strategies can be abstracted and pre-implemented generically, independent of a specific application. Thus, the pre-packaged solution separates their implementation details from the application developer (separation-of-concern). One such fault-tolerance model is designed and implemented here to demonstrate our idea. The Fault-Tolerant Parallel Architectural Skeleton (FT-PAS) model implements various fault-tolerance protocols targeted for a collection of (frequently used) patterns in parallel-programming. Fault-tolerance protocol extension is another important contribution of this research. The FT-PAS model provides a set of basic building blocks as part of protocol extension in order to build new fault- tolerance protocols as needed for available patterns. Finally, the usages of the model from the perspective of two user categories (i.e., an application developer and a protocol designer) are illustrated through examples
- …