11 research outputs found

    The PVM 3.4 tracing facility and XPVM 1.1

    Full text link
    One problem in developing a parallel program is monitoring its behavior for debugging and performance tuning. This paper discusses an enhanced tracing facility and tool for PVM (Parallel Virtual Machine), a message passing library for parallel processing in a heterogeneous environment. PVM supports mixed collections of workstation clusters, shared-memory multiprocessors, and massively parallel processors. The upcoming release of PVM, Version 3.4, contains a new, improved tracing facility which provides flexible, efficient access to run-time program information. This new tracing system supports a buffering mechanism to reduce perturbation of user applications caused by tracing, and a more flexible trace event definition scheme based on a self-defining data format. The new scheme expedites collection of program execution histories and allows for integration of user-defined custom trace events. Tracing instrumentation is built into the PVM library and allows on-the-fly adjustments to each task`s trace event mask, for control over the level of tracing detail. The graphical console and monitor XPVM have also been updated for better access to the new tracing functionality. Several new views have been implemented to utilize the additional tracing information now possible, including user-defined events. XPVM system has also been optimized for better real-time monitoring

    Using portable monitoring for heterogeneous clusters on Windows and Linux operating systems

    Get PDF
    This paper describes the advances obtained with the XPVM-W95 2.0, a novel monitoring tool for parallel applications that employ PVM-W95 (PVM for Windows) as well as PVM for Linux. The tool provides, at runtime, the appropriate information about parallel virtual machine configuration, parallel applications and workload from each node. The three more important aspects of the XPVM-W95 are: friendly graphical interface, portability and ability to deal with heterogeneity. These items were improved by version 2, mainly when considering the modularity rearrangement. Experiments demonstrate that XPVM-W95 has a stable behavior and reached the objectives proposed. XPVM-W95 allows a great portability of its source code and also allows the monitoring using different metrics. Empirical studies, realized by means of one single application, demonstrated an intrusion of 17,0% on Windows and 0,13% on Linux.Facultad de Informátic

    Millipede: A graphical tool for debugging distributed systems with a multilevel approach

    Full text link
    Much research and development has been applied to the problem of debugging computer programs. Unfortunately, most of this effort has been applied to solving the problem for traditional sequential programs with little attention paid to the parallel and distributed domains. Tracking down and fixing bugs in a parallel or distributed environment presents unique challenges for which these traditional sequential tools are simply not adequate. This thesis describes the development and usage of the Millipede debugging system, a graphical tool that applies the novel technique of multilevel debugging to the distributed debugging problem. By providing a user interface that offers the abstractions, flexibility, and granularity to handle the unique challenges that arise in this field, Millipede presents the user with an effective and compelling environment for the debugging of parallel and distributed programs, while avoiding many of the pitfalls encountered by its predecessors

    Development of a parallel SAR processor on a Beowulf cluster

    Get PDF
    Includes bibliographical references.The purpose of this dissertation is to present the development and testing of the parallelisation of a Range-Doppler SAR processor. The inherent data parallelism found in SAR data lead to the choice of using master slave parallel processor, where copies of a slave task perform the same tasks on different sets of data. However, the SAR processor that was parallelised needed to implement a corner turn without saving data to disk keeping the data set being processed distributed in memory over the nodes in the cluster. This was successfully achieved using a in-place method, thus saving valuable memory resources. Once the parallel processor was implemented some timing tests were performed, yielding a maximum speedup factor of 6.2 for an 8 slave processor system

    Profiling a parallel domain specific language using off-the-shelf tools

    Get PDF
    Profiling tools are essential for understanding and tuning the performance of both parallel programs and parallel language implementations. Assessing the performance of a program in a language with high-level parallel coordination is often complicated by the layers of abstraction present in the language and its implementation. This thesis investigates whether it is possible to profile parallel Domain Specific Languages (DSLs) using existing host language profiling tools. The key challenge is that the host language tools report the performance of the DSL runtime system (RTS) executing the application rather than the performance of the DSL application. The key questions are whether a correct, effective and efficient profiler can be constructed using host language profiling tools; is it possible to effectively profile the DSL implementation, and what capabilities are required of the host language profiling tools? The main contribution of this thesis is the development of an execution profiler for the parallel DSL, Haskell Distributed Parallel Haskell (HdpH) using the host language profiling tools. We show that it is possible to construct a profiler (HdpHProf) to support performance analysis of both the DSL applications and the DSL implementation. The implementation uses several new GHC features, including the GHC-Events Library and ThreadScope, develops two new performance analysis tools for DSL HdpH internals, i.e. Spark Pool Contention Analysis, and Registry Contention Analysis. We present a critical comparative evaluation of the host language profiling tools that we used (GHC-PPS and ThreadScope) with another recent functional profilers, EdenTV, alongside four important imperative profilers. This is the first report on the performance of functional profilers in comparison with well established industrial standard imperative profiling technologies. We systematically compare the profilers for usability and data presentation. We found that the GHC-PPS performs well in terms of overheads and usability so using it to profile the DSL is feasible and would not have significant impact on the DSL performance. We validate HdpHProf for functional correctness and measure its performance using six benchmarks. HdpHProf works correctly and can scale to profile HdpH programs running on up to 192 cores of a 32 nodes Beowulf cluster. We characterise the performance of HdpHProf in terms of profiling data size and profiling execution runtime overhead. It shows that HdpHProf does not alter the behaviour of the GHC-PPS and retains low tracing overheads close to the studied functional profilers; 18% on average. Also, it shows a low ratio of HdpH trace events in GHC-PPS eventlog, less than 3% on average. We show that HdpHProf is effective and efficient to use for performance analysis and tuning of the DSL applications. We use HdpHProf to identify performance issues and to tune the thread granularity of six HdpH benchmarks with different parallel paradigms, e.g. divide and conquer, flat data parallel, and nested data parallel. This include identifying problems such as, too small/large thread granularity, problem size too small for the parallel architecture, and synchronisation bottlenecks. We show that HdpHProf is effective and efficient for tuning the parallel DSL implementation. We use the Spark Pool Contention Analysis tool to examine how the spark pool implementation performs when accessed concurrently. We found that appropriate thread granularity can significantly reduce both conflict ratios, and conflict durations, by more than 90%. We use the Registry Contention Analysis tool to evaluate three alternatives of the registry implementations. We found that the tools can give a better understanding of how different implementations of the HdpH RTS perform

    Engineering Physics and Mathematics Division progress report for period ending December 31, 1994

    Full text link

    Support for flexible and transparent distributed computing

    Get PDF
    Modern distributed computing developed from the traditional supercomputing community rooted firmly in the culture of batch management. Therefore, the field has been dominated by queuing-based resource managers and work flow based job submission environments where static resource demands needed be determined and reserved prior to launching executions. This has made it difficult to support resource environments (e.g. Grid, Cloud) where the available resources as well as the resource requirements of applications may be both dynamic and unpredictable. This thesis introduces a flexible execution model where the compute capacity can be adapted to fit the needs of applications as they change during execution. Resource provision in this model is based on a fine-grained, self-service approach instead of the traditional one-time, system-level model. The thesis introduces a middleware based Application Agent (AA) that provides a platform for the applications to dynamically interact and negotiate resources with the underlying resource infrastructure. We also consider the issue of transparency, i.e., hiding the provision and management of the distributed environment. This is the key to attracting public to use the technology. The AA not only replaces user-controlled process of preparing and executing an application with a transparent software-controlled process, it also hides the complexity of selecting right resources to ensure execution QoS. This service is provided by an On-line Feedback-based Automatic Resource Configuration (OAC) mechanism cooperating with the flexible execution model. The AA constantly monitors utility-based feedbacks from the application during execution and thus is able to learn its behaviour and resource characteristics. This allows it to automatically compose the most efficient execution environment on the fly and satisfy any execution requirements defined by users. Two policies are introduced to supervise the information learning and resource tuning in the OAC. The Utility Classification policy classifies hosts according to their historical performance contributions to the application. According to this classification, the AA chooses high utility hosts and withdraws low utility hosts to configure an optimum environment. The Desired Processing Power Estimation (DPPE) policy dynamically configures the execution environment according to the estimated desired total processing power needed to satisfy users’ execution requirements. Through the introducing of flexibility and transparency, a user is able to run a dynamic/normal distributed application anywhere with optimised execution performance, without managing distributed resources. Based on the standalone model, the thesis further introduces a federated resource negotiation framework as a step forward towards an autonomous multi-user distributed computing world

    Uma arquitectura para a monitorização de computações paralelas e distribuídas

    Get PDF
    O recurso à monitorização do comportamento dos programas durante a execução é necessário em diversos contextos de aplicação. Por exemplo, para verificar a utilização dos recursos computacionais durante a execução, para calcular métricas que permitam melhor definir o perfil da aplicação ou para melhor identificar em que pontos da execução estão as causas de desvios do comportamento desejado de um programa e, noutros casos, para controlar a configuração da aplicação ou do sistema que suporta a sua execução. Esta técnica tem sido aplicada, quer no caso de programas sequenciais, quer se trate de programas distribuídos. Em particular, no caso de computações paralelas, dada a complexidade devida ao seu não determinismo, estas técnicas têm sido a melhor fonte de informação para compreender a execução da aplicação, quer em termos da sua correcção, quer na avaliação do seu desempenho e utilização dos recursos computacionais. As principais dificuldades no desenvolvimento e na adopção de ferramentas de monitorização, prendem-se com a complexidade dos sistemas de computação paralela e distribuída e com a necessidade de desenvolver soluções específicas para cada plataforma, para cada arquitectura e para cada objectivo. No entanto existem funcionalidades genéricas que, se presentes em todos os casos, podem ajudar ao desenvolvimento de novas ferramentas e à sua adaptação a diferentes ambientes computacionais. Esta dissertação propõe um modelo para suportar a observação e o controlo de aplicações paralelas e distribuídas (DAMS - Distributed ApplicationsMonitoring System). O modelo define uma arquitectura abstracta de monitorização baseada num núcleo mínimo sobre o qual assentam conjuntos de serviços que realizam as funcionalidades pretendidas em cada cenário de utilização. A sua organização em camadas de abstracção e a capacidade de extensão modular, permitem suportar o desenvolvimento de conjuntos de funcionalidades que podem ser partilhadas por distintas ferramentas. Por outro lado, o modelo proposto facilita o desenvolvimento de ferramentas de observação e controlo, sobre diferentes plataformas de suporte à execução. Nesta dissertação, são apresentados exemplos da utilização do modelo e da infraestrutura que o suporta, em diversos cenários de observação e controlo. Descreve-se também a experimentação realizada, com base em protótipos desenvolvidos sobre duas plataformas computacionais distintas
    corecore