4,618 research outputs found

    GMA Instrumentation of the Athena Framework using NetLogger

    Full text link
    Grid applications are, by their nature, wide-area distributed applications. This WAN aspect of Grid applications makes the use of conventional monitoring and instrumentation tools (such as top, gprof, LSF Monitor, etc) impractical for verification that the application is running correctly and efficiently. To be effective, monitoring data must be "end-to-end", meaning that all components between the Grid application endpoints must be monitored. Instrumented applications can generate a large amount of monitoring data, so typically the instrumentation is off by default. For jobs running on a Grid, there needs to be a general mechanism to remotely activate the instrumentation in running jobs. The NetLogger Toolkit Activation Service provides this mechanism. To demonstrate this, we have instrumented the ATLAS Athena Framework with NetLogger to generate monitoring events. We then use a GMA-based activation service to control NetLogger's trigger mechanism. The NetLogger trigger mechanism allows one to easily start, stop, or change the logging level of a running program by modifying a trigger file. We present here details of the design of the NetLogger implementation of the GMA-based activation service and the instrumentation service for Athena. We also describe how this activation service allows us to non-intrusively collect and visualize the ATLAS Athena Framework monitoring data

    Design and Evaluation of a Collective IO Model for Loosely Coupled Petascale Programming

    Full text link
    Loosely coupled programming is a powerful paradigm for rapidly creating higher-level applications from scientific programs on petascale systems, typically using scripting languages. This paradigm is a form of many-task computing (MTC) which focuses on the passing of data between programs as ordinary files rather than messages. While it has the significant benefits of decoupling producer and consumer and allowing existing application programs to be executed in parallel with no recoding, its typical implementation using shared file systems places a high performance burden on the overall system and on the user who will analyze and consume the downstream data. Previous efforts have achieved great speedups with loosely coupled programs, but have done so with careful manual tuning of all shared file system access. In this work, we evaluate a prototype collective IO model for file-based MTC. The model enables efficient and easy distribution of input data files to computing nodes and gathering of output results from them. It eliminates the need for such manual tuning and makes the programming of large-scale clusters using a loosely coupled model easier. Our approach, inspired by in-memory approaches to collective operations for parallel programming, builds on fast local file systems to provide high-speed local file caches for parallel scripts, uses a broadcast approach to handle distribution of common input data, and uses efficient scatter/gather and caching techniques for input and output. We describe the design of the prototype model, its implementation on the Blue Gene/P supercomputer, and present preliminary measurements of its performance on synthetic benchmarks and on a large-scale molecular dynamics application.Comment: IEEE Many-Task Computing on Grids and Supercomputers (MTAGS08) 200

    A study of publish/subscribe systems for real-time grid monitoring

    Get PDF
    Monitoring and controlling a large number of geographically distributed scientific instruments is a challenging task. Some operations on these instruments require real-time (or quasi real-time) response which make it even more difficult. In this paper, we describe the requirements of distributed monitoring for a possible future electrical power grid based on real-time extensions to grid computing. We examine several standards and publish/subscribe middleware candidates, some of which were specially designed and developed for grid monitoring. We analyze their architecture and functionality, and discuss the advantages and disadvantages. We report on a series of tests to measure their real-time performance and scalability

    Proximity coherence for chip-multiprocessors

    Get PDF
    Many-core architectures provide an efficient way of harnessing the growing numbers of transistors available in modern fabrication processes; however, the parallel programs run on these platforms are increasingly limited by the energy and latency costs of communication. Existing designs provide a functional communication layer but do not necessarily implement the most efficient solution for chip-multiprocessors, placing limits on the performance of these complex systems. In an era of increasingly power limited silicon design, efficiency is now a primary concern that motivates designers to look again at the challenge of cache coherence. The first step in the design process is to analyse the communication behaviour of parallel benchmark suites such as Parsec and SPLASH-2. This thesis presents work detailing the sharing patterns observed when running the full benchmarks on a simulated 32-core x86 machine. The results reveal considerable locality of shared data accesses between threads with consecutive operating system assigned thread IDs. This pattern, although of little consequence in a multi-node system, corresponds to strong physical locality of shared data between adjacent cores on a chip-multiprocessor platform. Traditional cache coherence protocols, although often used in chip-multiprocessor designs, have been developed in the context of older multi-node systems. By redesigning coherence protocols to exploit new patterns such as the physical locality of shared data, improving the efficiency of communication, specifically in chip-multiprocessors, is possible. This thesis explores such a design – Proximity Coherence – a novel scheme in which L1 load misses are optimistically forwarded to nearby caches via new dedicated links rather than always being indirected via a directory structure.EPSRC DTA research scholarshi

    The home-forwarding mechanism to reduce the cache coherence overhead in next-generation CMPs

    Get PDF
    On the road to computer systems able to support the requirements of exascale applications, Chip Multi-Processors (CMPs) are equipped with an ever increasing number of cores interconnected through fast on-chip networks. To exploit such new architectures, the parallel software must be able to scale almost linearly with the number of cores available. To this end, the overhead introduced by the run-time system of parallel programming frameworks and by the architecture itself must be small enough in order to enable high scalability also for very fine-grained parallel programs. An approach to reduce this overhead is to use non-conventional architectural mechanisms revealing useful when certain concurrency patterns in the running application are statically or dynamically recognized. Following this idea, this paper proposes a run-time support able to reduce the effective latency of inter-thread cooperation primitives by lowering the contention on individual caches. To achieve this goal, the new home-forwarding hardware mechanism is proposed and used by our runtime in order to reduce the amount of cache-to-cache interactions generated by the cache coherence protocol. Our ideas have been emulated on the Tilera TILEPro64 CMP, showing a significant speedup improvement in some first benchmarks

    A Hierarchical Filtering-Based Monitoring Architecture for Large-scale Distributed Systems

    Get PDF
    On-line monitoring is essential for observing and improving the reliability and performance of large-scale distributed (LSD) systems. In an LSD environment, large numbers of events are generated by system components during their execution and interaction with external objects (e.g. users or processes). These events must be monitored to accurately determine the run-time behavior of an LSD system and to obtain status information that is required for debugging and steering applications. However, the manner in which events are generated in an LSD system is complex and represents a number of challenges for an on-line monitoring system. Correlated events axe generated concurrently and can occur at multiple locations distributed throughout the environment. This makes monitoring an intricate task and complicates the management decision process. Furthermore, the large number of entities and the geographical distribution inherent with LSD systems increases the difficulty of addressing traditional issues, such as performance bottlenecks, scalability, and application perturbation. This dissertation proposes a scalable, high-performance, dynamic, flexible and non-intrusive monitoring architecture for LSD systems. The resulting architecture detects and classifies interesting primitive and composite events and performs either a corrective or steering action. When appropriate, information is disseminated to management applications, such as reactive control and debugging tools. The monitoring architecture employs a novel hierarchical event filtering approach that distributes the monitoring load and limits event propagation. This significantly improves scalability and performance while minimizing the monitoring intrusiveness. The architecture provides dynamic monitoring capabilities through: subscription policies that enable applications developers to add, delete and modify monitoring demands on-the-fly, an adaptable configuration that accommodates environmental changes, and a programmable environment that facilitates development of self-directed monitoring tasks. Increased flexibility is achieved through a declarative and comprehensive monitoring language, a simple code instrumentation process, and automated monitoring administration. These elements substantially relieve the burden imposed by using on-line distributed monitoring systems. In addition, the monitoring system provides techniques to manage the trade-offs between various monitoring objectives. The proposed solution offers improvements over related works by presenting a comprehensive architecture that considers the requirements and implied objectives for monitoring large-scale distributed systems. This architecture is referred to as the HiFi monitoring system. To demonstrate effectiveness at debugging and steering LSD systems, the HiFi monitoring system has been implemented at the Old Dominion University for monitoring the Interactive Remote Instruction (IRI) system. The results from this case study validate that the HiFi system achieves the objectives outlined in this thesis

    Program Transformations for Asynchronous and Batched Query Submission

    Full text link
    The performance of database/Web-service backed applications can be significantly improved by asynchronous submission of queries/requests well ahead of the point where the results are needed, so that results are likely to have been fetched already when they are actually needed. However, manually writing applications to exploit asynchronous query submission is tedious and error-prone. In this paper we address the issue of automatically transforming a program written assuming synchronous query submission, to one that exploits asynchronous query submission. Our program transformation method is based on data flow analysis and is framed as a set of transformation rules. Our rules can handle query executions within loops, unlike some of the earlier work in this area. We also present a novel approach that, at runtime, can combine multiple asynchronous requests into batches, thereby achieving the benefits of batching in addition to that of asynchronous submission. We have built a tool that implements our transformation techniques on Java programs that use JDBC calls; our tool can be extended to handle Web service calls. We have carried out a detailed experimental study on several real-life applications, which shows the effectiveness of the proposed rewrite techniques, both in terms of their applicability and the performance gains achieved.Comment: 14 page

    Execução remota e offloading de computação em NDN

    Get PDF
    The way the Internet is currently used is completely different from how it was intended to be used, which results in a number of shortcomings for the current demands. To address IP’s deficiencies, new clean-slate architectures started to emerge. Information-Centric Networking (ICN) is one of them. It provides architectural advantages like resource naming, built-in caching and security, effective forwarding and others. One of the Internet usage segments that has expanded the most is edge computing. Applying new architectures to the edge computing has many benefits. The advantages of using Named Data Networking (NDN), an ICN implementation, in the context of edge computing were examined in this dissertation. The emphasis is on remote execution and offloading of functions to the network. A framework is proposed for enabling remote execution and its application in edge offloading in NDN. New methods for delivering input arguments to functions and obtaining the execution result were studied. The framework is validated and its performance is evaluated.A forma como a Internet Ă© usada atualmente Ă© completamente diferente de como ela se destinava a ser utilizada, o que resulta numa sĂ©rie de deficiĂȘncias para as exigĂȘncias atuais. Para resolver as deficiĂȘncias do IP, novas arquiteturas clean-slate estĂŁo a surgir. A Information-Centric Networking (ICN) Ă© uma delas. Oferece vantagens de arquitetura como recursos nomeados, cache e segurança integrados, encaminhamento eficaz e outros. Um dos segmentos de uso da Internet que mais se expandiu Ă© a computação na edge. A aplicação das novas arquiteturas Ă  computação na edge tem muitos benefĂ­cios. As vantagens de usar Named Data Networking (NDN), uma implementação de ICN, no contexto da computação na edge foram examinadas nesta dissertação. Execução remota e offloading de funçÔes para a rede sĂŁo focos principais deste trabalho. É proposto um framework para viabilizar a execução remota e a sua aplicação para offloading de funçÔes na edge em NDN. Novos mĂ©todos para entregar argumentos de entrada para funçÔes e obter o resultado da execução foram estudados. O sistema Ă© validado e o seu desempenho Ă© avaliado.Mestrado em Engenharia de Computadores e TelemĂĄtic
    • 

    corecore