4,618 research outputs found
GMA Instrumentation of the Athena Framework using NetLogger
Grid applications are, by their nature, wide-area distributed applications.
This WAN aspect of Grid applications makes the use of conventional monitoring
and instrumentation tools (such as top, gprof, LSF Monitor, etc) impractical
for verification that the application is running correctly and efficiently. To
be effective, monitoring data must be "end-to-end", meaning that all components
between the Grid application endpoints must be monitored. Instrumented
applications can generate a large amount of monitoring data, so typically the
instrumentation is off by default. For jobs running on a Grid, there needs to
be a general mechanism to remotely activate the instrumentation in running
jobs. The NetLogger Toolkit Activation Service provides this mechanism.
To demonstrate this, we have instrumented the ATLAS Athena Framework with
NetLogger to generate monitoring events. We then use a GMA-based activation
service to control NetLogger's trigger mechanism. The NetLogger trigger
mechanism allows one to easily start, stop, or change the logging level of a
running program by modifying a trigger file. We present here details of the
design of the NetLogger implementation of the GMA-based activation service and
the instrumentation service for Athena. We also describe how this activation
service allows us to non-intrusively collect and visualize the ATLAS Athena
Framework monitoring data
Design and Evaluation of a Collective IO Model for Loosely Coupled Petascale Programming
Loosely coupled programming is a powerful paradigm for rapidly creating
higher-level applications from scientific programs on petascale systems,
typically using scripting languages. This paradigm is a form of many-task
computing (MTC) which focuses on the passing of data between programs as
ordinary files rather than messages. While it has the significant benefits of
decoupling producer and consumer and allowing existing application programs to
be executed in parallel with no recoding, its typical implementation using
shared file systems places a high performance burden on the overall system and
on the user who will analyze and consume the downstream data. Previous efforts
have achieved great speedups with loosely coupled programs, but have done so
with careful manual tuning of all shared file system access. In this work, we
evaluate a prototype collective IO model for file-based MTC. The model enables
efficient and easy distribution of input data files to computing nodes and
gathering of output results from them. It eliminates the need for such manual
tuning and makes the programming of large-scale clusters using a loosely
coupled model easier. Our approach, inspired by in-memory approaches to
collective operations for parallel programming, builds on fast local file
systems to provide high-speed local file caches for parallel scripts, uses a
broadcast approach to handle distribution of common input data, and uses
efficient scatter/gather and caching techniques for input and output. We
describe the design of the prototype model, its implementation on the Blue
Gene/P supercomputer, and present preliminary measurements of its performance
on synthetic benchmarks and on a large-scale molecular dynamics application.Comment: IEEE Many-Task Computing on Grids and Supercomputers (MTAGS08) 200
A study of publish/subscribe systems for real-time grid monitoring
Monitoring and controlling a large number of geographically distributed scientific instruments is a challenging task. Some operations on these instruments require real-time (or quasi real-time) response which make it even more difficult. In this paper, we describe the requirements of distributed monitoring for a possible future electrical power grid based on real-time extensions to grid computing. We examine several standards and publish/subscribe middleware candidates, some of which were specially designed and developed for grid monitoring. We analyze their architecture and functionality, and discuss the advantages and disadvantages. We report on a series of tests to measure their real-time performance and scalability
Proximity coherence for chip-multiprocessors
Many-core architectures provide an efficient way of harnessing the growing numbers of transistors available in modern fabrication processes; however, the parallel programs run on these platforms are increasingly limited by the energy and latency costs of communication. Existing designs provide a functional communication layer but do not necessarily implement the most efficient solution for chip-multiprocessors, placing limits on the performance of these complex systems. In an era of increasingly power limited silicon design, efficiency is now a primary concern that motivates designers to look again at the challenge of cache coherence.
The first step in the design process is to analyse the communication behaviour of parallel benchmark suites such as Parsec and SPLASH-2. This thesis presents work detailing the sharing patterns observed when running the full benchmarks on a simulated 32-core x86 machine. The results reveal considerable locality of shared data accesses between threads with consecutive operating system assigned thread IDs. This pattern, although of little consequence in a multi-node system, corresponds to strong physical locality of shared data between adjacent cores on a chip-multiprocessor platform.
Traditional cache coherence protocols, although often used in chip-multiprocessor designs, have been developed in the context of older multi-node systems. By redesigning coherence protocols to exploit new patterns such as the physical locality of shared data, improving the efficiency of communication, specifically in chip-multiprocessors, is possible. This thesis explores such a design â Proximity Coherence â a novel scheme in which L1 load misses are optimistically forwarded to nearby caches via new dedicated links rather than always being indirected via a directory structure.EPSRC DTA research scholarshi
The home-forwarding mechanism to reduce the cache coherence overhead in next-generation CMPs
On the road to computer systems able to support the requirements of exascale applications, Chip Multi-Processors (CMPs) are equipped with an ever increasing number of cores interconnected through fast on-chip networks. To exploit such new architectures, the parallel software must be able to scale almost linearly with the number of cores available. To this end, the overhead introduced by the run-time system of parallel programming frameworks and by the architecture itself must be small enough in order to enable high scalability also for very fine-grained parallel programs. An approach to reduce this overhead is to use non-conventional architectural mechanisms revealing useful when certain concurrency patterns in the running application are statically or dynamically recognized. Following this idea, this paper proposes a run-time support able to reduce the effective latency of inter-thread cooperation primitives by lowering the contention on individual caches. To achieve this goal, the new home-forwarding hardware mechanism is proposed and used by our runtime in order to reduce the amount of cache-to-cache interactions generated by the cache coherence protocol. Our ideas have been emulated on the Tilera TILEPro64 CMP, showing a significant speedup improvement in some first benchmarks
A Hierarchical Filtering-Based Monitoring Architecture for Large-scale Distributed Systems
On-line monitoring is essential for observing and improving the reliability and performance of large-scale distributed (LSD) systems. In an LSD environment, large numbers of events are generated by system components during their execution and interaction with external objects (e.g. users or processes). These events must be monitored to accurately determine the run-time behavior of an LSD system and to obtain status information that is required for debugging and steering applications. However, the manner in which events are generated in an LSD system is complex and represents a number of challenges for an on-line monitoring system. Correlated events axe generated concurrently and can occur at multiple locations distributed throughout the environment. This makes monitoring an intricate task and complicates the management decision process. Furthermore, the large number of entities and the geographical distribution inherent with LSD systems increases the difficulty of addressing traditional issues, such as performance bottlenecks, scalability, and application perturbation.
This dissertation proposes a scalable, high-performance, dynamic, flexible and non-intrusive monitoring architecture for LSD systems. The resulting architecture detects and classifies interesting primitive and composite events and performs either a corrective or steering action. When appropriate, information is disseminated to management applications, such as reactive control and debugging tools.
The monitoring architecture employs a novel hierarchical event filtering approach that distributes the monitoring load and limits event propagation. This significantly improves scalability and performance while minimizing the monitoring intrusiveness. The architecture provides dynamic monitoring capabilities through: subscription policies that enable applications developers to add, delete and modify monitoring demands on-the-fly, an adaptable configuration that accommodates environmental changes, and a programmable environment that facilitates development of self-directed monitoring tasks. Increased flexibility is achieved through a declarative and comprehensive monitoring language, a simple code instrumentation process, and automated monitoring administration. These elements substantially relieve the burden imposed by using on-line distributed monitoring systems. In addition, the monitoring system provides techniques to manage the trade-offs between various monitoring objectives.
The proposed solution offers improvements over related works by presenting a comprehensive architecture that considers the requirements and implied objectives for monitoring large-scale distributed systems. This architecture is referred to as the HiFi monitoring system.
To demonstrate effectiveness at debugging and steering LSD systems, the HiFi monitoring system has been implemented at the Old Dominion University for monitoring the Interactive Remote Instruction (IRI) system. The results from this case study validate that the HiFi system achieves the objectives outlined in this thesis
Program Transformations for Asynchronous and Batched Query Submission
The performance of database/Web-service backed applications can be
significantly improved by asynchronous submission of queries/requests well
ahead of the point where the results are needed, so that results are likely to
have been fetched already when they are actually needed. However, manually
writing applications to exploit asynchronous query submission is tedious and
error-prone. In this paper we address the issue of automatically transforming a
program written assuming synchronous query submission, to one that exploits
asynchronous query submission. Our program transformation method is based on
data flow analysis and is framed as a set of transformation rules. Our rules
can handle query executions within loops, unlike some of the earlier work in
this area. We also present a novel approach that, at runtime, can combine
multiple asynchronous requests into batches, thereby achieving the benefits of
batching in addition to that of asynchronous submission. We have built a tool
that implements our transformation techniques on Java programs that use JDBC
calls; our tool can be extended to handle Web service calls. We have carried
out a detailed experimental study on several real-life applications, which
shows the effectiveness of the proposed rewrite techniques, both in terms of
their applicability and the performance gains achieved.Comment: 14 page
Execução remota e offloading de computação em NDN
The way the Internet is currently used is completely different from how it
was intended to be used, which results in a number of shortcomings for the
current demands. To address IPâs deficiencies, new clean-slate architectures
started to emerge. Information-Centric Networking (ICN) is one of them.
It provides architectural advantages like resource naming, built-in caching
and security, effective forwarding and others. One of the Internet usage
segments that has expanded the most is edge computing. Applying new
architectures to the edge computing has many benefits. The advantages
of using Named Data Networking (NDN), an ICN implementation, in the
context of edge computing were examined in this dissertation. The emphasis
is on remote execution and offloading of functions to the network. A
framework is proposed for enabling remote execution and its application in
edge offloading in NDN. New methods for delivering input arguments to
functions and obtaining the execution result were studied. The framework
is validated and its performance is evaluated.A forma como a Internet Ă© usada atualmente Ă© completamente diferente de
como ela se destinava a ser utilizada, o que resulta numa sĂ©rie de deficiĂȘncias
para as exigĂȘncias atuais. Para resolver as deficiĂȘncias do IP, novas arquiteturas
clean-slate estĂŁo a surgir. A Information-Centric Networking (ICN)
Ă© uma delas. Oferece vantagens de arquitetura como recursos nomeados,
cache e segurança integrados, encaminhamento eficaz e outros. Um dos
segmentos de uso da Internet que mais se expandiu é a computação na
edge. A aplicação das novas arquiteturas à computação na edge tem muitos
benefĂcios. As vantagens de usar Named Data Networking (NDN), uma implementação de ICN, no contexto da computação na edge foram examinadas
nesta dissertação. Execução remota e offloading de funçÔes para a rede são
focos principais deste trabalho. Ă proposto um framework para viabilizar a
execução remota e a sua aplicação para offloading de funçÔes na edge em
NDN. Novos métodos para entregar argumentos de entrada para funçÔes e
obter o resultado da execução foram estudados. O sistema é validado e o
seu desempenho Ă© avaliado.Mestrado em Engenharia de Computadores e TelemĂĄtic
- âŠ