375,456 research outputs found

    MonALISA : A Distributed Monitoring Service Architecture

    Full text link
    The MonALISA (Monitoring Agents in A Large Integrated Services Architecture) system provides a distributed monitoring service. MonALISA is based on a scalable Dynamic Distributed Services Architecture which is designed to meet the needs of physics collaborations for monitoring global Grid systems, and is implemented using JINI/JAVA and WSDL/SOAP technologies. The scalability of the system derives from the use of multithreaded Station Servers to host a variety of loosely coupled self-describing dynamic services, the ability of each service to register itself and then to be discovered and used by any other services, or clients that require such information, and the ability of all services and clients subscribing to a set of events (state changes) in the system to be notified automatically. The framework integrates several existing monitoring tools and procedures to collect parameters describing computational nodes, applications and network performance. It has built-in SNMP support and network-performance monitoring algorithms that enable it to monitor end-to-end network performance as well as the performance and state of site facilities in a Grid. MonALISA is currently running around the clock on the US CMS test Grid as well as an increasing number of other sites. It is also being used to monitor the performance and optimize the interconnections among the reflectors in the VRVS system.Comment: Talk from the 2003 Computing in High Energy and Nuclear Physics (CHEP03), La Jolla, Ca, USA, March 2003, 8 pages, pdf. PSN MOET00

    A Hierarchical Filtering-Based Monitoring Architecture for Large-scale Distributed Systems

    Get PDF
    On-line monitoring is essential for observing and improving the reliability and performance of large-scale distributed (LSD) systems. In an LSD environment, large numbers of events are generated by system components during their execution and interaction with external objects (e.g. users or processes). These events must be monitored to accurately determine the run-time behavior of an LSD system and to obtain status information that is required for debugging and steering applications. However, the manner in which events are generated in an LSD system is complex and represents a number of challenges for an on-line monitoring system. Correlated events axe generated concurrently and can occur at multiple locations distributed throughout the environment. This makes monitoring an intricate task and complicates the management decision process. Furthermore, the large number of entities and the geographical distribution inherent with LSD systems increases the difficulty of addressing traditional issues, such as performance bottlenecks, scalability, and application perturbation. This dissertation proposes a scalable, high-performance, dynamic, flexible and non-intrusive monitoring architecture for LSD systems. The resulting architecture detects and classifies interesting primitive and composite events and performs either a corrective or steering action. When appropriate, information is disseminated to management applications, such as reactive control and debugging tools. The monitoring architecture employs a novel hierarchical event filtering approach that distributes the monitoring load and limits event propagation. This significantly improves scalability and performance while minimizing the monitoring intrusiveness. The architecture provides dynamic monitoring capabilities through: subscription policies that enable applications developers to add, delete and modify monitoring demands on-the-fly, an adaptable configuration that accommodates environmental changes, and a programmable environment that facilitates development of self-directed monitoring tasks. Increased flexibility is achieved through a declarative and comprehensive monitoring language, a simple code instrumentation process, and automated monitoring administration. These elements substantially relieve the burden imposed by using on-line distributed monitoring systems. In addition, the monitoring system provides techniques to manage the trade-offs between various monitoring objectives. The proposed solution offers improvements over related works by presenting a comprehensive architecture that considers the requirements and implied objectives for monitoring large-scale distributed systems. This architecture is referred to as the HiFi monitoring system. To demonstrate effectiveness at debugging and steering LSD systems, the HiFi monitoring system has been implemented at the Old Dominion University for monitoring the Interactive Remote Instruction (IRI) system. The results from this case study validate that the HiFi system achieves the objectives outlined in this thesis

    An Architectural Framework for Performance Analysis: Supporting the Design, Configuration, and Control of DIS /HLA Simulations

    Get PDF
    Technology advances are providing greater capabilities for most distributed computing environments. However, the advances in capabilities are paralleled by progressively increasing amounts of system complexity. In many instances, this complexity can lead to a lack of understanding regarding bottlenecks in run-time performance of distributed applications. This is especially true in the domain of distributed simulations where a myriad of enabling technologies are used as building blocks to provide large-scale, geographically disperse, dynamic virtual worlds. Persons responsible for the design, configuration, and control of distributed simulations need to understand the impact of decisions made regarding the allocation and use of the logical and physical resources that comprise a distributed simulation environment and how they effect run-time performance. Distributed Interactive Simulation (DIS) and High Level Architecture (HLA) simulation applications historically provide some of the most demanding distributed computing environments in terms of performance, and as such have a justified need for performance information sufficient to support decision-makers trying to improve system behavior. This research addresses two fundamental questions: (1) Is there an analysis framework suitable for characterizing DIS and HLA simulation performance? and (2) what kind of mechanism can be used to adequately monitor, measure, and collect performance data to support different performance analysis objectives for DIS and HLA simulations? This thesis presents a unified, architectural framework for DIS and HLA simulations, provides details on a performance monitoring system, and shows its effectiveness through a series of use cases that include practical applications of the framework to support real-world U.S. Department of Defense (DoD) programs. The thesis also discusses the robustness of the constructed framework and its applicability to performance analysis of more general distributed computing applications

    Optimization of Optical Frequency-domain Reflectometry for Dynamic Structural Health Monitoring using Distributed Optical Fiber Sensors

    Get PDF
    The goal of this project is to optimize the existing optical frequency-domain reflectometry (OFDR) method to facilitate dynamic structural health monitoring using Distributed Optical Fiber Sensors (DOFS) under field conditions. DOFS are gaining interest in Structural Health Monitoring (SHM) applications, especially for large and irregular structures. These sensors offer a cost-effective solution that reveals temperature, strain, and vibration information from any point along the entire length of an optical fiber. However, one of the biggest challenges that hinder the wide implementation of DOFS is the dynamic monitoring capability under field conditions. Although several efforts have been made to improve the dynamic monitoring capability of DOFS using polarization-optical time-domain reflectometry (OTDR), OTDR is limited to a spatial resolution of ~1m. The cost to improve the spatial resolution of OTDR is very high and limits its suitability for a large range of structural monitoring applications. On the other hand, optical frequency-domain reflectometry (OFDR) technique offer high spatial resolution and easy setup for stationary measurements. If similar performance can be achieved under dynamic monitoring conditions, OFDR can be implemented in virtually any SHM application. To date, only preliminary studies have been performed under laboratory conditions to evaluate dynamic measurements using OFDR. Thus, this study aims at developing an optimized OFDR for dynamic monitoring using DOFS under field conditions. Advanced algorithms have been developed for spectral analysis along with new de-noising methods. A laboratory experimental program and field monitoring program were carried out to validate static and dynamic measurements with conventional sensors, respectively. The research related to OFDR-based dynamic monitoring is still in the early stages of development. Successful execution of this project gives ERAU a great advantage in our signature SHM field. Based on findings from this project, future research proposals will be submitted to the FDOT Structural Research Center, NCHRP Highway IDEA program, and the EPMD program of NSF

    Dynamic re-optimization techniques for stream processing engines and object stores

    Get PDF
    Large scale data storage and processing systems are strongly motivated by the need to store and analyze massive datasets. The complexity of a large class of these systems is rooted in their distributed nature, extreme scale, need for real-time response, and streaming nature. The use of these systems on multi-tenant, cloud environments with potential resource interference necessitates fine-grained monitoring and control. In this dissertation, we present efficient, dynamic techniques for re-optimizing stream-processing systems and transactional object-storage systems.^ In the context of stream-processing systems, we present VAYU, a per-topology controller. VAYU uses novel methods and protocols for dynamic, network-aware tuple-routing in the dataflow. We show that the feedback-driven controller in VAYU helps achieve high pipeline throughput over long execution periods, as it dynamically detects and diagnoses any pipeline-bottlenecks. We present novel heuristics to optimize overlays for group communication operations in the streaming model.^ In the context of object-storage systems, we present M-Lock, a novel lock-localization service for distributed transaction protocols on scale-out object stores to increase transaction throughput. Lock localization refers to dynamic migration and partitioning of locks across nodes in the scale-out store to reduce cross-partition acquisition of locks. The service leverages the observed object-access patterns to achieve lock-clustering and deliver high performance. We also present TransMR, a framework that uses distributed, transactional object stores to orchestrate and execute asynchronous components in amorphous data-parallel applications on scale-out architectures

    Cost-efficient Low Latency Communication Infrastructure for Synchrophasor Applications in Smart Grids

    Get PDF
    With the introduction of distributed renewable energy resources and new loads, such as electric vehicles, the power grid is evolving to become a highly dynamic system, that necessitates continuous and fine-grained observability of its operating conditions. In the context of the medium voltage (MV) grid, this has motivated the deployment of Phasor Measurement Units (PMUs), that offer high precision synchronized grid monitoring, enabling mission-critical applications such as fault detection/location. However, PMU-based applications present stringent delay requirements, raising a significant challenge to the communication infrastructure. In contrast to the high voltage domain, there is no clear vision for the communication and network topologies for the MV grid; a full fledged optical fiber-based communication infrastructure is a costly approach due to the density of PMUs required. In this work, we focus on the support of low-latency PMU-based applications in the MV domain, identifying and addressing the trade-off between communication infrastructure deployment costs and the corresponding performance. We study a large set of real MV grid topologies to get an in-depth understanding of the various key latency factors. Building on the gained insights, we propose three algorithms for the careful placement of high capacity links, targeting a balance between deployment costs and achieved latencies. Extensive simulations demonstrate that the proposed algorithms result in low-latency network topologies while reducing deployment costs by up to 80% in comparison to a ubiquitous deployment of costly high capacity links

    Advanced Wide-Area Monitoring System Design, Implementation, and Application

    Get PDF
    Wide-area monitoring systems (WAMSs) provide an unprecedented way to collect, store and analyze ultra-high-resolution synchrophasor measurements to improve the dynamic observability in power grids. This dissertation focuses on designing and implementing a wide-area monitoring system and a series of applications to assist grid operators with various functionalities. The contributions of this dissertation are below: First, a synchrophasor data collection system is developed to collect, store, and forward GPS-synchronized, high-resolution, rich-type, and massive-volume synchrophasor data. a distributed data storage system is developed to store the synchrophasor data. A memory-based cache system is discussed to improve the efficiency of real-time situation awareness. In addition, a synchronization system is developed to synchronize the configurations among the cloud nodes. Reliability and Fault-Tolerance of the developed system are discussed. Second, a novel lossy synchrophasor data compression approach is proposed. This section first introduces the synchrophasor data compression problem, then proposes a methodology for lossy data compression, and finally presents the evaluation results. The feasibility of the proposed approach is discussed. Third, a novel intelligent system, SynchroService, is developed to provide critical functionalities for a synchrophasor system. Functionalities including data query, event query, device management, and system authentication are discussed. Finally, the resiliency and the security of the developed system are evaluated. Fourth, a series of synchrophasor-based applications are developed to utilize the high-resolution synchrophasor data to assist power system engineers to monitor the performance of the grid as well as investigate the root cause of large power system disturbances. Lastly, a deep learning-based event detection and verification system is developed to provide accurate event detection functionality. This section introduces the data preprocessing, model design, and performance evaluation. Lastly, the implementation of the developed system is discussed
    • …
    corecore