3,763 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Addressing the Challenges in Federating Edge Resources

    Full text link
    This book chapter considers how Edge deployments can be brought to bear in a global context by federating them across multiple geographic regions to create a global Edge-based fabric that decentralizes data center computation. This is currently impractical, not only because of technical challenges, but is also shrouded by social, legal and geopolitical issues. In this chapter, we discuss two key challenges - networking and management in federating Edge deployments. Additionally, we consider resource and modeling challenges that will need to be addressed for a federated Edge.Comment: Book Chapter accepted to the Fog and Edge Computing: Principles and Paradigms; Editors Buyya, Sriram

    Data acquisition software for the CMS strip tracker

    Get PDF
    The CMS silicon strip tracker, providing a sensitive area of approximately 200 m2 and comprising 10 million readout channels, has recently been completed at the tracker integration facility at CERN. The strip tracker community is currently working to develop and integrate the online and offline software frameworks, known as XDAQ and CMSSW respectively, for the purposes of data acquisition and detector commissioning and monitoring. Recent developments have seen the integration of many new services and tools within the online data acquisition system, such as event building, online distributed analysis, an online monitoring framework, and data storage management. We review the various software components that comprise the strip tracker data acquisition system, the software architectures used for stand-alone and global data-taking modes. Our experiences in commissioning and operating one of the largest ever silicon micro-strip tracking systems are also reviewed

    Monitoring the CMS strip tracker readout system

    Get PDF
    The CMS Silicon Strip Tracker at the LHC comprises a sensitive area of approximately 200 m2 and 10 million readout channels. Its data acquisition system is based around a custom analogue front-end chip. Both the control and the readout of the front-end electronics are performed by off-detector VME boards in the counting room, which digitise the raw event data and perform zero-suppression and formatting. The data acquisition system uses the CMS online software framework to configure, control and monitor the hardware components and steer the data acquisition. The first data analysis is performed online within the official CMS reconstruction framework, which provides many services, such as distributed analysis, access to geometry and conditions data, and a Data Quality Monitoring tool based on the online physics reconstruction. The data acquisition monitoring of the Strip Tracker uses both the data acquisition and the reconstruction software frameworks in order to provide real-time feedback to shifters on the operational state of the detector, archiving for later analysis and possibly trigger automatic recovery actions in case of errors. Here we review the proposed architecture of the monitoring system and we describe its software components, which are already in place, the various monitoring streams available, and our experiences of operating and monitoring a large-scale system

    Many-Task Computing and Blue Waters

    Full text link
    This report discusses many-task computing (MTC) generically and in the context of the proposed Blue Waters systems, which is planned to be the largest NSF-funded supercomputer when it begins production use in 2012. The aim of this report is to inform the BW project about MTC, including understanding aspects of MTC applications that can be used to characterize the domain and understanding the implications of these aspects to middleware and policies. Many MTC applications do not neatly fit the stereotypes of high-performance computing (HPC) or high-throughput computing (HTC) applications. Like HTC applications, by definition MTC applications are structured as graphs of discrete tasks, with explicit input and output dependencies forming the graph edges. However, MTC applications have significant features that distinguish them from typical HTC applications. In particular, different engineering constraints for hardware and software must be met in order to support these applications. HTC applications have traditionally run on platforms such as grids and clusters, through either workflow systems or parallel programming systems. MTC applications, in contrast, will often demand a short time to solution, may be communication intensive or data intensive, and may comprise very short tasks. Therefore, hardware and software for MTC must be engineered to support the additional communication and I/O and must minimize task dispatch overheads. The hardware of large-scale HPC systems, with its high degree of parallelism and support for intensive communication, is well suited for MTC applications. However, HPC systems often lack a dynamic resource-provisioning feature, are not ideal for task communication via the file system, and have an I/O system that is not optimized for MTC-style applications. Hence, additional software support is likely to be required to gain full benefit from the HPC hardware

    Metadata for ATLAS

    Get PDF
    This document provides an overview of the metadata, which are needed to characterize ATLAS event data at different levels (a complete run, data streams within a run, luminosity blocks within a run, individual events)

    Software Defined Networks based Smart Grid Communication: A Comprehensive Survey

    Get PDF
    The current power grid is no longer a feasible solution due to ever-increasing user demand of electricity, old infrastructure, and reliability issues and thus require transformation to a better grid a.k.a., smart grid (SG). The key features that distinguish SG from the conventional electrical power grid are its capability to perform two-way communication, demand side management, and real time pricing. Despite all these advantages that SG will bring, there are certain issues which are specific to SG communication system. For instance, network management of current SG systems is complex, time consuming, and done manually. Moreover, SG communication (SGC) system is built on different vendor specific devices and protocols. Therefore, the current SG systems are not protocol independent, thus leading to interoperability issue. Software defined network (SDN) has been proposed to monitor and manage the communication networks globally. This article serves as a comprehensive survey on SDN-based SGC. In this article, we first discuss taxonomy of advantages of SDNbased SGC.We then discuss SDN-based SGC architectures, along with case studies. Our article provides an in-depth discussion on routing schemes for SDN-based SGC. We also provide detailed survey of security and privacy schemes applied to SDN-based SGC. We furthermore present challenges, open issues, and future research directions related to SDN-based SGC.Comment: Accepte

    Online Fault Classification in HPC Systems through Machine Learning

    Full text link
    As High-Performance Computing (HPC) systems strive towards the exascale goal, studies suggest that they will experience excessive failure rates. For this reason, detecting and classifying faults in HPC systems as they occur and initiating corrective actions before they can transform into failures will be essential for continued operation. In this paper, we propose a fault classification method for HPC systems based on machine learning that has been designed specifically to operate with live streamed data. We cast the problem and its solution within realistic operating constraints of online use. Our results show that almost perfect classification accuracy can be reached for different fault types with low computational overhead and minimal delay. We have based our study on a local dataset, which we make publicly available, that was acquired by injecting faults to an in-house experimental HPC system.Comment: Accepted for publication at the Euro-Par 2019 conferenc
    corecore