317 research outputs found

    Extending DIRAC File Management with Erasure-Coding for efficient storage

    Get PDF
    The state of the art in Grid style data management is to achieve increased resilience of data via multiple complete replicas of data files across multiple storage endpoints. While this is effective, it is not the most space-efficient approach to resilience, especially when the reliability of individual storage endpoints is sufficiently high that only a few will be inactive at any point in time. We report on work performed as part of GridPP\cite{GridPP}, extending the Dirac File Catalogue and file management interface to allow the placement of erasure-coded files: each file distributed as N identically-sized chunks of data striped across a vector of storage endpoints, encoded such that any M chunks can be lost and the original file can be reconstructed. The tools developed are transparent to the user, and, as well as allowing up and downloading of data to Grid storage, also provide the possibility of parallelising access across all of the distributed chunks at once, improving data transfer and IO performance. We expect this approach to be of most interest to smaller VOs, who have tighter bounds on the storage available to them, but larger (WLCG) VOs may be interested as their total data increases during Run 2. We provide an analysis of the costs and benefits of the approach, along with future development and implementation plans in this area. In general, overheads for multiple file transfers provide the largest issue for competitiveness of this approach at present.Comment: 21st International Conference on Computing for High Energy and Nuclear Physics (CHEP2015

    Monitoring in a grid cluster

    Get PDF
    The monitoring of a grid cluster (or of any piece of reasonably scaled IT infrastructure) is a key element in the robust and consistent running of that site. There are several factors which are important to the selection of a useful monitoring framework, which include ease of use, reliability, data input and output. It is critical that data can be drawn from different instrumentation packages and collected in the framework to allow for a uniform view of the running of a site. It is also very useful to allow different views and transformations of this data to allow its manipulation for different purposes, perhaps unknown at the initial time of installation. In this context, we present the findings of an investigation of the Graphite monitoring framework and its use at the ScotGrid Glasgow site. In particular, we examine the messaging system used by the framework and means to extract data from different tools, including the existing framework Ganglia which is in use at many sites, in addition to adapting and parsing data streams from external monitoring frameworks and websites

    Enabling object storage via shims for grid middleware

    Get PDF
    The Object Store model has quickly become the basis of most commercially successful mass storage infrastructure, backing so-called "Cloud" storage such as Amazon S3, but also underlying the implementation of most parallel distributed storage systems. Many of the assumptions in Object Store design are similar, but not identical, to concepts in the design of Grid Storage Elements, although the requirement for "POSIX-like" filesystem structures on top of SEs makes the disjunction seem larger. As modern Object Stores provide many features that most Grid SEs do not (block level striping, parallel access, automatic file repair, etc.), it is of interest to see how easily we can provide interfaces to typical Object Stores via plugins and shims for Grid tools, and how well experiments can adapt their data models to them. We present evaluation of, and first-deployment experiences with, (for example) Xrootd-Ceph interfaces for direct object-store access, as part of an initiative within GridPP[1] hosted at RAL. Additionally, we discuss the tradeoffs and experience of developing plugins for the currently-popular Ceph parallel distributed filesystem for the GFAL2 access layer, at Glasgow

    A voyage to Arcturus: a model for automated management of a WLCG Tier-2 facility

    Get PDF
    With the current trend towards "On Demand Computing" in big data environments it is crucial that the deployment of services and resources becomes increasingly automated. Deployment based on cloud platforms is available for large scale data centre environments but these solutions can be too complex and heavyweight for smaller, resource constrained WLCG Tier-2 sites. Along with a greater desire for bespoke monitoring and collection of Grid related metrics, a more lightweight and modular approach is desired. In this paper we present a model for a lightweight automated framework which can be use to build WLCG grid sites, based on "off the shelf" software components. As part of the research into an automation framework the use of both IPMI and SNMP for physical device management will be included, as well as the use of SNMP as a monitoring/data sampling layer such that more comprehensive decision making can take place and potentially be automated. This could lead to reduced down times and better performance as services are recognised to be in a non-functional state by autonomous systems

    In-flight simulation of high agility through active control: Taming complexity by design

    Get PDF
    The motivation for research into helicopter agility stems from the realization that marked improvements relative to current operational types are possible, yet there is a dearth of useful criteria for flying qualities at high performance levels. Several research laboratories are currently investing resources in developing second generation airborne rotorcraft simulators. The UK's focus has been the exploitation of agility through active control technology (ACT); this paper reviews the results of studies conducted to date. The conflict between safety and performance in flight research is highlighted and the various forms of safety net to protect against system failures are described. The role of the safety pilot, and the use of actuator and flight envelope limiting are discussed. It is argued that the deep complexity of a research ACT system can only be tamed through a requirement specification assembled using design principles and cast in an operational simulation form. Work along these lines conducted at DRA is described, including the use of the Jackson System Development method and associated Ada simulation

    Evaluation of containers as a virtualisation alternative for HEP workloads

    Get PDF
    In this paper the emerging technology of Linux containers is examined and evaluated for use in the High Energy Physics (HEP) community. Key technologies required to enable containerisation will be discussed along with emerging technologies used to manage container images. An evaluation of the requirements for containers within HEP will be made and benchmarking will be carried out to asses performance over a range of HEP workflows. The use of containers will be placed in a broader context and recommendations on future work will be given

    Simulation of intrinsic parameter fluctuations in nano-CMOS devices

    Get PDF
    As devices are scaled to gate lengths of sub 100 nm the effects of intrinsic parameter fluctuations will become increasingly important.This work presents a systematic simulation study of intrinsic parameter fluctuations, consisting of random dopant fluctations, line edge roughness and oxide thickness fluctuations, in a real 35 nm MOSFET developed by Toshiba. The simulations are calibrated against experimental data for the real device and it is found that discrete random dopants have the greatest impact on both the threshold voltage and leakage current fluctuations with a σVT of 33.2mV and a percentage increase in the average leakage current of 50%. Line edge roughness has the second greatest impact with a σVT of 19mV and percentage increase in the average leakage current of 45.5%. The smallest impact is caused by oxide thickness variations resulting in a σVT of 1.8mV and a 13% increase in the average leakage current. The combined effects of pairs of fluctuations is also studied, showing that these sources of intrinsic parameter fluctuations are statistically independent and a calculated σVT of 39mV is given for all of the sources combined. This value is on par with that reported in literature for the 90 nm technology node

    Multiple interface management and flow mobility in next generation networks

    Get PDF
    Includes bibliographical references (leaves 79-80).Next Generation networks will consist of a number of different access networks interconnected to provide ubiquitous access to the global resources available on the Internet. The coverage of these access networks will also overlap, allowing users a choice of access net-works. Increasingly, mobile devices have more than one type of radio access interface built-in. In current mobile devices, a single primary radio interface performs all communications with the service provider. The availability of multiple different radio interfaces proves most beneficial if all these interfaces can connect with the service provider and carry data in collaboration or individually. This means that a control system is needed to route the correct traffic over each different interface, depending on the requirements of that traffic. Having multiple interfaces available provides the opportunity to aggregate two or more interfaces for faster transfer speeds and can provide redundancy. If one interface is expe-riencing high packet loss or no coverage an alternate interface will be available. Multiple interface schemes aim to enable traditional networks to support devices with more than one interface. This is usually achieved by introducing a new agent into the network architecture that acts as the packet redirection point. Incoming packet flows are routed to the different interfaces of the mobile device by this agent according to the traffic types of each packet flow. In this thesis an evaluation platform is developed to investigate whether the possible functionality of a multiple interfaced device provides useful traffic routing options. The evaluation platform consists of three key components evident in schemes from the literature, namely a Corresponding Node, Mobile Node and Router. The Router is emulated with a script-based routing software and configured as the packet redirection point in the evaluation platform. Four test scenarios emulate traffic travelling over two interfaces of a practical mobile node. A mid-flow handover from one interface to the other is investigated to determine that this process can be seamless under certain conditions. Dual Interface Aggregation shows good performance when the limits of each interface are not exceeded. Distinct improvement in combined packet loss of two lossy links carrying duplicate packet streams shows that two interfaces can provide a reliable link in critical situations where both interfaces have poor performance when used separately. Finally, a Bandwidth-on-Demand scenario shows that having two interfaces can allow automatic bandwidth allocation when data-rate is increased beyond the limits of one interface

    Storageless and caching Tier-2 models in the UK context

    Get PDF
    Operational and other pressures have lead to WLCG experiments moving increasingly to a stratified model for Tier-2 resources, where ``fat" Tier-2s (``T2Ds") and ``thin" Tier-2s (``T2Cs") provide different levels of service. In the UK, this distinction is also encouraged by the terms of the current GridPP5 funding model. In anticipation of this, testing has been performed on the implications, and potential implementation, of such a distinction in our resources. In particular, this presentation presents the results of testing of storage T2Cs, where the ``thin" nature is expressed by the site having either no local data storage, or only a thin caching layer; data is streamed or copied from a ``nearby" T2D when needed by jobs. In OSG, this model has been adopted successfully for CMS AAA sites; but the network topology and capacity in the USA is significantly different to that in the UK (and much of Europe). We present the result of several operational tests: the in-production University College London (UCL) site, which runs ATLAS workloads using storage at the Queen Mary University of London (QMUL) site; the Oxford site, which has had scaling tests performed against T2Ds in various locations in the UK (to test network effects); and the Durham site, which has been testing the specific ATLAS caching solution of ``Rucio Cache" integration with ARC's caching layer
    • …
    corecore