766,383 research outputs found

    Coding for Fast Content Download

    Full text link
    We study the fundamental trade-off between storage and content download time. We show that the download time can be significantly reduced by dividing the content into chunks, encoding it to add redundancy and then distributing it across multiple disks. We determine the download time for two content access models - the fountain and fork-join models that involve simultaneous content access, and individual access from enqueued user requests respectively. For the fountain model we explicitly characterize the download time, while in the fork-join model we derive the upper and lower bounds. Our results show that coding reduces download time, through the diversity of distributing the data across more disks, even for the total storage used.Comment: 8 pages, 6 figures, conferenc

    Enabling object storage via shims for grid middleware

    Get PDF
    The Object Store model has quickly become the basis of most commercially successful mass storage infrastructure, backing so-called "Cloud" storage such as Amazon S3, but also underlying the implementation of most parallel distributed storage systems. Many of the assumptions in Object Store design are similar, but not identical, to concepts in the design of Grid Storage Elements, although the requirement for "POSIX-like" filesystem structures on top of SEs makes the disjunction seem larger. As modern Object Stores provide many features that most Grid SEs do not (block level striping, parallel access, automatic file repair, etc.), it is of interest to see how easily we can provide interfaces to typical Object Stores via plugins and shims for Grid tools, and how well experiments can adapt their data models to them. We present evaluation of, and first-deployment experiences with, (for example) Xrootd-Ceph interfaces for direct object-store access, as part of an initiative within GridPP[1] hosted at RAL. Additionally, we discuss the tradeoffs and experience of developing plugins for the currently-popular Ceph parallel distributed filesystem for the GFAL2 access layer, at Glasgow

    Analysing I/O bottlenecks in LHC data analysis on grid storage resources

    Get PDF
    We describe recent I/O testing frameworks that we have developed and applied within the UK GridPP Collaboration, the ATLAS experiment and the DPM team, for a variety of distinct purposes. These include benchmarking vendor supplied storage products, discovering scaling limits of SRM solutions, tuning of storage systems for experiment data analysis, evaluating file access protocols, and exploring I/O read patterns of experiment software and their underlying event data models. With multiple grid sites now dealing with petabytes of data, such studies are becoming essential. We describe how the tests build, and improve, on previous work and contrast how the use-cases differ. We also detail the results obtained and the implications for storage hardware, middleware and experiment software

    Performance models of access latency in cloud storage systems

    Get PDF
    Access latency is a key performance metric for cloud storage systems and has great impact on user experience, but most papers focus on other performance metrics such as storage overhead, repair cost and so on. Only recently do some models argue that coding can reduce access latency. However, they are developed for special scenarios, which may not reflect reality. To fill the gaps between existing work and practice, in this paper, we propose a more practical model to measure access latency. This model can also be used to compare access latency of different codes used by different companies. To the best of our knowledge, this model is the first to provide a general method to compare access latencies of different erasure codes.postprin

    Predicting I/O performance in HPC using artificial neural networks

    Get PDF
    The prediction of file access times is an important part for the modeling of supercomputer's storage systems. These models can be used to develop analysis tools which support the users to integrate efficient I/O behavior. In this paper, we analyze and predict the access times of a Lustre file system from the client perspective. Therefore, we measure file access times in various test series and developed different models for predicting access times. The evaluation shows that in models utilizing artificial neural networks the average prediciton error is about 30% smaller than in linear models. A phenomenon in the distribution of file access times is of particular interest: File accesses with identical parameters show several typical access times.The typical access times usually differ by orders of magnitude and can be explained with a different processing of the file accesses in the storage system - an alternative I/O path. We investigate a method to automatically determine the alternative I/O path and quantify the significance of knowledge about the internal processing. It is shown that the prediction error is improved significantly with this approach
    • …
    corecore