40 research outputs found

    HEC: Collaborative Research: SAM^2 Toolkit: Scalable and Adaptive Metadata Management for High-End Computing

    Get PDF
    The increasing demand for Exa-byte-scale storage capacity by high end computing applications requires a higher level of scalability and dependability than that provided by current file and storage systems. The proposal deals with file systems research for metadata management of scalable cluster-based parallel and distributed file storage systems in the HEC environment. It aims to develop a scalable and adaptive metadata management (SAM2) toolkit to extend features of and fully leverage the peak performance promised by state-of-the-art cluster-based parallel and distributed file storage systems used by the high performance computing community. There is a large body of research on data movement and management scaling, however, the need to scale up the attributes of cluster-based file systems and I/O, that is, metadata, has been underestimated. An understanding of the characteristics of metadata traffic, and an application of proper load-balancing, caching, prefetching and grouping mechanisms to perform metadata management correspondingly, will lead to a high scalability. It is anticipated that by appropriately plugging the scalable and adaptive metadata management components into the state-of-the-art cluster-based parallel and distributed file storage systems one could potentially increase the performance of applications and file systems, and help translate the promise and potential of high peak performance of such systems to real application performance improvements. The project involves the following components: 1. Develop multi-variable forecasting models to analyze and predict file metadata access patterns. 2. Develop scalable and adaptive file name mapping schemes using the duplicative Bloom filter array technique to enforce load balance and increase scalability 3. Develop decentralized, locality-aware metadata grouping schemes to facilitate the bulk metadata operations such as prefetching. 4. Develop an adaptive cache coherence protocol using a distributed shared object model for client-side and server-side metadata caching. 5. Prototype the SAM2 components into the state-of-the-art parallel virtual file system PVFS2 and a distributed storage data caching system, set up an experimental framework for a DOE CMS Tier 2 site at University of Nebraska-Lincoln and conduct benchmark, evaluation and validation studies

    Metadata And Data Management In High Performance File And Storage Systems

    Get PDF
    With the advent of emerging e-Science applications, today\u27s scientific research increasingly relies on petascale-and-beyond computing over large data sets of the same magnitude. While the computational power of supercomputers has recently entered the era of petascale, the performance of their storage system is far lagged behind by many orders of magnitude. This places an imperative demand on revolutionizing their underlying I/O systems, on which the management of both metadata and data is deemed to have significant performance implications. Prefetching/caching and data locality awareness optimizations, as conventional and effective management techniques for metadata and data I/O performance enhancement, still play their crucial roles in current parallel and distributed file systems. In this study, we examine the limitations of existing prefetching/caching techniques and explore the untapped potentials of data locality optimization techniques in the new era of petascale computing. For metadata I/O access, we propose a novel weighted-graph-based prefetching technique, built on both direct and indirect successor relationship, to reap performance benefit from prefetching specifically for clustered metadata serversan arrangement envisioned necessary for petabyte scale distributed storage systems. For data I/O access, we design and implement Segment-structured On-disk data Grouping and Prefetching (SOGP), a combined prefetching and data placement technique to boost the local data read performance for parallel file systems, especially for those applications with partially overlapped access patterns. One high-performance local I/O software package in SOGP work for Parallel Virtual File System in the number of about 2000 C lines was released to Argonne National Laboratory in 2007 for potential integration into the production mode

    Bloom Filters, Adaptivity, and the Dictionary Problem

    Get PDF
    The Bloom filter---or, more generally, an approximate membership query data structure (AMQ)---maintains a compact, probabilistic representation of a set S of keys from a universe U. An AMQ supports lookups, inserts, and (for some AMQs) deletes. A query for an x in S is guaranteed to return "present." A query for x not in S returns "absent" with probability at least 1-epsilon, where epsilon is a tunable false positive probability. If a query returns "present," but x is not in S, then x is a false positive of the AMQ. Because AMQs have a nonzero probability of false-positives, they require far less space than explicit set representations. AMQs are widely used to speed up dictionaries that are stored remotely (e.g., on disk/across a network). Most AMQs offer weak guarantees on the number of false positives they will return on a sequence of queries. The false-positive probability of epsilon holds only for a single query. It is easy for an adversary to drive an AMQ's false-positive rate towards 1 by simply repeating false positives. This paper shows what it takes to get strong guarantees on the number of false positives. We say that an AMQs is adaptive if it guarantees a false-positive probability of epsilon for every query, regardless of answers to previous queries. First, we prove that it is impossible to build a small adaptive AMQ, even when the AMQ is immediately told whenever it returns a false positive. We then show how to build an adaptive AMQ that partitions its state into a small local component and a larger remote component. In addition to being adaptive, the local component of our AMQ dominates existing AMQs in all regards. It uses optimal space up to lower-order terms and supports queries and updates in worst-case constant time, with high probability. Thus, we show that adaptivity has no cost

    Bloom Filters in Adversarial Environments

    Get PDF
    Many efficient data structures use randomness, allowing them to improve upon deterministic ones. Usually, their efficiency and correctness are analyzed using probabilistic tools under the assumption that the inputs and queries are independent of the internal randomness of the data structure. In this work, we consider data structures in a more robust model, which we call the adversarial model. Roughly speaking, this model allows an adversary to choose inputs and queries adaptively according to previous responses. Specifically, we consider a data structure known as "Bloom filter" and prove a tight connection between Bloom filters in this model and cryptography. A Bloom filter represents a set SS of elements approximately, by using fewer bits than a precise representation. The price for succinctness is allowing some errors: for any xSx \in S it should always answer `Yes', and for any xSx \notin S it should answer `Yes' only with small probability. In the adversarial model, we consider both efficient adversaries (that run in polynomial time) and computationally unbounded adversaries that are only bounded in the number of queries they can make. For computationally bounded adversaries, we show that non-trivial (memory-wise) Bloom filters exist if and only if one-way functions exist. For unbounded adversaries we show that there exists a Bloom filter for sets of size nn and error ε\varepsilon, that is secure against tt queries and uses only O(nlog1ε+t)O(n \log{\frac{1}{\varepsilon}}+t) bits of memory. In comparison, nlog1εn\log{\frac{1}{\varepsilon}} is the best possible under a non-adaptive adversary

    Drug development progress in duchenne muscular dystrophy

    Get PDF
    Duchenne muscular dystrophy (DMD) is a severe, progressive, and incurable X-linked disorder caused by mutations in the dystrophin gene. Patients with DMD have an absence of functional dystrophin protein, which results in chronic damage of muscle fibers during contraction, thus leading to deterioration of muscle quality and loss of muscle mass over time. Although there is currently no cure for DMD, improvements in treatment care and management could delay disease progression and improve quality of life, thereby prolonging life expectancy for these patients. Furthermore, active research efforts are ongoing to develop therapeutic strategies that target dystrophin deficiency, such as gene replacement therapies, exon skipping, and readthrough therapy, as well as strategies that target secondary pathology of DMD, such as novel anti-inflammatory compounds, myostatin inhibitors, and cardioprotective compounds. Furthermore, longitudinal modeling approaches have been used to characterize the progression of MRI and functional endpoints for predictive purposes to inform Go/No Go decisions in drug development. This review showcases approved drugs or drug candidates along their development paths and also provides information on primary endpoints and enrollment size of Ph2/3 and Ph3 trials in the DMD space
    corecore