10 research outputs found

    Space-Efficient Predictive Block Management

    Get PDF
    With growing disk and storage capacities, the amount of required metadata for tracking all blocks in a system becomes a daunting task by itself. In previous work, we have demonstrated a system software effort in the area of predictive data grouping for reducing power and latency on hard disks. The structures used, very similar to prior efforts in prefetching and prefetch caching, track access successor information at the block level, keeping a fixed number of immediate successors per block. While providing powerful predictive expansion capabilities and being more space efficient in the amount of required metadata than many previous strategies, there remains a growing concern of how much data is actually required. In this paper, we present a novel method of storing equivalent information, SESH, a Space Efficient Storage of Heredity. This method utilizes the high amount of block-level predictability observed in a number of workload trace sets to reduce the overall metadata storage by up to 99% without any loss of information. As a result, we are able to provide a predictive tool that is adaptive, accurate, and robust in the face of workload noise, for a tiny fraction of the metadata cost previously anticipated; in some cases, reducing the required size from 12 gigabytes to less than 150 megabytes

    Metadata And Data Management In High Performance File And Storage Systems

    Get PDF
    With the advent of emerging e-Science applications, today\u27s scientific research increasingly relies on petascale-and-beyond computing over large data sets of the same magnitude. While the computational power of supercomputers has recently entered the era of petascale, the performance of their storage system is far lagged behind by many orders of magnitude. This places an imperative demand on revolutionizing their underlying I/O systems, on which the management of both metadata and data is deemed to have significant performance implications. Prefetching/caching and data locality awareness optimizations, as conventional and effective management techniques for metadata and data I/O performance enhancement, still play their crucial roles in current parallel and distributed file systems. In this study, we examine the limitations of existing prefetching/caching techniques and explore the untapped potentials of data locality optimization techniques in the new era of petascale computing. For metadata I/O access, we propose a novel weighted-graph-based prefetching technique, built on both direct and indirect successor relationship, to reap performance benefit from prefetching specifically for clustered metadata serversan arrangement envisioned necessary for petabyte scale distributed storage systems. For data I/O access, we design and implement Segment-structured On-disk data Grouping and Prefetching (SOGP), a combined prefetching and data placement technique to boost the local data read performance for parallel file systems, especially for those applications with partially overlapped access patterns. One high-performance local I/O software package in SOGP work for Parallel Virtual File System in the number of about 2000 C lines was released to Argonne National Laboratory in 2007 for potential integration into the production mode

    Transaction-filtering data mining and a predictive model for intelligent data management

    Get PDF
    This thesis, first of all, proposes a new data mining paradigm (transaction-filtering association rule mining) addressing a time consumption issue caused by the repeated scans of original transaction databases in conventional associate rule mining algorithms. An in-memory transaction filter is designed to discard those infrequent items in the pruning steps. This filter is a data structure to be updated at the end of each iteration. The results based on an IBM benchmark show that an execution time reduction of 10% - 19% is achieved compared with the base case. Next, a data mining-based predictive model is then established contributing to intelligent data management within the context of Centre for Grid Computing. The capability of discovering unseen rules, patterns and correlations enables data mining techniques favourable in areas where massive amounts of data are generated. The past behaviours of two typical scenarios (network file systems and Data Grids) have been analyzed to build the model. The future popularity of files can be forecasted with an accuracy of 90% by deploying the above predictor based on the given real system traces. A further step towards intelligent policy design is achieved by analyzing the prediction results of files’ future popularity. The real system trace-based simulations have shown improvements of 2-4 times in terms of data response time in network file system scenario and 24% mean job time reduction in Data Grids compared with conventional cases.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Optimización de clusters como plataformas multimedia utilizando clientes predictivios multihilo

    Get PDF
    En esta tesis doctoral se pretende hacer un estudio de los problemas principales de los clusters de computadoras utilizando aplicaciones multimedia. Enfrentamos las dos partes, arquitectura y aplicaciones y, además, lo hacemos de una forma "real", sin simulaciones. Se va a partir de una plataforma real, formada por un cluster de alta velocidad, y de una aplicación multimedia totalmente flexible como para emular el patrón de tráfico de otras. Esta aplicación va a inyectar la carga multimedia al cluster y va a exigir unos requisitos particulares. El gran reto actual de los cluster es la entrada/salida de alto rendimiento, ya que la computación de alto rendimiento está conseguida. Expondremos los principales problemas y daremos una solución que mejore los resultados obtenidos inicialmente. En entrada/salida existen numerosos e interesantes trabajos. Todos han intentado aportar alguna novedad, con el propósito de mejorar algún punto negro de la entrada/salida. En nuestro trabajo se ha hecho un estudio exhaustivo de todos esos trabajos con el fin de plantear un nuevo método híbrido de adelantamiento de datos para arquitecturas cliente-servidor, en uno de los sistemas de ficheros en red más utilizado actualmente, NFS (Network Fle System) . Pero no solo va a ser un planteamiento algorítmico y teórico, sino que se va a implementar en el mismo núcleo del sistema operativo, donde NFS aparece como modulo, y se realizaran los experimentos para confirmar las mejoras de la implementación desarrollada. Inicialmente, nos planteamos la posibilidad de cambiar la forma de trabajar del servidor, pero después realizamos el traslado al cliente, mucho más manejable y abierto a mejoras de este tipo. Planteamos el diseño de la técnica hibrida de prefetching, una técnica basada en grafos de acceso, con el propósito de adelantar datos no solo del fichero actual que se está leyendo (que eso ya se hacía) sino, también, a través de distintos ficheros. Presentamos también los resultados con este nuevo cliente predictivo y obtenemos una importante reducción en los tiempos de lectura, y unos valores importantes en la ganancia conseguida. Por lo tanto, queda demostrada la utilidad de técnicas e este tipo para el sistema NFS

    The case for efficient file access pattern modeling

    No full text

    Abstract The Case for Efficient File Access Pattern Modeling

    No full text
    modern I/O systems treat each file access independently. However, events in a computer system are driven by programs. Thus, accesses to files occur in consistent patterns and are by no means independent. The result is that modern I/O systems ignore useful information. Using traces of file system activity we show that file accesses are strongly correlated with preceding accesses. In fact, a simple last-successor model (one that predicts each file access will be followed by the same file that followed the last time it was accessed) successfully predicted the next file 72 % of the time. We examine the ability of two previously proposed models for file access prediction in comparison to this baseline model and see a stark contrast in accuracy and high overheads in state space. We then enhance one of these models to address the issues of model space requirements. This new model is able to improve an additional 10 % on the accuracy of the last-successor model, while working within a state space that is within a constant factor (relative to the number of files) of the lastsuccessor model. While this work was motivated by the use of file relationships for I/O prefetching, information regarding the likelihood of file access patterns has several other uses such as disk layout and file clustering for disconnected operation.

    Abstract The Case for Efficient File Access Pattern Modeling

    No full text
    modern I/O systems treat each file access independently. However, events in a computer system are driven by programs. Thus, accesses to files occur in consistent patterns and are by no means independent. The result is that modern I/O systems ignore useful information. Using traces of file system activity we show that file accesses are strongly correlated with preceding accesses. In fact, a simple last-successor model (one that predicts each file access will be followed by the same file that followed the last time it was accessed) successfully predicted the next file 72 % of the time. We examine the ability of two previously proposed models for file access prediction in comparison to this baseline model and see a stark contrast in accuracy and high overheads in state space. We then enhance one of these models to address the issues of model space requirements. This new model is able to improve an additional 10 % on the accuracy of the last-successor model, while working within a state space that is within a constant factor (relative to the number of files) of the lastsuccessor model. While this work was motivated by the use of file relationships for I/O prefetching, information regarding the likelihood of file access patterns has several other uses such as disk layout and file clustering for disconnected operation.
    corecore