Search CORE

10 research outputs found

Space-Efficient Predictive Block Management

Author: Amer Ahmed
Essary David
Publication venue: Department of Computer Science, University of Pittsburgh
Publication date: 01/01/2009
Field of study

With growing disk and storage capacities, the amount of required metadata for tracking all blocks in a system becomes a daunting task by itself. In previous work, we have demonstrated a system software effort in the area of predictive data grouping for reducing power and latency on hard disks. The structures used, very similar to prior efforts in prefetching and prefetch caching, track access successor information at the block level, keeping a fixed number of immediate successors per block. While providing powerful predictive expansion capabilities and being more space efficient in the amount of required metadata than many previous strategies, there remains a growing concern of how much data is actually required. In this paper, we present a novel method of storing equivalent information, SESH, a Space Efficient Storage of Heredity. This method utilizes the high amount of block-level predictability observed in a number of workload trace sets to reduce the overall metadata storage by up to 99% without any loss of information. As a result, we are able to provide a predictive tool that is adaptive, accurate, and robust in the face of workload noise, for a tiny fraction of the metadata cost previously anticipated; in some cases, reducing the required size from 12 gigabytes to less than 150 megabytes

D-Scholarship@Pitt

Metadata And Data Management In High Performance File And Storage Systems

Author: Gu Peng
Publication venue: 'Information Bulletin on Variable Stars (IBVS)'
Publication date: 01/01/2008
Field of study

With the advent of emerging e-Science applications, today\u27s scientific research increasingly relies on petascale-and-beyond computing over large data sets of the same magnitude. While the computational power of supercomputers has recently entered the era of petascale, the performance of their storage system is far lagged behind by many orders of magnitude. This places an imperative demand on revolutionizing their underlying I/O systems, on which the management of both metadata and data is deemed to have significant performance implications. Prefetching/caching and data locality awareness optimizations, as conventional and effective management techniques for metadata and data I/O performance enhancement, still play their crucial roles in current parallel and distributed file systems. In this study, we examine the limitations of existing prefetching/caching techniques and explore the untapped potentials of data locality optimization techniques in the new era of petascale computing. For metadata I/O access, we propose a novel weighted-graph-based prefetching technique, built on both direct and indirect successor relationship, to reap performance benefit from prefetching specifically for clustered metadata serversan arrangement envisioned necessary for petabyte scale distributed storage systems. For data I/O access, we design and implement Segment-structured On-disk data Grouping and Prefetching (SOGP), a combined prefetching and data placement technique to boost the local data read performance for parallel file systems, especially for those applications with partially overlapped access patterns. One high-performance local I/O software package in SOGP work for Parallel Virtual File System in the number of about 2000 C lines was released to Argonne National Laboratory in 2007 for potential integration into the production mode

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Transaction-filtering data mining and a predictive model for intelligent data management

Author: Liao ChenHan
Wang Frank ZhiGang
Publication venue
Publication date: 01/01/2008
Field of study

This thesis, first of all, proposes a new data mining paradigm (transaction-filtering association rule mining) addressing a time consumption issue caused by the repeated scans of original transaction databases in conventional associate rule mining algorithms. An in-memory transaction filter is designed to discard those infrequent items in the pruning steps. This filter is a data structure to be updated at the end of each iteration. The results based on an IBM benchmark show that an execution time reduction of 10% - 19% is achieved compared with the base case. Next, a data mining-based predictive model is then established contributing to intelligent data management within the context of Centre for Grid Computing. The capability of discovering unseen rules, patterns and correlations enables data mining techniques favourable in areas where massive amounts of data are generated. The past behaviours of two typical scenarios (network file systems and Data Grids) have been analyzed to build the model. The future popularity of files can be forecasted with an accuracy of 90% by deploying the above predictor based on the given real system traces. A further step towards intelligent policy design is achieved by analyzing the prediction results of files’ future popularity. The real system trace-based simulations have shown improvements of 2-4 times in terms of data response time in network file system scenario and 24% mean job time reduction in Data Grids compared with conventional cases.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

OpenGrey Repository

Optimización de clusters como plataformas multimedia utilizando clientes predictivios multihilo

Author: Olivares Montes Teresa
Publication venue: Ediciones de la Universidad de Castilla-La Mancha
Publication date: 01/01/2003
Field of study

En esta tesis doctoral se pretende hacer un estudio de los problemas principales de los clusters de computadoras utilizando aplicaciones multimedia. Enfrentamos las dos partes, arquitectura y aplicaciones y, además, lo hacemos de una forma "real", sin simulaciones. Se va a partir de una plataforma real, formada por un cluster de alta velocidad, y de una aplicación multimedia totalmente flexible como para emular el patrón de tráfico de otras. Esta aplicación va a inyectar la carga multimedia al cluster y va a exigir unos requisitos particulares. El gran reto actual de los cluster es la entrada/salida de alto rendimiento, ya que la computación de alto rendimiento está conseguida. Expondremos los principales problemas y daremos una solución que mejore los resultados obtenidos inicialmente. En entrada/salida existen numerosos e interesantes trabajos. Todos han intentado aportar alguna novedad, con el propósito de mejorar algún punto negro de la entrada/salida. En nuestro trabajo se ha hecho un estudio exhaustivo de todos esos trabajos con el fin de plantear un nuevo método híbrido de adelantamiento de datos para arquitecturas cliente-servidor, en uno de los sistemas de ficheros en red más utilizado actualmente, NFS (Network Fle System) . Pero no solo va a ser un planteamiento algorítmico y teórico, sino que se va a implementar en el mismo núcleo del sistema operativo, donde NFS aparece como modulo, y se realizaran los experimentos para confirmar las mejoras de la implementación desarrollada. Inicialmente, nos planteamos la posibilidad de cambiar la forma de trabajar del servidor, pero después realizamos el traslado al cliente, mucho más manejable y abierto a mejoras de este tipo. Planteamos el diseño de la técnica hibrida de prefetching, una técnica basada en grafos de acceso, con el propósito de adelantar datos no solo del fichero actual que se está leyendo (que eso ya se hacía) sino, también, a través de distintos ficheros. Presentamos también los resultados con este nuevo cliente predictivo y obtenemos una importante reducción en los tiempos de lectura, y unos valores importantes en la ganancia conseguida. Por lo tanto, queda demostrada la utilidad de técnicas e este tipo para el sistema NFS

Universidad de Castilla-La Mancha: Repositorio Universitario Institucional de Recursos Abiertos (RUIdeRA)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

The case for efficient file access pattern modeling

Author: Kroeger TM,
Publication venue
Publication date: 23/12/2022
Field of study

Ezid

Recommended from our members

The Case for Efficient File Access Pattern Modeling

Author: Kroeger Thomas M
Long Darrell
Publication venue: eScholarship, University of California
Publication date: 28/03/1999
Field of study

Most modern I/O systems treat each file access independently. However, events in a computer system are driven by programs. Thus, accesses to files occur in consistent patterns and are by no means independent. The result is that modern I/O systems ignore useful information. Using traces of file system activity we show that file accesses are strongly correlated with preceding accesses. In fact, a simple last-successor model (one that predicts each file access will be followed by the same file that followed the last time it was accessed) successfully predicted the next file 72% of the time. We examine the ability of two previously proposed models for file access prediction in comparison to this baseline model and see a stark contrast in accuracy and high overheads in state space. We then enhance one of these models to address the issues of model space requirements. This new model is able to improve an additional 10% on the accuracy of the last-successor model, while working within a state space that is within a constant factor (relative to the number of files) of the last-successor model. While this work was motivated by the use of file relationships for I/O prefetching, information regarding the likelihood of file access patterns has several other uses such as disk layout and file clustering for disconnected operation

eScholarship - University of California

Abstract The Case for Efficient File Access Pattern Modeling

Author
Publication venue
Publication date
Field of study

modern I/O systems treat each file access independently. However, events in a computer system are driven by programs. Thus, accesses to files occur in consistent patterns and are by no means independent. The result is that modern I/O systems ignore useful information. Using traces of file system activity we show that file accesses are strongly correlated with preceding accesses. In fact, a simple last-successor model (one that predicts each file access will be followed by the same file that followed the last time it was accessed) successfully predicted the next file 72 % of the time. We examine the ability of two previously proposed models for file access prediction in comparison to this baseline model and see a stark contrast in accuracy and high overheads in state space. We then enhance one of these models to address the issues of model space requirements. This new model is able to improve an additional 10 % on the accuracy of the last-successor model, while working within a state space that is within a constant factor (relative to the number of files) of the lastsuccessor model. While this work was motivated by the use of file relationships for I/O prefetching, information regarding the likelihood of file access patterns has several other uses such as disk layout and file clustering for disconnected operation.

CiteSeerX

Abstract The Case for Efficient File Access Pattern Modeling

Author
Publication venue
Publication date
Field of study

CiteSeerX