12 research outputs found

    The applicability of a use value-based file retention method

    Get PDF
    The determination of the relative value of files is important for an organization while determining a retrieval service level for its files and a corresponding file retention policy. This paper discusses via a literature review methods for developing file retention policies based on the use values of files. On basis of these results we propose an enhanced version of one of them. In a case study, we demonstrate how one can develop a customized file retention policy by testing causal relations between file parameters and the use value of files. This case shows that, contrary to suggestions of previous research, the file type has no significant relation with the value of a file and thus should be excluded from a retention policy in this case. The case study also shows a strong relation between the position of a file user and the value of this file. Furthermore, we have improved the Information Value Questionnaire (IVQ) for subjective valuation of files. However, the resulting method needs software to be efficient in its application. Therefore, we developed a prototype for the automatic execution of a file retention policy. We conclude with a discussio

    Simulation of Automated File Migratoon in Information Lifecycle Management

    Get PDF
    Information Lifecycle Management (ILM) is a strategic concept for storage of information and documents. ILM is based on the idea that in an enterprise information have different values. Information with different values are stored on different storage hierarchies. ILM offers significant potential cost savings by tiering storage and 90% of decision makers consider implementing ILM (Linden 2006). Nonetheless, there are too few experience reports and experimenting and researching in real systems are too expensive. This paper addresses this issue and contributes to supporting and assisting IT managers in their decision-making process. ILM automation needs migration rules. There are well-known static, heuristic migration rules and we present a new dynamic migration rule for ILM. These migration rules are implemented in an ILM simulator. We compare the performance of the new dynamic rule with the heuristics. The simulative approach has two advantages. On the one hand it offers predictions about the dynamic behaviour of an ILM migration rule and, on the other hand, it dispenses with real storage hardware. Simulation leads to decisions under certainty. When making a decision under certainty, the major problem is to determine which is the trade-off among different objectives. Cost-benefit analysis can be used to this purpose. A decision matrix is laid where rows represent choices and columns represent states of nature. The simulated results support the choice of migration rules and help to avoid mismanagement and poor investments in advance. The results raise the awareness of choosing the best alternative

    Modeling Information Lifecycle Management

    Get PDF

    Evaluating the Applicability of a Use Value-Based File Retention Method

    Get PDF
    A well constructed file retention policy can help a company determine the relative value and the corresponding retrieval service level of the different files it owns. Though such a retention policy is useful, the method one can use to arrive at such a policy is under-researched. This paper discusses how one can arrive at a method (based on a systematic literature review) for developing file retention policies based on use values of files. In the case study, we demonstrate how one can develop a file retention policy by testing of causal relations between file retention policy parameters and the use value of files. This case study shows that, contrary to suggestions of previous research, the file type has no significant causal relation with the value of a file and thus should be excluded from a retention policy in this case. The case study also shows that there is a strong causal relation between the position of a user of a file and the value of this file. Furthermore, we have amended an existing subjective file valuation method, namely, the Information Value Questionnaire (IVQ). However, to make file retention methods effective and reliable a substantially more case experiences need to be collected

    Space-Efficient Predictive Block Management

    Get PDF
    With growing disk and storage capacities, the amount of required metadata for tracking all blocks in a system becomes a daunting task by itself. In previous work, we have demonstrated a system software effort in the area of predictive data grouping for reducing power and latency on hard disks. The structures used, very similar to prior efforts in prefetching and prefetch caching, track access successor information at the block level, keeping a fixed number of immediate successors per block. While providing powerful predictive expansion capabilities and being more space efficient in the amount of required metadata than many previous strategies, there remains a growing concern of how much data is actually required. In this paper, we present a novel method of storing equivalent information, SESH, a Space Efficient Storage of Heredity. This method utilizes the high amount of block-level predictability observed in a number of workload trace sets to reduce the overall metadata storage by up to 99% without any loss of information. As a result, we are able to provide a predictive tool that is adaptive, accurate, and robust in the face of workload noise, for a tiny fraction of the metadata cost previously anticipated; in some cases, reducing the required size from 12 gigabytes to less than 150 megabytes

    EFFECTIVE GROUPING FOR ENERGY AND PERFORMANCE: CONSTRUCTION OF ADAPTIVE, SUSTAINABLE, AND MAINTAINABLE DATA STORAGE

    Get PDF
    The performance gap between processors and storage systems has been increasingly critical overthe years. Yet the performance disparity remains, and further, storage energy consumption israpidly becoming a new critical problem. While smarter caching and predictive techniques domuch to alleviate this disparity, the problem persists, and data storage remains a growing contributorto latency and energy consumption.Attempts have been made at data layout maintenance, or intelligent physical placement ofdata, yet in practice, basic heuristics remain predominant. Problems that early studies soughtto solve via layout strategies were proven to be NP-Hard, and data layout maintenance todayremains more art than science. With unknown potential and a domain inherently full of uncertainty,layout maintenance persists as an area largely untapped by modern systems. But uncertainty inworkloads does not imply randomness; access patterns have exhibited repeatable, stable behavior.Predictive information can be gathered, analyzed, and exploited to improve data layouts. Ourgoal is a dynamic, robust, sustainable predictive engine, aimed at improving existing layouts byreplicating data at the storage device level.We present a comprehensive discussion of the design and construction of such a predictive engine,including workload evaluation, where we present and evaluate classical workloads as well asour own highly detailed traces collected over an extended period. We demonstrate significant gainsthrough an initial static grouping mechanism, and compare against an optimal grouping method ofour own construction, and further show significant improvement over competing techniques. We also explore and illustrate the challenges faced when moving from static to dynamic (i.e. online)grouping, and provide motivation and solutions for addressing these challenges. These challengesinclude metadata storage, appropriate predictive collocation, online performance, and physicalplacement. We reduced the metadata needed by several orders of magnitude, reducing the requiredvolume from more than 14% of total storage down to less than 12%. We also demonstrate how ourcollocation strategies outperform competing techniques. Finally, we present our complete modeland evaluate a prototype implementation against real hardware. This model was demonstrated tobe capable of reducing device-level accesses by up to 65%

    FairSky: gestão confiável e otimizada de dados em múltiplas nuvens de armazenamento na internet

    Get PDF
    Dissertação para obtenção do Grau de Mestre em Engenharia InformáticaA utilização simultânea de múltiplas nuvens heterogéneas para armazenamento de dados remo-tos, disponíveis em diferentes soluções de diversos provedores desses serviços na Internet, pode ser encarada numa abordagem de organização de um repositório de dados em nuvem de nuvens de armazenamento de dados. A conceção de um sistema baseado nesta visão possui dimensões de investigação em aberto, particularmente em contextos de utilização confiável e otimizada de dados críticos, onde a aproximação tem particular interesse. O objetivo desta dissertação visa propor, implementar e avaliar uma solução de confiabi-lidade, que conjuga a manutenção de propriedades de segurança e de tolerância a falhas ou in-trusões nas nuvens de armazenamento utilizadas, com critérios de utilização otimizada de dados distribuídos e replicados nas múltiplas nuvens utilizadas. A dissertação propõe o sistema FairSky, um sistema que concretiza um repositório de dados geograficamente replicados em diferentes nuvens de armazenamento de dados, tirando partido da diversidade e heterogeneidade das mesmas e atendendo à otimização ou balanceamento de diferentes critérios: resiliência face a falhas ou intrusões bizantinas que ocorram de forma independente em cada nuvem; manuten-ção de requisitos de confidencialidade, integridade e autenticidade dos dados armazenados; oti-mização de critérios de latência de acesso aos dados; e otimização do custo inerente à utilização de diferentes nuvens de diferentes provedores. O sistema FairSky evita a gestão da complexi-dade do balanceamento dos anteriores critérios ao nível das aplicações. Apresenta-se como uma solução genérica, sob a forma de uma solução middleware que visa suportar diferentes aplica-ções, otimizando de forma transparente os anteriores critérios, tendo por base as características das diferentes nuvens que podem ser utilizadas, informação de caracterização dinâmica da ope-ração dessas nuvens e o perfil de operação e carga (workload profile) das aplicações. A gestão dos dados (através do sistema middleware) faz-se assim com transparência e com garantias con-troladas de disponibilidade, tolerância a falhas, segurança, otimização de acesso e otimização de custos financeiros, tendo em conta os requisitos próprios das aplicações
    corecore