154 research outputs found

    HEC: Collaborative Research: SAM^2 Toolkit: Scalable and Adaptive Metadata Management for High-End Computing

    Get PDF
    The increasing demand for Exa-byte-scale storage capacity by high end computing applications requires a higher level of scalability and dependability than that provided by current file and storage systems. The proposal deals with file systems research for metadata management of scalable cluster-based parallel and distributed file storage systems in the HEC environment. It aims to develop a scalable and adaptive metadata management (SAM2) toolkit to extend features of and fully leverage the peak performance promised by state-of-the-art cluster-based parallel and distributed file storage systems used by the high performance computing community. There is a large body of research on data movement and management scaling, however, the need to scale up the attributes of cluster-based file systems and I/O, that is, metadata, has been underestimated. An understanding of the characteristics of metadata traffic, and an application of proper load-balancing, caching, prefetching and grouping mechanisms to perform metadata management correspondingly, will lead to a high scalability. It is anticipated that by appropriately plugging the scalable and adaptive metadata management components into the state-of-the-art cluster-based parallel and distributed file storage systems one could potentially increase the performance of applications and file systems, and help translate the promise and potential of high peak performance of such systems to real application performance improvements. The project involves the following components: 1. Develop multi-variable forecasting models to analyze and predict file metadata access patterns. 2. Develop scalable and adaptive file name mapping schemes using the duplicative Bloom filter array technique to enforce load balance and increase scalability 3. Develop decentralized, locality-aware metadata grouping schemes to facilitate the bulk metadata operations such as prefetching. 4. Develop an adaptive cache coherence protocol using a distributed shared object model for client-side and server-side metadata caching. 5. Prototype the SAM2 components into the state-of-the-art parallel virtual file system PVFS2 and a distributed storage data caching system, set up an experimental framework for a DOE CMS Tier 2 site at University of Nebraska-Lincoln and conduct benchmark, evaluation and validation studies

    Survey on securing data storage in the cloud

    Get PDF
    Cloud Computing has become a well-known primitive nowadays; many researchers and companies are embracing this fascinating technology with feverish haste. In the meantime, security and privacy challenges are brought forward while the number of cloud storage user increases expeditiously. In this work, we conduct an in-depth survey on recent research activities of cloud storage security in association with cloud computing. After an overview of the cloud storage system and its security problem, we focus on the key security requirement triad, i.e., data integrity, data confidentiality, and availability. For each of the three security objectives, we discuss the new unique challenges faced by the cloud storage services, summarize key issues discussed in the current literature, examine, and compare the existing and emerging approaches proposed to meet those new challenges, and point out possible extensions and futuristic research opportunities. The goal of our paper is to provide a state-of-the-art knowledge to new researchers who would like to join this exciting new field

    Data security and reliability in cloud backup systems with deduplication.

    Get PDF
    雲存儲是一個新興的服務模式,讓個人和企業的數據備份外包予較低成本的遠程雲服務提供商。本論文提出的方法,以確保數據的安全性和雲備份系統的可靠性。在本論文的第一部分,我們提出 FadeVersion,安全的雲備份作為今天的雲存儲服務上的安全層服務的系統。 FadeVersion實現標準的版本控制備份設計,從而消除跨不同版本備份的冗餘數據存儲。此外,FadeVersion在此設計上加入了加密技術以保護備份。具體來說,它實現細粒度安全删除,那就是,雲客戶可以穩妥地在雲上删除特定的備份版本或文件,使有關文件永久無法被解讀,而其它共用被删除數據的備份版本或文件將不受影響。我們實現了試驗性原型的 FadeVersion並在亞馬遜S3之上進行實證評價。我們證明了,相對於不支援度安全删除技術傳統的雲備份服務 FadeVersion只增加小量額外開鎖。在本論文的第二部分,提出 CFTDedup一個分佈式代理系統,利用通過重複數據删除增加雲存儲的效率,而同時確保代理之間的崩潰容錯。代理之間會進行同步以保持重複數據删除元數據的一致性。另外,它也分批更新元數據減輕同步帶來的開銷。我們實現了初步的原型CFTDedup並通過試驗台試驗,以存儲虛擬機映像評估其重複數據删除的運行性能。我們還討論了幾個開放問題,例如如何提供可靠、高性能的重複數據删除的存儲。我們的CFTDedup原型提供了一個平台來探討這些問題。Cloud storage is an emerging service model that enables individuals and enterprises to outsource the storage of data backups to remote cloud providers at a low cost. This thesis presents methods to ensure the data security and reliability of cloud backup systems.In the first part of this thesis, we present FadeVersion, a secure cloud backup system that serves as a security layer on top of todays cloud storage services. FadeVersion follows the standard version-controlled backup design, which eliminates the storage of redundant data across different versions of backups. On top of this, FadeVersion applies cryptographic protection to data backups. Specifically, it enables ne-grained assured deletion, that is, cloud clients can assuredly delete particular backup versions or files on the cloud and make them permanently in accessible to anyone, while other versions that share the common data of the deleted versions or les will remain unaffected. We implement a proof-of-concept prototype of FadeVersion and conduct empirical evaluation atop Amazon S3. We show that FadeVersion only adds minimal performance overhead over a traditional cloud backup service that does not support assured deletion.In the second part of this thesis, we present CFTDedup, a distributed proxy system designed for providing storage efficiency via deduplication in cloud storage, while ensuring crash fault tolerance among proxies. It synchronizes deduplication metadata among proxies to provide strong consistency. It also batches metadata updates to mitigate synchronization overhead. We implement a preliminary prototype of CFTDedup and evaluate via test bed experiments its runtime performance in deduplication storage for virtual machine images. We also discuss several open issues on how to provide reliable, high-performance deduplication storage. Our CFTDedup prototype provides a platform to explore such issues.Detailed summary in vernacular field only.Detailed summary in vernacular field only.Detailed summary in vernacular field only.Rahumed, Arthur.Thesis (M.Phil.)--Chinese University of Hong Kong, 2012.Includes bibliographical references (leaves 47-51).Abstracts also in Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Cloud Based Backup and Assured Deletion --- p.1Chapter 1.2 --- Crash Fault Tolerance for Backup Systems with Deduplication --- p.4Chapter 1.3 --- Outline of Thesis --- p.6Chapter 2 --- Background and Related Work --- p.7Chapter 2.1 --- Deduplication --- p.7Chapter 2.2 --- Assured Deletion --- p.7Chapter 2.3 --- Policy Based Assured Deletion --- p.8Chapter 2.4 --- Convergent Encryption --- p.9Chapter 2.5 --- Cloud Based Backup Systems --- p.10Chapter 2.6 --- Fault Tolerant Deduplication Systems --- p.10Chapter 3 --- Design of FadeVersion --- p.12Chapter 3.1 --- Threat Model and Assumptions for Fade Version --- p.12Chapter 3.2 --- Motivation --- p.13Chapter 3.3 --- Main Idea --- p.14Chapter 3.4 --- Version Control --- p.14Chapter 3.5 --- Assured Deletion --- p.16Chapter 3.6 --- Assured Deletion for Multiple Policies --- p.18Chapter 3.7 --- Key Management --- p.19Chapter 4 --- Implementation of FadeVersion --- p.20Chapter 4.1 --- System Entities --- p.20Chapter 4.2 --- Metadata Format in FadeVersion --- p.22Chapter 5 --- Evaluation of FadeVersion --- p.24Chapter 5.1 --- Setup --- p.24Chapter 5.2 --- Backup/Restore Time --- p.26Chapter 5.3 --- Storage Space --- p.28Chapter 5.4 --- Monetary Cost --- p.29Chapter 5.5 --- Conclusions --- p.30Chapter 6 --- CFTDedup Design --- p.31Chapter 6.1 --- Failure Model --- p.31Chapter 6.2 --- System Overview --- p.32Chapter 6.3 --- Distributed Deduplication --- p.33Chapter 6.4 --- Crash Fault Tolerance --- p.35Chapter 6.5 --- Implementation --- p.36Chapter 7 --- Evaluation of CFTDedup --- p.37Chapter 7.1 --- Setup --- p.37Chapter 7.2 --- Experiment 1 (Archival) --- p.38Chapter 7.3 --- Experiment 2 (Restore) --- p.39Chapter 7.4 --- Experiment 3 (Recovery) --- p.40Chapter 7.5 --- Summary --- p.41Chapter 8 --- Future work and Conclusions of CFTDedup --- p.43Chapter 8.1 --- Future Work --- p.43Chapter 8.2 --- Conclusions --- p.44Chapter 9 --- Conclusion --- p.45Bibliography --- p.4

    Analysis of outsourcing data to the cloud using autonomous key generation

    Get PDF
    Cloud computing, a technology that enables users to store and manage their data at a low cost and high availability, has been emerging for the past few decades because of the many services it provides. One of the many services cloud computing provides to its users is data storage. The majority of the users of this service are still concerned to outsource their data due to the integrity and confidentiality issues, as well as performance and cost issues, that come along with it. These issues make it necessary to encrypt data prior to outsourcing it to the cloud. However, encrypting data prior to outsourcing makes searching the data obsolete, lowering the functionality of the cloud. Most existing cloud storage schemes often prioritize security over performance and functionality, or vice versa. In this thesis, the cloud storage service is explored, and the aspects of security, performance, and functionality are analyzed in order to investigate the trade-offs of the service. DSB-SEIS, a scheme with encryption intensity selection, an autonomous key generation algorithm that allows users to control the encryption intensity of their files, as well as other features is developed in order to find a balance between performance, security, and functionality. The features that DSB-SEIS contains are deduplication, assured deletion, and searchable encryption. The effect of encryption intensity selection on encryption, decryption, and key generation is explored, and the performance and security of DSB-SEIS are evaluated. The MapReduce framework is also used to investigate the DSB-SEIS algorithm performance with big data. Analysis demonstrates that the encryption intensity selection algorithm generates a manageable number of encryption keys based on the confidentiality of data while not adding significant overhead on encryption or decryption --Abstract, page iii

    Challenges and requirements of heterogenous research data management in environmental sciences:a qualitative study

    Get PDF
    Abstract. The research focuses on the challenges and requirements of heterogeneous research data management in environmental sciences. Environmental research involves diverse data types, and effective management and integration of these data sets are crucial in managing heterogeneous environmental research data. The issue at hand is the lack of specific guidance on how to select and plan an appropriate data management practice to address the challenges of handling and integrating diverse data types in environmental research. The objective of the research is to identify the issues associated with the current data storage approach in research data management and determine the requirements for an appropriate system to address these challenges. The research adopts a qualitative approach, utilizing semi-structured interviews to collect data. Content analysis is employed to analyze the gathered data and identify relevant issues and requirements. The study reveals various issues in the current data management process, including inconsistencies in data treatment, the risk of unintentional data deletion, loss of knowledge due to staff turnover, lack of guidelines, and data scattered across multiple locations. The requirements identified through interviews emphasize the need for a data management system that integrates automation, open access, centralized storage, online electronic lab notes, systematic data management, secure repositories, reduced hardware storage, and version control with metadata support. The research identifies the current challenges faced by researchers in heterogeneous data management and compiles a list of requirements for an effective solution. The findings contribute to existing knowledge on research-related problems and provide a foundation for developing tailored solutions to meet the specific needs of researchers in environmental sciences

    Cloud technology options towards Free Flow of Data

    Get PDF
    This whitepaper collects the technology solutions that the projects in the Data Protection, Security and Privacy Cluster propose to address the challenges raised by the working areas of the Free Flow of Data initiative. The document describes the technologies, methodologies, models, and tools researched and developed by the clustered projects mapped to the ten areas of work of the Free Flow of Data initiative. The aim is to facilitate the identification of the state-of-the-art of technology options towards solving the data security and privacy challenges posed by the Free Flow of Data initiative in Europe. The document gives reference to the Cluster, the individual projects and the technologies produced by them

    On Personal Storage Systems: Architecture and Design Considerations

    Get PDF
    Actualment, els usuaris necessiten grans quantitats d’espai d’emmagatzematge remot per guardar la seva informació personal. En aquesta dissertació, estudiarem dues arquitectures emergents de sistemes d’emmagatzematge d’informació personal: els Núvols Personals (centralitzats) i els sistemes d’emmagatzematge social (descentralitzats). A la Part I d'aquesta tesi, contribuïm desvelant l’operació interna d’un Núvol Personal d’escala global, anomenat UbuntuOne (U1), incloent-hi la seva arquitectura, el seu servei de metadades i les interaccions d’emmagatzematge de dades. A més, proporcionem una anàlisi de la part de servidor d’U1 on estudiem la càrrega del sistema, el comportament dels usuaris i el rendiment del seu servei de metadades. També suggerim tota una sèrie de millores potencials al sistema que poden beneficiar sistemes similars. D'altra banda, en aquesta tesi també contribuïm mesurant i analitzant la qualitat de servei (p.e., velocitat, variabilitat) de les transferències sobre les REST APIs oferides pels Núvols Personals. A més, durant aquest estudi, ens hem adonat que aquestes interfícies poden ser objecte d’abús quan són utilitzades sobre els comptes gratuïts que normalment ofereixen aquests serveis. Això ha motivat l’estudi d’aquesta vulnerabilitat, així com de potencials contramesures. A la Part II d'aquesta dissertació, la nostra primera contribució és analitzar la qualitat de servei que els sistemes d’emmagatzematge social poden proporcionar en termes de disponibilitat de dades, velocitat de transferència i balanceig de la càrrega. El nostre interès principal és entendre com fenòmens intrínsecs, com les dinàmiques de connexió dels usuaris o l’estructura de la xarxa social, limiten el rendiment d’aquests sistemes. També proposem nous mecanismes de manegament de dades per millorar aquestes limitacions. Finalment, dissenyem una arquitectura híbrida que combina recursos del Núvol i dels usuaris. Aquesta arquitectura té com a objectiu millorar la qualitat de servei del sistema i deixa als usuaris decidir la quantitat de recursos utilitzats del Núvol, o en altres paraules, és una decisió entre control de les seves dades i rendiment.Los usuarios cada vez necesitan espacios mayores de almacenamiento en línea para guardar su información personal. Este reto motiva a los investigadores a diseñar y evaluar nuevas infraestructuras de almacenamiento de datos personales. En esta tesis, nos centramos en dos arquitecturas emergentes de almacenamiento de datos personales: las Nubes Personales (centralización) y los sistemas de almacenamiento social (descentralización). Creemos que, pese a su creciente popularidad, estos sistemas requieren de un mayor estudio científico. En la Parte I de esta disertación, examinamos aspectos referentes a la operación interna y el rendimiento de varias Nubes Personales. Concretamente, nuestra primera contribución es desvelar la operación interna e infraestructura de una Nube Personal de gran escala (UbuntuOne, U1). Además, proporcionamos un estudio de la actividad interna de U1 que incluye la carga diaria soportada, el comportamiento de los usuarios y el rendimiento de su sistema de metadatos. También sugerimos mejoras sobre U1 que pueden ser de utilidad en sistemas similares. Por otra parte, en esta tesis medimos y caracterizamos el rendimiento del servicio de REST APIs ofrecido por varias Nubes Personales (velocidad de transferencia, variabilidad, etc.). También demostramos que la combinación de REST APIs sobre cuentas gratuitas de usuario puede dar lugar a abusos por parte de usuarios malintencionados. Esto nos motiva a proponer mecanismos para limitar el impacto de esta vulnerabilidad. En la Parte II de esta tesis, estudiamos la calidad de servicio que pueden ofrecer los sistemas de almacenamiento social en términos de disponibilidad de datos, balanceo de carga y tiempos de transferencia. Nuestro interés principal es entender la manera en que fenómenos intrínsecos, como las dinámicas de conexión de los usuarios o la estructura de su red social, limitan el rendimiento de estos sistemas. También proponemos nuevos mecanismos de gestión de datos para mejorar esas limitaciones. Finalmente, diseñamos y evaluamos una arquitectura híbrida para mejorar la calidad de servicio de los sistemas de almacenamiento social que combina recursos de usuarios y de la Nube. Esta arquitectura permite al usuario decidir su equilibrio entre control de sus datos y rendimiento.Increasingly, end-users demand larger amounts of online storage space to store their personal information. This challenge motivates researchers to devise novel personal storage infrastructures. In this thesis, we focus on two popular personal storage architectures: Personal Clouds (centralized) and social storage systems (decentralized). In our view, despite their growing popularity among users and researchers, there still remain some critical aspects to address regarding these systems. In the Part I of this dissertation, we examine various aspects of the internal operation and performance of various Personal Clouds. Concretely, we first contribute by unveiling the internal structure of a global-scale Personal Cloud, namely UbuntuOne (U1). Moreover, we provide a back-end analysis of U1 that includes the study of the storage workload, the user behavior and the performance of the U1 metadata store. We also suggest improvements to U1 (storage optimizations, user behavior detection and security) that can also benefit similar systems. From an external viewpoint, we actively measure various Personal Clouds through their REST APIs for characterizing their QoS, such as transfer speed, variability and failure rate. We also demonstrate that combining open APIs and free accounts may lead to abuse by malicious parties, which motivates us to propose countermeasures to limit the impact of abusive applications in this scenario. In the Part II of this thesis, we study the storage QoS of social storage systems in terms of data availability, load balancing and transfer times. Our main interest is to understand the way intrinsic phenomena, such as the dynamics of users and the structure of their social relationships, limit the storage QoS of these systems, as well as to research novel mechanisms to ameliorate these limitations. Finally, we design and evaluate a hybrid architecture to enhance the QoS achieved by a social storage system that combines user resources and cloud storage to let users infer the right balance between user control and QoS

    Adapting a quality model for a Big Data application: the case of a feature prediction system

    Get PDF
    En la última década hemos sido testigos del considerable incremento de proyectos basados en aplicaciones de Big Data. Algunos de los tipos más populares de esas aplicaciones han sido: los sistemas de recomendaciones, la predicción de características y la toma de decisiones. En este nuevo auge han surgido propuestas de implementación de modelos de calidad para las aplicaciones de Big data que por su gran heterogeneidad se hace difícil la selección del modelo de calidad ideal para el desarrollo de un tipo específico de aplicación de Big Data. En el presente Trabajo de Fin de Máster se realiza un estudio de mapeo sistemático (SMS, por sus siglas en inglés) que parte de dos preguntas clave de investigación. La primera trata sobre cuál es el estado en la identificación de riesgos, problemas o desafíos en las aplicaciones de Big Data. La segunda, trata sobre qué modelos de calidad se han aplicado hasta la fecha a las aplicaciones de Big Data, específicamente a los sistemas de predicción de características. El objetivo final es analizar los modelos de calidad disponibles y adaptar un modelo de calidad a partir de los existentes que se puedan aplicar a un tipo específico de aplicación de Big Data: los sistemas de predicción de características. El modelo definido comprende un conjunto de características de calidad definidas como parte del modelo y métricas de calidad para evaluarlas. Finalmente, se realiza una aproximación a un caso de estudio donde se aplica el modelo y se evalúan las características de calidad definidas a través de sus métricas de calidad presentándose los resultados obtenidos.In the last decade, we have been witnesses of the considerable increment of projects based on big data applications. Some of the most popular types of those applications have been: Recommendations, Feature Predictions, and Decision making. In this new context, several proposals have arisen for the implementation of quality models applied to Big Data applications. As part of the current Master thesis, a Systematic Mapping Study (SMS) is conducted which starts from two key research questions. The first one is about what is the state of the art about the identification of risks, issues, problems, or challenges in big data applications. The second one, is about which quality models have been applied up to date to big data applications, specifically to feature prediction systems. The main objective is to analyze the available quality models and adapt a quality model from the existing ones that can be applied to a specific type of Big Data application: The Feature Prediction Systems. The defined model comprises a set of quality characteristics defined as part of the model and a set of quality metrics to evaluate them. Finally, an approach is made to a case study where the model is applied, and the quality characteristics defined through its quality metrics are evaluated. The results are presented and discussed.Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos)Máster en Ingeniería Informátic

    Resurrection: Rethinking Magnetic Tapes For Cost Efficient Data Preservation

    Get PDF
    With the advent of Big Data technologies-the capacity to store and efficiently process large sets of data, doors of opportunities for developing business intelligence that was previously unknown, has opened. Each phase in the processing of this data requires specialized infrastructures. One such phase, the preservation and archiving of data, has proven its usefulness time and again. Data archives are processed using novel data mining methods to elicit vital data gathered over long periods of time and efficiently audit the growth of a business or an organization. Data preservation is also an important aspect of business processes which helps in avoiding loss of important information due to system failures, human errors and natural calamities. This thesis investigates the need, discusses possibilities and presents a novel, highly cost-effective, unified, long- term storage solution for data. Some of the common processes followed in large-scale data warehousing systems are analyzed for overlooked, inordinate shortcomings and a profitably feasible solution is conceived for them. The gap between the general needs of 'efficient' long-term storage and common, current functionalities is analyzed. An attempt to bridge this gap is made through the use of a hybrid, hierarchical media based, performance enhancing middleware and a monolithic namespace filesystem in a new storage architecture, Tape Cloud. The scope of studies carried out by us involves interpreting the effects of using heterogeneous storage media in terms of operational behavior, average latency of data transactions and power consumption. The results show the advantages of the new storage system by demonstrating the difference in operating costs, personnel costs and total cost of ownership from varied perspectives in a business model.Computer Science, Department o
    corecore