146 research outputs found

    SEARS: Space Efficient And Reliable Storage System in the Cloud

    Full text link
    Today's cloud storage services must offer storage reliability and fast data retrieval for large amount of data without sacrificing storage cost. We present SEARS, a cloud-based storage system which integrates erasure coding and data deduplication to support efficient and reliable data storage with fast user response time. With proper association of data to storage server clusters, SEARS provides flexible mixing of different configurations, suitable for real-time and archival applications. Our prototype implementation of SEARS over Amazon EC2 shows that it outperforms existing storage systems in storage efficiency and file retrieval time. For 3 MB files, SEARS delivers retrieval time of 2.52.5 s compared to 77 s with existing systems.Comment: 4 pages, IEEE LCN 201

    Personal Cloud Storage Benchmarks and Comparison

    Get PDF
    The large amount of space offered by personal cloud storage services (e.g., Dropbox and OneDrive), together with the possibility of synchronizing devices seamlessly, keep attracting customers to the cloud. Despite the high public interest, little information about system design and actual implications on performance is available when selecting a cloud storage service. Systematic benchmarks to assist in comparing services and understanding the effects of design choices are still lacking. This paper proposes a methodology to understand and benchmark personal cloud storage services. Our methodology unveils their architecture and capabilities. Moreover, by means of repeatable and customizable tests, it allows the measurement of performance metrics under different workloads. The effectiveness of the methodology is shown in a case study in which 11 services are compared under the same conditions. Our case study reveals interesting differences in design choices. Their implications are assessed in a series of benchmarks. Results show no clear winner, with all services having potential for improving performance. In some scenarios, the synchronization of the same files can take 20 times longer. In other cases, we observe a wastage of twice as much network capacity, questioning the design of some services. Our methodology and results are thus useful both as benchmarks and as guidelines for system design

    Malleable coding for updatable cloud caching

    Full text link
    In software-as-a-service applications provisioned through cloud computing, locally cached data are often modified with updates from new versions. In some cases, with each edit, one may want to preserve both the original and new versions. In this paper, we focus on cases in which only the latest version must be preserved. Furthermore, it is desirable for the data to not only be compressed but to also be easily modified during updates, since representing information and modifying the representation both incur cost. We examine whether it is possible to have both compression efficiency and ease of alteration, in order to promote codeword reuse. In other words, we study the feasibility of a malleable and efficient coding scheme. The tradeoff between compression efficiency and malleability cost-the difficulty of synchronizing compressed versions-is measured as the length of a reused prefix portion. The region of achievable rates and malleability is found. Drawing from prior work on common information problems, we show that efficient data compression may not be the best engineering design principle when storing software-as-a-service data. In the general case, goals of efficiency and malleability are fundamentally in conflict.This work was supported in part by an NSF Graduate Research Fellowship (LRV), Grant CCR-0325774, and Grant CCF-0729069. This work was presented at the 2011 IEEE International Symposium on Information Theory [1] and the 2014 IEEE International Conference on Cloud Engineering [2]. The associate editor coordinating the review of this paper and approving it for publication was R. Thobaben. (CCR-0325774 - NSF Graduate Research Fellowship; CCF-0729069 - NSF Graduate Research Fellowship)Accepted manuscrip

    A survey and classification of software-defined storage systems

    Get PDF
    The exponential growth of digital information is imposing increasing scale and efficiency demands on modern storage infrastructures. As infrastructure complexity increases, so does the difficulty in ensuring quality of service, maintainability, and resource fairness, raising unprecedented performance, scalability, and programmability challenges. Software-Defined Storage (SDS) addresses these challenges by cleanly disentangling control and data flows, easing management, and improving control functionality of conventional storage systems. Despite its momentum in the research community, many aspects of the paradigm are still unclear, undefined, and unexplored, leading to misunderstandings that hamper the research and development of novel SDS technologies. In this article, we present an in-depth study of SDS systems, providing a thorough description and categorization of each plane of functionality. Further, we propose a taxonomy and classification of existing SDS solutions according to different criteria. Finally, we provide key insights about the paradigm and discuss potential future research directions for the field.This work was financed by the Portuguese funding agency FCT-Fundacao para a Ciencia e a Tecnologia through national funds, the PhD grant SFRH/BD/146059/2019, the project ThreatAdapt (FCT-FNR/0002/2018), the LASIGE Research Unit (UIDB/00408/2020), and cofunded by the FEDER, where applicable

    A systematic review on cloud storage mechanisms concerning e-healthcare systems

    Get PDF
    As the expenses of medical care administrations rise and medical services experts are becoming rare, it is up to medical services organizations and institutes to consider the implementation of medical Health Information Technology (HIT) innovation frameworks. HIT permits health associations to smooth out their considerable cycles and offer types of assistance in a more productive and financially savvy way. With the rise of Cloud Storage Computing (CSC), an enormous number of associations and undertakings have moved their healthcare data sources to distributed storage. As the information can be mentioned whenever universally, the accessibility of information becomes an urgent need. Nonetheless, outages in cloud storage essentially influence the accessibility level. Like the other basic variables of cloud storage (e.g., reliability quality, performance, security, and protection), availability also directly impacts the data in cloud storage for e-Healthcare systems. In this paper, we systematically review cloud storage mechanisms concerning the healthcare environment. Additionally, in this paper, the state-of-the-art cloud storage mechanisms are critically reviewed for e-Healthcare systems based on their characteristics. In short, this paper summarizes existing literature based on cloud storage and its impact on healthcare, and it likewise helps researchers, medical specialists, and organizations with a solid foundation for future studies in the healthcare environment.Qatar University [IRCC-2020-009]

    Efficient, Dependable Storage of Human Genome Sequencing Data

    Get PDF
    A compreensão do genoma humano impacta várias áreas da vida. Os dados oriundos do genoma humano são enormes pois existem milhões de amostras a espera de serem sequenciadas e cada genoma humano sequenciado pode ocupar centenas de gigabytes de espaço de armazenamento. Os genomas humanos são críticos porque são extremamente valiosos para a investigação e porque podem fornecer informações delicadas sobre o estado de saúde dos indivíduos, identificar os seus dadores ou até mesmo revelar informações sobre os parentes destes. O tamanho e a criticidade destes genomas, para além da quantidade de dados produzidos por instituições médicas e de ciências da vida, exigem que os sistemas informáticos sejam escaláveis, ao mesmo tempo que sejam seguros, confiáveis, auditáveis e com custos acessíveis. As infraestruturas de armazenamento existentes são tão caras que não nos permitem ignorar a eficiência de custos no armazenamento de genomas humanos, assim como em geral estas não possuem o conhecimento e os mecanismos adequados para proteger a privacidade dos dadores de amostras biológicas. Esta tese propõe um sistema de armazenamento de genomas humanos eficiente, seguro e auditável para instituições médicas e de ciências da vida. Ele aprimora os ecossistemas de armazenamento tradicionais com técnicas de privacidade, redução do tamanho dos dados e auditabilidade a fim de permitir o uso eficiente e confiável de infraestruturas públicas de computação em nuvem para armazenar genomas humanos. As contribuições desta tese incluem (1) um estudo sobre a sensibilidade à privacidade dos genomas humanos; (2) um método para detetar sistematicamente as porções dos genomas que são sensíveis à privacidade; (3) algoritmos de redução do tamanho de dados, especializados para dados de genomas sequenciados; (4) um esquema de auditoria independente para armazenamento disperso e seguro de dados; e (5) um fluxo de armazenamento completo que obtém garantias razoáveis de proteção, segurança e confiabilidade a custos modestos (por exemplo, menos de 1/Genoma/Ano),integrandoosmecanismospropostosaconfigurac\co~esdearmazenamentoapropriadasTheunderstandingofhumangenomeimpactsseveralareasofhumanlife.Datafromhumangenomesismassivebecausetherearemillionsofsamplestobesequenced,andeachsequencedhumangenomemaysizehundredsofgigabytes.Humangenomesarecriticalbecausetheyareextremelyvaluabletoresearchandmayprovidehintsonindividuals’healthstatus,identifytheirdonors,orrevealinformationaboutdonors’relatives.Theirsizeandcriticality,plustheamountofdatabeingproducedbymedicalandlife−sciencesinstitutions,requiresystemstoscalewhilebeingsecure,dependable,auditable,andaffordable.Currentstorageinfrastructuresaretooexpensivetoignorecostefficiencyinstoringhumangenomes,andtheylacktheproperknowledgeandmechanismstoprotecttheprivacyofsampledonors.Thisthesisproposesanefficientstoragesystemforhumangenomesthatmedicalandlifesciencesinstitutionsmaytrustandafford.Itenhancestraditionalstorageecosystemswithprivacy−aware,data−reduction,andauditabilitytechniquestoenabletheefficient,dependableuseofmulti−tenantinfrastructurestostorehumangenomes.Contributionsfromthisthesisinclude(1)astudyontheprivacy−sensitivityofhumangenomes;(2)todetectgenomes’privacy−sensitiveportionssystematically;(3)specialiseddatareductionalgorithmsforsequencingdata;(4)anindependentauditabilityschemeforsecuredispersedstorage;and(5)acompletestoragepipelinethatobtainsreasonableprivacyprotection,security,anddependabilityguaranteesatmodestcosts(e.g.,lessthan1/Genoma/Ano), integrando os mecanismos propostos a configurações de armazenamento apropriadasThe understanding of human genome impacts several areas of human life. Data from human genomes is massive because there are millions of samples to be sequenced, and each sequenced human genome may size hundreds of gigabytes. Human genomes are critical because they are extremely valuable to research and may provide hints on individuals’ health status, identify their donors, or reveal information about donors’ relatives. Their size and criticality, plus the amount of data being produced by medical and life-sciences institutions, require systems to scale while being secure, dependable, auditable, and affordable. Current storage infrastructures are too expensive to ignore cost efficiency in storing human genomes, and they lack the proper knowledge and mechanisms to protect the privacy of sample donors. This thesis proposes an efficient storage system for human genomes that medical and lifesciences institutions may trust and afford. It enhances traditional storage ecosystems with privacy-aware, data-reduction, and auditability techniques to enable the efficient, dependable use of multi-tenant infrastructures to store human genomes. Contributions from this thesis include (1) a study on the privacy-sensitivity of human genomes; (2) to detect genomes’ privacy-sensitive portions systematically; (3) specialised data reduction algorithms for sequencing data; (4) an independent auditability scheme for secure dispersed storage; and (5) a complete storage pipeline that obtains reasonable privacy protection, security, and dependability guarantees at modest costs (e.g., less than 1/Genome/Year) by integrating the proposed mechanisms with appropriate storage configurations

    Network Traffic Measurements, Applications to Internet Services and Security

    Get PDF
    The Internet has become along the years a pervasive network interconnecting billions of users and is now playing the role of collector for a multitude of tasks, ranging from professional activities to personal interactions. From a technical standpoint, novel architectures, e.g., cloud-based services and content delivery networks, innovative devices, e.g., smartphones and connected wearables, and security threats, e.g., DDoS attacks, are posing new challenges in understanding network dynamics. In such complex scenario, network measurements play a central role to guide traffic management, improve network design, and evaluate application requirements. In addition, increasing importance is devoted to the quality of experience provided to final users, which requires thorough investigations on both the transport network and the design of Internet services. In this thesis, we stress the importance of users’ centrality by focusing on the traffic they exchange with the network. To do so, we design methodologies complementing passive and active measurements, as well as post-processing techniques belonging to the machine learning and statistics domains. Traffic exchanged by Internet users can be classified in three macro-groups: (i) Outbound, produced by users’ devices and pushed to the network; (ii) unsolicited, part of malicious attacks threatening users’ security; and (iii) inbound, directed to users’ devices and retrieved from remote servers. For each of the above categories, we address specific research topics consisting in the benchmarking of personal cloud storage services, the automatic identification of Internet threats, and the assessment of quality of experience in the Web domain, respectively. Results comprise several contributions in the scope of each research topic. In short, they shed light on (i) the interplay among design choices of cloud storage services, which severely impact the performance provided to end users; (ii) the feasibility of designing a general purpose classifier to detect malicious attacks, without chasing threat specificities; and (iii) the relevance of appropriate means to evaluate the perceived quality of Web pages delivery, strengthening the need of users’ feedbacks for a factual assessment

    Survey on securing data storage in the cloud

    Get PDF
    Cloud Computing has become a well-known primitive nowadays; many researchers and companies are embracing this fascinating technology with feverish haste. In the meantime, security and privacy challenges are brought forward while the number of cloud storage user increases expeditiously. In this work, we conduct an in-depth survey on recent research activities of cloud storage security in association with cloud computing. After an overview of the cloud storage system and its security problem, we focus on the key security requirement triad, i.e., data integrity, data confidentiality, and availability. For each of the three security objectives, we discuss the new unique challenges faced by the cloud storage services, summarize key issues discussed in the current literature, examine, and compare the existing and emerging approaches proposed to meet those new challenges, and point out possible extensions and futuristic research opportunities. The goal of our paper is to provide a state-of-the-art knowledge to new researchers who would like to join this exciting new field
    • …
    corecore