616 research outputs found

    ViPEr-HiSS: A Case for Storage Design Tools

    Get PDF
    The viability of large-scale multimedia applications, depends on the performance of storage systems. Providing cost-effective access to vast amounts of video, image, audio, and text data, requires (a) proper configuration of storage hierarchies as well as (b) efficient resource management techniques at all levels of the storage hierarchy. The resulting complexities of the hardware/software co-design in turn contribute to difficulties in making accurate predictions about performance, scalability, and cost-effectiveness of a storage system. Moreover, poor decisions at design time can be costly and problematic to correct in later stages of development. Hence, measurement of systems after they have been developed is not a desirable approach to predicting their performance. What is needed is the ability to evaluate the system's design while there are still opportunities to make corrections to fundamental design flaws. In this paper we describe the framework of ViPEr-HiSS, a tool which facilitates design, development, and subsequent performance evaluation of designs of multimedia storage hierarchies by providing mechanisms for relatively easy experimentation with (a) system configurations as well as (b) application- and media-aware resource management techniques. (Also cross-referenced as UMIACS-TR-99-69

    Doctor of Philosophy

    Get PDF
    dissertationIn the past few years, we have seen a tremendous increase in digital data being generated. By 2011, storage vendors had shipped 905 PB of purpose-built backup appliances. By 2013, the number of objects stored in Amazon S3 had reached 2 trillion. Facebook had stored 20 PB of photos by 2010. All of these require an efficient storage solution. To improve space efficiency, compression and deduplication are being widely used. Compression works by identifying repeated strings and replacing them with more compact encodings while deduplication partitions data into fixed-size or variable-size chunks and removes duplicate blocks. While we have seen great improvements in space efficiency from these two approaches, there are still some limitations. First, traditional compressors are limited in their ability to detect redundancy across a large range since they search for redundant data in a fine-grain level (string level). For deduplication, metadata embedded in an input file changes more frequently, and this introduces more unnecessary unique chunks, leading to poor deduplication. Cloud storage systems suffer from unpredictable and inefficient performance because of interference among different types of workloads. This dissertation proposes techniques to improve the effectiveness of traditional compressors and deduplication in improving space efficiency, and a new IO scheduling algorithm to improve performance predictability and efficiency for cloud storage systems. The common idea is to utilize similarity. To improve the effectiveness of compression and deduplication, similarity in content is used to transform an input file into a compression- or deduplication-friendly format. We propose Migratory Compression, a generic data transformation that identifies similar data in a coarse-grain level (block level) and then groups similar blocks together. It can be used as a preprocessing stage for any traditional compressor. We find metadata have a huge impact in reducing the benefit of deduplication. To isolate the impact from metadata, we propose to separate metadata from data. Three approaches are presented for use cases with different constrains. For the commonly used tar format, we propose Migratory Tar: a data transformation and also a new tar format that deduplicates better. We also present a case study where we use deduplication to reduce storage consumption for storing disk images, while at the same time achieving high performance in image deployment. Finally, we apply the same principle of utilizing similarity in IO scheduling to prevent interference between random and sequential workloads, leading to efficient, consistent, and predictable performance for sequential workloads and a high disk utilization

    Scalable File Systems for High Performance Computing Final Report

    Full text link

    Fourth NASA Goddard Conference on Mass Storage Systems and Technologies

    Get PDF
    This report contains copies of all those technical papers received in time for publication just prior to the Fourth Goddard Conference on Mass Storage and Technologies, held March 28-30, 1995, at the University of Maryland, University College Conference Center, in College Park, Maryland. This series of conferences continues to serve as a unique medium for the exchange of information on topics relating to the ingestion and management of substantial amounts of data and the attendant problems involved. This year's discussion topics include new storage technology, stability of recorded media, performance studies, storage system solutions, the National Information infrastructure (Infobahn), the future for storage technology, and lessons learned from various projects. There also will be an update on the IEEE Mass Storage System Reference Model Version 5, on which the final vote was taken in July 1994

    The 9th Conference of PhD Students in Computer Science

    Get PDF

    Sixth Goddard Conference on Mass Storage Systems and Technologies Held in Cooperation with the Fifteenth IEEE Symposium on Mass Storage Systems

    Get PDF
    This document contains copies of those technical papers received in time for publication prior to the Sixth Goddard Conference on Mass Storage Systems and Technologies which is being held in cooperation with the Fifteenth IEEE Symposium on Mass Storage Systems at the University of Maryland-University College Inn and Conference Center March 23-26, 1998. As one of an ongoing series, this Conference continues to provide a forum for discussion of issues relevant to the management of large volumes of data. The Conference encourages all interested organizations to discuss long term mass storage requirements and experiences in fielding solutions. Emphasis is on current and future practical solutions addressing issues in data management, storage systems and media, data acquisition, long term retention of data, and data distribution. This year's discussion topics include architecture, tape optimization, new technology, performance, standards, site reports, vendor solutions. Tutorials will be available on shared file systems, file system backups, data mining, and the dynamics of obsolescence

    The Office of Science Data-Management Challenge

    Full text link

    Context-aware collaborative storage and programming for mobile users

    Get PDF
    Since people generate and access most digital content from mobile devices, novel innovative mobile apps and services are possible. Most people are interested in sharing this content with communities defined by friendship, similar interests, or geography in exchange for valuable services from these innovative apps. At the same time, they want to own and control their content. Collaborative mobile computing is an ideal choice for this situation. However, due to the distributed nature of this computing environment and the limited resources on mobile devices, maintaining content availability and storage fairness as well as providing efficient programming frameworks are challenging. This dissertation explores several techniques to improve these shortcomings of collaborative mobile computing platforms. First, it proposes a medley of three techniques into one system, MobiStore, that offers content availability in mobile peer-to-peer networks: topology maintenance with robust connectivity, structural reorientation based on the current state of the network, and gossip-based hierarchical updates. Experimental results showed that MobiStore outperforms a state-of-the-art comparison system in terms of content availability and resource usage fairness. Next, the dissertation explores the usage of social relationship properties (i.e., network centrality) to improve the fairness of resource allocation for collaborative computing in peer-to-peer online social networks. The challenge is how to provide fairness in content replication for P2P-OSN, given that the peers in these networks exchange information only with one-hop neighbors. The proposed solution provides fairness by selecting the peers to replicate content based on their potential to introduce the storage skewness, which is determined from their structural properties in the network. The proposed solution, Philia, achieves higher content availability and storage fairness than several comparison systems. The dissertation concludes with a high-level distributed programming model, which efficiently uses computing resources on a cloud-assisted, collaborative mobile computing platform. This platform pairs mobile devices with virtual machines (VMs) in the cloud for increased execution performance and availability. On such a platform, two important challenges arise: first, pairing the two computing entities into a seamless computation, communication, and storage unit; and second, using the computing resources in a cost-effective way. This dissertation proposes Moitree, a distributed programming model and middleware that translates high-level programming constructs into events and provides the illusion of a single computing entity over the mobile-VM pairs. From programmers’ viewpoint, the Moitree API models user collaborations into dynamic groups formed over location, time, or social hierarchies. Experimental results from a prototype implementation show that Moitree is scalable, suitable for real-time apps, and can improve the performance of collaborating apps regarding latency and energy consumption

    Edge Computing Platforms and Protocols

    Get PDF
    Cloud computing has created a radical shift in expanding the reach of application usage and has emerged as a de-facto method to provide low-cost and highly scalable computing services to its users. Existing cloud infrastructure is a composition of large-scale networks of datacenters spread across the globe. These datacenters are carefully installed in isolated locations and are heavily managed by cloud providers to ensure reliable performance to its users. In recent years, novel applications, such as Internet-of-Things, augmented-reality, autonomous vehicles etc., have proliferated the Internet. Majority of such applications are known to be time-critical and enforce strict computational delay requirements for acceptable performance. Traditional cloud offloading techniques are inefficient for handling such applications due to the incorporation of additional network delay encountered while uploading pre-requisite data to distant datacenters. Furthermore, as computations involving such applications often rely on sensor data from multiple sources, simultaneous data upload to the cloud also results in significant congestion in the network. Edge computing is a new cloud paradigm which aims to bring existing cloud services and utilities near end users. Also termed edge clouds, the central objective behind this upcoming cloud platform is to reduce the network load on the cloud by utilizing compute resources in the vicinity of users and IoT sensors. Dense geographical deployment of edge clouds in an area not only allows for optimal operation of delay-sensitive applications but also provides support for mobility, context awareness and data aggregation in computations. However, the added functionality of edge clouds comes at the cost of incompatibility with existing cloud infrastructure. For example, while data center servers are closely monitored by the cloud providers to ensure reliability and security, edge servers aim to operate in unmanaged publicly-shared environments. Moreover, several edge cloud approaches aim to incorporate crowdsourced compute resources, such as smartphones, desktops, tablets etc., near the location of end users to support stringent latency demands. The resulting infrastructure is an amalgamation of heterogeneous, resource-constrained and unreliable compute-capable devices that aims to replicate cloud-like performance. This thesis provides a comprehensive collection of novel protocols and platforms for integrating edge computing in the existing cloud infrastructure. At its foundation lies an all-inclusive edge cloud architecture which allows for unification of several co-existing edge cloud approaches in a single logically classified platform. This thesis further addresses several open problems for three core categories of edge computing: hardware, infrastructure and platform. For hardware, this thesis contributes a deployment framework which enables interested cloud providers to effectively identify optimal locations for deploying edge servers in any geographical region. For infrastructure, the thesis proposes several protocols and techniques for efficient task allocation, data management and network utilization in edge clouds with the end-objective of maximizing the operability of the platform as a whole. Finally, the thesis presents a virtualization-dependent platform for application owners to transparently utilize the underlying distributed infrastructure of edge clouds, in conjunction with other co-existing cloud environments, without much management overhead.Pilvilaskenta on aikaansaanut suuren muutoksen sovellusten toiminta-alueessa ja on sen myötä muodostunut lähes oletusarvoiseksi tavaksi toteuttaa edullisia ja skaalautuvia laskentapalveluita käyttäjille. Olemassaoleva pilvi-infrastruktuuri on kokoelma suuren mittakaavan datakeskuksia ympäri maailman. Datakeskukset sijaitsevat maantieteellisesti tarkkaan valituissa paikoissa, joista pilvioperaattorit pystyvät takaamaan hyvän suorituskyvyn käyttäjilleen. Viime vuosina yleistyneet uudet sovellusalat, kuten esineiden Internet (IoT), lisätty todellisuus (AR), itseohjautuvat autot, jne., ovat yleistyneet Internetissä. Valtaosa edellä mainituista sovellusaloista on aikakriittisiä, ja ne asettavat laskennalle tiukan viivemarginaalin, jonka toteutuminen on edellytys sovelluksen hyväksyttävälle suorituskyvylle. Perinteiset menetelmät delegoida laskentaa pilvipalveluihin ovat kelvottomia aikakriittisissä sovelluksissa, sillä laskentaan liittyvän oheisdatan siirtämisestä johtuva verkkoviive on liian suuri. Useat edellä mainituista uusista sovellusaloista hyödyntävät sensoridataa, jota kerätään useista eri lähteistä. Samanaikaiset datayhteydet puolestaan aiheuttavat merkittävää ruuhkaa verkossa. Reunalaskenta on uusi pilviparadigma, jonka tavoitteena on tuoda nykyiset palvelut ja resurssit lähemmäksi loppukäyttäjää. Myös reunapilvenä tunnetun paradigman keskeinen tavoite on vähentää pilveen kohdistuvaa verkkoliikennettä suorittamalla sovelluksen vaatima laskenta resursseilla, jotka sijaitsevat lähempänä loppukäyttäjää. Reunapilvien tiheä maantieteellinen sijoittelu ei ainoastaan auta minimoimaan tiedonsiirtoviivettä aikakriittisiä sovelluksia varten, vaan tukee myös sovellusten mobiliteettia, kontekstitietoisuutta ja datan aggregointia laskentaa varten. Edellä mainitut reunapilven tarjoamat uudet mahdollisuudet eivät kuitenkaan ole yhteensopivia nykyisten pilvi-infrastruktuurien kanssa. Datakeskukset toimivat tarkoin valvotuissa ympäristöissä palvelun takaamiseksi, kun taas reunapilvien toiminta-alue on hallinnoimaton ja julkinen. Useat esitykset reunapilven toteutukseen liittyen hyödyntävät myös käyttäjien laitteiden potentiaalista laskentakapasiteettia, jota tänä päivänä löytyy runsaasti mm. älypuhelimista, kannettavista tietokoneista, tableteista. Reunapilven infrastruktuuri on täten haastava yhdistelmä heterogeenisiä, resurssirajoitettuja, epäluotettavia, mutta laskentakykyisiä laitteita, jotka yhdessä pyrkivät suorittamaan pilvilaskentaa. Tämä väitöstutkimus tarjoaa kokoelman uudentyyppisiä protokollia ja alustoja reunalaskennan integroimiseksi osaksi nykyistä pilvi-infrastruktuuria. Tutkimuksen pohjana on kokonaisvaltainen reunapilviarkkitehtuuri, joka pyrkii yhdistämään useita rinnakkaisia arkkitehtuuriehdotuksia yhdeksi loogiseksi pilvialustaksi. Väitöstutkimus ottaa myös kantaa useisiin avoimiin ongelmiin reunalaskennan kolmella osa-alueella: resurssit, infrastruktuuri ja palvelualusta. Resursseihin liittyen tämä väitöstutkimus tarjoaa käyttöönottokehyksen, jonka avulla palveluntarjoajat voivat tehokkaasti selvittää reunapalvelinten optimaaliset maantieteelliset sijoituskohteet. Infrastruktuurin osalta tämä väitöstutkimus esittelee reunapilvessä tapahtuvaa tehokasta tehtävien allokointia, datan hallinnointia ja verkon hyödyntämistä varten useita protokollia ja tekniikoita, joiden yhteinen tavoite on maksimoida alustan toiminnallisuus kokonaisuutena. Tämän väitöstutkimuksen lopussa kuvataan virtualisointiin pohjautuva alusta, jonka avulla käyttäjä voi läpinäkyvästi hyödyntää ympäröivää reunapilveä perinteisten pilvi-infrastruktuurien rinnalla ilman suurta hallinnollista kuormaa
    corecore