136 research outputs found

    Content, Topology and Cooperation in In-network Caching

    Get PDF
    In-network caching aims at improving content delivery and alleviating pressures on network bandwidth by leveraging universally networked caches. This thesis studies the design of cooperative in-network caching strategy from three perspectives: content, topology and cooperation, specifically focuses on the mechanisms of content delivery and cooperation policy and their impacts on the performance of cache networks. The main contributions of this thesis are twofold. From measurement perspective, we show that the conventional metric hit rate is not sufficient in evaluating a caching strategy on non-trivial topologies, therefore we introduce footprint reduction and coupling factor, which contain richer information. We show cooperation policy is the key in balancing various tradeoffs in caching strategy design, and further investigate the performance impact from content per se via different chunking schemes. From design perspective, we first show different caching heuristics and smart routing schemes can significantly improve the caching performance and facilitate content delivery. We then incorporate well-defined fairness metric into design and derive the unique optimal caching solution on the Pareto boundary with bargaining game framework. In addition, our study on the functional relationship between cooperation overhead and neighborhood size indicates collaboration should be constrained in a small neighborhood due to its cost growing exponentially on general network topologies.Verkonsisäinen välimuistitallennus pyrkii parantamaan sisällöntoimitusta ja helpottamaan painetta verkon siirtonopeudessa hyödyntämällä universaaleja verkottuneita välimuisteja. Tämä väitöskirja tutkii yhteistoiminnallisen verkonsisäisen välimuistitallennuksen suunnittelua kolmesta näkökulmasta: sisällön, topologian ja yhteistyön kautta, erityisesti keskittyen sisällöntoimituksen mekanismeihin ja yhteistyökäytäntöihin sekä näiden vaikutuksiin välimuistiverkkojen performanssiin. Väitöskirjan suurimmat aikaansaannokset ovat kahdella saralla. Mittaamisen näkökulmasta näytämme, että perinteinen metrinen välimuistin osumatarkkuus ei ole riittävä ei-triviaalin välimuistitallennusstrategian arvioinnissa, joten esittelemme parempaa informaatiota sisältävät jalanjäljen pienentämisen sekä yhdistämistekijän. Näytämme, että yhteistyökäytäntö on avain erilaisten välimuistitallennusstrategian suunnitteluun liittyvien kompromissien tasapainotukseen ja tutkimme lisää sisällön erilaisten lohkomisjärjestelmien kautta aiheuttamaa vaikutusta performanssiin. Suunnittelun näkökulmasta näytämme ensin, kuinka erilaiset välimuistitallennuksen heuristiikat ja viisaan reitityksen järjestelmät parantavat merkittävästi välimuistitallennusperformanssia sekä helpottavat sisällön toimitusta. Sisällytämme sitten suunnitteluun hyvin määritellyn oikeudenmukaisuusmittarin ja johdamme uniikin optimaalin välimuistitallennusratkaisun Pareto-rintamalla neuvottelupelin kehyksissä. Lisäksi tutkimuksemme yhteistyökustannusten ja naapurustokoon funktionaalisesta suhteesta viittaa siihen, että yhteistyö on syytä rajoittaa pieneen naapurustoon sen kustannusten kasvaessa eksponentiaalisesti yleisessä verkkotopologiassa

    Soft-TTL: Time-Varying Fractional Caching

    Get PDF
    Standard Time-to-Live (TTL) cache management prescribes the storage of entire files, or possibly fractions thereof, for a given amount of time after a request. As a generalization of this approach, this work proposes the storage of a time-varying, diminishing, fraction of a requested file. Accordingly, the cache progressively evicts parts of the file over an interval of time following a request. The strategy, which is referred to as soft-TTL, is justified by the fact that traffic traces are often characterized by arrival processes that display a decreasing, but non-negligible, probability of observing a request as the time elapsed since the last request increases. An optimization-based analysis of soft-TTL is presented, demonstrating the important role played by the hazard function of the inter-arrival request process, which measures the likelihood of observing a request as a function of the time since the most recent request

    Multidimensional content modeling and caching in D2D edge networks

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Future Internet is going to be shaped by networked multimedia services with exploding video traffic becoming the dominant payload. That evolution requires a remedial shift from the connection-oriented architecture to a content-centric one. Another technique to address this capacity crunch is to improve spectral utilization through new networking paradigms at the wireless network edge. To this end, Device-to-Device (D2D) communications has the potential for boosting the capacity and energy efficiency for content-centric networking. To design and implement efficient content-centric D2D networks, rigorous content modeling and in-network caching mechanisms based on such models are crucial. In this work, we develop a multidimensional content model based on popularity, chunking and layering, and devise caching schemes through this model. Our main motivation is to improve the system performance via our caching strategies. The numerical analysis shows the interplay among different system parameters and performance metrics: while our schemes perform slightly poorer in terms of system goodput, they also decrease the system energy expenditure. Overall, this improvement dominates the loss in the goodput, leading to greater energy efficiency compared to the commonly-used caching technique Least Recently Used (LRU)

    Distributed Caching for Processing Raw Arrays

    Get PDF
    As applications continue to generate multi-dimensional data at exponentially increasing rates, fast analytics to extract meaningful results is becoming extremely important. The database community has developed array databases that alleviate this problem through a series of techniques. In-situ mechanisms provide direct access to raw data in the original format---without loading and partitioning. Parallel processing scales to the largest datasets. In-memory caching reduces latency when the same data are accessed across a workload of queries. However, we are not aware of any work on distributed caching of multi-dimensional raw arrays. In this paper, we introduce a distributed framework for cost-based caching of multi-dimensional arrays in native format. Given a set of files that contain portions of an array and an online query workload, the framework computes an effective caching plan in two stages. First, the plan identifies the cells to be cached locally from each of the input files by continuously refining an evolving R-tree index. In the second stage, an optimal assignment of cells to nodes that collocates dependent cells in order to minimize the overall data transfer is determined. We design cache eviction and placement heuristic algorithms that consider the historical query workload. A thorough experimental evaluation over two real datasets in three file formats confirms the superiority - by as much as two orders of magnitude - of the proposed framework over existing techniques in terms of cache overhead and workload execution time

    Distributed Caching for Complex Querying of Raw Arrays

    Get PDF
    As applications continue to generate multi-dimensional data at exponentially increasing rates, fast analytics to extract meaningful results is becoming extremely important. The database community has developed array databases that alleviate this problem through a series of techniques. In-situ mechanisms provide direct access to raw data in the original format---without loading and partitioning. Parallel processing scales to the largest datasets. In-memory caching reduces latency when the same data are accessed across a workload of queries. However, we are not aware of any work on distributed caching of multi-dimensional raw arrays. In this paper, we introduce a distributed framework for cost-based caching of multi-dimensional arrays in native format. Given a set of files that contain portions of an array and an online query workload, the framework computes an effective caching plan in two stages. First, the plan identifies the cells to be cached locally from each of the input files by continuously refining an evolving R-tree index. In the second stage, an optimal assignment of cells to nodes that collocates dependent cells in order to minimize the overall data transfer is determined. We design cache eviction and placement heuristic algorithms that consider the historical query workload. A thorough experimental evaluation over two real datasets in three file formats confirms the superiority -- by as much as two orders of magnitude -- of the proposed framework over existing techniques in terms of cache overhead and workload execution time

    Hardware accelerated redundancy elimination in network system

    Get PDF
    With the tremendous growth in the amount of information stored on remote locations and cloud systems, many service providers are seeking ways to reduce the amount of redundant information sent across networks by using data de-duplication techniques. Data de-duplication can reduce network traffic without the loss of information, and consequently increase available network bandwidth by reducing redundant traffic. However, due to the heavy computation required for detecting and reducing redundant data transmission, de-duplication itself can become a bottleneck in high capacity links. We completed two parts of work in this research study, Hardware Accelerated Redundancy Elimination in Network Systems (HARENS) and Distributed Redundancy Elimination System Simulation (DRESS). HARENS can significantly improve the performance of redundancy elimination algorithm in a network system by leveraging General Purpose Graphic Processing Unit (GPGPU) techniques as well as other big data optimizations such as the use of a hierarchical multi-threaded pipeline, single machine Map-Reduce, and memory efficiency techniques. Our results indicate that throughput can be increased by a factor of 9 times compared to a naive implementation of the data de-duplication algorithm, providing a net transmission increase of up to 3.0 Gigabits per second (Gbps). DRESS provides further acceleration to the redundancy elimination in network system by deploying HARENS as the server\u27s side redundancy elimination module, and four cooperative distributed byte caches on the clients\u27 side. A client\u27s side distributed byte cache broadcast its cached chunks by sending hash values to other byte caches, so that they can keep a record of all the chunks in the cooperative distributed cache system. When duplications are detected, a client\u27s side byte cache can fetch a chunk directly from either its own cache or peer byte caches rather than server\u27s side redundancy elimination module. Our results indicate that bandwidth savings of the redundancy elimination system with cooperative distributed byte cache can be increased by 12% compared to the one without distributed byte cache, when transferring about 48 Gigabits of data

    Efficient Methods on Reducing Data Redundancy in the Internet

    Get PDF
    The transformation of the Internet from a client-server based paradigm to a content-based one has led to many of the fundamental network designs becoming outdated. The increase in user-generated contents, instant sharing, flash popularity, etc., brings forward the needs for designing an Internet which is ready for these and can handle the needs of the small-scale content providers. The Internet, as of today, carries and stores a large amount of duplicate, redundant data, primarily due to a lack of duplication detection mechanisms and caching principles. This redundancy costs the network in different ways: it consumes energy from the network elements that need to process the extra data; it makes the network caches store duplicate data, thus causing the tail of the data distribution to be swapped out of the caches; and it causes the content-servers to be loaded more as they have to always serve the less popular contents.  In this dissertation, we have analyzed the aforementioned phenomena and proposed several methods to reduce the redundancy of the network at a low cost. The proposals involve different approaches to do so--including data chunk level redundancy detection and elimination, rerouting-based caching mechanisms in information-centric networks, and energy-aware content distribution techniques. Using these approaches, we have demonstrated how we can perform redundancy elimination using a low overhead and low processing power. We have also demonstrated that by using local or global cooperation methods, we can increase the storage efficiency of the existing caches many-fold. In addition to that, this work shows that it is possible to reduce a sizable amount of traffic from the core network using collaborative content download mechanisms, while reducing client devices' energy consumption simultaneously

    Proxcache: A new cache deployment strategy in information-centric network for mitigating path and content redundancy

    Get PDF
    One of the promising paradigms for resource sharing with maintaining the basic Internet semantics is the Information-Centric Networking (ICN). ICN distinction with the current Internet is its ability to refer contents by names with partly dissociating the host-to-host practice of Internet Protocol addresses. Moreover, content caching in ICN is the major action of achieving content networking to reduce the amount of server access. The current caching practice in ICN using the Leave Copy Everywhere (LCE) progenerate problems of over deposition of contents known as content redundancy, path redundancy, lesser cache-hit rates in heterogeneous networks and lower content diversity. This study proposes a new cache deployment strategy referred to as ProXcache to acquire node relationships using hyperedge concept of hypergraph for cache positioning. The study formulates the relationships through the path and distance approximation to mitigate content and path redundancy. The study adopted the Design Research Methodology approach to achieve the slated research objectives. ProXcache was investigated using simulation on the Abilene, GEANT and the DTelekom network topologies for LCE and ProbCache caching strategies with the Zipf distribution to differ content categorization. The results show the overall content and path redundancy are minimized with lesser caching operation of six depositions per request as compared to nine and nineteen for ProbCache and LCE respectively. ProXcache yields better content diversity ratio of 80% against 20% and 49% for LCE and ProbCache respectively as the cache sizes varied. ProXcache also improves the cache-hit ratio through proxy positions. These thus, have significant influence in the development of the ICN for better management of contents towards subscribing to the Future Internet