6,550 research outputs found

    Runtime Optimization of Join Location in Parallel Data Management Systems

    Full text link
    Applications running on parallel systems often need to join a streaming relation or a stored relation with data indexed in a parallel data storage system. Some applications also compute UDFs on the joined tuples. The join can be done at the data storage nodes, corresponding to reduce side joins, or by fetching data from the storage system to compute nodes, corresponding to map side join. Both may be suboptimal: reduce side joins may cause skew, while map side joins may lead to a lot of data being transferred and replicated. In this paper, we present techniques to make runtime decisions between the two options on a per key basis, in order to improve the throughput of the join, accounting for UDF computation if any. Our techniques are based on an extended ski-rental algorithm and provide worst-case performance guarantees with respect to the optimal point in the space considered by us. Our techniques use load balancing taking into account the CPU, network and I/O costs as well as the load on compute and storage nodes. We have implemented our techniques on Hadoop, Spark and the Muppet stream processing engine. Our experiments show that our optimization techniques provide a significant improvement in throughput over existing techniques.Comment: 17 page

    Kelly Cache Networks

    Full text link
    We study networks of M/M/1 queues in which nodes act as caches that store objects. Exogenous requests for objects are routed towards nodes that store them; as a result, object traffic in the network is determined not only by demand but, crucially, by where objects are cached. We determine how to place objects in caches to attain a certain design objective, such as, e.g., minimizing network congestion or retrieval delays. We show that for a broad class of objectives, including minimizing both the expected network delay and the sum of network queue lengths, this optimization problem can be cast as an NP- hard submodular maximization problem. We show that so-called continuous greedy algorithm attains a ratio arbitrarily close to 1−1/e≈0.631 - 1/e \approx 0.63 using a deterministic estimation via a power series; this drastically reduces execution time over prior art, which resorts to sampling. Finally, we show that our results generalize, beyond M/M/1 queues, to networks of M/M/k and symmetric M/D/1 queues.Comment: This is the extended version of the Infocom 2019 paper with the same title. The authors gratefully acknowledge support from National Science Foundation grant NeTS-1718355, as well as from research grants by Intel Corp. and Cisco System

    Wireless Video Caching and Dynamic Streaming under Differentiated Quality Requirements

    Full text link
    This paper considers one-hop device-to-device (D2D)-assisted wireless caching networks that cache video files of varying quality levels, with the assumption that the base station can control the video quality but cache-enabled devices cannot. Two problems arise in such a caching network: file placement problem and node association problem. This paper suggests a method to cache videos of different qualities, and thus of varying file sizes, by maximizing the sum of video quality measures that users can enjoy. There exists an interesting trade-off between video quality and video diversity, i.e., the ability to provision diverse video files. By caching high-quality files, the cache-enabled devices can provide high-quality video, but cannot cache a variety of files. Conversely, when the device caches various files, it cannot provide a good quality for file-requesting users. In addition, when multiple devices cache the same file but their qualities are different, advanced node association is required for file delivery. This paper proposes a node association algorithm that maximizes time-averaged video quality for multiple users under a playback delay constraint. In this algorithm, we also consider request collision, the situation where several users request files from the same device at the same time, and we propose two ways to cope with the collision: scheduling of one user and non-orthogonal multiple access. Simulation results verify that the proposed caching method and the node association algorithm work reliably.Comment: 13 pages, 11 figures, accepted for publication in IEEE Journal on Selected Areas in Communication

    Modeling and Optimization of Latency in Erasure-coded Storage Systems

    Full text link
    As consumers are increasingly engaged in social networking and E-commerce activities, businesses grow to rely on Big Data analytics for intelligence, and traditional IT infrastructures continue to migrate to the cloud and edge, these trends cause distributed data storage demand to rise at an unprecedented speed. Erasure coding has seen itself quickly emerged as a promising technique to reduce storage cost while providing similar reliability as replicated systems, widely adopted by companies like Facebook, Microsoft and Google. However, it also brings new challenges in characterizing and optimizing the access latency when erasure codes are used in distributed storage. The aim of this monograph is to provide a review of recent progress (both theoretical and practical) on systems that employ erasure codes for distributed storage. In this monograph, we will first identify the key challenges and taxonomy of the research problems and then give an overview of different approaches that have been developed to quantify and model latency of erasure-coded storage. This includes recent work leveraging MDS-Reservation, Fork-Join, Probabilistic, and Delayed-Relaunch scheduling policies, as well as their applications to characterize access latency (e.g., mean, tail, asymptotic latency) of erasure-coded distributed storage systems. We will also extend the problem to the case when users are streaming videos from erasure-coded distributed storage systems. Next, we bridge the gap between theory and practice, and discuss lessons learned from prototype implementation. In particular, we will discuss exemplary implementations of erasure-coded storage, illuminate key design degrees of freedom and tradeoffs, and summarize remaining challenges in real-world storage systems such as in content delivery and caching. Open problems for future research are discussed at the end of each chapter.Comment: Monograph for use by researchers interested in latency aspects of distributed storage system

    On the Complexity of Optimal Routing and Content Caching in Heterogeneous Networks

    Full text link
    We investigate the problem of optimal request routing and content caching in a heterogeneous network supporting in-network content caching with the goal of minimizing average content access delay. Here, content can either be accessed directly from a back-end server (where content resides permanently) or be obtained from one of multiple in-network caches. To access a piece of content, a user must decide whether to route its request to a cache or to the back-end server. Additionally, caches must decide which content to cache. We investigate the problem complexity of two problem formulations, where the direct path to the back-end server is modeled as i) a congestion-sensitive or ii) a congestion-insensitive path, reflecting whether or not the delay of the uncached path to the back-end server depends on the user request load, respectively. We show that the problem is NP-complete in both cases. We prove that under the congestion-insensitive model the problem can be solved optimally in polynomial time if each piece of content is requested by only one user, or when there are at most two caches in the network. We also identify a structural property of the user-cache graph that potentially makes the problem NP-complete. For the congestion-sensitive model, we prove that the problem remains NP-complete even if there is only one cache in the network and each content is requested by only one user. We show that approximate solutions can be found for both models within a (1-1/e) factor of the optimal solution, and demonstrate a greedy algorithm that is found to be within 1% of optimal for small problem sizes. Through trace-driven simulations we evaluate the performance of our greedy algorithms, which show up to a 50% reduction in average delay over solutions based on LRU content caching.Comment: Infoco

    Bandana: Using Non-volatile Memory for Storing Deep Learning Models

    Full text link
    Typical large-scale recommender systems use deep learning models that are stored on a large amount of DRAM. These models often rely on embeddings, which consume most of the required memory. We present Bandana, a storage system that reduces the DRAM footprint of embeddings, by using Non-volatile Memory (NVM) as the primary storage medium, with a small amount of DRAM as cache. The main challenge in storing embeddings on NVM is its limited read bandwidth compared to DRAM. Bandana uses two primary techniques to address this limitation: first, it stores embedding vectors that are likely to be read together in the same physical location, using hypergraph partitioning, and second, it decides the number of embedding vectors to cache in DRAM by simulating dozens of small caches. These techniques allow Bandana to increase the effective read bandwidth of NVM by 2-3x and thereby significantly reduce the total cost of ownership

    Forwarding, Caching and Congestion Control in Named Data Networks

    Full text link
    Emerging information-centric networking architectures seek to optimally utilize both bandwidth and storage for efficient content distribution. This highlights the need for joint design of traffic engineering and caching strategies, in order to optimize network performance in view of both current traffic loads and future traffic demands. We present a systematic framework for joint dynamic interest request forwarding and dynamic cache placement and eviction, within the context of the Named Data Networking (NDN) architecture. The framework employs a virtual control plane which operates on the user demand rate for data objects in the network, and an actual plane which handles Interest Packets and Data Packets. We develop distributed algorithms within the virtual plane to achieve network load balancing through dynamic forwarding and caching, thereby maximizing the user demand rate that the NDN network can satisfy. Next, we show that congestion control can be optimally combined with forwarding and caching within this framework to maximize user utilities subject to network stability. Numerical experiments within a number of network settings demonstrate the superior performance of the resulting algorithms for the actual plane in terms of high user utilities, low user delay, and high rate of cache hits.Comment: This version of the paper contains a new section on congestion control for NDN networks. Previous version of the paper appeared in Proceedings of ACM Conference on Information-Centric Networking (ICN), Paris, France, September 24-26, 201

    The TrieJax Architecture: Accelerating Graph Operations Through Relational Joins

    Full text link
    Graph pattern matching (e.g., finding all cycles and cliques) has become an important component in many critical domains such as social networks, biology, and cyber-security. This development motivated research to develop faster algorithms that target graph pattern matching. In recent years, the database community has shown that mapping graph pattern matching problems to a new class of relational join algorithms provides an efficient framework for computing these problems. In this paper, we argue that this new class of relational join algorithms is highly amenable to specialized hardware acceleration thanks to two fundamental properties: improved locality and inherent concurrency. The improved locality is a result of the provably bound number of intermediate results these algorithms generate, which results in smaller working sets. In addition, their inherent concurrency can be leveraged for effective hardware acceleration and hiding memory latency. We demonstrate the hardware amenability of this new class of algorithms by introducing TrieJax, a hardware accelerator for graph pattern matching. The TrieJax design leverages the improved locality and high concurrency properties to dramatically accelerate graph pattern matching, and can be tightly integrated into existing manycore processors. We evaluate TrieJax on a set standard graph pattern matching queries and datasets. Our evaluation shows that TrieJax outperforms recently proposed hardware accelerators for graph and database processing that do not employ the new class of algorithms by 7-63x on average (up to 539x), while consuming 15-179x less energy (up to 1750x). systems that do incorporate modern relational join algorithms by 9-20x on average (up to 45x), while consuming 59-110x less energy (up to 372x)

    Generalization of LRU Cache Replacement Policy with Applications to Video Streaming

    Full text link
    Caching plays a crucial role in networking systems to reduce the load on the network and is commonly employed by content delivery networks (CDNs) in order to improve performance. One of the commonly used mechanisms, Least Recently Used (LRU), works well for identical file sizes. However, for asymmetric file sizes, the performance deteriorates. This paper proposes an adaptation to the LRU strategy, called gLRU, where the file is sub-divided into equal-sized chunks. In this strategy, a chunk of the newly requested file is added in the cache, and a chunk of the least-recently-used file is removed from the cache. Even though approximate analysis for the hit rate has been studied for LRU, the analysis does not extend to gLRU since the metric of interest is no longer the hit rate as the cache has partial files. This paper provides a novel approximation analysis for this policy where the cache may have partial file contents. The approximation approach is validated by simulations. Further, gLRU outperforms the LRU strategy for a Zipf file popularity distribution and censored Pareto file size distribution for the file download times. Video streaming applications can further use the partial cache contents to help the stall duration significantly, and the numerical results indicate significant improvements (32\%) in stall duration using the gLRU strategy as compared to the LRU strategy. Furthermore, the gLRU replacement policy compares favorably to two other cache replacement policies when simulated on MSR Cambridge Traces obtained from the SNIA IOTTA repository.Comment: Accepted to ACM TOMPECS, Jun 201

    Prefix based Chaining Scheme for Streaming Popular Videos using Proxy servers in VoD

    Full text link
    Streaming high quality videos consumes significantly large amount of network resources. In this context request to service delay, network traffic, congestion and server overloading are the main parameters to be considered in video streaming over the internet that effect the quality of service (QoS). In this paper, we propose an efficient architecture as a cluster of proxy servers and clients that uses a peer to peer (P2P) approach to cooperatively stream the video using chaining technique. We consider the following two key issues in the proposed architecture (1) Prefix caching technique to accommodate more number of videos close to client (2) Cooperative client and proxy chaining to achieve the network efficiency. Our simulation results shows that the proposed approach yields a prefix caching close to the optimal solution minimizing WAN bandwidth usage on server-proxy path by utilizing the proxy-client and client-client path bandwidth, which is much cheaper than the expensive server proxy path bandwidth, server load, and client rejection ratio significantly using chaining.Comment: 9 pages IEEE format, International Journal of Computer Science and Information Security, IJCSIS 2009, ISSN 1947 5500, Impact Factor 0.423, http://sites.google.com/site/ijcsis
    • …
    corecore