576 research outputs found

    Leveraging Program Analysis to Reduce User-Perceived Latency in Mobile Applications

    Full text link
    Reducing network latency in mobile applications is an effective way of improving the mobile user experience and has tangible economic benefits. This paper presents PALOMA, a novel client-centric technique for reducing the network latency by prefetching HTTP requests in Android apps. Our work leverages string analysis and callback control-flow analysis to automatically instrument apps using PALOMA's rigorous formulation of scenarios that address "what" and "when" to prefetch. PALOMA has been shown to incur significant runtime savings (several hundred milliseconds per prefetchable HTTP request), both when applied on a reusable evaluation benchmark we have developed and on real applicationsComment: ICSE 201

    Efficient Proactive Caching for Supporting Seamless Mobility

    Full text link
    We present a distributed proactive caching approach that exploits user mobility information to decide where to proactively cache data to support seamless mobility, while efficiently utilizing cache storage using a congestion pricing scheme. The proposed approach is applicable to the case where objects have different sizes and to a two-level cache hierarchy, for both of which the proactive caching problem is hard. Additionally, our modeling framework considers the case where the delay is independent of the requested data object size and the case where the delay is a function of the object size. Our evaluation results show how various system parameters influence the delay gains of the proposed approach, which achieves robust and good performance relative to an oracle and an optimal scheme for a flat cache structure.Comment: 10 pages, 9 figure

    Characterizing Deep-Learning I/O Workloads in TensorFlow

    Full text link
    The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on CPUs and then moved to accelerator for computation. In addition, checkpointing and restart operations are carried out to allow DL computing frameworks to restart quickly from a checkpoint. Because of this, I/O affects the performance of DL applications. In this work, we characterize the I/O performance and scaling of TensorFlow, an open-source programming framework developed by Google and specifically designed for solving DL problems. To measure TensorFlow I/O performance, we first design a micro-benchmark to measure TensorFlow reads, and then use a TensorFlow mini-application based on AlexNet to measure the performance cost of I/O and checkpointing in TensorFlow. To improve the checkpointing performance, we design and implement a burst buffer. We find that increasing the number of threads increases TensorFlow bandwidth by a maximum of 2.3x and 7.8x on our benchmark environments. The use of the tensorFlow prefetcher results in a complete overlap of computation on accelerator and input pipeline on CPU eliminating the effective cost of I/O on the overall performance. The use of a burst buffer to checkpoint to a fast small capacity storage and copy asynchronously the checkpoints to a slower large capacity storage resulted in a performance improvement of 2.6x with respect to checkpointing directly to slower storage on our benchmark environment.Comment: Accepted for publication at pdsw-DISCS 201

    Towards Scalable Web Documents

    Get PDF
    The current Web is running into serious scalability problems. The standard solution is to apply techniques like caching, replication, and distribution. Unfortunately, as the variety of Web applications continues to grow, it will be impossible to find a single solution that fits all needs. The authors advocate a different approach to tackling scaling problems. Instead of seeking a general-purpose solution, they argue that it makes more sense to look at each Web document separately. For each document, three issues need to be addressed: placement of replicas, required coherence, and best coherence protocol. The authors examine each of these issues, and identify the alternatives. However, forcing developers to decide on the best alternatives will turn the Web into an unworkable system. Therefore, a number of possible ways to reduce complexity is indicated. Also, the authors briefly discuss a wide-area infrastructure that can be used as a flexible basis for developing per-document solutions

    Accelerated Data Delivery Architecture

    Get PDF
    This paper introduces the Accelerated Data Delivery Architecture (ADDA). ADDA establishes a framework to distribute transactional data and control consistency to achieve fast access to data, distributed scalability and non-blocking concurrency control by using a clean declarative interface. It is designed to be used with web-based business applications. This framework uses a combination of traditional Relational Database Management System (RDBMS) combined with a distributed Not Only SQL (NoSQL) database and a browser-based database. It uses a single physical and conceptual database schema designed for a standard RDBMS driven application. The design allows the architect to assign consistency levels to entities which determine the storage location and query methodology. The implementation of these levels is flexible and requires no database schema changes in order to change the level of an entity. Also, a data leasing system to enforce concurrency control in a non-blocking manner is employed for critical data items. The system also ensures that all data is available for query from the RDBMS server. This means that the system can have the performance advantages of a DDBMS system and the ACID qualities of a single-site RDBMS system without the complex design considerations of traditional DDBMS systems

    On the Intrinsic Locality Properties of Web Reference Streams

    Full text link
    There has been considerable work done in the study of Web reference streams: sequences of requests for Web objects. In particular, many studies have looked at the locality properties of such streams, because of the impact of locality on the design and performance of caching and prefetching systems. However, a general framework for understanding why reference streams exhibit given locality properties has not yet emerged. In this work we take a first step in this direction, based on viewing the Web as a set of reference streams that are transformed by Web components (clients, servers, and intermediaries). We propose a graph-based framework for describing this collection of streams and components. We identify three basic stream transformations that occur at nodes of the graph: aggregation, disaggregation and filtering, and we show how these transformations can be used to abstract the effects of different Web components on their associated reference streams. This view allows a structured approach to the analysis of why reference streams show given properties at different points in the Web. Applying this approach to the study of locality requires good metrics for locality. These metrics must meet three criteria: 1) they must accurately capture temporal locality; 2) they must be independent of trace artifacts such as trace length; and 3) they must not involve manual procedures or model-based assumptions. We describe two metrics meeting these criteria that each capture a different kind of temporal locality in reference streams. The popularity component of temporal locality is captured by entropy, while the correlation component is captured by interreference coefficient of variation. We argue that these metrics are more natural and more useful than previously proposed metrics for temporal locality. We use this framework to analyze a diverse set of Web reference traces. We find that this framework can shed light on how and why locality properties vary across different locations in the Web topology. For example, we find that filtering and aggregation have opposing effects on the popularity component of the temporal locality, which helps to explain why multilevel caching can be effective in the Web. Furthermore, we find that all transformations tend to diminish the correlation component of temporal locality, which has implications for the utility of different cache replacement policies at different points in the Web.National Science Foundation (ANI-9986397, ANI-0095988); CNPq-Brazi

    Cooperative announcement-based caching for video-on-demand streaming

    Get PDF
    Recently, video-on-demand (VoD) streaming services like Netflix and Hulu have gained a lot of popularity. This has led to a strong increase in bandwidth capacity requirements in the network. To reduce this network load, the design of appropriate caching strategies is of utmost importance. Based on the fact that, typically, a video stream is temporally segmented into smaller chunks that can be accessed and decoded independently, cache replacement strategies have been developed that take advantage of this temporal structure in the video. In this paper, two caching strategies are proposed that additionally take advantage of the phenomenon of binge watching, where users stream multiple consecutive episodes of the same series, reported by recent user behavior studies to become the everyday behavior. Taking into account this information allows us to predict future segment requests, even before the video playout has started. Two strategies are proposed, both with a different level of coordination between the caches in the network. Using a VoD request trace based on binge watching user characteristics, the presented algorithms have been thoroughly evaluated in multiple network topologies with different characteristics, showing their general applicability. It was shown that in a realistic scenario, the proposed election-based caching strategy can outperform the state-of-the-art by 20% in terms of cache hit ratio while using 4% less network bandwidth
    corecore