694 research outputs found

    Improving Mobile Video Streaming with Mobility Prediction and Prefetching in Integrated Cellular-WiFi Networks

    Full text link
    We present and evaluate a procedure that utilizes mobility and throughput prediction to prefetch video streaming data in integrated cellular and WiFi networks. The effective integration of such heterogeneous wireless technologies will be significant for supporting high performance and energy efficient video streaming in ubiquitous networking environments. Our evaluation is based on trace-driven simulation considering empirical measurements and shows how various system parameters influence the performance, in terms of the number of paused video frames and the energy consumption; these parameters include the number of video streams, the mobile, WiFi, and ADSL backhaul throughput, and the number of WiFi hotspots. Also, we assess the procedure's robustness to time and throughput variability. Finally, we present our initial prototype that implements the proposed approach.Comment: 7 pages, 15 figure

    Metadata And Data Management In High Performance File And Storage Systems

    Get PDF
    With the advent of emerging e-Science applications, today\u27s scientific research increasingly relies on petascale-and-beyond computing over large data sets of the same magnitude. While the computational power of supercomputers has recently entered the era of petascale, the performance of their storage system is far lagged behind by many orders of magnitude. This places an imperative demand on revolutionizing their underlying I/O systems, on which the management of both metadata and data is deemed to have significant performance implications. Prefetching/caching and data locality awareness optimizations, as conventional and effective management techniques for metadata and data I/O performance enhancement, still play their crucial roles in current parallel and distributed file systems. In this study, we examine the limitations of existing prefetching/caching techniques and explore the untapped potentials of data locality optimization techniques in the new era of petascale computing. For metadata I/O access, we propose a novel weighted-graph-based prefetching technique, built on both direct and indirect successor relationship, to reap performance benefit from prefetching specifically for clustered metadata serversan arrangement envisioned necessary for petabyte scale distributed storage systems. For data I/O access, we design and implement Segment-structured On-disk data Grouping and Prefetching (SOGP), a combined prefetching and data placement technique to boost the local data read performance for parallel file systems, especially for those applications with partially overlapped access patterns. One high-performance local I/O software package in SOGP work for Parallel Virtual File System in the number of about 2000 C lines was released to Argonne National Laboratory in 2007 for potential integration into the production mode

    POOR MAN’S TRACE CACHE: A VARIABLE DELAY SLOT ARCHITECTURE

    Get PDF
    We introduce a novel fetch architecture called Poor Man’s Trace Cache (PMTC). PMTC constructs taken-path instruction traces via instruction replication in static code and inserts them after unconditional direct and select conditional direct control transfer instructions. These traces extend to the end of the cache line. Since available space for trace insertion may vary by the position of the control transfer instruction within the line, we refer to these fetch slots as variable delay slots. This approach ensures traces are fetched along with the control transfer instruction that initiated the trace. Branch, jump and return instruction semantics as well as the fetch unit are modified to utilize traces in delay slots. PMTC yields the following benefits: 1. Average fetch bandwidth increases as the front end can fetch across taken control transfer instructions in a single cycle. 2. The dynamic number of instruction cache lines fetched by the processor is reduced as multiple non contiguous basic blocks along a given path are encountered in one fetch cycle. 3. Replication of a branch instruction along multiple paths provides path separability for branches, which positively impacts branch prediction accuracy. PMTC mechanism requires minimal modifications to the processor’s fetch unit and the trace insertion algorithm can easily be implemented within the assembler without compiler support

    Empirical and Statistical Application Modeling Using on -Chip Performance Monitors.

    Get PDF
    To analyze the performance of applications and architectures, both programmers and architects desire formal methods to explain anomalous behavior. To this end, we present various methods that utilize non-intrusive, performance-monitoring hardware only recently available on microprocessors to provide further explanations of observed behavior. All the methods attempt to characterize and explain the instruction-level parallelism achieved by codes on different architectures. We also present a prototype tool automating the analysis process to exploit the advantages of the empirical and statistical methods proposed. The empirical, statistical and hybrid methods are discussed and explained with case study results provided. The given methods further the wealth of tools available to programmer\u27s and architects for generally understanding the performance of scientific applications. Specifically, the models and tools presented provide new methods for evaluating and categorizing application performance. The empirical memory model serves to quantify the hierarchical memory performance of applications by inferring the incurred latencies of codes after the effect of latency hiding techniques are realized. The instruction-level model and its extensions model on-chip performance analytically giving insight into inherent performance bottlenecks in superscalar architectures. The statistical model and its hybrid extension provide other methods of categorizing codes via their statistical variations. The PTERA performance tool automates the use of performance counters for use by these methods across platforms making the modeling process easier still. These unique methods provide alternatives to performance modeling and categorizing not available previously in an attempt to utilize the inherent modeling capabilities of performance monitors on commodity processors for scientific applications

    TailoredRE: A Personalized Cloud-based Traffic Redundancy Elimination for Smartphones

    Get PDF
    The exceptional rise in usages of mobile devices such as smartphones and tablets has contributed to a massive increase in wireless network trac both Cellular (3G/4G/LTE) and WiFi. The unprecedented growth in wireless network trac not only strain the battery of the mobile devices but also bogs down the last-hop wireless access links. Interestingly, a signicant part of this data trac exhibits high level of redundancy in them due to repeated access of popular contents in the web. Hence, a good amount of research both in academia and in industries has studied, analyzed and designed diverse systems that attempt to eliminate redundancy in the network trac. Several of the existing Trac Redundancy Elimination (TRE) solutions either does not improve last-hop wireless access links or involves inecient use of compute resources from resource-constrained mobile devices. In this research, we propose TailoredRE, a personalized cloud-based trac redundancy elimination system. The main objective of TailoredRE is to tailor TRE mechanism such that TRE is performed against selected applications rather than application agnostically, thus improving eciency by avoiding caching of unnecessary data chunks. In our system, we leverage the rich resources of the cloud to conduct TRE by ooading most of the operational cost from the smartphones or mobile devices to its clones (proxies) available in the cloud. We cluster the multiple individual user clones in the cloud based on the factors of connectedness among users such as usage of similar applications, common interests in specic web contents etc., to improve the eciency of caching in the cloud. This thesis encompasses motivation, system design along with detailed analysis of the results obtained through simulation and real implementation of TailoredRE system

    Quality-driven management of video streaming services in segment-based cache networks

    Get PDF

    Using Rollback Avoidance to Mitigate Failures in Next-Generation Extreme-Scale Systems

    Get PDF
    High-performance computing (HPC) systems enable scientists to numerically model complex phenomena in many important physical systems. The next major milestone in the development of HPC systems is the construction of the first supercomputer capable executing more than an exaflop, 10^18 floating point operations per second. On systems of this scale, failures will occur much more frequently than on current systems. As a result, resilience is a key obstacle to building next-generation extreme-scale systems. Coordinated checkpointing is currently the most widely-used mechanism for handling failures on HPC systems. Although coordinated checkpointing remains effective on current systems, increasing the scale of today\u27s systems to build next-generation systems will increase the cost of fault tolerance as more and more time is taken away from the application to protect against or recover from failure. Rollback avoidance techniques seek to mitigate the cost of checkpoint/restart by allowing an application to continue its execution rather than rolling back to an earlier checkpoint when failures occur. These techniques include failure prediction and preventive migration, replicated computation, fault-tolerant algorithms, and software-based memory fault correction. In this thesis, I examine how rollback avoidance techniques can be used to address failures on extreme-scale systems. Using a combination of analytic modeling and simulation, I evaluate the potential impact of rollback avoidance on these systems. I then present a novel rollback avoidance technique that exploits similarities in application memory. Finally, I examine the feasibility of using this technique to protect against memory faults in kernel memory
    • …
    corecore