351 research outputs found

    FIFO anomaly is unbounded

    Get PDF
    Virtual memory of computers is usually implemented by demand paging. For some page replacement algorithms the number of page faults may increase as the number of page frames increases. Belady, Nelson and Shedler constructed reference strings for which page replacement algorithm FIFO produces near twice more page faults in a larger memory than in a smaller one. They formulated the conjecture that 2 is a general bound. We prove that this ratio can be arbitrarily large

    GreedyDual-Join: Locality-Aware Buffer Management for Approximate Join Processing Over Data Streams

    Full text link
    We investigate adaptive buffer management techniques for approximate evaluation of sliding window joins over multiple data streams. In many applications, data stream processing systems have limited memory or have to deal with very high speed data streams. In both cases, computing the exact results of joins between these streams may not be feasible, mainly because the buffers used to compute the joins contain much smaller number of tuples than the tuples contained in the sliding windows. Therefore, a stream buffer management policy is needed in that case. We show that the buffer replacement policy is an important determinant of the quality of the produced results. To that end, we propose GreedyDual-Join (GDJ) an adaptive and locality-aware buffering technique for managing these buffers. GDJ exploits the temporal correlations (at both long and short time scales), which we found to be prevalent in many real data streams. We note that our algorithm is readily applicable to multiple data streams and multiple joins and requires almost no additional system resources. We report results of an experimental study using both synthetic and real-world data sets. Our results demonstrate the superiority and flexibility of our approach when contrasted to other recently proposed techniques

    VIRTUAL MEMORY ON A MANY-CORE NOC

    Get PDF
    Many-core devices are likely to become increasingly common in real-time and embedded systems as computational demands grow and as expectations for higher performance can generally only be met by by increasing core numbers rather than relying on higher clock speeds. Network-on-chip devices, where multiple cores share a single slice of silicon and employ packetised communications, are a widely-deployed many-core option for system designers. As NoCs are expected to run larger and more complex programs, the small amount of fast, on-chip memory available to each core is unlikely to be sufficient for all but the simplest of tasks, and it is necessary to find an efficient, effective, and time-bounded, means of accessing resources stored in off-chip memory, such as DRAM or Flash storage. The abstraction of paged virtual memory is a familiar technique to manage similar tasks in general computing but has often been shunned by real-time developers because of concern about time predictability. We show it can be a poor choice for a many-core NoC system as, unmodified, it typically uses page sizes optimised for interaction with spinning disks and not solid state media, and transports significant volumes of subsequently unused data across already congested links. In this work we outline and simulate an efficient partial paging algorithm where only those memory resources that are locally accessed are transported between global and local storage. We further show that smaller page sizes add to efficiency. We examine the factors that lead to timing delays in such systems, and show we can predict worst case execution times at even safety-critical thresholds by using statistical methods from extreme value theory. We also show these results are applicable to systems with a variety of connections to memory

    Hyperswitch communication network

    Get PDF
    The Hyperswitch Communication Network (HCN) is a large scale parallel computer prototype being developed at JPL. Commercial versions of the HCN computer are planned. The HCN computer being designed is a message passing multiple instruction multiple data (MIMD) computer, and offers many advantages in price-performance ratio, reliability and availability, and manufacturing over traditional uniprocessors and bus based multiprocessors. The design of the HCN operating system is a uniquely flexible environment that combines both parallel processing and distributed processing. This programming paradigm can achieve a balance among the following competing factors: performance in processing and communications, user friendliness, and fault tolerance. The prototype is being designed to accommodate a maximum of 64 state of the art microprocessors. The HCN is classified as a distributed supercomputer. The HCN system is described, and the performance/cost analysis and other competing factors within the system design are reviewed

    Index Translation Schemes for Adaptive Computations on Distributed Memory Multicomputers

    Get PDF
    Current research in parallel programming is focused on closing the gap between globally indexed algorithms and the separate address spaces of processors on distributed memory multicomputers. A set of index translation schemes have been implemented as a part of CHAOS runtime support library, so that the library functions can be used for implementing a global indez space across a collection of separate local index spaces. These schemes include also software-cached translation schemes aimed at adaptive irregular problems as teen as a distributed translation table technique for statically irregular problems. To evaluate and demonstrate the efficiency of the softwDare-cached translation schemes, experiments have been performed with an adaptively irregular loop kernel and a full-fledped 3D DSMC code from NASA Langley on the Intel Paragon and Cray T3D. This paper also discusses and analyzes the operational conditions under which each scheme can produce optimal performance. (Also cross-referenced as UMIACS-TR-95-28

    GPU High-Performance Framework for PIC-like Simulation Methods Using the Vulkan® Explicit API

    Get PDF
    Within computational continuum mechanics there exists a large category of simulation methods which operate by tracking Lagrangian particles over an Eulerian background grid. These Lagrangian/Eulerian hybrid methods, descendants of the Particle-In-Cell method (PIC), have proven highly effective at simulating a broad range of materials and mechanics including fluids, solids, granular materials, and plasma. These methods remain an area of active research after several decades, and their applications can be found across scientific, engineering, and entertainment disciplines. This thesis presents a GPU driven PIC-like simulation framework created using the Vulkan® API. Vulkan is a cross-platform and open-standard explicit API for graphics and GPU compute programming. Compared to its predecessors, Vulkan offers lower overhead, support for host parallelism, and finer grain control over both device resources and scheduling. This thesis harnesses those advantages to create a programmable GPU compute pipeline backed by a Vulkan adaptation of the SPgrid data-structure and multi-buffered particle arrays. The CPU host system works asynchronously with the GPU to maximize utilization of both the host and device. The framework is demonstrated to be capable of supporting Particle-in-Cell like simulation methods, making it viable for GPU acceleration of many Lagrangian particle on Eulerian grid hybrid methods. This novel framework is the first of its kind to be created using Vulkan® and to take advantage of GPU sparse memory features for grid sparsity

    ACR: An Adaptive Cost-Aware Buffer Replacement Algorithm for Flash Storage Devices

    Full text link
    Abstract—Flash disks are being widely used as an important alternative to conventional magnetic disks, although accessed through the same interface by applications, their distinguished feature, i.e., different read and write cost in the aspects of time, makes it necessary to reconsider the design of existing replacement algorithms to leverage their performance potential. Different from existing flash-aware buffer replacement policies that focus on the asymmetry of read and write operations, we address the “discrepancy ” of the asymmetry for different flash disks, which is the fact that exists for a long time, while has drawn little attention by researchers since most existing flash-aware buffer replacement polices are somewhat based on the assumption that the cost of read operation is neglectable compared with that of write operation. In fact, this is not true for current flash disks on the market. We propose an adaptive cost-aware replacement policy (ACR) that uses three cost-based heuristics to select the victim page, thus can fairly make trade off between clean pages (their content remain unchanged) and dirty pages (their content is modified), and hence, can work well for different type of flash disks of large discrepancy. Further, in ACR, buffer pages are divided into clean list and dirty list, the newly entered pages will not be inserted at the MRU position of either list, but at some position in the middle, thus the once-requested pages can be flushed out from the buffer quickly and the frequently-requested pages can stay in buffer for a longer time. Such mechanism makes ACR adaptive to workloads of different access patterns. The experimental results on different traces and flash disks show that ACR not only adaptively tunes itself to workloads of different access patterns, but also works well for different kind of flash disks compared with existing methods. I

    Adaptive Prefetching for Device-Independent File I/O

    Get PDF
    Device independent I/O has been a holy grail to operating system designers since the early days of UNIX. Unfortunately, existing operating systems fall short of this goal for multimedia applications. Techniques such as caching and sequential read-ahead can help mask I/O latency in some cases, but in others they increase latency and add substantial jitter. Multimedia applications, such as video players, are sensitive to vagaries in performance since I/O latency and jitter affect the quality of presentation. Our solution uses adaptive prefetching to reduce both latency and jitter. Applications submit file access plans to the prefetcher, which then generates I/O requests to the operating system and manages the buffer cache to isolate the application from variations in device performance. Our experiments show device independence can be achieved: an MPEG video player sees the same latency when reading from a local disk or an NFS server. Moreover, our approach reduces jitter substantially

    Working Sets Past and Present

    Get PDF
    • …
    corecore