94 research outputs found

    ISIM: The simulator for the impulse adaptable memory system

    Get PDF
    technical reportThis document describes ISIM, the simulator for the Impulse Adaptable Memory System. Impulse adds two new features to a conventional memory system. First, it supports a configurable, extra level of address remapping at the memory controller. Second, it supports prefetching at the memory controller. consequently, two new units, a remapping controller and a memory controller cache, are added to a traditional memory system to support the new Impulse features. ISIM is based on Paint, a PA-RISC instruction set interpreter. ISIM extends Paint with a detailed Impulse memory system model which includes a primary data cache, a secondary data cache, a system bus, an Impulse memory controller, and a renovated DRAM backend. Note that this document focuses on the Impulse extensions only. The reader should consult the Paint technical report [2] for an overview of the Paint simulation environment and terminology

    Cooperative caching and prefetching in parallel/distributed file systems

    Get PDF
    If we examine the structure of the applications that run on parallel machines, we observe that their I/O needs increase tremendously every day. These applications work with very large data sets which, in most cases, do not fit in memory and have to be kept in the disk. The input and output data files are also very large and have to be accessed very fast. These large applications also want to be able to checkpoint themselves without wasting too much time. These facts constantly increase the expectations placed on parallel and distributed file systems. Thus, these file systems have to improve their performance to avoid becoming the bottleneck in parallel/distributed environments. On the other hand, while the performance of the new processors, interconnection networks and memory increases very rapidly, no such thing happens with the disk performance. This lack of improvement is due to the mechanical parts used to build the disks. These components are slow and limit both the latency and t..

    EFFECTIVE GROUPING FOR ENERGY AND PERFORMANCE: CONSTRUCTION OF ADAPTIVE, SUSTAINABLE, AND MAINTAINABLE DATA STORAGE

    Get PDF
    The performance gap between processors and storage systems has been increasingly critical overthe years. Yet the performance disparity remains, and further, storage energy consumption israpidly becoming a new critical problem. While smarter caching and predictive techniques domuch to alleviate this disparity, the problem persists, and data storage remains a growing contributorto latency and energy consumption.Attempts have been made at data layout maintenance, or intelligent physical placement ofdata, yet in practice, basic heuristics remain predominant. Problems that early studies soughtto solve via layout strategies were proven to be NP-Hard, and data layout maintenance todayremains more art than science. With unknown potential and a domain inherently full of uncertainty,layout maintenance persists as an area largely untapped by modern systems. But uncertainty inworkloads does not imply randomness; access patterns have exhibited repeatable, stable behavior.Predictive information can be gathered, analyzed, and exploited to improve data layouts. Ourgoal is a dynamic, robust, sustainable predictive engine, aimed at improving existing layouts byreplicating data at the storage device level.We present a comprehensive discussion of the design and construction of such a predictive engine,including workload evaluation, where we present and evaluate classical workloads as well asour own highly detailed traces collected over an extended period. We demonstrate significant gainsthrough an initial static grouping mechanism, and compare against an optimal grouping method ofour own construction, and further show significant improvement over competing techniques. We also explore and illustrate the challenges faced when moving from static to dynamic (i.e. online)grouping, and provide motivation and solutions for addressing these challenges. These challengesinclude metadata storage, appropriate predictive collocation, online performance, and physicalplacement. We reduced the metadata needed by several orders of magnitude, reducing the requiredvolume from more than 14% of total storage down to less than 12%. We also demonstrate how ourcollocation strategies outperform competing techniques. Finally, we present our complete modeland evaluate a prototype implementation against real hardware. This model was demonstrated tobe capable of reducing device-level accesses by up to 65%

    Prefetching techniques for client server object-oriented database systems

    Get PDF
    The performance of many object-oriented database applications suffers from the page fetch latency which is determined by the expense of disk access. In this work we suggest several prefetching techniques to avoid, or at least to reduce, page fetch latency. In practice no prediction technique is perfect and no prefetching technique can entirely eliminate delay due to page fetch latency. Therefore we are interested in the trade-off between the level of accuracy required for obtaining good results in terms of elapsed time reduction and the processing overhead needed to achieve this level of accuracy. If prefetching accuracy is high then the total elapsed time of an application can be reduced significantly otherwise if the prefetching accuracy is low, many incorrect pages are prefetched and the extra load on the client, network, server and disks decreases the whole system performance. Access pattern of object-oriented databases are often complex and usually hard to predict accurately. The ..

    Software and hardware methods for memory access latency reduction on ILP processors

    Get PDF
    While microprocessors have doubled their speed every 18 months, performance improvement of memory systems has continued to lag behind. to address the speed gap between CPU and memory, a standard multi-level caching organization has been built for fast data accesses before the data have to be accessed in DRAM core. The existence of these caches in a computer system, such as L1, L2, L3, and DRAM row buffers, does not mean that data locality will be automatically exploited. The effective use of the memory hierarchy mainly depends on how data are allocated and how memory accesses are scheduled. In this dissertation, we propose several novel software and hardware techniques to effectively exploit the data locality and to significantly reduce memory access latency.;We first presented a case study at the application level that reconstructs memory-intensive programs by utilizing program-specific knowledge. The problem of bit-reversals, a set of data reordering operations extensively used in scientific computing program such as FFT, and an application with a special data access pattern that can cause severe cache conflicts, is identified in this study. We have proposed several software methods, including padding and blocking, to restructure the program to reduce those conflicts. Our methods outperform existing ones on both uniprocessor and multiprocessor systems.;The access latency to DRAM core has become increasingly long relative to CPU speed, causing memory accesses to be an execution bottleneck. In order to reduce the frequency of DRAM core accesses to effectively shorten the overall memory access latency, we have conducted three studies at this level of memory hierarchy. First, motivated by our evaluation of DRAM row buffer\u27s performance roles and our findings of the reasons of its access conflicts, we propose a simple and effective memory interleaving scheme to reduce or even eliminate row buffer conflicts. Second, we propose a fine-grain priority scheduling scheme to reorder the sequence of data accesses on multi-channel memory systems, effectively exploiting the available bus bandwidth and access concurrency. In the final part of the dissertation, we first evaluate the design of cached DRAM and its organization alternatives associated with ILP processors. We then propose a new memory hierarchy integration that uses cached DRAM to construct a very large off-chip cache. We show that this structure outperforms a standard memory system with an off-level L3 cache for memory-intensive applications.;Memory access latency has become a major performance bottleneck for memory-intensive applications. as long as DRAM technology remains its most cost-effective position for making main memory, the memory performance problem will continue to exist. The studies conducted in this dissertation attempt to address this important issue. Our proposed software and hardware schemes are effective and applicable, which can be directly used in real-world memory system designs and implementations. Our studies also provide guidance for application programmers to understand memory performance implications, and for system architects to optimize memory hierarchies

    05141 Abstracts Collection -- Power-aware Computing Systems

    Get PDF
    From 03.04.05 to 08.04.05, the Dagstuhl Seminar 05141 ``Power-aware Computing Systems\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and discussed open problems. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are collected in this paper. The first section describes the seminar topics and goals. Links to extended abstracts or full papers are provided, if available

    Model-driven dual caching For nomadic service-oriented architecture clients

    Get PDF
    Mobile devices have evolved over the years from resource constrained devices that supported only the most basic tasks to powerful handheld computing devices. However, the most significant step in the evolution of mobile devices was the introduction of wireless connectivity which enabled them to host applications that require internet connectivity such as email, web browsers and maybe most importantly smart/rich clients. Being able to host smart clients allows the users of mobile devices to seamlessly access the Information Technology (IT) resources of their organizations. One increasingly popular way of enabling access to IT resources is by using Web Services (WS). This trend has been aided by the rapid availability of WS packages/tools, most notably the efforts of the Apache group and Integrated Development Environment (IDE) vendors. But the widespread use of WS raises questions for users of mobile devices such as laptops or PDAs; how and if they can participate in WS. Unlike their “wired” counterparts (desktop computers and servers) they rely on a wireless network that is characterized by low bandwidth and unreliable connectivity.The aim of this thesis is to enable mobile devices to host Web Services consumers. It introduces a Model-Driven Dual Caching (MDDC) approach to overcome problems arising from temporarily loss of connectivity and fluctuations in bandwidth
    corecore