31 research outputs found

    An Effective Cache Replacement Policy For Multicore Processors

    Get PDF
    Cache management is increasingly important on multicore systems since the available cache space is shared by an increasing number of cores. Optimal caching is generally impossible at the system or hardware. The goal of cache management is to maximize data reuse. In this paper, a collaborative caching system is implemented that allows a program to choose different caching methods (generally LRU or MRU) for its data. We have developed a LRU MRU caching replacement policy for multicore processor which is simulated using Multi2sim simulator. We have compared the value of the following parameters (Instruction committed per cycle and branch predication Accuracy) of the multicore processor when different caching algorithm is used for cache management: replacement policies are LRU,MRU,FIFO and LRU-MRU. We use 2 benchmarks:Spec2006 and Splash-2 to see how or LRU-MRU replacement is behaving and what is the value of IPC, is it less than LRU,MRU,FIFO and Random or vice versa. We get Random replacement policy performance is worst when use for multicore processor and LRU and MRU performance is almost same. Here Performance refers to the value of IPC, More the value of IPC better the performance of the replacement policy used

    CLAM: Compiler Lease of Cache Memory

    Get PDF
    Caching is a common solution to the data movement performance bottleneck of today’s computational systems and networks. Traditional caching examines program behavior and cache optimization separately, limiting performance. Recently, a new cache policy called Compiler Lease of cAche Memory (CLAM), has been suggested for program-based cache management. CLAM manages cache memory by allowing the compiler to assign leases, or lifespans, to cached items over a hardware-software interface, known as lease cache. Lease cache affords new performance potential, by way of program-driven cache optimization. It is applicable to existing cache architecture optimizations, and can be used to emulate other cache policies. This paper presents the first functional hardware implementation of lease cache for CLAM support. Lease cache hardware architecture is first presented, along with CLAM hardware support systems. The cache is emulated on an FPGA, and benchmarked using a collection of scientific kernels from the PolyBench/C suite, for three CLAM lease assignment policies: Compiler Assigned Reference Leasing (CARL), Phased Reference Leasing (PRL), and Fixed Uniform Leasing (FUL). CARL and PRL are able to achieve superior performance to Least Recently Used (LRU) replacement, while FUL is shown to serve as a safety mechanism for CLAM. Novel spectrum-based cache tenancy analysis verifies PRL’s effectiveness in limiting cache utilization, and can identify changes in the working-set that cause the policy to perform adversely. This suggests that CLAM is extendable to more complex workloads if working-set transitions can elicit a similar change in lease policy. Being able to do so could yield appreciable performance improvements for large and highly iterative workloads like tensors

    Improving Storage Performance with Non-Volatile Memory-based Caching Systems

    Get PDF
    University of Minnesota Ph.D. dissertation. April 2017. Major: Computer Science. Advisor: David Du. 1 computer file (PDF); ix, 104 pages.With the rapid development of new types of non-volatile memory (NVRAM), e.g., 3D Xpoint, NVDIMM, and STT-MRAM, these technologies have been or will be integrated into current computer systems to work together with traditional DRAM. Compared with DRAM, which can cause data loss when the power fails or the system crashes, NVRAM's non-volatile nature makes it a better candidate as caching material. In the meantime, storage performance needs to keep up to process and accommodate the rapidly generated amounts of data around the world (a.k.a the big data problem). Throughout my Ph.D. research, I have been focusing on building novel NVRAM-based caching systems to provide cost-effective ways to improve storage system performance. To show the benefits of designing novel NVRAM-based caching systems, I target four representative storage devices and systems: solid state drives (SSDs), hard disk drives (HDDs), disk arrays, and high-performance computing (HPC) parallel file systems (PFSs). For SSDs, to mitigate their wear out problem and extend their lifespan, we propose two NVRAM-based buffer cache policies which can work together in different layers to maximally reduce SSD write traffic: a main memory buffer cache design named Hierarchical Adaptive Replacement Cache (H-ARC) and an internal SSD write buffer design named Write Traffic Reduction Buffer (WRB). H-ARC considers four factors (dirty, clean, recency, and frequency) to reduce write traffic and improve cache hit ratios in the host. WRB reduces block erasures and write traffic further inside an SSD by effectively exploiting temporal and spatial localities. For HDDs, to exploit their fast sequential access speed to improve I/O throughput, we propose a buffer cache policy, named I/O-Cache, that regroups and synchronizes long sets of consecutive dirty pages to take advantage of HDDs' fast sequential access speed and the non-volatile property of NVRAM. In addition, our new policy can dynamically separate the whole cache into a dirty cache and a clean cache, according to the characteristics of the workload, to decrease storage writes. For disk arrays, although numerous cache policies have been proposed, most are either targeted at main memory buffer caches or manage NVRAM as write buffers and separately manage DRAM as read caches. To the best of our knowledge, cooperative hybrid volatile and non-volatile memory buffer cache policies specifically designed for storage systems using newer NVRAM technologies have not been well studied. Based on our elaborate study of storage server block I/O traces, we propose a novel cooperative HybrId NVRAM and DRAM Buffer cACHe polIcy for storage arrays, named Hibachi. Hibachi treats read cache hits and write cache hits differently to maximize cache hit rates and judiciously adjusts the clean and the dirty cache sizes to capture workloads' tendencies. In addition, it converts random writes to sequential writes for high disk write throughput and further exploits storage server I/O workload characteristics to improve read performance. For modern complex HPC systems (e.g., supercomputers), data generated during checkpointing are bursty and so dominate HPC I/O traffic that relying solely on PFSs will slow down the whole HPC system. In order to increase HPC checkpointing speed, we propose an NVRAM-based burst buffer coordination system for PFSs, named collaborative distributed burst buffer (CDBB). Inspired by our observations of HPC application execution patterns and experimentations on HPC clusters, we design CDBB to coordinate all the available burst buffers, based on their priorities and states, to help overburdened burst buffers and maximize resource utilization

    Optimizing Hierarchical Storage Management For Database System

    Get PDF
    Caching is a classical but effective way to improve system performance. To improve system performance, servers, such as database servers and storage servers, contain significant amounts of memory that act as a fast cache. Meanwhile, as new storage devices such as flash-based solid state drives (SSDs) are added to storage systems over time, using the memory cache is not the only way to improve system performance. In this thesis, we address the problems of how to manage the cache of a storage server and how to utilize the SSD in a hybrid storage system. Traditional caching policies are known to perform poorly for storage server caches. One promising approach to solving this problem is to use hints from the storage clients to manage the storage server cache. Previous hinting approaches are ad hoc, in that a predefined reaction to specific types of hints is hard-coded into the caching policy. With ad hoc approaches, it is difficult to ensure that the best hints are being used, and it is difficult to accommodate multiple types of hints and multiple client applications. In this thesis, we propose CLient-Informed Caching (CLIC), a generic hint-based technique for managing storage server caches. CLIC automatically interprets hints generated by storage clients and translates them into a server caching policy. It does this without explicit knowledge of the application-specific hint semantics. We demonstrate using trace-based simulation of database workloads that CLIC outperforms hint-oblivious and state-of-the-art hint-aware caching policies. We also demonstrate that the space required to track and interpret hints is small. SSDs are becoming a part of the storage system. Adding SSD to a storage system not only raises the question of how to manage the SSD, but also raises the question of whether current buffer pool algorithms will still work effectively. We are interested in the use of hybrid storage systems, consisting of SSDs and hard disk drives (HDD), for database management. We present cost-aware replacement algorithms for both the DBMS buffer pool and the SSD. These algorithms are aware of the different I/O performance of HDD and SSD. In such a hybrid storage system, the physical access pattern to the SSD depends on the management of the DBMS buffer pool. We studied the impact of the buffer pool caching policies on the access patterns of the SSD. Based on these studies, we designed a caching policy to effectively manage the SSD. We implemented these algorithms in MySQL's InnoDB storage engine and used the TPC-C workload to demonstrate that these cost-aware algorithms outperform previous algorithms

    Performance Analysis and Optimisation of In-network Caching for Information-Centric Future Internet

    Get PDF
    The rapid development in wireless technologies and multimedia services has radically shifted the major function of the current Internet from host-centric communication to service-oriented content dissemination, resulting a mismatch between the protocol design and the current usage patterns. Motivated by this significant change, Information-Centric Networking (ICN), which has been attracting ever-increasing attention from the communication networks research community, has emerged as a new clean-slate networking paradigm for future Internet. Through identifying and routing data by unified names, ICN aims at providing natural support for efficient information retrieval over the Internet. As a crucial characteristic of ICN, in-network caching enables users to efficiently access popular contents from on-path routers equipped with ubiquitous caches, leading to the enhancement of the service quality and reduction of network loads. Performance analysis and optimisation has been and continues to be key research interests of ICN. This thesis focuses on the development of efficient and accurate analytical models for the performance evaluation of ICN caching and the design of optimal caching management schemes under practical network configurations. This research starts with the proposition of a new analytical model for caching performance under the bursty multimedia traffic. The bursty characteristic is captured and the closed formulas for cache hit ratio are derived. To investigate the impact of topology and heterogeneous caching parameters on the performance, a comprehensive analytical model is developed to gain valuable insight into the caching performance with heterogeneous cache sizes, service intensity and content distribution under arbitrary topology. The accuracy of the proposed models is validated by comparing the analytical results with those obtained from extensive simulation experiments. The analytical models are then used as cost-efficient tools to investigate the key network and content parameters on the performance of caching in ICN. Bursty traffic and heterogeneous caching features have significant influence on the performance of ICN. Therefore, in order to obtain optimal performance results, a caching resource allocation scheme, which leverages the proposed model and targets at minimising the total traffic within the network and improving hit probability at the nodes, is proposed. The performance results reveal that the caching allocation scheme can achieve better caching performance and network resource utilisation than the default homogeneous and random caching allocation strategy. To attain a thorough understanding of the trade-off between the economic aspect and service quality, a cost-aware Quality-of-Service (QoS) optimisation caching mechanism is further designed aiming for cost-efficiency and QoS guarantee in ICN. A cost model is proposed to take into account installation and operation cost of ICN under a realistic ISP network scenario, and a QoS model is presented to formulate the service delay and delay jitter in the presence of heterogeneous service requirements and general probabilistic caching strategy. Numerical results show the effectiveness of the proposed mechanism in achieving better service quality and lower network cost. In this thesis, the proposed analytical models are used to efficiently and accurately evaluate the performance of ICN and investigate the key performance metrics. Leveraging the insights discovered by the analytical models, the proposed caching management schemes are able to optimise and enhance the performance of ICN. To widen the outcomes achieved in the thesis, several interesting yet challenging research directions are pointed out

    CACHE MANAGEMENT ALGORITHMS: SINGLE AND NETWORKED CACHES

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Enabling Distributed Applications Optimization in Cloud Environment

    Get PDF
    The past few years have seen dramatic growth in the popularity of public clouds, such as Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Container-as-a-Service (CaaS). In both commercial and scientific fields, quick environment setup and application deployment become a mandatory requirement. As a result, more and more organizations choose cloud environments instead of setting up the environment by themselves from scratch. The cloud computing resources such as server engines, orchestration, and the underlying server resources are served to the users as a service from a cloud provider. Most of the applications that run in public clouds are the distributed applications, also called multi-tier applications, which require a set of servers, a service ensemble, that cooperate and communicate to jointly provide a certain service or accomplish a task. Moreover, a few research efforts are conducting in providing an overall solution for distributed applications optimization in the public cloud. In this dissertation, we present three systems that enable distributed applications optimization: (1) the first part introduces DocMan, a toolset for detecting containerized application’s dependencies in CaaS clouds, (2) the second part introduces a system to deal with hot/cold blocks in distributed applications, (3) the third part introduces a system named FP4S, a novel fragment-based parallel state recovery mechanism that can handle many simultaneous failures for a large number of concurrently running stream applications

    Using Tracing To Enhance Data Cache Performance in CPUs: The creation of a Trace-Assisted Cache to increase cache hits and decrease runtime

    Get PDF
    The processor-memory gap is widening every year with no prospect of reprieve. More and more latency is being added to program runtimes as memory cannot satisfy the demands of CPUs quickly enough. In the past, this has been alleviated through caches of increasing complexity or techniques like prefetching, to give the illusion of faster memory. However, these techniques have drawbacks because they are reactive or rely on incomplete information. In general, this leads to large amounts of latency in programs due to processor stalls. It is our contention that through tracing a program's data accesses and feeding this information back to the cache, overall program runtime can be reduced. This is achieved through a new piece of hardware called a Trace-Assisted Cache (TAC). This uses traces to gain foreknowledge of the memory requests the processor is likely to make, allowing them to be actioned before the processor requests the data, overlapping memory and computation instructions. Comparing the TAC against a standard CPU without a cache, we see improvements in runtimes of up to 65%. However, we see degraded performance of around 8% on average when compared to Set-Associative and Direct-Mapped caches. This is because improvements are swamped by high overheads and synchronisation times between components. We also see that benchmarks that exhibit several qualities: a balance of computation and memory instructions and keeping data well spread out in memory fare better using TAC than other benchmarks on the same hardware. Overall this demonstrates that whilst there is potential to reduce runtime via increasing the agency of the cache through Trace Assistance, it requires a highly efficient implementation to be competitive otherwise any potential gains are negated by the increase in overheads
    corecore