Search CORE

4 research outputs found

CPHASH: A cache-partitioned hash table

Author: Kaashoek M. Frans
Metreveli Zviad
Zeldovich Nickolai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 26/11/2011
Field of study

CPHash is a concurrent hash table for multicore processors. CPHash partitions its table across the caches of cores and uses message passing to transfer lookups/inserts to a partition. CPHash's message passing avoids the need for locks, pipelines batches of asynchronous messages, and packs multiple messages into a single cache line transfer. Experiments on a 80-core machine with 2 hardware threads per core show that CPHash has ~1.6x higher throughput than a hash table implemented using fine-grained locks. An analysis shows that CPHash wins because it experiences fewer cache misses and its cache misses are less expensive, because of less contention for the on-chip interconnect and DRAM. CPServer, a key/value cache server using CPHash, achieves ~5% higher throughput than a key/value cache server that uses a hash table with fine-grained locks, but both achieve better throughput and scalability than memcached. The throughput of CPHash and CPServer also scale near-linearly with the number of cores.Quanta Computer (Firm)National Science Foundation (U.S.). (Award 915164

CiteSeerX

DSpace@MIT

Crossref

A Software Approach to Unifying Multicore Caches

Author: Boyd-Wickizer Silas
Kaashoek M. Frans
Morris Robert
Zeldovich Nickolai
Publication venue
Publication date: 28/06/2011
Field of study

Multicore chips will have large amounts of fast on-chip cache memory, along with relatively slow DRAM interfaces. The on-chip cache memory, however, will be fragmented and spread over the chip; this distributed arrangement is hard for certain kinds of applications to exploit efficiently, and can lead to needless slow DRAM accesses. First, data accessed from many cores may be duplicated in many caches, reducing the amount of distinct data cached. Second, data in a cache distant from the accessing core may be slow to fetch via the cache coherence protocol. Third, software on each core can only allocate space in the small fraction of total cache memory that is local to that core. A new approach called software cache unification (SCU) addresses these challenges for applications that would be better served by a large shared cache. SCU chooses the on-chip cache in which to cache each item of data. As an application thread reads data items, SCU moves the thread to the core whose on-chip cache contains each item. This allows the thread to read the data quickly if it is already on-chip; if it is not, moving the thread causes the data to be loaded into the chosen on-chip cache. A new file cache for Linux, called MFC, uses SCU to improve performance of file-intensive applications, such as Unix file utilities. An evaluation on a 16-core AMD Opteron machine shows that MFC improves the throughput of file utilities by a factor of 1.6. Experiments with a platform that emulates future machines with less DRAM throughput per core shows that MFC will provide benefit to a growing range of applications.This material is based upon work supported by the National Science Foundation under grant number 0915164

CiteSeerX

DSpace@MIT

Compiler and Runtime Optimizations for Fine-Grained Distributed Shared Memory Systems

Author: Veldema R.S.
Publication venue
Publication date: 01/01/2003
Field of study

Bal, H.E. [Promotor

VU Research Portal

Dynamic Computation Migration in DSM Systems

Author: M. Frans Kaashoek
William E. Weihl
Wilson C. Hsieh
Publication venue: Society Press
Publication date: 01/01/1996
Field of study

Dynamic computation migration is the runtime choice between computation and data migration. Dynamic computation migration speeds up access to concurrent data structures with unpredictable read/write patterns. This paper describes the design, implementation, and evaluation of dynamic computation migration in a multithreaded distributed shared-memory system, MCRL. Two policies are studied, STATIC and REPEAT. Both migrate computation for writes. STATIC migrates data for reads, while REPEAT maintains a limited history of accesses and sometimes migrates computation for reads. On a concurrent, distributed B-tree with 50% lookups and 50% inserts, STATIC improves performance by about 17% on both Alewife and the CM-5. REPEAT generally performs better than STATIC. With 80% lookups and 20% inserts, REPEATimproves performance by 23% on Alewife, and by 46% on the CM-5. Keywords: computation migration, data migration, replication, coherence 1 Introduction Dynamic computation migration is the dyn..

CiteSeerX

Crossref