Search CORE

1,474 research outputs found

The Virtual Block Interface: A Flexible Alternative to the Conventional Virtual Memory Framework

Author: Appavoo Jonathan
Ausavarungnirun Rachata
Ghose Saugata
Hajinazar Nastaran
Kanellopoulos Konstantinos
Mutlu Onur
Oliveira Jr. Geraldo Francisco de
Patel Minesh
Patel Pratyush
Seshadri Vivek
Publication venue
Publication date: 01/05/2020
Field of study

Computers continue to diversify with respect to system designs, emerging memory technologies, and application memory demands. Unfortunately, continually adapting the conventional virtual memory framework to each possible system configuration is challenging, and often results in performance loss or requires non-trivial workarounds. To address these challenges, we propose a new virtual memory framework, the Virtual Block Interface (VBI). We design VBI based on the key idea that delegating memory management duties to hardware can reduce the overheads and software complexity associated with virtual memory. VBI introduces a set of variable-sized virtual blocks (VBs) to applications. Each VB is a contiguous region of the globally-visible VBI address space, and an application can allocate each semantically meaningful unit of information (e.g., a data structure) in a separate VB. VBI decouples access protection from memory allocation and address translation. While the OS controls which programs have access to which VBs, dedicated hardware in the memory controller manages the physical memory allocation and address translation of the VBs. This approach enables several architectural optimizations to (1) efficiently and flexibly cater to different and increasingly diverse system configurations, and (2) eliminate key inefficiencies of conventional virtual memory. We demonstrate the benefits of VBI with two important use cases: (1) reducing the overheads of address translation (for both native execution and virtual machine environments), as VBI reduces the number of translation requests and associated memory accesses; and (2) two heterogeneous main memory architectures, where VBI increases the effectiveness of managing fast memory regions. For both cases, VBI significanttly improves performance over conventional virtual memory

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

Optimizing the MapReduce Framework on Intel Xeon Phi Coprocessor

Author: Goh Rick Siow Mong
He Bingsheng
Huynh Huynh Phung
Huynh Richard
Liang Yun
Lu Mian
Ong Zhongliang
Zhang Lei
Publication venue
Publication date: 01/01/2013
Field of study

With the ease-of-programming, flexibility and yet efficiency, MapReduce has become one of the most popular frameworks for building big-data applications. MapReduce was originally designed for distributed-computing, and has been extended to various architectures, e,g, multi-core CPUs, GPUs and FPGAs. In this work, we focus on optimizing the MapReduce framework on Xeon Phi, which is the latest product released by Intel based on the Many Integrated Core Architecture. To the best of our knowledge, this is the first work to optimize the MapReduce framework on the Xeon Phi. In our work, we utilize advanced features of the Xeon Phi to achieve high performance. In order to take advantage of the SIMD vector processing units, we propose a vectorization friendly technique for the map phase to assist the auto-vectorization as well as develop SIMD hash computation algorithms. Furthermore, we utilize MIMD hyper-threading to pipeline the map and reduce to improve the resource utilization. We also eliminate multiple local arrays but use low cost atomic operations on the global array for some applications, which can improve the thread scalability and data locality due to the coherent L2 caches. Finally, for a given application, our framework can either automatically detect suitable techniques to apply or provide guideline for users at compilation time. We conduct comprehensive experiments to benchmark the Xeon Phi and compare our optimized MapReduce framework with a state-of-the-art multi-core based MapReduce framework (Phoenix++). By evaluating six real-world applications, the experimental results show that our optimized framework is 1.2X to 38X faster than Phoenix++ for various applications on the Xeon Phi

arXiv.org e-Print Archive

Crossref

Characterization and Acceleration of High Performance Compute Workloads

Author: Corda Stefano
Publication venue: Eindhoven University of Technology
Publication date: 19/12/2022
Field of study

Pure OAI Repository

Characterization and Acceleration of High Performance Compute Workloads

Author: Corda Stefano
Publication venue: Eindhoven University of Technology
Publication date: 19/12/2022
Field of study

Pure OAI Repository