Search CORE

667 research outputs found

A Transparent Runtime Data Distribution Engine for OpenMP

Author: Ayguade E.
Labarta J.
Nikolopoulos Dimitrios
Papatheodorou T.S.
Polychronopoulos C.D.
Publication venue
Publication date: 01/12/2000
Field of study

Queen's University Belfast Research Portal

Programming a Distributed System Using Shared Objects

Author: Bal H.E.
Kaashoek M.F.
Tanenbaum A.S.
Publication venue
Publication date: 01/01/1993
Field of study

Building the hardware for a high-performance distributed computer system is a lot easier than building its software. The authors describe a model for programming distributed systems based on abstract data types that can be replicated on all machines that need them. Read operations are done locally, without requiring network traffic. Writes can be done using a reliable broadcast algorithm if the hardware supports broadcasting; otherwise, a point-to-point protocol is used. The authors have built such a system based on the Amoeba microkernel, and implemented a language, Orca, on top of it. For Orca applications that have a high ratio of reads to writes, they measure good speedups on a system with 16 processors

VU Research Portal

MAi: Memory Affinity Interface

Author: Méhaut Jean-François
Pousa Ribeiro Christiane
Publication venue: HAL CCSD
Publication date: 01/01/2010
Field of study

In this document, we describe an interface called MAI. This interface allows developers to manage memory affinity in NUMA architectures. The affinity unit in MAI is an array of the parallel application. A set of memory policies implemented in MAI can be applied to these arrays in a simple way. High-level functions implemented in MAI minimize developers work when managing memory affinity in NUMA machines. MAI's performance has been evaluating on two different NUMA machines using some parallel applications. Results obtained with MAI present important gains when compared with the standard memory affinity solutions

INRIA a CCSD electronic archive server

Minas: Memory Affinity Management Framework

Author: Méhaut Jean-François
Pousa Ribeiro Christiane
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

In this document, we introduce Minas, a memory affinity management framework for cache-coherent NUMA Non-Uniform Memory Access) platforms, which provides either explicit memory affinity management or automatic one with efficiency and architecture abstraction for numerical scientic applications. The explicit tuning is based on an API called MAi (Memory Affinity interface) which provides simple functions to manage allocation and data placement using an extensive set of memory policies. An automatic tuning mechanism is provided by the preprocessor named MApp (Memory Anity preprocessor). MApp analyses both the application source code and the target cache-coherent NUMA platform characteristics in order to automatically apply MAi functions at compile time. Minas efficiency and architecture abstraction have been evaluated on two cache-coherent NUMA platforms using three numerical scientic HPC applications. The results have shown signicant gains when compared to other solutions available on Linux (First-touch, libnuma and numactl)

INRIA a CCSD electronic archive server

Hybrid Caching for Chip Multiprocessors Using Compiler-Based Data Classification

Author: Li Yong
Publication venue
Publication date: 26/01/2011
Field of study

The high performance delivered by modern computer system keeps scaling with an increasingnumber of processors connected using distributed network on-chip. As a result, memory accesslatency, largely dominated by remote data cache access and inter-processor communication, is becoming a critical performance bottleneck. To release this problem, it is necessary to localize data access as much as possible while keep efficient on-chip cache memory utilization. Achieving this however, is application dependent and needs a keen insight into the memory access characteristics of the applications. This thesis demonstrates how using fairly simple thus inexpensive compiler analysis memory accesses can be classified into private data access and shared data access. In addition, we introduce a third classification named probably private access and demonstrate the impact of this category compared to traditional private and shared memory classification. The memory access classification information from the compiler analysis is then provided to the runtime system through a modified memory allocator and page table to facilitate a hybrid private-shared caching technique. The hybrid cache mechanism is aware of different data access classification and adopts appropriate placement and search policies accordingly to improve performance. Our analysis demonstrates that many applications have a significant amount of both private and shared data and that compiler analysis can identify the private data effectively for many applications. Experimentsresults show that the implemented hybrid caching scheme achieves 4.03% performance improvement over state of the art NUCA-base caching

D-Scholarship@Pitt

Multicore Architecture-aware Scientific Applications

Author: Srinivasa Avinash
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2011
Field of study

Modern high performance systems are becoming increasingly complex and powerful due to advancements in processor and memory architecture. In order to keep up with this increasing complexity, applications have to be augmented with certain capabilities to fully exploit such systems. These may be at the application level, such as static or dynamic adaptations or at the system level, like having strategies in place to override some of the default operating system polices, the main objective being to improve computational performance of the application. The current work proposes two such capabilites with respect to multi-threaded scientific applications, in particular a large scale physics application computing ab-initio nuclear structure. The first involves using a middleware tool to invoke dynamic adaptations in the application, so as to be able to adjust to the changing computational resource availability at run-time. The second involves a strategy for effective placement of data in main memory, to optimize memory access latencies and bandwidth. These capabilties when included were found to have a significant impact on the application performance, resulting in average speedups of as much as two to four times

Digital Repository @ Iowa State University (ISU)

UNT Digital Library