Search CORE

42 research outputs found

OpenCL + OpenSHMEM Hybrid Programming Model for the Adapteva Epiphany Architecture

Author: D Wentzlaff
J Ross
JE Stone
M Baker
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/08/2016
Field of study

There is interest in exploring hybrid OpenSHMEM + X programming models to extend the applicability of the OpenSHMEM interface to more hardware architectures. We present a hybrid OpenCL + OpenSHMEM programming model for device-level programming for architectures like the Adapteva Epiphany many-core RISC array processor. The Epiphany architecture comprises a 2D array of low-power RISC cores with minimal uncore functionality connected by a 2D mesh Network-on-Chip (NoC). The Epiphany architecture offers high computational energy efficiency for integer and floating point calculations as well as parallel scalability. The Epiphany-III is available as a coprocessor in platforms that also utilize an ARM CPU host. OpenCL provides good functionality for supporting a co-design programming model in which the host CPU offloads parallel work to a coprocessor. However, the OpenCL memory model is inconsistent with the Epiphany memory architecture and lacks support for inter-core communication. We propose a hybrid programming model in which OpenSHMEM provides a better solution by replacing the non-standard OpenCL extensions introduced to achieve high performance with the Epiphany architecture. We demonstrate the proposed programming model for matrix-matrix multiplication based on Cannon's algorithm showing that the hybrid model addresses the deficiencies of using OpenCL alone to achieve good benchmark performance.Comment: 12 pages, 5 figures, OpenSHMEM 2016: Third workshop on OpenSHMEM and Related Technologie

arXiv.org e-Print Archive

Crossref

An integrated general practice and pharmacy-based intervention to promote the use of appropriate preventive medications among individuals at high cardiovascular disease risk: protocol for a cluster randomized controlled trial

Background: Cardiovascular diseases (CVD) are responsible for significant morbidity, premature mortality, and economic burden. Despite established evidence that supports the use of preventive medications among patients at high CVD risk, treatment gaps remain. Building on prior evidence and a theoretical framework, a complex intervention has been designed to address these gaps among high-risk, under-treated patients in the Australian primary care setting. This intervention comprises a general practice quality improvement tool incorporating clinical decision support and audit/feedback capabilities; availability of a range of CVD polypills (fixed-dose combinations of two blood pressure lowering agents, a statin ± aspirin) for prescription when appropriate; and access to a pharmacy-based program to support long-term medication adherence and lifestyle modification. Methods: Following a systematic development process, the intervention will be evaluated in a pragmatic cluster randomized controlled trial including 70 general practices for a median period of 18 months. The 35 general practices in the intervention group will work with a nominated partner pharmacy, whereas those in the control group will provide usual care without access to the intervention tools. The primary outcome is the proportion of patients at high CVD risk who were inadequately treated at baseline who achieve target blood pressure (BP) and low-density lipoprotein cholesterol (LDL-C) levels at the study end. The outcomes will be analyzed using data from electronic medical records, utilizing a validated extraction tool. Detailed process and economic evaluations will also be performed. Discussion: The study intends to establish evidence about an intervention that combines technological innovation with team collaboration between patients, pharmacists, and general practitioners (GPs) for CVD prevention. Trial registration: Australian New Zealand Clinical Trials Registry ACTRN1261600023342

Crossref

Springer - Publisher Connector

ResearchOnline@ND (University of Notre Dame)

OPUS - University of Technology Sydney

PubMed Central

UNSWorks

Spiral - Imperial College Digital Repository

Sydney eScholarship

espace@Curtin

Monash University Research Portal

Cache-Integrated Network Interfaces: Flexible On-Chip Communication and Synchronization for Large-Scale CMPs

Author: D. Lenoski
D. Wentzlaff
Dimitrios S. Nikolopoulos
H. Shan
I. Schoinas
J. Leverich
J.A. Kahle
J.M. Mellor-Crummey
K. Gharachorloo
M. Wen
M.M.K. Martin
Manolis Katevenis
Michail Zampetakis
P.S. Magnusson
S.L. Scott
S.P. Amarasinghe
S.W. Keckler
Stamatis Kavadias
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Queen's University Belfast Research Portal

Crossref

Springer - Publisher Connector

Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models

Author: A. Peymandoust
Alastair R. Beresford
Andreas Gal Albert Noll
Bram Adams
Bratin Saha
Carl Hewitt
Charles Antony Richard Hoare
Charles R. Johns
Chen-Yong Cher
Colin Blundell
David Ungar
David Wentzlaff
Doug Lea
ECMA International
Edward A. Lee
freescale semiconductor
Georg Sorst
Gul Agha
Hans Schippers
Haris Volos
Intel Corporation
James Gosling
Jim Gray
John A. Trono
John S. Danaher
John Zigman
Jos'e M. Piquer
Kevin Casey
Kevin Williams
Larry Seiler
Lukasz Ziarek
M. Anton Ertl
Mark S. Miller
Maurice Herlihy
Michael Haupt
Michael R. Marty
Nir Shavit
Pascal Costanza
Philipp Haller
Rajesh K. Karmani
Robert D. Blumofe
Robert Virding
Simon Gay
Sriram Srinivasan
Stefan Marr
Stefan Marr
Stijn Timbermont
Theo D'Hondt
Thomas Kistler
Tom Van Cutsem
Uwe Kastens
Vijay A. Saraswat
Virendra J. Marathe
Wenzhang Zhu
Wolfgang De Meuter
Xu Wang
Yaoqing Gao
Publication venue: 'Open Publishing Association'
Publication date: 01/02/2010
Field of study

The upcoming many-core architectures require software developers to exploit concurrency to utilize available computational power. Today's high-level language virtual machines (VMs), which are a cornerstone of software development, do not provide sufficient abstraction for concurrency concepts. We analyze concrete and abstract concurrency models and identify the challenges they impose for VMs. To provide sufficient concurrency support in VMs, we propose to integrate concurrency operations into VM instruction sets. Since there will always be VMs optimized for special purposes, our goal is to develop a methodology to design instruction sets with concurrency support. Therefore, we also propose a list of trade-offs that have to be investigated to advise the design of such instruction sets. As a first experiment, we implemented one instruction set extension for shared memory and one for non-shared memory concurrency. From our experimental results, we derived a list of requirements for a full-grown experimental environment for further research

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Kent Academic Repository

Tiny but Mighty: Designing and Realizing Scalable Latency Tolerance for Manycore SoCs

Author: Aragón J.L.
Balkind J.
Gao F.
Manocha A.
Martonosi M.
Orenes-Vera M.
Wentzlaff D.
Publication venue: ACM and IEEE
Publication date: 18/06/2022
Field of study

Modern computing systems employ significant heterogeneity and specialization to meet performance targets at manageable power. However, memory latency bottlenecks remain problematic, particularly for sparse neural network and graph analytic applications where indirect memory accesses (IMAs) challenge the memory hierarchy. Decades of prior art have proposed hardware and software mechanisms to mitigate IMA latency, but they fail to analyze real-chip considerations, especially when used in SoCs and manycores. In this paper, we revisit many of these techniques while taking into account manycore integration and verification. We present the first system implementation of latency tolerance hardware that provides significant speedups without requiring any memory hierarchy or processor tile modifications. This is achieved through a Memory Access Parallel-Load Engine (MAPLE), integrated through the Network-on-Chip (NoC) in a scalable manner. Our hardware-software co-design allows programs to perform long-latency memory accesses asynchronously from the core, avoiding pipeline stalls, and enabling greater memory parallelism (MLP). In April 2021 we taped out a manycore chip that includes tens of MAPLE instances for efficient data supply. MAPLE demonstrates a full RTL implementation of out-of-core latency-mitigation hardware, with virtual memory support and automated compilation targetting it. This paper evaluates MAPLE integrated with a dual-core FPGA prototype running applications with full SMP Linux, and demonstrates geomean speedups of 2.35× and 2.27× over software-based prefetching and decoupling, respectively. Compared to state-of-the-art hardware, it provides geomean speedups of 1.82× and 1.72× over prefetching and decoupling techniques

DIGITUM Universidad de Murcia (España)