18,550 research outputs found
DART-MPI: An MPI-based Implementation of a PGAS Runtime System
A Partitioned Global Address Space (PGAS) approach treats a distributed
system as if the memory were shared on a global level. Given such a global view
on memory, the user may program applications very much like shared memory
systems. This greatly simplifies the tasks of developing parallel applications,
because no explicit communication has to be specified in the program for data
exchange between different computing nodes. In this paper we present DART, a
runtime environment, which implements the PGAS paradigm on large-scale
high-performance computing clusters. A specific feature of our implementation
is the use of one-sided communication of the Message Passing Interface (MPI)
version 3 (i.e. MPI-3) as the underlying communication substrate. We evaluated
the performance of the implementation with several low-level kernels in order
to determine overheads and limitations in comparison to the underlying MPI-3.Comment: 11 pages, International Conference on Partitioned Global Address
Space Programming Models (PGAS14
Macroservers: An Execution Model for DRAM Processor-In-Memory Arrays
The emergence of semiconductor fabrication technology allowing a tight coupling between high-density DRAM and CMOS logic on the same chip has led to the important new class of Processor-In-Memory (PIM) architectures. Newer developments provide powerful parallel processing capabilities on the chip, exploiting the facility to load wide words in single memory accesses and supporting complex address manipulations in the memory. Furthermore, large arrays of PIMs can be arranged into a massively parallel architecture. In this report, we describe an object-based programming model based on the notion of a macroserver. Macroservers encapsulate a set of variables and methods; threads, spawned by the activation of methods, operate asynchronously on the variables' state space. Data distributions provide a mechanism for mapping large data structures across the memory region of a macroserver, while work distributions allow explicit control of bindings between threads and data. Both data and work distributuions are first-class objects of the model, supporting the dynamic management of data and threads in memory. This offers the flexibility required for fully exploiting the processing power and memory bandwidth of a PIM array, in particular for irregular and adaptive applications. Thread synchronization is based on atomic methods, condition variables, and futures. A special type of lightweight macroserver allows the formulation of flexible scheduling strategies for the access to resources, using a monitor-like mechanism
EbbRT: a framework for building per-application library operating systems
Efficient use of high speed hardware requires operating system components be customized to the application work- load. Our general purpose operating systems are ill-suited for this task. We present EbbRT, a framework for constructing per-application library operating systems for cloud applications. The primary objective of EbbRT is to enable high-performance in a tractable and maintainable fashion. This paper describes the design and implementation of EbbRT, and evaluates its ability to improve the performance of common cloud applications. The evaluation of the EbbRT prototype demonstrates memcached, run within a VM, can outperform memcached run on an unvirtualized Linux. The prototype evaluation also demonstrates an 14% performance improvement of a V8 JavaScript engine benchmark, and a node.js webserver that achieves a 50% reduction in 99th percentile latency compared to it run on Linux
EbbRT: a customizable operating system for cloud applications
Efficient use of hardware requires operating system components be customized to the application workload. Our general purpose operating systems are ill-suited for this task. We present Genesis, a new operating system that enables per-application customizations for cloud applications. Genesis achieves this through a novel heterogeneous distributed structure, a partitioned object model, and an event-driven execution environment. This paper describes the design and prototype implementation of Genesis, and evaluates its ability to improve the performance of common cloud applications. The evaluation of the Genesis prototype demonstrates memcached, run within a VM, can outperform memcached run on an unvirtualized Linux. The prototype evaluation also demonstrates an 14% performance improvement of a V8 JavaScript engine benchmark, and a node.js webserver that achieves a 50% reduction in 99th percentile latency compared to it run on Linux
Temporal regularity effects on pre-attentive and attentive processing of deviance
Temporal regularity allows predicting the temporal locus of future information thereby potentially facilitating cognitive processing. We applied event-related brain potentials (ERPs) to investigate how temporal regularity impacts pre-attentive and attentive processing of deviance in the auditory modality. Participants listened to sequences of sinusoidal tones differing exclusively in pitch. The inter-stimulus interval (ISI) in these sequences was manipulated to convey either isochronous or random temporal structure. In the pre-attentive session, deviance processing was unaffected by the regularity manipulation as evidenced in three event-related-potentials (ERPs): mismatch negativity (MMN), P3a, and reorienting negativity (RON). In the attentive session, the P3b was smaller for deviant tones embedded in irregular temporal structure, while the N2b component remained unaffected. These findings confirm that temporal regularity can reinforce cognitive mechanisms associated with the attentive processing of deviance. Furthermore, they provide evidence for the dynamic allocation of attention in time and dissociable pre-attentive and attention-dependent temporal processing mechanisms
AirSync: Enabling Distributed Multiuser MIMO with Full Spatial Multiplexing
The enormous success of advanced wireless devices is pushing the demand for
higher wireless data rates. Denser spectrum reuse through the deployment of
more access points per square mile has the potential to successfully meet the
increasing demand for more bandwidth. In theory, the best approach to density
increase is via distributed multiuser MIMO, where several access points are
connected to a central server and operate as a large distributed multi-antenna
access point, ensuring that all transmitted signal power serves the purpose of
data transmission, rather than creating "interference." In practice, while
enterprise networks offer a natural setup in which distributed MIMO might be
possible, there are serious implementation difficulties, the primary one being
the need to eliminate phase and timing offsets between the jointly coordinated
access points.
In this paper we propose AirSync, a novel scheme which provides not only time
but also phase synchronization, thus enabling distributed MIMO with full
spatial multiplexing gains. AirSync locks the phase of all access points using
a common reference broadcasted over the air in conjunction with a Kalman filter
which closely tracks the phase drift. We have implemented AirSync as a digital
circuit in the FPGA of the WARP radio platform. Our experimental testbed,
comprised of two access points and two clients, shows that AirSync is able to
achieve phase synchronization within a few degrees, and allows the system to
nearly achieve the theoretical optimal multiplexing gain. We also discuss MAC
and higher layer aspects of a practical deployment. To the best of our
knowledge, AirSync offers the first ever realization of the full multiuser MIMO
gain, namely the ability to increase the number of wireless clients linearly
with the number of jointly coordinated access points, without reducing the per
client rate.Comment: Submitted to Transactions on Networkin
- …