19,913 research outputs found
Scalable data abstractions for distributed parallel computations
The ability to express a program as a hierarchical composition of parts is an
essential tool in managing the complexity of software and a key abstraction
this provides is to separate the representation of data from the computation.
Many current parallel programming models use a shared memory model to provide
data abstraction but this doesn't scale well with large numbers of cores due to
non-determinism and access latency. This paper proposes a simple programming
model that allows scalable parallel programs to be expressed with distributed
representations of data and it provides the programmer with the flexibility to
employ shared or distributed styles of data-parallelism where applicable. It is
capable of an efficient implementation, and with the provision of a small set
of primitive capabilities in the hardware, it can be compiled to operate
directly on the hardware, in the same way stack-based allocation operates for
subroutines in sequential machines
Decentralized Delay Optimal Control for Interference Networks with Limited Renewable Energy Storage
In this paper, we consider delay minimization for interference networks with
renewable energy source, where the transmission power of a node comes from both
the conventional utility power (AC power) and the renewable energy source. We
assume the transmission power of each node is a function of the local channel
state, local data queue state and local energy queue state only. In turn, we
consider two delay optimization formulations, namely the decentralized
partially observable Markov decision process (DEC-POMDP) and Non-cooperative
partially observable stochastic game (POSG). In DEC-POMDP formulation, we
derive a decentralized online learning algorithm to determine the control
actions and Lagrangian multipliers (LMs) simultaneously, based on the policy
gradient approach. Under some mild technical conditions, the proposed
decentralized policy gradient algorithm converges almost surely to a local
optimal solution. On the other hand, in the non-cooperative POSG formulation,
the transmitter nodes are non-cooperative. We extend the decentralized policy
gradient solution and establish the technical proof for almost-sure convergence
of the learning algorithms. In both cases, the solutions are very robust to
model variations. Finally, the delay performance of the proposed solutions are
compared with conventional baseline schemes for interference networks and it is
illustrated that substantial delay performance gain and energy savings can be
achieved
Scalability of broadcast performance in wireless network-on-chip
Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version
Distributive Stochastic Learning for Delay-Optimal OFDMA Power and Subband Allocation
In this paper, we consider the distributive queue-aware power and subband
allocation design for a delay-optimal OFDMA uplink system with one base
station, users and independent subbands. Each mobile has an uplink
queue with heterogeneous packet arrivals and delay requirements. We model the
problem as an infinite horizon average reward Markov Decision Problem (MDP)
where the control actions are functions of the instantaneous Channel State
Information (CSI) as well as the joint Queue State Information (QSI). To
address the distributive requirement and the issue of exponential memory
requirement and computational complexity, we approximate the subband allocation
Q-factor by the sum of the per-user subband allocation Q-factor and derive a
distributive online stochastic learning algorithm to estimate the per-user
Q-factor and the Lagrange multipliers (LM) simultaneously and determine the
control actions using an auction mechanism. We show that under the proposed
auction mechanism, the distributive online learning converges almost surely
(with probability 1). For illustration, we apply the proposed distributive
stochastic learning framework to an application example with exponential packet
size distribution. We show that the delay-optimal power control has the {\em
multi-level water-filling} structure where the CSI determines the instantaneous
power allocation and the QSI determines the water-level. The proposed algorithm
has linear signaling overhead and computational complexity ,
which is desirable from an implementation perspective.Comment: To appear in Transactions on Signal Processin
Optimal Distributed Scheduling in Wireless Networks under the SINR interference model
Radio resource sharing mechanisms are key to ensuring good performance in
wireless networks. In their seminal paper \cite{tassiulas1}, Tassiulas and
Ephremides introduced the Maximum Weighted Scheduling algorithm, and proved its
throughput-optimality. Since then, there have been extensive research efforts
to devise distributed implementations of this algorithm. Recently, distributed
adaptive CSMA scheduling schemes \cite{jiang08} have been proposed and shown to
be optimal, without the need of message passing among transmitters. However
their analysis relies on the assumption that interference can be accurately
modelled by a simple interference graph. In this paper, we consider the more
realistic and challenging SINR interference model. We present {\it the first
distributed scheduling algorithms that (i) are optimal under the SINR
interference model, and (ii) that do not require any message passing}. They are
based on a combination of a simple and efficient power allocation strategy
referred to as {\it Power Packing} and randomization techniques. We first
devise algorithms that are rate-optimal in the sense that they perform as well
as the best centralized scheduling schemes in scenarios where each transmitter
is aware of the rate at which it should send packets to the corresponding
receiver. We then extend these algorithms so that they reach
throughput-optimality
EbbRT: a customizable operating system for cloud applications
Efficient use of hardware requires operating system components be customized to the application workload. Our general purpose operating systems are ill-suited for this task. We present Genesis, a new operating system that enables per-application customizations for cloud applications. Genesis achieves this through a novel heterogeneous distributed structure, a partitioned object model, and an event-driven execution environment. This paper describes the design and prototype implementation of Genesis, and evaluates its ability to improve the performance of common cloud applications. The evaluation of the Genesis prototype demonstrates memcached, run within a VM, can outperform memcached run on an unvirtualized Linux. The prototype evaluation also demonstrates an 14% performance improvement of a V8 JavaScript engine benchmark, and a node.js webserver that achieves a 50% reduction in 99th percentile latency compared to it run on Linux
- …