658,301 research outputs found
Performance Aware Design of Communication Systems
In this paper we present a methodology for a perfor-mance aware design of communication systems including protocols and devices. The goal is to evaluate the perfor-mance already in early stages of the design process to avoid costly re-designs of bottlenecks in finished products. From protocol specifications given as sequence diagrams, we de-rive multiclass queueing networks as a means to estimate the performance of the system architecture before any exe-cutable prototype exists. This makes queueing theory acces-sible to system and protocol designers even if the designer is not familiar with details of queueing network theory. The methodology supports an easy evaluation of design alterna-tives without affecting the functional model. To achieve this, it incorporates the performance information into the system model in a non-invasive fashion, such that the system model remains meaningful without this information.
Model-Free Design of Control Systems over Wireless Fading Channels
Wireless control systems replace traditional wired communication with
wireless networks to exchange information between actuators, plants and sensors
in a control system. The noise in wireless channels renders ideal control
policies suboptimal, and their performance is moreover directly dependent on
the way in which wireless resources are allocated between control loops. Proper
design of the control policy and the resource allocation policy based on both
plant states and wireless fading states is then critical to achieve good
performance. The resulting problem of co-designing control-aware resource
allocation policies and communication-aware controllers, however, is
challenging due to its infinite dimensionality, existence of system constraints
and need for explicit knowledge of the plants and wireless network models. To
overcome those challenges, we rely on constrained reinforcement learning
algorithms to propose a model-free approach to the design of wireless control
systems. We demonstrate the near optimality of control system performance and
stability using near-universal policy parametrizations and present a practical
model-free algorithm to learn the co-design policy. Numerical experiments show
the strong performance of learned policies over baseline solutions.Comment: Submitted to IEEE TS
Modeling and optimization of high-performance many-core systems for energy-efficient and reliable computing
Thesis (Ph.D.)--Boston UniversityMany-core systems, ranging from small-scale many-core processors to large-scale high performance computing (HPC) data centers, have become the main trend in computing system design owing to their potential to deliver higher throughput per watt. However, power densities and temperatures increase following the growth in the performance capacity, and bring major challenges in energy efficiency, cooling costs, and reliability. These challenges require a joint assessment of performance, power, and temperature tradeoffs as well as the design of runtime optimization techniques that monitor and manage the interplay among them. This thesis proposes novel modeling and runtime management techniques that evaluate and optimize the performance, energy, and reliability of many-core systems.
We first address the energy and thermal challenges in 3D-stacked many-core processors. 3D processors with stacked DRAM have the potential to dramatically improve performance owing to lower memory access latency and higher bandwidth. However, the performance increase may cause 3D systems to exceed the power budgets or create thermal hot spots. In order to provide an accurate analysis and enable the design of efficient management policies, this thesis introduces a simulation framework to jointly analyze performance, power, and temperature for 3D systems. We then propose a runtime optimization policy that maximizes the system performance by characterizing the application behavior and predicting the operating points that satisfy the power and thermal constraints. Our policy reduces the energy-delay product (EDP) by up to 61.9% compared to existing strategies.
Performance, cooling energy, and reliability are also critical aspects in HPC data centers. In addition to causing reliability degradation, high temperatures increase the required cooling energy. Communication cost, on the other hand, has a significant impact on system performance in HPC data centers. This thesis proposes a topology-aware technique that maximizes system reliability by selecting between workload clustering and balancing. Our policy improves the system reliability by up to 123.3% compared to existing temperature balancing approaches. We also introduce a job allocation methodology to simultaneously optimize the communication cost and the cooling energy in a data center. Our policy reduces the cooling cost by 40% compared to cooling-aware and performance-aware policies, while achieving comparable performance to performance-aware policy
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?
Dense Multi-GPU systems have recently gained a lot of attention in the HPC
arena. Traditionally, MPI runtimes have been primarily designed for clusters
with a large number of nodes. However, with the advent of MPI+CUDA applications
and CUDA-Aware MPI runtimes like MVAPICH2 and OpenMPI, it has become important
to address efficient communication schemes for such dense Multi-GPU nodes. This
coupled with new application workloads brought forward by Deep Learning
frameworks like Caffe and Microsoft CNTK pose additional design constraints due
to very large message communication of GPU buffers during the training phase.
In this context, special-purpose libraries like NVIDIA NCCL have been proposed
for GPU-based collective communication on dense GPU systems. In this paper, we
propose a pipelined chain (ring) design for the MPI_Bcast collective operation
along with an enhanced collective tuning framework in MVAPICH2-GDR that enables
efficient intra-/inter-node multi-GPU communication. We present an in-depth
performance landscape for the proposed MPI_Bcast schemes along with a
comparative analysis of NVIDIA NCCL Broadcast and NCCL-based MPI_Bcast. The
proposed designs for MVAPICH2-GDR enable up to 14X and 16.6X improvement,
compared to NCCL-based solutions, for intra- and inter-node broadcast latency,
respectively. In addition, the proposed designs provide up to 7% improvement
over NCCL-based solutions for data parallel training of the VGG network on 128
GPUs using Microsoft CNTK.Comment: 8 pages, 3 figure
Demand-Aware Network Design with Steiner Nodes and a Connection to Virtual Network Embedding
Emerging optical and virtualization technologies enable the design of more
flexible and demand-aware networked systems, in which resources can be
optimized toward the actual workload they serve. For example, in a demand-aware
datacenter network, frequently communicating nodes (e.g., two virtual machines
or a pair of racks in a datacenter) can be placed topologically closer,
reducing communication costs and hence improving the overall network
performance.
This paper revisits the bounded-degree network design problem underlying such
demand-aware networks. Namely, given a distribution over communicating server
pairs, we want to design a network with bounded maximum degree that minimizes
expected communication distance. In addition to this known problem, we
introduce and study a variant where we allow Steiner nodes (i.e., additional
routers) to be added to augment the network.
We improve the understanding of this problem domain in several ways. First,
we shed light on the complexity and hardness of the aforementioned problems,
and study a connection between them and the virtual networking embedding
problem. We then provide a constant-factor approximation algorithm for the
Steiner node version of the problem, and use it to improve over prior
state-of-the-art algorithms for the original version of the problem with sparse
communication distributions. Finally, we investigate various heuristic
approaches to bounded-degree network design problem, in particular providing a
reliable heuristic algorithm with good experimental performance.
We report on an extensive empirical evaluation, using several real-world
traffic traces from datacenters, and find that our approach results in improved
demand-aware network designs
Control Aware Radio Resource Allocation in Low Latency Wireless Control Systems
We consider the problem of allocating radio resources over wireless
communication links to control a series of independent wireless control
systems. Low-latency transmissions are necessary in enabling time-sensitive
control systems to operate over wireless links with high reliability. Achieving
fast data rates over wireless links thus comes at the cost of reliability in
the form of high packet error rates compared to wired links due to channel
noise and interference. However, the effect of the communication link errors on
the control system performance depends dynamically on the control system state.
We propose a novel control-communication co-design approach to the low-latency
resource allocation problem. We incorporate control and channel state
information to make scheduling decisions over time on frequency, bandwidth and
data rates across the next-generation Wi-Fi based wireless communication links
that close the control loops. Control systems that are closer to instability or
further from a desired range in a given control cycle are given higher packet
delivery rate targets to meet. Rather than a simple priority ranking, we derive
precise packet error rate targets for each system needed to satisfy stability
targets and make scheduling decisions to meet such targets while reducing total
transmission time. The resulting Control-Aware Low Latency Scheduling (CALLS)
method is tested in numerous simulation experiments that demonstrate its
effectiveness in meeting control-based goals under tight latency constraints
relative to control-agnostic scheduling
Quarc: an architecture for efficient on-chip communication
The exponential downscaling of the feature size has enforced a paradigm shift from computation-based design to communication-based design in system on chip development. Buses, the traditional communication architecture in systems on chip, are incapable of addressing the increasing bandwidth requirements of future large systems.
Networks on chip have emerged as an interconnection architecture offering unique solutions to the technological and design issues related to communication in future systems on chip. The transition from buses as a shared medium to networks on chip as a segmented medium has given rise to new challenges in system on chip realm.
By leveraging the shared nature of the communication medium, buses have been highly efficient in delivering multicast communication. The segmented nature of networks, however, inhibits the multicast messages to be delivered as efficiently by networks on chip. Relying on extensive research on multicast communication in parallel computers, several network on chip architectures have offered mechanisms to perform the operation, while conforming to resource constraints of the network on chip paradigm. Multicast communication in majority of these networks on chip is implemented by establishing a connection between source and all multicast destinations before the message transmission
commences. Establishing the connections incurs an overhead and, therefore, is not desirable; in particular in latency sensitive services such as cache coherence.
To address high performance multicast communication, this research presents Quarc, a novel network on chip architecture. The Quarc architecture targets an area-efficient, low power, high performance implementation. The thesis covers a detailed representation of
the building blocks of the architecture, including topology, router and network interface.
The cost and performance comparison of the Quarc architecture against other network on chip architectures reveals that the Quarc architecture is a highly efficient architecture.
Moreover, the thesis introduces novel performance models of complex traffic patterns, including multicast and quality of service-aware communication
- …