658,301 research outputs found

    Performance Aware Design of Communication Systems

    Full text link
    In this paper we present a methodology for a perfor-mance aware design of communication systems including protocols and devices. The goal is to evaluate the perfor-mance already in early stages of the design process to avoid costly re-designs of bottlenecks in finished products. From protocol specifications given as sequence diagrams, we de-rive multiclass queueing networks as a means to estimate the performance of the system architecture before any exe-cutable prototype exists. This makes queueing theory acces-sible to system and protocol designers even if the designer is not familiar with details of queueing network theory. The methodology supports an easy evaluation of design alterna-tives without affecting the functional model. To achieve this, it incorporates the performance information into the system model in a non-invasive fashion, such that the system model remains meaningful without this information.

    Model-Free Design of Control Systems over Wireless Fading Channels

    Full text link
    Wireless control systems replace traditional wired communication with wireless networks to exchange information between actuators, plants and sensors in a control system. The noise in wireless channels renders ideal control policies suboptimal, and their performance is moreover directly dependent on the way in which wireless resources are allocated between control loops. Proper design of the control policy and the resource allocation policy based on both plant states and wireless fading states is then critical to achieve good performance. The resulting problem of co-designing control-aware resource allocation policies and communication-aware controllers, however, is challenging due to its infinite dimensionality, existence of system constraints and need for explicit knowledge of the plants and wireless network models. To overcome those challenges, we rely on constrained reinforcement learning algorithms to propose a model-free approach to the design of wireless control systems. We demonstrate the near optimality of control system performance and stability using near-universal policy parametrizations and present a practical model-free algorithm to learn the co-design policy. Numerical experiments show the strong performance of learned policies over baseline solutions.Comment: Submitted to IEEE TS

    Modeling and optimization of high-performance many-core systems for energy-efficient and reliable computing

    Full text link
    Thesis (Ph.D.)--Boston UniversityMany-core systems, ranging from small-scale many-core processors to large-scale high performance computing (HPC) data centers, have become the main trend in computing system design owing to their potential to deliver higher throughput per watt. However, power densities and temperatures increase following the growth in the performance capacity, and bring major challenges in energy efficiency, cooling costs, and reliability. These challenges require a joint assessment of performance, power, and temperature tradeoffs as well as the design of runtime optimization techniques that monitor and manage the interplay among them. This thesis proposes novel modeling and runtime management techniques that evaluate and optimize the performance, energy, and reliability of many-core systems. We first address the energy and thermal challenges in 3D-stacked many-core processors. 3D processors with stacked DRAM have the potential to dramatically improve performance owing to lower memory access latency and higher bandwidth. However, the performance increase may cause 3D systems to exceed the power budgets or create thermal hot spots. In order to provide an accurate analysis and enable the design of efficient management policies, this thesis introduces a simulation framework to jointly analyze performance, power, and temperature for 3D systems. We then propose a runtime optimization policy that maximizes the system performance by characterizing the application behavior and predicting the operating points that satisfy the power and thermal constraints. Our policy reduces the energy-delay product (EDP) by up to 61.9% compared to existing strategies. Performance, cooling energy, and reliability are also critical aspects in HPC data centers. In addition to causing reliability degradation, high temperatures increase the required cooling energy. Communication cost, on the other hand, has a significant impact on system performance in HPC data centers. This thesis proposes a topology-aware technique that maximizes system reliability by selecting between workload clustering and balancing. Our policy improves the system reliability by up to 123.3% compared to existing temperature balancing approaches. We also introduce a job allocation methodology to simultaneously optimize the communication cost and the cooling energy in a data center. Our policy reduces the cooling cost by 40% compared to cooling-aware and performance-aware policies, while achieving comparable performance to performance-aware policy

    Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?

    Full text link
    Dense Multi-GPU systems have recently gained a lot of attention in the HPC arena. Traditionally, MPI runtimes have been primarily designed for clusters with a large number of nodes. However, with the advent of MPI+CUDA applications and CUDA-Aware MPI runtimes like MVAPICH2 and OpenMPI, it has become important to address efficient communication schemes for such dense Multi-GPU nodes. This coupled with new application workloads brought forward by Deep Learning frameworks like Caffe and Microsoft CNTK pose additional design constraints due to very large message communication of GPU buffers during the training phase. In this context, special-purpose libraries like NVIDIA NCCL have been proposed for GPU-based collective communication on dense GPU systems. In this paper, we propose a pipelined chain (ring) design for the MPI_Bcast collective operation along with an enhanced collective tuning framework in MVAPICH2-GDR that enables efficient intra-/inter-node multi-GPU communication. We present an in-depth performance landscape for the proposed MPI_Bcast schemes along with a comparative analysis of NVIDIA NCCL Broadcast and NCCL-based MPI_Bcast. The proposed designs for MVAPICH2-GDR enable up to 14X and 16.6X improvement, compared to NCCL-based solutions, for intra- and inter-node broadcast latency, respectively. In addition, the proposed designs provide up to 7% improvement over NCCL-based solutions for data parallel training of the VGG network on 128 GPUs using Microsoft CNTK.Comment: 8 pages, 3 figure

    Demand-Aware Network Design with Steiner Nodes and a Connection to Virtual Network Embedding

    Full text link
    Emerging optical and virtualization technologies enable the design of more flexible and demand-aware networked systems, in which resources can be optimized toward the actual workload they serve. For example, in a demand-aware datacenter network, frequently communicating nodes (e.g., two virtual machines or a pair of racks in a datacenter) can be placed topologically closer, reducing communication costs and hence improving the overall network performance. This paper revisits the bounded-degree network design problem underlying such demand-aware networks. Namely, given a distribution over communicating server pairs, we want to design a network with bounded maximum degree that minimizes expected communication distance. In addition to this known problem, we introduce and study a variant where we allow Steiner nodes (i.e., additional routers) to be added to augment the network. We improve the understanding of this problem domain in several ways. First, we shed light on the complexity and hardness of the aforementioned problems, and study a connection between them and the virtual networking embedding problem. We then provide a constant-factor approximation algorithm for the Steiner node version of the problem, and use it to improve over prior state-of-the-art algorithms for the original version of the problem with sparse communication distributions. Finally, we investigate various heuristic approaches to bounded-degree network design problem, in particular providing a reliable heuristic algorithm with good experimental performance. We report on an extensive empirical evaluation, using several real-world traffic traces from datacenters, and find that our approach results in improved demand-aware network designs

    Control Aware Radio Resource Allocation in Low Latency Wireless Control Systems

    Full text link
    We consider the problem of allocating radio resources over wireless communication links to control a series of independent wireless control systems. Low-latency transmissions are necessary in enabling time-sensitive control systems to operate over wireless links with high reliability. Achieving fast data rates over wireless links thus comes at the cost of reliability in the form of high packet error rates compared to wired links due to channel noise and interference. However, the effect of the communication link errors on the control system performance depends dynamically on the control system state. We propose a novel control-communication co-design approach to the low-latency resource allocation problem. We incorporate control and channel state information to make scheduling decisions over time on frequency, bandwidth and data rates across the next-generation Wi-Fi based wireless communication links that close the control loops. Control systems that are closer to instability or further from a desired range in a given control cycle are given higher packet delivery rate targets to meet. Rather than a simple priority ranking, we derive precise packet error rate targets for each system needed to satisfy stability targets and make scheduling decisions to meet such targets while reducing total transmission time. The resulting Control-Aware Low Latency Scheduling (CALLS) method is tested in numerous simulation experiments that demonstrate its effectiveness in meeting control-based goals under tight latency constraints relative to control-agnostic scheduling

    Quarc: an architecture for efficient on-chip communication

    Get PDF
    The exponential downscaling of the feature size has enforced a paradigm shift from computation-based design to communication-based design in system on chip development. Buses, the traditional communication architecture in systems on chip, are incapable of addressing the increasing bandwidth requirements of future large systems. Networks on chip have emerged as an interconnection architecture offering unique solutions to the technological and design issues related to communication in future systems on chip. The transition from buses as a shared medium to networks on chip as a segmented medium has given rise to new challenges in system on chip realm. By leveraging the shared nature of the communication medium, buses have been highly efficient in delivering multicast communication. The segmented nature of networks, however, inhibits the multicast messages to be delivered as efficiently by networks on chip. Relying on extensive research on multicast communication in parallel computers, several network on chip architectures have offered mechanisms to perform the operation, while conforming to resource constraints of the network on chip paradigm. Multicast communication in majority of these networks on chip is implemented by establishing a connection between source and all multicast destinations before the message transmission commences. Establishing the connections incurs an overhead and, therefore, is not desirable; in particular in latency sensitive services such as cache coherence. To address high performance multicast communication, this research presents Quarc, a novel network on chip architecture. The Quarc architecture targets an area-efficient, low power, high performance implementation. The thesis covers a detailed representation of the building blocks of the architecture, including topology, router and network interface. The cost and performance comparison of the Quarc architecture against other network on chip architectures reveals that the Quarc architecture is a highly efficient architecture. Moreover, the thesis introduces novel performance models of complex traffic patterns, including multicast and quality of service-aware communication
    • …
    corecore