175 research outputs found

    Energy consumption in networks on chip : efficiency and scaling

    Get PDF
    Computer architecture design is in a new era where performance is increased by replicating processing cores on a chip rather than making CPUs larger and faster. This design strategy is motivated by the superior energy efficiency of the multi-core architecture compared to the traditional monolithic CPU. If the trend continues as expected, the number of cores on a chip is predicted to grow exponentially over time as the density of transistors on a die increases. A major challenge to the efficiency of multi-core chips is the energy used for communication among cores over a Network on Chip (NoC). As the number of cores increases, this energy also increases, imposing serious constraints on design and performance of both applications and architectures. Therefore, understanding the impact of different design choices on NoC power and energy consumption is crucial to the success of the multi- and many-core designs. This dissertation proposes methods for modeling and optimizing energy consumption in multi- and many-core chips, with special focus on the energy used for communication on the NoC. We present a number of tools and models to optimize energy consumption and model its scaling behavior as the number of cores increases. We use synthetic traffic patterns and full system simulations to test and validate our methods. Finally, we take a step back and look at the evolution of computer hardware in the last 40 years and, using a scaling theory from biology, present a predictive theory for power-performance scaling in microprocessor systems

    Efficient Synthesis of Room Acoustics via Scattering Delay Networks

    Get PDF
    An acoustic reverberator consisting of a network of delay lines connected via scattering junctions is proposed. All parameters of the reverberator are derived from physical properties of the enclosure it simulates. It allows for simulation of unequal and frequency-dependent wall absorption, as well as directional sources and microphones. The reverberator renders the first-order reflections exactly, while making progressively coarser approximations of higher-order reflections. The rate of energy decay is close to that obtained with the image method (IM) and consistent with the predictions of Sabine and Eyring equations. The time evolution of the normalized echo density, which was previously shown to be correlated with the perceived texture of reverberation, is also close to that of IM. However, its computational complexity is one to two orders of magnitude lower, comparable to the computational complexity of a feedback delay network (FDN), and its memory requirements are negligible

    Cooperative Navigation for Low-bandwidth Mobile Acoustic Networks.

    Full text link
    This thesis reports on the design and validation of estimation and planning algorithms for underwater vehicle cooperative localization. While attitude and depth are easily instrumented with bounded-error, autonomous underwater vehicles (AUVs) have no internal sensor that directly observes XY position. The global positioning system (GPS) and other radio-based navigation techniques are not available because of the strong attenuation of electromagnetic signals in seawater. The navigation algorithms presented herein fuse local body-frame rate and attitude measurements with range observations between vehicles within a decentralized architecture. The acoustic communication channel is both unreliable and low bandwidth, precluding many state-of-the-art terrestrial cooperative navigation algorithms. We exploit the underlying structure of a post-process centralized estimator in order to derive two real-time decentralized estimation frameworks. First, the origin state method enables a client vehicle to exactly reproduce the corresponding centralized estimate within a server-to-client vehicle network. Second, a graph-based navigation framework produces an approximate reconstruction of the centralized estimate onboard each vehicle. Finally, we present a method to plan a locally optimal server path to localize a client vehicle along a desired nominal trajectory. The planning algorithm introduces a probabilistic channel model into prior Gaussian belief space planning frameworks. In summary, cooperative localization reduces XY position error growth within underwater vehicle networks. Moreover, these methods remove the reliance on static beacon networks, which do not scale to large vehicle networks and limit the range of operations. Each proposed localization algorithm was validated in full-scale AUV field trials. The planning framework was evaluated through numerical simulation.PhDMechanical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113428/1/jmwalls_1.pd

    Systems with Massive Number of Antennas: Distributed Approaches

    Get PDF
    As 5G is entering maturity, the research interest has shifted towards 6G, and specially the new use cases that the future telecommunication infrastructure needs to support. These new use cases encompass much higher requirements, specifically: higher communication data-rates, larger number of users, higher accuracy in localization, possibility to wirelessly charge devices, among others.The radio access network (RAN) has already gone through an evolution on the path towards 5G. One of the main changes was a large increment of the number of antennas in the base-station. Some of them may even reach 100 elements, in what is commonly referred as Massive MIMO. New proposals for 6G RAN point in the direction of continuing this path of increasing the number of antennas, and locate them throughout a certain area of service. Different technologies have been proposed in this direction, such as: cell-free Massive MIMO, distributed MIMO, and large intelligent surface (LIS). In this thesis we focus on LIS, whose conducted theoretical studies promise the fulfillment of the aforementioned requirements.While the theoretical capabilities of LIS have been conveniently analyzed, little has been done in terms of implementing this type of systems. When the number of antennas grow to hundreds or thousands, there are numerous challenges that need to be solved for a successful implementation. The most critical challenges are the interconnection data-rate and the computational complexity.In the present thesis we introduce the implementation challenges, and show that centralized processing architectures are no longer adequate for this type of systems. We also present different distributed processing architectures and show the benefits of this type of schemes. This work aims at giving a system-design guideline that helps the system designer to make the right decisions when designing these type of systems. For that, we provide algorithms, performance analysis and comparisons, including first order evaluation of the interconnection data-rate, processing latency, memory and energy consumption. These numbers are based on models and available data in the literature. Exact values depend on the selected technology, and will be accurately determined after building and testing these type of systems.The thesis concentrates mostly on the topic of communication, with additional exploration of other areas, such as localization. In case of localization, we benefit from the high spatial resolution of a very-large array that provides very rich channel state information (CSI). A CSI-based fingerprinting via neural network technique is selected for this case with promising results. As the communication and localization services are based on the acquisition of CSI, we foresee a common system architecture capable of supporting both cases. Further work in this direction is recommended, with the possibility of including other applications such as sensing.The obtained results indicate that the implementation of these very-large array systems is feasible, but the challenges are numerous. The proposed solutions provide encouraging results that need to be verified with hardware implementations and real measurements

    Distributed synchronization algorithms for wireless sensor networks

    Get PDF
    The ability to distribute time and frequency among a large population of interacting agents is of interest for diverse disciplines, inasmuch as it enables to carry out complex cooperative tasks. In a wireless sensor network (WSN), time/frequency synchronization allows the implementation of distributed signal processing and coding techniques, and the realization of coordinated access to the shared wireless medium. Large multi-hop WSN\u27s constitute a new regime for network synchronization, as they call for the development of scalable, fully distributed synchronization algorithms. While most of previous research focused on synchronization at the application layer, this thesis considers synchronization at the lowest layers of the communication protocol stack of a WSN, namely the physical and the medium access control (MAC) layer. At the physical layer, the focus is on the compensation of carrier frequency offsets (CFO), while time synchronization is studied for application at the MAC layer. In both cases, the problem of realizing network-wide synchronization is approached by employing distributed clock control algorithms based on the classical concept of coupled phase and frequency locked loops (PLL and FLL). The analysis takes into account communication, signaling and energy consumption constraints arising in the novel context of multi-hop WSN\u27s. In particular, the robustness of the algorithms is checked against packet collision events, infrequent sync updates, and errors introduced by different noise sources, such as transmission delays and clock frequency instabilities. By observing that WSN\u27s allow for greater flexibility in the design of the synchronization network architecture, this work examines also the relative merits of both peer-to-peer (mutually coupled - MC) and hierarchical (master-slave - MS) architectures. With both MC and MS architectures, synchronization accuracy degrades smoothly with the network size, provided that loop parameters are conveniently chosen. In particular, MS topologies guarantee faster synchronization, but they are hindered by higher noise accumulation, while MC topologies allow for an almost uniform error distribution at the price of much slower convergence. For all the considered cases, synchronization algorithms based on adaptive PLL and FLL designs are shown to provide robust and scalable network-wide time and frequency distribution in a WSN

    Energy-Efficient On-Board Radio Resource Management for Satellite Communications via Neuromorphic Computing

    Full text link
    The latest satellite communication (SatCom) missions are characterized by a fully reconfigurable on-board software-defined payload, capable of adapting radio resources to the temporal and spatial variations of the system traffic. As pure optimization-based solutions have shown to be computationally tedious and to lack flexibility, machine learning (ML)-based methods have emerged as promising alternatives. We investigate the application of energy-efficient brain-inspired ML models for on-board radio resource management. Apart from software simulation, we report extensive experimental results leveraging the recently released Intel Loihi 2 chip. To benchmark the performance of the proposed model, we implement conventional convolutional neural networks (CNN) on a Xilinx Versal VCK5000, and provide a detailed comparison of accuracy, precision, recall, and energy efficiency for different traffic demands. Most notably, for relevant workloads, spiking neural networks (SNNs) implemented on Loihi 2 yield higher accuracy, while reducing power consumption by more than 100×\times as compared to the CNN-based reference platform. Our findings point to the significant potential of neuromorphic computing and SNNs in supporting on-board SatCom operations, paving the way for enhanced efficiency and sustainability in future SatCom systems.Comment: currently under review at IEEE Transactions on Machine Learning in Communications and Networkin

    RAMP: A Flat Nanosecond Optical Network and MPI Operations for Distributed Deep Learning Systems

    Full text link
    Distributed deep learning (DDL) systems strongly depend on network performance. Current electronic packet switched (EPS) network architectures and technologies suffer from variable diameter topologies, low-bisection bandwidth and over-subscription affecting completion time of communication and collective operations. We introduce a near-exascale, full-bisection bandwidth, all-to-all, single-hop, all-optical network architecture with nanosecond reconfiguration called RAMP, which supports large-scale distributed and parallel computing systems (12.8~Tbps per node for up to 65,536 nodes). For the first time, a custom RAMP-x MPI strategy and a network transcoder is proposed to run MPI collective operations across the optical circuit switched (OCS) network in a schedule-less and contention-less manner. RAMP achieves 7.6-171×\times speed-up in completion time across all MPI operations compared to realistic EPS and OCS counterparts. It can also deliver a 1.3-16×\times and 7.8-58×\times reduction in Megatron and DLRM training time respectively} while offering 42-53×\times and 3.3-12.4×\times improvement in energy consumption and cost respectively

    Event-Based Control and Estimation with Stochastic Disturbances

    Get PDF
    This thesis deals with event-based control and estimation strategies, motivated by certain bottlenecks in the control loop. Two kinds of implementation constraints are considered: closing one or several control loops over a data network, and sensors that report measurements only as intervals (e.g. with quantization). The proposed strategies depend critically on _events_, when a data packet is sent or when a change in the measurement signal is received. The value of events is that they communicate new information about stochastic process disturbances. A data network in the control loop imposes constraints on the event timing, modelled as a minimum time between packets. A thresholdbased control strategy is suggested and shown to be optimal for firstorder systems with impulse control. Different ways to find the optimal threshold are investigated for single and multiple control loops sharing one network. The major gain compared to linear time invariant (LTI) control is with a single loop a greatly reduced communication rate, which with multiple loops can be traded for a similarly reduced regulation error. With the bottleneck that sensors report only intervals, both the theoretical and practical control problems become more complex. We focus on the estimation problem, where the optimal solution is known but untractable. Two simplifications are explored to find a realistic state estimator: reformulation to a mixed stochastic/worst case scenario and joint maximum a posteriori estimation. The latter approach is simplified and evaluated experimentally on a moving cart with quantized position measurements controlled by a low-end microcontroller. The examples considered demonstrate that event-based control considerably outperforms LTI control, when the bottleneck addressed is a genuine performance constraint on the latter
    • …
    corecore