400 research outputs found

    Bayes-Optimal Joint Channel-and-Data Estimation for Massive MIMO with Low-Precision ADCs

    Get PDF
    This paper considers a multiple-input multiple-output (MIMO) receiver with very low-precision analog-to-digital convertors (ADCs) with the goal of developing massive MIMO antenna systems that require minimal cost and power. Previous studies demonstrated that the training duration should be {\em relatively long} to obtain acceptable channel state information. To address this requirement, we adopt a joint channel-and-data (JCD) estimation method based on Bayes-optimal inference. This method yields minimal mean square errors with respect to the channels and payload data. We develop a Bayes-optimal JCD estimator using a recent technique based on approximate message passing. We then present an analytical framework to study the theoretical performance of the estimator in the large-system limit. Simulation results confirm our analytical results, which allow the efficient evaluation of the performance of quantized massive MIMO systems and provide insights into effective system design.Comment: accepted in IEEE Transactions on Signal Processin

    Modulation Diversity in Fading Channels with Quantized Receiver

    Full text link
    In this paper, we address the design of codes which achieve modulation diversity in block fading single-input single-output (SISO) channels with signal quantization at receiver and low-complexity decoding. With an unquantized receiver, coding based on algebraic rotations is known to achieve modulation coding diversity. On the other hand, with a quantized receiver, algebraic rotations may not guarantee diversity. Through analysis, we propose specific rotations which result in the codewords having equidistant component-wise projections. We show that the proposed coding scheme achieves maximum modulation diversity with a low-complexity minimum distance decoder and perfect channel knowledge. Relaxing the perfect channel knowledge assumption we propose a novel training/estimation and receiver control technique to estimate the channel. We show that our coding/training/estimation scheme and minimum distance decoding achieve an error probability performance similar to that achieved with perfect channel knowledge

    Linear Precoding with Low-Resolution DACs for Massive MU-MIMO-OFDM Downlink

    Full text link
    We consider the downlink of a massive multiuser (MU) multiple-input multiple-output (MIMO) system in which the base station (BS) is equipped with low-resolution digital-to-analog converters (DACs). In contrast to most existing results, we assume that the system operates over a frequency-selective wideband channel and uses orthogonal frequency division multiplexing (OFDM) to simplify equalization at the user equipments (UEs). Furthermore, we consider the practically relevant case of oversampling DACs. We theoretically analyze the uncoded bit error rate (BER) performance with linear precoders (e.g., zero forcing) and quadrature phase-shift keying using Bussgang's theorem. We also develop a lower bound on the information-theoretic sum-rate throughput achievable with Gaussian inputs, which can be evaluated in closed form for the case of 1-bit DACs. For the case of multi-bit DACs, we derive approximate, yet accurate, expressions for the distortion caused by low-precision DACs, which can be used to establish lower bounds on the corresponding sum-rate throughput. Our results demonstrate that, for a massive MU-MIMO-OFDM system with a 128-antenna BS serving 16 UEs, only 3--4 DAC bits are required to achieve an uncoded BER of 10^-4 with a negligible performance loss compared to the infinite-resolution case at the cost of additional out-of-band emissions. Furthermore, our results highlight the importance of taking into account the inherent spatial and temporal correlations caused by low-precision DACs

    Hardware-aware training for large-scale and diverse deep learning inference workloads using in-memory computing-based accelerators

    Full text link
    Analog in-memory computing (AIMC) -- a promising approach for energy-efficient acceleration of deep learning workloads -- computes matrix-vector multiplications (MVMs) but only approximately, due to nonidealities that often are non-deterministic or nonlinear. This can adversely impact the achievable deep neural network (DNN) inference accuracy as compared to a conventional floating point (FP) implementation. While retraining has previously been suggested to improve robustness, prior work has explored only a few DNN topologies, using disparate and overly simplified AIMC hardware models. Here, we use hardware-aware (HWA) training to systematically examine the accuracy of AIMC for multiple common artificial intelligence (AI) workloads across multiple DNN topologies, and investigate sensitivity and robustness to a broad set of nonidealities. By introducing a new and highly realistic AIMC crossbar-model, we improve significantly on earlier retraining approaches. We show that many large-scale DNNs of various topologies, including convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformers, can in fact be successfully retrained to show iso-accuracy on AIMC. Our results further suggest that AIMC nonidealities that add noise to the inputs or outputs, not the weights, have the largest impact on DNN accuracy, and that RNNs are particularly robust to all nonidealities.Comment: 35 pages, 7 figures, 5 table

    Using the IBM Analog In-Memory Hardware Acceleration Kit for Neural Network Training and Inference

    Full text link
    Analog In-Memory Computing (AIMC) is a promising approach to reduce the latency and energy consumption of Deep Neural Network (DNN) inference and training. However, the noisy and non-linear device characteristics, and the non-ideal peripheral circuitry in AIMC chips, require adapting DNNs to be deployed on such hardware to achieve equivalent accuracy to digital computing. In this tutorial, we provide a deep dive into how such adaptations can be achieved and evaluated using the recently released IBM Analog Hardware Acceleration Kit (AIHWKit), freely available at https://github.com/IBM/aihwkit. The AIHWKit is a Python library that simulates inference and training of DNNs using AIMC. We present an in-depth description of the AIHWKit design, functionality, and best practices to properly perform inference and training. We also present an overview of the Analog AI Cloud Composer, that provides the benefits of using the AIHWKit simulation platform in a fully managed cloud setting. Finally, we show examples on how users can expand and customize AIHWKit for their own needs. This tutorial is accompanied by comprehensive Jupyter Notebook code examples that can be run using AIHWKit, which can be downloaded from https://github.com/IBM/aihwkit/tree/master/notebooks/tutorial

    Development of a low-cost multi-camera star tracker for small satellites

    Get PDF
    This thesis presents a novel small satellite star tracker that uses an array of low-cost, off the shelf imaging sensors to achieve high accuracy attitude determination performance. The theoretical analysis of improvements in star detectability achieved by stacking images from multiple cameras is presented. An image processing algorithm is developed to combine images from multiple cameras with arbitrary focal lengths, principal point offsets, distortions, and misalignments. The star tracker also implements other algorithms including the region growing algorithm, the intensity weighted centroid algorithm, the geometric voting algorithm for star identification, and the singular value decomposition algorithm for attitude determination. A star tracker software simulator is used to test the algorithms by generating star images with sensor noises, lens defocusing, and lens distortion. A hardware prototype is being assembled for eventual night sky testing to verify simulated performance levels. Star tracker flight hardware is being developed in the Laboratory for Advanced Space Systems at Illinois (LASSI) at the University of Illinois at Urbana Champaign for future CubeSat missions

    Finite precision deep learning with theoretical guarantees

    Get PDF
    Recent successes of deep learning have been achieved at the expense of a very high computational and parameter complexity. Today, deployment of both inference and training of deep neural networks (DNNs) is predominantly in the cloud. A recent alternative trend is to deploy DNNs onto untethered, resource-constrained platforms at the Edge. To realize on-device intelligence, the gap between algorithmic requirements and available resources needs to be closed. One popular way of doing so is via implementation in finite precision. While ad-hoc trial and error techniques in finite precision deep learning abound, theoretical guarantees on network accuracy are elusive. The work presented in this dissertation builds a theoretical framework for the implementation of deep learning in finite precision. For inference, we theoretically analyze the worst-case accuracy drop in the presence of weight and activation quantization. Furthermore, we derive an optimal clipping criterion (OCC) to minimize the precision of dot-product outputs. For implementations using in-memory computing, OCC lowers ADC precision requirements. We analyze fixed-point training and present a methodology for implementing quantized back-propagation with close-to-minimal per-tensor precision. Finally, we study accumulator precision for reduced precision floating-point training using variance analysis techniques. We first introduce our work on fixed-point inference with accuracy guarantees. Theoretical bounds on the mismatch between limited and full precision networks are derived. Proper precision assignment can be readily obtained employing these bounds, and weight-activation, as well as per-layer precision trade-offs, are derived. Applied to a variety of networks and datasets, the presented analysis is found to be tight to within 2 bit. Furthermore, it is shown that a minimum precision network can have up to ∼3.5×\sim3.5\times lower hardware complexity than a binarized network at iso-accuracy. In general, a minimum precision network can reduce complexity by up to ∼10×\sim10\times compared to a full precision baseline while maintaining accuracy. Per-layer precision analysis indicates that precision requirements of common networks vary from 2 bit to 10 bit to guarantee an accuracy close to the floating-point baseline. Then, we study DNN implementation using in-memory computing (IMC), where we propose OCC to minimize the column ADC precision. The signal-to-quantization-noise ratio (SQNR) of OCC is shown to be within 0.8 dB of the well-known optimal Lloyd-Max quantizer. OCC improves the SQNR of the commonly employed full range quantizer by 14 dB which translates to a 3 bit ADC precision reduction. The input-serial weight-parallel (ISWP) IMC architecture is studied. Using bit-slicing techniques, significant energy savings can be achieved with minimal accuracy lost. Indeed, we prove that a dot-product can be realized with single memory access while suffering no more than 2 dB SQNR drop. Combining the proposed OCC and ISWP noise analysis with our proposed DNN precision analysis, we demonstrate ∼6×\sim6\times reduction of energy consumption in DNN implementation at iso-accuracy. Furthermore, we study the quantization of the back-propagation training algorithm. We propose a systematic methodology to obtain close-to-minimal per-layer precision requirements for the guaranteed statistical similarity between fixed-point and floating-point training. The challenges of quantization noise, inter-layer and intra-layer precision trade-offs, dynamic range, and stability are jointly addressed. Applied to several benchmarks, fixed-point training is demonstrated to achieve high fidelity to the baseline with an accuracy drop no greater than 0.56\%. The derived precision assignment is shown to be within 1 bit per tensor of the minimum. The methodology is found to reduce representational, computational, and communication costs of training by up to 6×6\times, 8×8\times, and 4×4\times, respectively, compared to the baseline and related works. Finally, we address the problem of reduced precision floating-point training. In particular, we study accumulation precision requirements. We present the variance retention ratio (VRR), an analytical metric measuring the suitability of accumulation mantissa precision. The analysis expands on concepts employed in variance engineering for weight initialization. An analytical expression for the VRR is derived and used to determine accumulation bit-width for precise tailoring of computation hardware. The VRR also quantifies the benefits of effective summation reduction techniques such as chunked accumulation and sparsification. Experimentally, the validity and tightness of our analysis are verified across multiple deep learning benchmarks
    • …
    corecore