195 research outputs found

    Digital and Mixed Domain Hardware Reduction Algorithms and Implementations for Massive MIMO

    Get PDF
    Emerging 5G and 6G based wireless communications systems largely rely on multiple-input-multiple-output (MIMO) systems to reduce inherently extensive path losses, facilitate high data rates, and high spatial diversity. Massive MIMO systems used in mmWave and sub-THz applications consists of hundreds perhaps thousands of antenna elements at base stations. Digital beamforming techniques provide the highest flexibility and better degrees of freedom for phased antenna arrays as compared to its analog and hybrid alternatives but has the highest hardware complexity. Conventional digital beamformers at the receiver require a dedicated analog to digital converter (ADC) for every antenna element, leading to ADCs for elements. The number of ADCs is the key deterministic factor for the power consumption of an antenna array system. The digital hardware consists of fast Fourier transform (FFT) cores with a multiplier complexity of (N log2N) for an element system to generate multiple beams. It is required to reduce the mixed and digital hardware complexities in MIMO systems to reduce the cost and the power consumption, while maintaining high performance. The well-known concept has been in use for ADCs to achieve reduced complexities. An extension of the architecture to multi-dimensional domain is explored in this dissertation to implement a single port ADC to replace ADCs in an element system, using the correlation of received signals in the spatial domain. This concept has applications in conventional uniform linear arrays (ULAs) as well as in focal plane array (FPA) receivers. Our analysis has shown that sparsity in the spatio-temporal frequency domain can be exploited to reduce the number of ADCs from N to where . By using the limited field of view of practical antennas, multiple sub-arrays are combined without interferences to achieve a factor of K increment in the information carrying capacity of the ADC systems. Applications of this concept include ULAs and rectangular array systems. Experimental verifications were done for a element, 1.8 - 2.1 GHz wideband array system to sample using ADCs. This dissertation proposes that frequency division multiplexing (FDM) receiver outputs at an intermediate frequency (IF) can pack multiple (M) narrowband channels with a guard band to avoid interferences. The combined output is then sampled using a single wideband ADC and baseband channels are retrieved in the digital domain. Measurement results were obtained by employing a element, 28 GHz antenna array system to combine channels together to achieve a 75% reduction of ADC requirement. Implementation of FFT cores in the digital domain is not always exact because of the finite precision. Therefore, this dissertation explores the possibility of approximating the discrete Fourier transform (DFT) matrix to achieve reduced hardware complexities at an allowable cost of accuracy. A point approximate DFT (ADFT) core was implemented on digital hardware using radix-32 to achieve savings in cost, size, weight and power (C-SWaP) and synthesized for ASIC at 45-nm technology

    Towards Complete Emulation of Quantum Algorithms using High-Performance Reconfigurable Computing

    Get PDF
    Quantum computing is a promising technology that can potentially demonstrate supremacy over classical computing in solving specific classically-intractable problems. However, in its current nascent stage, quantum computing faces major challenges. Two of the main challenges are quantum state decoherence and low scalability of current quantum devices. Decoherence is a process in which the state of the quantum computer is destroyed by interaction with the environment. Decoherence places constraints on the realistic applicability of quantum algorithms as real-life applications usually require complex equivalent quantum circuits to be realized. For example, encoding classical data on quantum computers for solving I/O and data-intensive applications generally requires complex quantum circuits that violate decoherence constraints. In addition, current quantum devices are of intermediate scale, having low quantum bit (qubit) counts and often producing inaccurate or noisy measurements. Consequently, benchmarking of existing quantum algorithms and the investigation of new applications are heavily dependent on classical simulations that use costly, resource-intensive computing platforms. Hardware-based emulation has been alternatively proposed as a more cost-effective and power-efficient approach. Hardware-based emulation methods can take advantage of hardware parallelism and acceleration to produce results at a higher throughput and lower power requirements.This work proposes a hardware-based emulation methodology for quantum algorithms, using cost-effective Field Programmable Gate Array (FPGA) technology. The proposed methodology consists of three components that are required for complete emulation of quantum algorithms; the first component models classical-to-quantum (C2Q) data encoding, the second emulates the behavior of quantum algorithms, and the third models the process of measuring the quantum state and extracting classical information, i.e., quantum-to-classical (Q2C) data decoding. The proposed emulation methodology is used to investigate and optimize methods for C2Q/Q2C data encoding/decoding, as well as several important quantum algorithms such as Quantum Fourier Transform (QFT), Quantum Haar Transform (QHT), and Quantum Grover’s Search (QGS). This work delivers contributions in terms of reducing complexities of quantum circuits, extending and optimizing quantum algorithms, and developing new quantum applications. For example, decoherence-optimized circuits for C2Q/Q2C data encoding/decoding are proposed and evaluated using the proposed emulation methodology. Multi-level decomposable forms of optimized QHT circuits are presented and used to demonstrate dimension reduction of high-resolution data. Additionally, a novel extension to the QGS algorithm is proposed to enable search for dynamically changing multi-patterns of unordered data. Finally, a novel quantum application is presented that combines QHT and dynamic multi-pattern QGS to perform pattern recognition using dimension reduction on high-resolution spatio-spectral data. For higher emulation performance and scalability of the framework, hardware design techniques and hardware architectural optimizations are investigated and proposed. The emulation architectures are designed and implemented on a high-performance reconfigurable computer (HPRC). For reference and comparison, implementations of the proposed quantum circuits are also performed on a state-of-the-art quantum computer. Experimental results show that the proposed hardware architectures enable emulation of quantum algorithms with higher scalability, higher accuracy, and higher throughput, compared to existing hardware-based emulators. As a case study, quantum image processing using multi-spectral images is considered for the experimental evaluations. The analysis and results of this work demonstrate that quantum computers and methodologies based on quantum algorithms will be highly useful in realistic data-intensive domains such as remote-sensing hyperspectral imagery and high-energy physics (HEP)

    Energy-Efficient Recurrent Neural Network Accelerators for Real-Time Inference

    Full text link
    Over the past decade, Deep Learning (DL) and Deep Neural Network (DNN) have gone through a rapid development. They are now vastly applied to various applications and have profoundly changed the life of hu- man beings. As an essential element of DNN, Recurrent Neural Networks (RNN) are helpful in processing time-sequential data and are widely used in applications such as speech recognition and machine translation. RNNs are difficult to compute because of their massive arithmetic operations and large memory footprint. RNN inference workloads used to be executed on conventional general-purpose processors including Central Processing Units (CPU) and Graphics Processing Units (GPU); however, they have un- necessary hardware blocks for RNN computation such as branch predictor, caching system, making them not optimal for RNN processing. To accelerate RNN computations and outperform the performance of conventional processors, previous work focused on optimization methods on both software and hardware. On the software side, previous works mainly used model compression to reduce the memory footprint and the arithmetic operations of RNNs. On the hardware side, previous works also designed domain-specific hardware accelerators based on Field Pro- grammable Gate Arrays (FPGA) or Application Specific Integrated Circuits (ASIC) with customized hardware pipelines optimized for efficient pro- cessing of RNNs. By following this software-hardware co-design strategy, previous works achieved at least 10X speedup over conventional processors. Many previous works focused on achieving high throughput with a large batch of input streams. However, in real-time applications, such as gaming Artificial Intellegence (AI), dynamical system control, low latency is more critical. Moreover, there is a trend of offloading neural network workloads to edge devices to provide a better user experience and privacy protection. Edge devices, such as mobile phones and wearable devices, are usually resource-constrained with a tight power budget. They require RNN hard- ware that is more energy-efficient to realize both low-latency inference and long battery life. Brain neurons have sparsity in both the spatial domain and time domain. Inspired by this human nature, previous work mainly explored model compression to induce spatial sparsity in RNNs. The delta network algorithm alternatively induces temporal sparsity in RNNs and can save over 10X arithmetic operations in RNNs proven by previous works. In this work, we have proposed customized hardware accelerators to exploit temporal sparsity in Gated Recurrent Unit (GRU)-RNNs and Long Short-Term Memory (LSTM)-RNNs to achieve energy-efficient real-time RNN inference. First, we have proposed DeltaRNN, the first-ever RNN accelerator to exploit temporal sparsity in GRU-RNNs. DeltaRNN has achieved 1.2 TOp/s effective throughput with a batch size of 1, which is 15X higher than its related works. Second, we have designed EdgeDRNN to accelerate GRU-RNN edge inference. Compared to DeltaRNN, EdgeDRNN does not rely on on-chip memory to store RNN weights and focuses on reducing off-chip Dynamic Random Access Memory (DRAM) data traffic using a more scalable architecture. EdgeDRNN have realized real-time inference of large GRU-RNNs with submillisecond latency and only 2.3 W wall plug power consumption, achieving 4X higher energy efficiency than commercial edge AI platforms like NVIDIA Jetson Nano. Third, we have used DeltaRNN to realize the first-ever continuous speech recognition sys- tem with the Dynamic Audio Sensor (DAS) as the front-end. The DAS is a neuromorphic event-driven sensor that produces a stream of asyn- chronous events instead of audio data sampled at a fixed sample rate. We have also showcased how an RNN accelerator can be integrated with an event-driven sensor on the same chip to realize ultra-low-power Keyword Spotting (KWS) on the extreme edge. Fourth, we have used EdgeDRNN to control a powered robotic prosthesis using an RNN controller to replace a conventional proportional–derivative (PD) controller. EdgeDRNN has achieved 21 μs latency of running the RNN controller and could maintain stable control of the prosthesis. We have used DeltaRNN and EdgeDRNN to solve these problems to prove their value in solving real-world problems. Finally, we have applied the delta network algorithm on LSTM-RNNs and have combined it with a customized structured pruning method, called Column-Balanced Targeted Dropout (CBTD), to induce spatio-temporal sparsity in LSTM-RNNs. Then, we have proposed another FPGA-based accelerator called Spartus, the first RNN accelerator that exploits spatio- temporal sparsity. Spartus achieved 9.4 TOp/s effective throughput with a batch size of 1, the highest among present FPGA-based RNN accelerators with a power budget around 10 W. Spartus can complete the inference of an LSTM layer having 5 million parameters within 1 μs

    Advanced Image Acquisition, Processing Techniques and Applications

    Get PDF
    "Advanced Image Acquisition, Processing Techniques and Applications" is the first book of a series that provides image processing principles and practical software implementation on a broad range of applications. The book integrates material from leading researchers on Applied Digital Image Acquisition and Processing. An important feature of the book is its emphasis on software tools and scientific computing in order to enhance results and arrive at problem solution

    Sustainable marine ecosystems: deep learning for water quality assessment and forecasting

    Get PDF
    An appropriate management of the available resources within oceans and coastal regions is vital to guarantee their sustainable development and preservation, where water quality is a key element. Leveraging on a combination of cross-disciplinary technologies including Remote Sensing (RS), Internet of Things (IoT), Big Data, cloud computing, and Artificial Intelligence (AI) is essential to attain this aim. In this paper, we review methodologies and technologies for water quality assessment that contribute to a sustainable management of marine environments. Specifically, we focus on Deep Leaning (DL) strategies for water quality estimation and forecasting. The analyzed literature is classified depending on the type of task, scenario and architecture. Moreover, several applications including coastal management and aquaculture are surveyed. Finally, we discuss open issues still to be addressed and potential research lines where transfer learning, knowledge fusion, reinforcement learning, edge computing and decision-making policies are expected to be the main involved agents.Postprint (published version

    Low-Noise Amplifier and Noise/Distortion Shaping Beamformer

    Get PDF
    The emergence of advanced technologies has increased the need for fast and efficient mobile communication that can facilitate transferring large amounts of data and simultaneously serve multiple users. Future wireless systems will rely on millimeter-wave frequencies, enabled by recent silicon hardware advancements. High-frequency millimeter-wave technology and low-noise receiver front ends and amplifiers are key for improved performance and energy efficiency. This thesis proposes two LNA topologies that offer wide input-power-matched bandwidths and low noise figures, eliminating the need for complex matching networks at the LNA input. These topologies use intrinsic feedback through gate-drain networks and/or the resistance of the SOI-transistor back-gate terminal to achieve the real part of the input impedance. The two LNAs are experimentally demonstrated with two 22-nm FDSOI LNAs. One LNA, matched with the assistance of the gate-drain network, exhibits a bandwidth ranging from 7.7-33.3 GHz, which is further improved to 6-38.7 GHz through the application of the back-gate-resistance method. The two LNAs have noise-figure minima of 1.8 and 1.9 dB, maximum gains of 14.7 and 15.6 dB, and maximum IP1dBs of -9.1 and -7.8 dBm while consuming 10 and 7.8 mW of power and occupying 0.04 and 0.03 mm^2 of active areas, respectively. This thesis also presents the first experimental demonstration of noise/distortion (ND) shaping beamformer. The NDs originating in the receiver itself are spatio-temporally shaped away from the beamformer region of support, thereby permitting their suppression by the beamformer. The demonstrator is a 24.3-28.7 GHz, 79.28 mW 4-port receiver for a 4-element antenna array implemented in 22-nm FDSOI CMOS. When shaping was enabled, the concept demonstrator provided average improvements to the NF and IP1dB of 1.6 dB and 2.25 dB, respectively (compared to a reference design), and achieved NF=2.6 dB and IP1dB=-18.7dBm while consuming 19.8 mW/channel
    corecore