15 research outputs found

    PRUNE: Dynamic and Decidable Dataflow for Signal Processing on Heterogeneous Platforms

    Get PDF
    The majority of contemporary mobile devices and personal computers are based on heterogeneous computing platforms that consist of a number of CPU cores and one or more Graphics Processing Units (GPUs). Despite the high volume of these devices, there are few existing programming frameworks that target full and simultaneous utilization of all CPU and GPU devices of the platform. This article presents a dataflow-flavored Model of Computation (MoC) that has been developed for deploying signal processing applications to heterogeneous platforms. The presented MoC is dynamic and allows describing applications with data dependent run-time behavior. On top of the MoC, formal design rules are presented that enable application descriptions to be simultaneously dynamic and decidable. Decidability guarantees compile-time application analyzability for deadlock freedom and bounded memory. The presented MoC and the design rules are realized in a novel Open Source programming environment "PRUNE" and demonstrated with representative application examples from the domains of image processing, computer vision and wireless communications. Experimental results show that the proposed approach outperforms the state-of-the-art in analyzability, flexibility and performance.Comment: This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publicatio

    Software Defined Radio Solutions for Wireless Communications Systems

    Get PDF
    Wireless technologies have been advancing rapidly, especially in the recent years. Design, implementation, and manufacturing of devices supporting the continuously evolving technologies require great efforts. Thus, building platforms compatible with different generations of standards and technologies has gained a lot of interest. As a result, software defined radios (SDRs) are investigated to offer more flexibility and scalability, and reduce the design efforts, compared to the conventional fixed-function hardware-based solutions.This thesis mainly addresses the challenges related to SDR-based implementation of today’s wireless devices. One of the main targets of most of the wireless standards has been to improve the achievable data rates, which imposes strict requirements on the processing platforms. Realizing real-time processing of high throughput signal processing algorithms using SDR-based platforms while maintaining energy consumption close to conventional approaches is a challenging topic that is addressed in this thesis.Firstly, this thesis concentrates on the challenges of a real-time software-based implementation for the very high throughput (VHT) Institute of Electrical and Electronics Engineers (IEEE) 802.11ac amendment from the wireless local area networks (WLAN) family, where an SDR-based solution is introduced for the frequency-domain baseband processing of a multiple-input multipleoutput (MIMO) transmitter and receiver. The feasibility of the implementation is evaluated with respect to the number of clock cycles and the consumed power. Furthermore, a digital front-end (DFE) concept is developed for the IEEE 802.11ac receiver, where the 80 MHz waveform is divided to two 40 MHz signals. This is carried out through time-domain digital filtering and decimation, which is challenging due to the latency and cyclic prefix (CP) budget of the receiver. Different multi-rate channelization architectures are developed, and the software implementation is presented and evaluated in terms of execution time, number of clock cycles, power, and energy consumption on different multi-core platforms.Secondly, this thesis addresses selected advanced techniques developed to realize inband fullduplex (IBFD) systems, which aim at improving spectral efficiency in today’s congested radio spectrum. IBFD refers to concurrent transmission and reception on the same frequency band, where the main challenge to combat is the strong self-interference (SI). In this thesis, an SDRbased solution is introduced, which is capable of real-time mitigation of the SI signal. The implementation results show possibility of achieving real-time sufficient SI suppression under time-varying environments using low-power, mobile-scale multi-core processing platforms. To investigate the challenges associated with SDR implementations for mobile-scale devices with limited processing and power resources, processing platforms suitable for hand-held devices are selected in this thesis work. On the baseband processing side, a very long instruction word (VLIW) processor, optimized for wireless communication applications, is utilized. Furthermore, in the solutions presented for the DFE processing and the digital SI canceller, commercial off-the-shelf (COTS) multi-core central processing units (CPUs) and graphics processing units (GPUs) are used with the aim of investigating the performance enhancement achieved by utilizing parallel processing.Overall, this thesis provides solutions to the challenges of low-power, and real-time software-based implementation of computationally intensive signal processing algorithms for the current and future communications systems

    On Design and Optimization of Convolutional Neural Network for Embedded Systems

    Get PDF
    This work presents the research on optimizing neural networks and deploying them for real-time practical applications. We analyze different optimization methods, namely binarization, separable convolution and pruning. We implement each method for the application of vehicle classification and we empirically evaluate and analyze the results. The objective is to make large neural networks suitable for real-time applications by reducing the computation requirements through these optimization approaches. The data set is of vehicles from 4 classes of vehicle types, and a convolutional model was used to solve the problem initially. Our results show that these optimization methods offer many performance benefits in this application in terms of reduced execution time (by up to 5 ×), reduced model storage requirements, with out largely impacting accuracy, making them a suitable tool for use in streamlining heavy neural networks to be used on resource-constrained envrionments. The platforms used in the research are a desktop platform, and two embedded platforms

    Reduced-complexity Digital Predistortion in Flexible Radio Spectrum Access

    Get PDF
    Wireless communications is nowadays seen as one of the main foundations of technological advancements in, e.g., healthcare, education, agriculture, transportation, computing, personal communications, media, and entertainment. This requires major technological developments and advances at different levels of the wireless communication systems and networks. In particular, it is required to utilize the currently available frequency spectrum in a more and more efficient way, while also adopting new spectral bands. Moreover, it is required that cheaper and smaller electronic components are used to build future wireless communication systems to facilitate increasingly cost-effective solutions. Meanwhile, energy efficiency becomes extremely important in wide scale deployments of the networks both from a running cost point of view, and from an environmental impact point of view. This is the big picture, or the so called ‘bird’s eye view’ of the challenges that are yet to be met in this very interesting and fast developing field of science.The power amplifier (PA) is the most power-hungry component in most RF transmitters. Consequently, its energy efficiency significantly contributes to the overall energy efficiency of the transmitter, and in fact the whole wireless network. Unfortunately, energy efficiency enhancement implies operating the PA closer to its saturation region, which typically results in severe nonlinear distortion that can deteriorate the signal quality and cause interference to neighboring users, both of which negatively impact the system spectral efficiency. Moreover, in flexible spectrum access scenarios, which are essential for improving the spectral efficiency, particular in the form of non-contiguous radio spectrum access, the nonlinear distortion due to the PA becomes even more severe and can significantly impact the overall network performance. For example, in noncontiguous carrier aggregation (CA) in LTE-Advanced, it has been demonstrated that in addition to the classical in-band distortion and regrowth around the main carriers, harmful spurious emission components are generated which can easily violate the spurious emission limits even in the case of user equipment (UE) transmitters.Technological advances in the digital electronics domain have enabled us to approach this problem from a digital signal processing point of view in the form of widely-adopted and researched digital predistortion (DPD) technology. However, when the signal bandwidth gets larger, and flexible or non-contiguous spectrum access is introduced, the complexity of the DPD increases and the power consumed in the digital domain by the DPD itself becomes higher and higher, to the extent that it might be close to, or even surpass, the energy savings achieved from using a more efficient PA. The problem becomes even more challenging at the UE side which has relatively limited computational capabilities and lower transmit power. This dilemma can be resolved by developing novel reduced-complexity DPD solutions in such flexible spectrum access and/or wide bandwidth scenarios while not sacrificing the DPD performance, which is the main topic area that this thesis work contributes to.The first contribution of this thesis is the development of a spur-injection based sub-band DPD structure for spurious emission mitigation in noncontiguous transmission scenarios. A novel and effective learning algorithm is also introduced, for the proposed sub-band DPD, based on the decorrelation principle. Mathematical models of the unwanted emissions are formulated based on realistic PA models with memory, followed by developing an efficient DPD structure for mitigating these emissions with reducedcomplexity in both the DPD main processing and learning paths while providing excellent spurious emission suppression. In the special case when the spurious emissions overlap with the own RX band in frequency division duplexing (FDD) transceivers, a novel subband DPD solution is also developed that uses the main RX for DPD learning without requiring any additional observation RX, thus further reducing the DPD complexity.The second contribution is the development of a novel reduced-complexity concurrent DPD, with a single-feedback receiver path, for carrier aggregation-like scenarios. The proposed solution is based on a simple and flexible DPD structure with decorrelationbased parameter learning. Practical simulations and RF measurements demonstrate that the proposed concurrent DPD provides excellent linearization performance, in terms of in-band error vector magnitude (EVM) and adjacent channel leakage ratio (ACLR), when compared to state-of-the-art concurrent DPD solutions, despite its reduced computational complexity in both the DPD main path processing and parameter learning.The third contribution is the development of a new and novel frequency-optimized DPD solution which can tailor its linearization capabilities to any particular regions of the spectrum. Detailed mathematical expressions of the power spectrum at the PA output as a function of the DPD coefficients are formulated. A Newton-Raphson optimization routine is then utilized to optimize the suppression of unwanted emissions at arbitrary pre-specified frequencies at the PA output. From a complexity reduction perspective, this means that for a given linearization performance at a particular frequency range, an optimized and reduced-complexity DPD can be used.Detailed quantitative complexity analysis, of all the proposed DPD solutions, is performed in this thesis. The complexity and linearization performance are also compared to state-of-the-art DPD solutions in the literature to validate and demonstrate the complexity reduction aspect without sacrificing the linearization performance. Moreover, all the DPD solutions developed in this thesis are tested in practical RF environments using real cellular power amplifiers that are commercially used in the latest wireless communication systems, both at the base station side and at the mobile terminal side. These experiments, along with the strong theoretical foundation of the developed DPD solutions prove that they can be commercially used as such to enhance the performance, energy efficiency, and cost effectiveness of next generation wireless transmitters

    DESIGN SPACE EXPLORATION FOR SIGNAL PROCESSING SYSTEMS USING LIGHTWEIGHT DATAFLOW GRAPHS

    Get PDF
    Digital signal processing (DSP) is widely used in many types of devices, including mobile phones, tablets, personal computers, and numerous forms of embedded systems. Implementation of modern DSP applications is very challenging in part due to the complex design spaces that are involved. These design spaces involve many kinds of configurable parameters associated with the signal processing algorithms that are used, as well as different ways of mapping the algorithms onto the targeted platforms. In this thesis, we develop new algorithms, software tools and design methodologies to systematically explore the complex design spaces that are involved in design and implementation of signal processing systems. To improve the efficiency of design space exploration, we develop and apply compact system level models, which are carefully formulated to concisely capture key properties of signal processing algorithms, target platforms, and algorithm-platform interactions. Throughout the thesis, we develop design methodologies and tools for integrating new compact system level models and design space exploration methods with lightweight dataflow (LWDF) techniques for design and implementation of signal processing systems. LWDF is a previously-introduced approach for integrating new forms of design space exploration and system-level optimization into design processes for DSP systems. LWDF provides a compact set of retargetable application programming interfaces (APIs) that facilitates the integration of dataflow-based models and methods. Dataflow provides an important formal foundation for advanced DSP system design, and the flexible support for dataflow in LWDF facilitates experimentation with and application of novel design methods that are founded in dataflow concepts. Our developed methodologies apply LWDF programming to facilitate their application to different types of platforms and their efficient integration with platform-based tools for hardware/software implementation. Additionally, we introduce novel extensions to LWDF to improve its utility for digital hardware design and adaptive signal processing implementation. To address the aforementioned challenges of design space exploration and system optimization, we present a systematic multiobjective optimization framework for dataflow-based architectures. This framework builds on the methodology of multiobjective evolutionary algorithms and derives key system parameters subject to time-varying and multidimensional constraints on system performance. We demonstrate the framework by applying LWDF techniques to develop a dataflow-based architecture that can be dynamically reconfigured to realize strategic configurations in the underlying parameter space based on changing operational requirements. Secondly, we apply Markov decision processes (MDPs) for design space exploration in adaptive embedded signal processing systems. We propose a framework, known as the Hierarchical MDP framework for Compact System-level Modeling (HMCSM), which embraces MDPs to enable autonomous adaptation of embedded signal processing under multidimensional constraints and optimization objectives. The framework integrates automated, MDP-based generation of optimal reconfiguration policies, dataflow-based application modeling, and implementation of embedded control software that carries out the generated reconfiguration policies. Third, we present a new methodology for design and implementation of signal processing systems that are targeted to system-on-chip (SoC) platforms. The methodology is centered on the use of LWDF concepts and methods for applying principles of dataflow design at different layers of abstraction. The development processes integrated in our approach are software implementation, hardware implementation, hardware-software co-design, and optimized application mapping. The proposed methodology facilitates development and integration of signal processing hardware and software modules that involve heterogeneous programming languages and platforms. Through three case studies involving complex applications, we demonstrate the effectiveness of the proposed contributions for compact system level design and design space exploration: a digital predistortion (DPD) system, a reconfigurable channelizer for wireless communication, and a deep neural network (DNN) for vehicle classification

    Observing and Modeling the Physical Layer Phenomena in Open Optical Systems for Network planning and management

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    REAL-TIME ADAPTIVE PULSE COMPRESSION ON RECONFIGURABLE, SYSTEM-ON-CHIP (SOC) PLATFORMS

    Get PDF
    New radar applications need to perform complex algorithms and process a large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low-power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression algorithms for real-time transceiver optimization is presented, and is based on a System-on-Chip architecture for reconfigurable hardware devices. This study also evaluates the performance of dedicated coprocessors as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion, which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through high-performance buses, to perform floating-point operations, control the processing blocks, and communicate with an external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band testbed together with a low-cost channel emulator for different types of waveforms

    Energy-Efficient Recurrent Neural Network Accelerators for Real-Time Inference

    Full text link
    Over the past decade, Deep Learning (DL) and Deep Neural Network (DNN) have gone through a rapid development. They are now vastly applied to various applications and have profoundly changed the life of hu- man beings. As an essential element of DNN, Recurrent Neural Networks (RNN) are helpful in processing time-sequential data and are widely used in applications such as speech recognition and machine translation. RNNs are difficult to compute because of their massive arithmetic operations and large memory footprint. RNN inference workloads used to be executed on conventional general-purpose processors including Central Processing Units (CPU) and Graphics Processing Units (GPU); however, they have un- necessary hardware blocks for RNN computation such as branch predictor, caching system, making them not optimal for RNN processing. To accelerate RNN computations and outperform the performance of conventional processors, previous work focused on optimization methods on both software and hardware. On the software side, previous works mainly used model compression to reduce the memory footprint and the arithmetic operations of RNNs. On the hardware side, previous works also designed domain-specific hardware accelerators based on Field Pro- grammable Gate Arrays (FPGA) or Application Specific Integrated Circuits (ASIC) with customized hardware pipelines optimized for efficient pro- cessing of RNNs. By following this software-hardware co-design strategy, previous works achieved at least 10X speedup over conventional processors. Many previous works focused on achieving high throughput with a large batch of input streams. However, in real-time applications, such as gaming Artificial Intellegence (AI), dynamical system control, low latency is more critical. Moreover, there is a trend of offloading neural network workloads to edge devices to provide a better user experience and privacy protection. Edge devices, such as mobile phones and wearable devices, are usually resource-constrained with a tight power budget. They require RNN hard- ware that is more energy-efficient to realize both low-latency inference and long battery life. Brain neurons have sparsity in both the spatial domain and time domain. Inspired by this human nature, previous work mainly explored model compression to induce spatial sparsity in RNNs. The delta network algorithm alternatively induces temporal sparsity in RNNs and can save over 10X arithmetic operations in RNNs proven by previous works. In this work, we have proposed customized hardware accelerators to exploit temporal sparsity in Gated Recurrent Unit (GRU)-RNNs and Long Short-Term Memory (LSTM)-RNNs to achieve energy-efficient real-time RNN inference. First, we have proposed DeltaRNN, the first-ever RNN accelerator to exploit temporal sparsity in GRU-RNNs. DeltaRNN has achieved 1.2 TOp/s effective throughput with a batch size of 1, which is 15X higher than its related works. Second, we have designed EdgeDRNN to accelerate GRU-RNN edge inference. Compared to DeltaRNN, EdgeDRNN does not rely on on-chip memory to store RNN weights and focuses on reducing off-chip Dynamic Random Access Memory (DRAM) data traffic using a more scalable architecture. EdgeDRNN have realized real-time inference of large GRU-RNNs with submillisecond latency and only 2.3 W wall plug power consumption, achieving 4X higher energy efficiency than commercial edge AI platforms like NVIDIA Jetson Nano. Third, we have used DeltaRNN to realize the first-ever continuous speech recognition sys- tem with the Dynamic Audio Sensor (DAS) as the front-end. The DAS is a neuromorphic event-driven sensor that produces a stream of asyn- chronous events instead of audio data sampled at a fixed sample rate. We have also showcased how an RNN accelerator can be integrated with an event-driven sensor on the same chip to realize ultra-low-power Keyword Spotting (KWS) on the extreme edge. Fourth, we have used EdgeDRNN to control a powered robotic prosthesis using an RNN controller to replace a conventional proportional–derivative (PD) controller. EdgeDRNN has achieved 21 μs latency of running the RNN controller and could maintain stable control of the prosthesis. We have used DeltaRNN and EdgeDRNN to solve these problems to prove their value in solving real-world problems. Finally, we have applied the delta network algorithm on LSTM-RNNs and have combined it with a customized structured pruning method, called Column-Balanced Targeted Dropout (CBTD), to induce spatio-temporal sparsity in LSTM-RNNs. Then, we have proposed another FPGA-based accelerator called Spartus, the first RNN accelerator that exploits spatio- temporal sparsity. Spartus achieved 9.4 TOp/s effective throughput with a batch size of 1, the highest among present FPGA-based RNN accelerators with a power budget around 10 W. Spartus can complete the inference of an LSTM layer having 5 million parameters within 1 μs

    Optics for AI and AI for Optics

    Get PDF
    Artificial intelligence is deeply involved in our daily lives via reinforcing the digital transformation of modern economies and infrastructure. It relies on powerful computing clusters, which face bottlenecks of power consumption for both data transmission and intensive computing. Meanwhile, optics (especially optical communications, which underpin today’s telecommunications) is penetrating short-reach connections down to the chip level, thus meeting with AI technology and creating numerous opportunities. This book is about the marriage of optics and AI and how each part can benefit from the other. Optics facilitates on-chip neural networks based on fast optical computing and energy-efficient interconnects and communications. On the other hand, AI enables efficient tools to address the challenges of today’s optical communication networks, which behave in an increasingly complex manner. The book collects contributions from pioneering researchers from both academy and industry to discuss the challenges and solutions in each of the respective fields

    VLSI Design

    Get PDF
    This book provides some recent advances in design nanometer VLSI chips. The selected topics try to present some open problems and challenges with important topics ranging from design tools, new post-silicon devices, GPU-based parallel computing, emerging 3D integration, and antenna design. The book consists of two parts, with chapters such as: VLSI design for multi-sensor smart systems on a chip, Three-dimensional integrated circuits design for thousand-core processors, Parallel symbolic analysis of large analog circuits on GPU platforms, Algorithms for CAD tools VLSI design, A multilevel memetic algorithm for large SAT-encoded problems, etc
    corecore