17 research outputs found

    THOR:A Neuromorphic Processor with 7.29G TSOP2/mm2Js Energy-Throughput Efficiency

    Get PDF
    Neuromorphic computing using biologically inspired Spiking Neural Networks (SNNs) is a promising solution to meet Energy-Throughput (ET) efficiency needed for edge computing devices. Neuromorphic hardware architectures that emulate SNNs in analog/mixed-signal domains have been proposed to achieve order-of-magnitude higher energy efficiency than all-digital architectures, however at the expense of limited scalability, susceptibility to noise, complex verification, and poor flexibility. On the other hand, state-of-the-art digital neuromorphic architectures focus either on achieving high energy efficiency (Joules/synaptic operation (SOP)) or throughput efficiency (SOPs/second/area), resulting in poor ET efficiency. In this work, we present THOR, an all-digital neuromorphic processor with a novel memory hierarchy and neuron update architecture that addresses both energy consumption and throughput bottlenecks. We implemented THOR in 28nm FDSOI CMOS technology and our post-layout results demonstrate an ET efficiency of 7.29G TSOP2/mm2Js at 0.9V, 400 MHz, which represents a 3X improvement over state-of-the-art digital neuromorphic processors

    Mixture of Experts Neural Network for Modeling of Power Amplifiers

    Get PDF
    A new Mixture of Experts Neural Network (ME-NN) approach is described and proposed for modeling of nonlinear RF power amplifiers (PAs). The proposed ME-NN is compared with various piece-wise polynomial models and the time-delay neural network (TDNN) regarding their ability to scale in terms of modeling accuracy and parameter count. To this end, measurements with GaN Doherty PA at 1.8 GHz and a load modulated balanced (LMBA) PA operating at 2.1 GHz with strong nonlinear behavior and dynamics are employed, assessing the potential benefits of ME-NN over the existing models. Implementation-related advantages of the proposed ME-NN over TDNNs at increasing network sizes are furthermore discussed. The measurement results show that the ME-NN approach offers increased modeling accuracy, particularly in the LMBA PA case, compared to the existing reference methods.acceptedVersionPeer reviewe

    Low-Complexity Feedback Data Compression for Closed-Loop Digital Predistortion

    Get PDF
    This paper proposes sample combining as a low-complex and effective feedback data compression technique that allows to significantly reduce the computational effort and buffering needs for parameter adaptation in a closed-loop digital predistortion (DPD) system. Compression is achieved by applying an integrate & dump operation to an undersampled feedback signal. The proposed method is experimentally validated for RF measurement based behavioral modeling as well as closed-loop DPD of a 3.5 GHz GaN Doherty PA, taking also quantization effects of the feedback path into account. Our results demonstrate that the proposed technique is as capable as state-of-the-art histogram-based sample selection, however, at a much lower complexity.Peer reviewe

    An adaptive solution for power efficiency and QoS optimization in WLAN 802.11n

    No full text
    The wide spread use of IEEE Wireless LAN 802.11 in battery operated mobile devices introduced the need of power consumption optimization while meeting Quality-of-Service (QoS) requirements of applications connected through the wireless network. The IEEE 802.11 standard specifies a baseline power saving mechanism, hereafter referred to as standard Power Save Mode (PSM), and the IEEE 802.11e standard specifies the Automatic Power Save Delivery (APSD) enhancement which provides support for real-time applications with QoS requirements. The latest amendment to the WLAN 802.11 standard is the IEEE 802.11n standard which enables the use of much higher data rates by including enhancements in the Physical and MAC Layer. In this thesis, different 802.11n MAC power saving and QoS optimization possibilities are analyzed comparing against existing power saving mechanisms. Initially, the performance of the existing power saving mechanisms PSM and Unscheduled-APSD (UAPSD) are evaluated using the 802.11n process model in the OPNET simulator and the impact of frame aggregation feature introduced in the MAC layer of 802.11n was analyzed on these power saving mechanisms. From the performance analysis it can be concluded that the frame aggregation will be efficient under congested network conditions. When the network congestion level increases, the signaling load in UAPSD saturates the channel capacity and hence results in poor performance compared to PSM. Since PSM cannot guarantee the minimum QoS requirements for delay sensitive applications, a better mechanism for performance enhancement of UAPSD under dynamic network conditions is proposed. The functionality and performance of the proposed algorithm is evaluated under different network conditions and using different contention settings. From the performance results it can be concluded that, by using the proposed algorithm the congestion level in the network is reduced dynamically thereby providing a better power saving and QoS by utilizing the frame aggregation feature efficiently

    An adaptive solution for power efficiency and QoS optimization in WLAN 802.11n

    No full text
    The wide spread use of IEEE Wireless LAN 802.11 in battery operated mobile devices introduced the need of power consumption optimization while meeting Quality-of-Service (QoS) requirements of applications connected through the wireless network. The IEEE 802.11 standard specifies a baseline power saving mechanism, hereafter referred to as standard Power Save Mode (PSM), and the IEEE 802.11e standard specifies the Automatic Power Save Delivery (APSD) enhancement which provides support for real-time applications with QoS requirements. The latest amendment to the WLAN 802.11 standard is the IEEE 802.11n standard which enables the use of much higher data rates by including enhancements in the Physical and MAC Layer. In this thesis, different 802.11n MAC power saving and QoS optimization possibilities are analyzed comparing against existing power saving mechanisms. Initially, the performance of the existing power saving mechanisms PSM and Unscheduled-APSD (UAPSD) are evaluated using the 802.11n process model in the OPNET simulator and the impact of frame aggregation feature introduced in the MAC layer of 802.11n was analyzed on these power saving mechanisms. From the performance analysis it can be concluded that the frame aggregation will be efficient under congested network conditions. When the network congestion level increases, the signaling load in UAPSD saturates the channel capacity and hence results in poor performance compared to PSM. Since PSM cannot guarantee the minimum QoS requirements for delay sensitive applications, a better mechanism for performance enhancement of UAPSD under dynamic network conditions is proposed. The functionality and performance of the proposed algorithm is evaluated under different network conditions and using different contention settings. From the performance results it can be concluded that, by using the proposed algorithm the congestion level in the network is reduced dynamically thereby providing a better power saving and QoS by utilizing the frame aggregation feature efficiently

    A 28-nm Coarse Grain 2D-Reconfigurable Array with Data Forwarding

    No full text
    To answer the ever-increasing demand for computational power, coarse-grain reconfigurable architectures (CGRAs) experience a revival. As they provide efficient, dedicated datapaths to match the dataflow graphs of a wide range of applications, CGRAs successfully manage high performance tasks. Yet, modern CGRAs access data through classical memory hierarchies or global scratchpad memories, failing to take advantage of data locality. This reveals memory bandwidth as a key bottleneck for these reconfigurable architectures. This letter introduces a heterogeneous 2-D CGRA which distributes data buffers across its computational grid. Together with a single-cycle data forwarding path, this allows the CGRA to better take advantage of data locality, in order to minimize data transfers. A 28-nm CMOS instantiation of this concept realizes the mapping and execution of a variety of compute intensive kernels, such as a real-time 512-point FFT or 5×55\times 5 convolutional filter at high power efficiency. The CGRA demonstrates a peak energy efficiency of 584.9 GOPS/W during a 103-tap FIR filter, marking a 2.9×2.9\times improvement over state-of-the-art 2-D CGRAs

    A generic, scalable and globally arbitrated memory tree for shared DRAM access in real-time systems

    No full text
    \u3cp\u3ePredictable arbitration policies, such as Time Division Multiplexing (TDM) and Round-Robin (RR), are used to provide firm real-time guarantees to clients sharing a single memory resource (DRAM) between the multiple memory clients in multi-core real-time systems. Traditional centralized implementations of predictable arbitration policies in a shared memory bus or interconnect are not scalable in terms of the number of clients. On the other hand, existing distributed memory interconnects are either globally arbitrated, which do not offer diverse service according to the heterogeneous client requirements, or locally arbitrated, which suffers from larger area, power and latency overhead. Moreover, selecting the right arbitration policy according to the diverse and dynamic client requirements in reusable platforms requires a generic re-configurable architecture supporting different arbitration policies. The main contributions in this paper are: (1) We propose a novel generic, scalable and globally arbitrated memory tree (GSMT) architecture for distributed implementation of several predictable arbitration policies. (2) We present an RTL-level implementation of Accounting and Priority assignment (APA) logic of GSMT that can be configured with five different arbitration policies typically used for shared memory access in real-time systems. (3) We compare the performance of GSMT with different centralized implementations by synthesizing the designs in a 40 nm process. Our experiments show that with 64 clients GSMT can run up to four times faster than traditional architectures and have over 51% and 37% reduction in area and power consumption, respectively.\u3c/p\u3

    μ-Genie:A Framework for Memory-Aware Spatial Processor Architecture Co-Design Exploration

    No full text
    Spatial processor architectures are essential to meet the increasing demand in performance and energy efficiency of both embedded and high performance computing systems. Due to the growing performance gap between memories and processors, the memory system of ten determines the overall performance and power consumption in silicon. The interdependency between memory system and spatial processor architectures suggests that they should be co-designed. For the same reason, state-of-The-Art design methodologies for processor architectures are ineffective for spatial processor architectures because they do not include the memory system. In this paper, we present μ-Genie: An automated framework for co-design-space exploration of spatial processor architecture and the memory system, starting from an application description in a high-level programming language. In addition, we propose a spatial processor architecture template that can be configured at design-Time for optimal hardware implementation. To demonstrate the effectiveness of our approach, we show a case study of co-designing a spatial processor using different memory technologies.</p

    Digital Predistortion with Compressed Observations for Cloud-Based Learning

    Get PDF
    This paper presents a novel system architecture for digital predistortion (DPD) of power amplifiers (PA), where the training of the DPD model is done in a remote compute infrastructure i.e. cloud or a distributed unit (DU). In beyond-5G systems it is no longer feasible to perform computationally intensive tasks such as DPD training locally in the radio unit front-end which has stringent power consumption requirements. Thus, we propose to split the DPD system and perform the compute-intensive DPD training in the DU where more processing resources are available. To enable the distant training, the observed PA output, i.e. the observation signal, must be available, however, sending the data-intensive observation signal to the DU adds additional communication overhead to the system. In this paper, a low-complexity compression method is proposed to reduce the bit-resolution of the observation signal by removing the known linear part in the observation to use fewer bits to represent the remaining information. Our numerical simulations show a reduction of 50 % of bits/samples for the accurate training of the DPD model. The DPD performance was evaluated based on simulation for a strongly driven PA operated at 28 GHz with a 200 MHz wide OFDM signal.acceptedVersionPeer reviewe
    corecore