307 research outputs found

    Investigating the Dirac operator evaluation with FPGAs

    Get PDF
    In recent years the computational capacity of single Field Programmable Gate Arrays (FPGA) devices as well as their versatility has increased significantly. Adding to that the High Level Synthesis frameworks allowing to program such processors in a high level language like C++, makes modern FPGA devices a serious candidate as building blocks of a general purpose High Performance Computing solution. In this contribution we describe benchmarks which we performed using a Lattice QCD code, a highly compute-demanding HPC academic code for elementary particle simulations. We benchmark the performance of a single FPGA device running in two modes: using the external or embedded memory. We discuss both approaches in detail using the Xilinx U250 device and provide estimates for the necessary memory throughput and the minimal amount of resources needed to deliver optimal performance depending on the available hardware platform.Comment: 8 pages, 5 figure

    Decentralized Massive MIMO Processing Exploring Daisy-chain Architecture and Recursive Algorithms

    Full text link
    Algorithms for Massive MIMO uplink detection and downlink precoding typically rely on a centralized approach, by which baseband data from all antenna modules are routed to a central node in order to be processed. In the case of Massive MIMO, where hundreds or thousands of antennas are expected in the base-station, said routing becomes a bottleneck since interconnection throughput is limited. This paper presents a fully decentralized architecture and an algorithm for Massive MIMO uplink detection and downlink precoding based on the Stochastic Gradient Descent (SGD) method, which does not require a central node for these tasks. Through a recursive approach and very low complexity operations, the proposed algorithm provides a good trade-off between performance, interconnection throughput and latency. Further, our proposed solution achieves significantly lower interconnection data-rate than other architectures, enabling future scalability.Comment: Manuscript accepted for publication in IEEE Transactions on Signal Processin

    Complexity Analysis of MMSE Detector Architectures for MIMO OFDM Systems

    Get PDF
    In this paper, a field programmable gate array (FPGA) implementation of a linear minimum mean square error (LMMSE) detector is considered for MIMO-OFDM systems. Two square root free algorithms based on QR decomposition (QRD) are introduced for the implementation of LMMSE detector. Both algorithms are based on QRD via Givens rotations, namely coordinate rotation digital computer (CORDIC) and squared Givens rotation (SGR) algorithms. Linear and triangular shaped array architectures are considered to exploit the parallelism in the computations. An FPGA hardware implementation is presented and computational complexity of each implementation is evaluated and compared.ElekrobitNokiaTexas InstrumentsNational Technology Agency of FinlandTeke

    A High-Speed QR Decomposition Processor for Carrier-Aggregated LTE-A Downlink Systems

    Get PDF
    This paper presents a high-speed QR decomposition (QRD) processor targeting the carrier-aggregated 4 Ă— 4 Long Term Evolution-Advanced (LTE-A) receiver. The processor provides robustness in spatially correlated channels with reduced complexity by using modifications to the Householder transform, such as decomposing-target redefinition and matrix real-valued decomposition. In terms of hardware design, we extensively explore flexibilities in systolic architectures using a high-level synthesis tool to achieve area-power efficiency. In a 65 nm CMOS technology, the processor occupies a core area of 0.77mm2 and produces 72MQRD per second, the highest reported throughput. The power consumed in the proposed processor is 219mW

    Energy Efficient VLSI Circuits for MIMO-WLAN

    Get PDF
    Mobile communication - anytime, anywhere access to data and communication services - has been continuously increasing since the operation of the first wireless communication link by Guglielmo Marconi. The demand for higher data rates, despite the limited bandwidth, led to the development of multiple-input multiple-output (MIMO) communication which is often combined with orthogonal frequency division multiplexing (OFDM). Together, these two techniques achieve a high bandwidth efficiency. Unfortunately, techniques such as MIMO-OFDM significantly increase the signal processing complexity of transceivers. While fast improvements in the integrated circuit (IC) technology enabled to implement more signal processing complexity per chip, large efforts had and have to be done for novel algorithms as well as for efficient very large scaled integration (VLSI) architectures in order to meet today's and tomorrow's requirements for mobile wireless communication systems. In this thesis, we will present architectures and VLSI implementations of complete physical (PHY) layer application specific integrated circuits (ASICs) under the constraints imposed by an industrial wireless communication standard. Contrary to many other publications, we do not elaborate individual components of a MIMO-OFDM communication system stand-alone, but in the context of the complete PHY layer ASIC. We will investigate the performance of several MIMO detectors and the corresponding preprocessing circuits, being integrated into the entire PHY layer ASIC, in terms of achievable error-rate, power consumption, and area requirement. Finally, we will assemble the results from the proposed PHY layer implementations in order to enhance the energy efficiency of a transceiver. To this end, we propose a cross-layer optimization of PHY layer and medium access control (MAC) layer

    Systems with Massive Number of Antennas: Distributed Approaches

    Get PDF
    As 5G is entering maturity, the research interest has shifted towards 6G, and specially the new use cases that the future telecommunication infrastructure needs to support. These new use cases encompass much higher requirements, specifically: higher communication data-rates, larger number of users, higher accuracy in localization, possibility to wirelessly charge devices, among others.The radio access network (RAN) has already gone through an evolution on the path towards 5G. One of the main changes was a large increment of the number of antennas in the base-station. Some of them may even reach 100 elements, in what is commonly referred as Massive MIMO. New proposals for 6G RAN point in the direction of continuing this path of increasing the number of antennas, and locate them throughout a certain area of service. Different technologies have been proposed in this direction, such as: cell-free Massive MIMO, distributed MIMO, and large intelligent surface (LIS). In this thesis we focus on LIS, whose conducted theoretical studies promise the fulfillment of the aforementioned requirements.While the theoretical capabilities of LIS have been conveniently analyzed, little has been done in terms of implementing this type of systems. When the number of antennas grow to hundreds or thousands, there are numerous challenges that need to be solved for a successful implementation. The most critical challenges are the interconnection data-rate and the computational complexity.In the present thesis we introduce the implementation challenges, and show that centralized processing architectures are no longer adequate for this type of systems. We also present different distributed processing architectures and show the benefits of this type of schemes. This work aims at giving a system-design guideline that helps the system designer to make the right decisions when designing these type of systems. For that, we provide algorithms, performance analysis and comparisons, including first order evaluation of the interconnection data-rate, processing latency, memory and energy consumption. These numbers are based on models and available data in the literature. Exact values depend on the selected technology, and will be accurately determined after building and testing these type of systems.The thesis concentrates mostly on the topic of communication, with additional exploration of other areas, such as localization. In case of localization, we benefit from the high spatial resolution of a very-large array that provides very rich channel state information (CSI). A CSI-based fingerprinting via neural network technique is selected for this case with promising results. As the communication and localization services are based on the acquisition of CSI, we foresee a common system architecture capable of supporting both cases. Further work in this direction is recommended, with the possibility of including other applications such as sensing.The obtained results indicate that the implementation of these very-large array systems is feasible, but the challenges are numerous. The proposed solutions provide encouraging results that need to be verified with hardware implementations and real measurements
    • …
    corecore