307 research outputs found
Investigating the Dirac operator evaluation with FPGAs
In recent years the computational capacity of single Field Programmable Gate
Arrays (FPGA) devices as well as their versatility has increased significantly.
Adding to that the High Level Synthesis frameworks allowing to program such
processors in a high level language like C++, makes modern FPGA devices a
serious candidate as building blocks of a general purpose High Performance
Computing solution. In this contribution we describe benchmarks which we
performed using a Lattice QCD code, a highly compute-demanding HPC academic
code for elementary particle simulations. We benchmark the performance of a
single FPGA device running in two modes: using the external or embedded memory.
We discuss both approaches in detail using the Xilinx U250 device and provide
estimates for the necessary memory throughput and the minimal amount of
resources needed to deliver optimal performance depending on the available
hardware platform.Comment: 8 pages, 5 figure
Decentralized Massive MIMO Processing Exploring Daisy-chain Architecture and Recursive Algorithms
Algorithms for Massive MIMO uplink detection and downlink precoding typically
rely on a centralized approach, by which baseband data from all antenna modules
are routed to a central node in order to be processed. In the case of Massive
MIMO, where hundreds or thousands of antennas are expected in the base-station,
said routing becomes a bottleneck since interconnection throughput is limited.
This paper presents a fully decentralized architecture and an algorithm for
Massive MIMO uplink detection and downlink precoding based on the Stochastic
Gradient Descent (SGD) method, which does not require a central node for these
tasks. Through a recursive approach and very low complexity operations, the
proposed algorithm provides a good trade-off between performance,
interconnection throughput and latency. Further, our proposed solution achieves
significantly lower interconnection data-rate than other architectures,
enabling future scalability.Comment: Manuscript accepted for publication in IEEE Transactions on Signal
Processin
Temporal unpredictability detection of real-time video sequence
Imperial Users onl
Complexity Analysis of MMSE Detector Architectures for MIMO OFDM Systems
In this paper, a field programmable gate array (FPGA) implementation of a linear minimum mean square error (LMMSE) detector is considered for MIMO-OFDM systems. Two square root free algorithms based on QR decomposition (QRD) are introduced for the implementation of LMMSE detector. Both algorithms are based on QRD via Givens rotations, namely coordinate rotation digital computer (CORDIC) and squared
Givens rotation (SGR) algorithms. Linear and triangular shaped array architectures are considered to exploit the parallelism in the computations. An FPGA hardware implementation is presented and computational complexity of each implementation is evaluated and compared.ElekrobitNokiaTexas InstrumentsNational Technology Agency of FinlandTeke
A High-Speed QR Decomposition Processor for Carrier-Aggregated LTE-A Downlink Systems
This paper presents a high-speed QR decomposition (QRD) processor targeting the carrier-aggregated 4 Ă— 4 Long Term Evolution-Advanced (LTE-A) receiver. The processor provides robustness in spatially correlated channels with reduced complexity by using modifications to the Householder transform, such as decomposing-target redefinition and matrix real-valued decomposition. In terms of hardware design, we extensively explore flexibilities in systolic architectures using a high-level synthesis tool to achieve area-power efficiency. In a 65 nm CMOS technology, the processor occupies a core area of 0.77mm2 and produces 72MQRD per second, the highest reported throughput. The power consumed in the proposed processor is 219mW
ASIP Design and Prototyping for Wireless Communication Applications
International audienc
Energy Efficient VLSI Circuits for MIMO-WLAN
Mobile communication - anytime, anywhere access to data and communication services - has been continuously increasing since the operation of the first wireless communication link by Guglielmo Marconi. The demand for higher data rates, despite the limited bandwidth, led to the development of multiple-input multiple-output (MIMO) communication which is often combined with orthogonal frequency division multiplexing (OFDM). Together, these two techniques achieve a high bandwidth efficiency. Unfortunately, techniques such as MIMO-OFDM significantly increase the signal processing complexity of transceivers. While fast improvements in the integrated circuit (IC) technology enabled to implement more signal processing complexity per chip, large efforts had and have to be done for novel algorithms as well as for efficient very large scaled integration (VLSI) architectures in order to meet today's and tomorrow's requirements for mobile wireless communication systems. In this thesis, we will present architectures and VLSI implementations of complete physical (PHY) layer application specific integrated circuits (ASICs) under the constraints imposed by an industrial wireless communication standard. Contrary to many other publications, we do not elaborate individual components of a MIMO-OFDM communication system stand-alone, but in the context of the complete PHY layer ASIC. We will investigate the performance of several MIMO detectors and the corresponding preprocessing circuits, being integrated into the entire PHY layer ASIC, in terms of achievable error-rate, power consumption, and area requirement. Finally, we will assemble the results from the proposed PHY layer implementations in order to enhance the energy efficiency of a transceiver. To this end, we propose a cross-layer optimization of PHY layer and medium access control (MAC) layer
Systems with Massive Number of Antennas: Distributed Approaches
As 5G is entering maturity, the research interest has shifted towards 6G, and specially the new use cases that the future telecommunication infrastructure needs to support. These new use cases encompass much higher requirements, specifically: higher communication data-rates, larger number of users, higher accuracy in localization, possibility to wirelessly charge devices, among others.The radio access network (RAN) has already gone through an evolution on the path towards 5G. One of the main changes was a large increment of the number of antennas in the base-station. Some of them may even reach 100 elements, in what is commonly referred as Massive MIMO. New proposals for 6G RAN point in the direction of continuing this path of increasing the number of antennas, and locate them throughout a certain area of service. Different technologies have been proposed in this direction, such as: cell-free Massive MIMO, distributed MIMO, and large intelligent surface (LIS). In this thesis we focus on LIS, whose conducted theoretical studies promise the fulfillment of the aforementioned requirements.While the theoretical capabilities of LIS have been conveniently analyzed, little has been done in terms of implementing this type of systems. When the number of antennas grow to hundreds or thousands, there are numerous challenges that need to be solved for a successful implementation. The most critical challenges are the interconnection data-rate and the computational complexity.In the present thesis we introduce the implementation challenges, and show that centralized processing architectures are no longer adequate for this type of systems. We also present different distributed processing architectures and show the benefits of this type of schemes. This work aims at giving a system-design guideline that helps the system designer to make the right decisions when designing these type of systems. For that, we provide algorithms, performance analysis and comparisons, including first order evaluation of the interconnection data-rate, processing latency, memory and energy consumption. These numbers are based on models and available data in the literature. Exact values depend on the selected technology, and will be accurately determined after building and testing these type of systems.The thesis concentrates mostly on the topic of communication, with additional exploration of other areas, such as localization. In case of localization, we benefit from the high spatial resolution of a very-large array that provides very rich channel state information (CSI). A CSI-based fingerprinting via neural network technique is selected for this case with promising results. As the communication and localization services are based on the acquisition of CSI, we foresee a common system architecture capable of supporting both cases. Further work in this direction is recommended, with the possibility of including other applications such as sensing.The obtained results indicate that the implementation of these very-large array systems is feasible, but the challenges are numerous. The proposed solutions provide encouraging results that need to be verified with hardware implementations and real measurements
- …