Abstract-RF chain circuits play a major role in digital receiver architectures, allowing passband communication signals to be processed in baseband. When operating at high frequencies, these circuits tend to be costly. This increased cost imposes a major limitation on future multiple-input multiple-output (MIMO) communication technologies. A common approach to mitigate the increased cost is to utilize hybrid architectures, in which the received signal is combined in analog into a lower dimension, thus reducing the number of RF chains. In this work we present a hardware prototype implementing analog combining for RF chain reduction. The prototype consists of a specially designed configurable combining board as well as a dedicated experimental setup. Our hardware prototype allows to evaluate the effect of analog combining in MIMO systems using actual communication signals. The experimental study, which focuses on channel estimation accuracy in MIMO channels, demonstrates that using the proposed prototype, the achievable channel estimation performance is within a small gap in a statistical sense from that obtained using a costly receiver in which each antenna is connected to a dedicated RF chain. Furthermore, in the considered scenarios, this gap becomes negligible when the reduction rate, i.e., the ratio of the number of RF chains to the number of antennas, is above 62.5%.
lead to increased cost and power consumption when operating at high carrier frequencies. This increased cost becomes a major practical bottleneck when implementing a MIMO antenna array operating at millimeter wave bands in which each antenna is connected to an RF chain.
One of the common approaches to mitigate this increased cost is to utilize fewer RF chains, namely, implement RF chain reduction. In such systems, the analog signal observed at the antenna array is combined into a lower dimension digital signal using dedicated hardware [5] . When utilizing such hybrid architectures, the number of RF chains, and hence the number of inputs processed in the digital domain, is smaller than the number of antennas. Receiver designs implementing RF chain reduction using analog combining have been the focus of extensive researches in recent years [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] . These previous works include the design of different analog combining structures [5] , [9] [10] [11] , [16] , [17] , hybrid architectures with low-resolution analog-to-digital conversion [6] , [7] , [12] [13] [14] [15] , [18] , and the integration of dynamic analog combining as part of the physical antenna technology [8] . The theoretical maturity of the concept of MIMO communications with RF chain reduction gives rise to the need to demonstrate and evaluate the implementation of such systems in hardware, which is the focus of this work.
Here we present a prototype of a configurable analog combining hardware board. Our analog combiner hardware can realize different RF chain reduction strategies, including complex gain combiners [18] , [19] , phase shifter networks [5] , [9] , and antenna selection techniques [14] . The board is designed to experimentally evaluate MIMO communications with RF chain reduction applied to actual up-converted passband signals. Our main design criteria is the channel estimation accuracy, using an analog combining configuration algorithm which extends and improves upon the method proposed in [9] . The proposed algorithm uses alternating optimization to obtain a suitable analog combining configuration under structure constraints, such as the requirement to utilize unit gain coefficients arising in phase shifter networks. Our method is capable of achieving closer feasible approximations of the optimal unconstrained combiner compared to the algorithm of [9] by introducing an additional degree of freedom which is exploited to improve the approximation under structure constraints. While we focus on task of channel estimation, we emphasize that the proposed prototype can be used to implement analog combiners designed according to alternative objectives, e.g., maximize the achievable rate as in [12] , or signal recovery, as suggested in [9] .
The proposed hardware prototype, which combines our specially designed analog combiner board with a dedicated experimental hardware setup, demonstrates the feasibility of hybrid architectures in wireless networks. In particular, we show that, using a phase shifter network configured with our analog combiner board and the proposed design algorithm, one can achieve channel estimation accuracy which is within a small gap in statistical sense from that achievable using a MIMO receiver with a complex controllable gain combiner. Furthermore, the channel estimation accuracy gap from that achievable using a costly fully digital MIMO receiver, in which each antenna is connected to a dedicated RF chain, is shown to become negligible when the RF chain reduction rate is above 62.5% for the considered scenarios.
The rest of this paper is organized as follows: In Section II, we formulate the model for wireless communication with RF chain reduction and present the channel estimation problem. Section III details the algorithm implemented for configuring the analog combiner, and Section IV presents a detailed description of the prototype and its components. Experimental results are given in Section V, and Section VI provides concluding remarks.
Throughout the paper, we use boldface lower-case letters to denote vectors, e.g., x, and the ith element of x is written as (x) i . Boldface upper-case letters are used for matrices, e.g. M, whose (i, k)th entry is denoted (M) i,k and ith column is (M) :,i . We use vec(·) to denote the vectorization operator, I n is the n × n identity matrix, ⊗ is the Kronecker product, · is the l 2 norm, · F is the Frobenius norm, tr(·) is the trace operator, while (·)
T and (·) * denote the transpose and complex transpose, respectively. The proper-complex Gaussian distribution is denoted CN , and C is the set of complex numbers.
II. SYSTEM MODEL Our hardware prototype implements RF chain reduction for MIMO receivers. In particular, we design the hybrid architecture to facilitate channel estimation in multi-antenna cellular BSs. To formulate the setup and the design objectives, we first detail the problem formulation in Subsection II-A, after which we present the considered model for the unknown channel in Subsection II-B.
A. Problem Formulation
Consider a single-cell network in which a BS is equipped with N bs antennas and serves K single-antenna user terminals (UTs). We focus on the uplink, namely, the transmission from the UTs to the BS. The BS utilizes an analog combiner, and thus observes the channel output after it has been linearly combined and acquired using N rf ≤ N bs RF chains. The analog combiner network is denoted via the matrix W ∈ W, where W ⊆ C N rf ×N bs represents the feasible set of analog combiners. This set is determined by the specific hardware of the analog combiner, and can represent, e.g., complex gains and phase shifter networks, see [5] , [9] .
Let H ∈ C N bs ×K denote the wireless channel matrix and S ∈ C τ ×K be the transmitted symbols of all the UTs in the cell over τ time instances. We can express the received baseband (BB) signal via
where N ∈ C N bs ×τ represents the additive white Gaussian noise (AWGN) corrupting the channel output, modeled as having i.i.d. zero-mean proper-complex Gaussian entries with variance p n > 0. The resulting model is illustrated in Fig. 1 .
Our hardware prototype implements the analog combining matrix W. In particular, we consider two different feasible sets for the analog combining weights W:
1) Complex-gain analog combiner (CGAC): such analog combiners can realize any form of analog combining, namely, W = C N rf ×N bs . This architecture is implemented using a hardware network with controllable gains and phase shifters.
2) Phase-shifter-only analog combiner (PSOAC): here the elements of the combiner matrix have a fixed unit magnitude. Such analog combiners are implemented using adjustable phase shifters, and thus tend to be less costly and simpler to implement compared to CGACs. Since we focus on facilitating channel estimation, we design the analog combining hardware to minimize the mean-squared error (MSE) in recovering the channel matrix H from the observed channel output Y. We assume that S represents an a-priori known orthogonal pilot sequence, i.e., that channel estimation is carried out in a pilot-aided fashion, where SS * = I τ and τ ≥ K. The design objective can thus be formulated as
We emphasize that the proposed approach can also be extended to multiple cells, in which the channel output is corrupted an additional interference term, as well as to signal recovery scenarios. In signal recovery, the observed output Y in (1) is used to recover the transmitted symbols S, assuming that knowledge of the channel matrix H (or a reliable estimate of it) is available.
B. Channel Model
As detailed in the above subsection, we consider minimum MSE (MMSE) estimation of the unknown channel matrix H from the channel output Y. We model the distribution of the channel matrix using the common Kronecker model [20] [21] [22] . Accordingly, the matrix H can be written as
where Q ∈ C N bs ×N bs and P ∈ C K×K are the deterministic non-singular receive side and transmit side correlation matrices, respectively; andH ∈ C N bs ×K models Rayleigh fading, i.e., its entries are i.i.d. zero-mean unit variance propercomplex Gaussian random variables (RVs). We henceforth assume that the transmit side correlation matrix is a scalar multiple of the identity matrix, i.e., ∃α > 0 such that P = αI K , representing the scenario in which the UTs are distributed in the cell in an i.i.d.. manner. The correlation matrices Q and P are assumed to be known to the BS, and can be utilized for recovering the unknown channel H, as discussed in the following section.
It is emphasized that part of our motivation for focusing on the Kronecker model (3) with this specific setting of P as the main scenario used in our hardware prototype stems from the fact that it facilitates deriving the optimal unconstrained CGAC, i.e., the solution to (2) when W = C N rf ×N bs . Nonetheless, our algorithm for designing PSOACs, detailed in Section III, is not restricted to a specific model and only requires an unconstrained CGAC to approximate. Furthermore, our hardware prototype detailed in Section IV is clearly model independent, and can support any analog combiner configuration.
III. ANALOG COMBINER DESIGN ALGORITHM
We now detail the proposed algorithm for designing the analog combining matrix W based on the objective (2). Our design method improves upon the minimal gap iterative quantization (MaGiQ) algorithm suggested in [9] , and consists of two steps: First, we reformulate the MSE objective as a matrix trace expression. Then, we show how this objective can be utilized to design the analog combiner.
A. MSE Objective
In order to specialize the MSE objective (2) to our channel model detailed in Subsection II-B, we write the channel inputoutput relationship (1) in vector form. In particular, it holds from (1) that y vec(Y) can be written as
where h vec(H) ∼ CN (0, P ⊗ Q) is the unknown channel in vector form and n = vec(N) ∼ CN 0, p n I N bs ·τ is the additive noise. Since h and n are mutually independent, it follows from (4) that y and h are jointly Gaussian. Hence, the MMSE estimator is given by the linear MMSE estimator [23, Ch. 8] , which can be written aŝ
Accordingly, the MSE E h −ĥ 2 is given by [23, Ch.
As expected, the MSE (6) is determined by the pilot symbols S, the second order statistical moments of the channel, represented by the correlation matrices Q, P, and the analog combiner W. For a-priori known and fixed S, P, and Q, the design of the analog combiner under the MSE objective (6) is detailed in the following subsection.
B. Analog Combiner Design
The minimization of MSE (6) is equivalent to the maximization of the second trace term of (6), i.e.,
where
Let U and D be the sets of unitary N rf × N rf matrices and diagonal N rf × N rf with positive diagonal entries. The analog combiner matrix W o for unconstrained analog combiners, i.e., W = C N rf ×N bs , is given in the following theorem: Theorem 1. Let U o be an N bs × N rf matrix whose columns are the N rf eigenvectors ofQ corresponding to its N rf largest eigenvalues, wherē
Then, for any V ∈ U and D ∈ D, the optimization problem (7) is solved by setting
Proof: The proof is detailed in the appendix. We note that in the high signal-to-noise ratio (SNR) regime, i.e., when p n ≈ 0, then U in Theorem 1 specializes the first N rf eigenvectors of αQ, coinciding with the derivation in [10, Sec. IV] which assumed noiseless setups.
The analog combiner in (10) is achievable for any V and D using the CGAC architecture, in which each element of the analog combining matrix can be any complex value. For the PSOAC case, the entries of the combiner matrix are restricted to have unit amplitude, a condition which may not be satisfied for a matrix of the form (10) . Following [9] , we propose to exploit the non-uniqueness of W o to facilitate its approximation using a feasible PSOAC matrix. In particular, we recover the selection of the non-unique V and D for which the resulting W o can be closely approximated using a feasible W, namely, our design objective is
In [9, Sec. V], only the non-uniqueness in the unitary V is exploited, and the diagonal D is assumed to be the identity matrix I N rf . Consequently, our proposed design criterion generalizes that of MaGiQ [9] , and is capable of recovering PSOAC matrices which better approximate the unconstrained MSE minimizing analog combiner compared to MaGiQ [9] . We propose to tackle the optimization problem (11) in an alternating fashion. Our design method is based on the following lemma:
ij is the projection operator. Furthermore, for any W ∈ W and D ∈ D, by letting L and R be the left and right singular vectors matrices of WDU * , respectively, it holds that
Finally, letting η > 0 be some lower bound on the diagonal entries of the matrices in D, guaranteeing that these values are strictly positive, it holds that for any W ∈ W and V ∈ U, the diagonal entries ofD = arg min
for all j = 1, . . . , N bs .
Proof. The lemma directly follows from [8, Lem 2] .
Lemma 1 implies that the optimization problem (11) can be solved using alternating optimization. In particular, we propose to update each of the three matrices W, V, and D in turn, while fixing the remaining two matrices, and to repeat this process iteratively. Since our objective in (11) is the minimization of the convex Frobenius norm, it follows from [24, Thm. 2] that the convergence of such an alternating approach is guaranteed. The proposed iterative alternating optimization method is summarized in Algorithm 1. It is noted that for a given U o , MaGiQ [9] can be considered as a special case of the proposed algorithm with D fixed to the identity matrix, i.e., without step 8. This additional degree of freedom allows our algorithm to obtain close feasible approximations of the unconstrained optimal analog combiner W o . Obtain W i+1 via (12a) with T = T i and D = D i .
7:
Obtain T i+1 via (12b) with W = W i+1 and D = D i .
8:
Obtain D i+1 via (12c) with W = W i+1 and T = T i+1 . 9: i := i + 1. The fact that our algorithm obtains a better approximation of W o compared to MaGiQ also translates to improved the channel estimation accuracy, as demonstrated in the following numerical study. We consider a multi-user MIMO system in which a BS equipped with N bs = 80 antennas and N rf = 20 RF chains serves K = 40 UTs. The receive side correlation matrix Q follows Jakes' model with antenna spacing of 0.2 carrier wavelength [25] , and the transmit side correlation P is set to I K . In Fig. 2 we evaluate the normalized MSE, defined as
The normalized MSEs are computed using the unconstrained CGAC W o as well as PSOACs designed using MaGiQ and the proposed algorithm, respectively. Observing Fig. 2 we note that the PSOAC designed using the proposed algorithm achieves effectively the same performance as the CGAC W o which requires controllable gains, while the normalized MSE achieved using MaGiQ is within a small gap from W o . It is emphasized that this small gap in normalized MSE can lead to substantial gaps in MSE, particularly when the Frobenius norm of the channel, E h 2 , is large, as common in massive MIMO systems. This numerical study demonstrates the superiority of the proposed algorithm over previous PSOAC design methods. Consequently, our hardware prototype and its experimental system use Algorithm 1 when realizing PSOACs.
In our prototype we implement both CGACs, designed via (10) by setting V = D = I N rf , and PSOACs, configured using Algorithm 1. The architecture of the prototype, which allows it to implement the aforementioned hybrid design in a dynamic manner, is discussed in the following section.
IV. SYSTEM ARCHITECTURE
In this section, we elaborate on the system architecture of the hardware prototype which realizes the RF chain reduction scheme detailed in the previous section. To that aim, we first present the high-level system architecture in Subsection IV-A, after which we discuss the concrete structure of each of the hardware components in Subsection IV-B.
A. High-Level Design 1) Experimental Environment: Our hardware prototype implements a configurable analog combiner, which is evaluated using a dedicated experimental setup at microwave frequencies. The experimental setup consists of a Matlabbased host application and a field-programmable gate array (FPGA) board. The former simulates the BB channel output and processes the signal captured after analog combining. The latter acts as an interface between the digital signals generated and processed by the host application, and the analog signals which are utilized by the analog combiner hardware. In particular, the input and output signals of the analog combiner hardware are generated as follows:
• Analog combiner input: The digital baseband channel outputs simulated by the host application are transferred by an Ethernet cable from the host application to the FPGA board in real-time. The FPGA board generates the baseband input signal which is up-converted on the combiner board using a carrier waveform generated by a VSG25A vector signal generator. The resulting analog passband signal represents the multivariate channel output observed at the BS antenna array.
• Analog combiner output: The analog combined passband signal, representing the signal fed to the RF chains at the BS, is down-converted with the same carrier waveform as for the up-conversion on the combiner board, followed by an analog-to-digital conversion implemented on the FPGA. These digital outputs are transferred from the FPGA board to the host application where they are utilized for estimating the underlying channel. The host application is also capable of adjusting the weights of the analog combining hardware via the FPGA board. The experimental procedure is illustrated in the flow chart at the top of Fig. 3 .
2) Analog combiner implementation: Several different architectures for analog combiners can be found in the literature [5] , [9] : The most common is arguably a controllable network of fully connected phase shifters, i.e., the PSOAC. Alternative architectures include fully connected complex gain networks, such as the CGAC; fully connected phase shifters and switches network, and flexible partially connected phase shifters network with sub arrays. We refer the readers to [5, Sec. II] for a detailed account of these analog combining architectures. In order to incorporate a large family of architectures, our analog combiner hardware consists of a controllable network of gains and phase shifters. The outputs of the adjustable gains and phase shifters are then summed up by a combiner and fed to an RF chain. In particular, our hardware system consists of 4 input ports and 2 output ports, namely, it can be utilized in a BS with 4 antennas and 2 RF chains. This setup can also be used for experimenting with analog combiners with larger number of antenna and RF chains using a virtual channel extension.
By using the flexible and controllable network of gains and phase shifters, we implement the proposed designs for both CGACs as well as PSOACs. The baseline is to use the basic hardware which models a MIMO BS with 4 antenna inputs and 2 RF chains. The setup can also be used as a MIMO BS with 8 antenna inputs and 4 RF chains as well as 16 antenna inputs and 8 RF chains, using a virtual channel extension. This virtual channel approach is based on a sequential utilization of the basic hardware to obtain an overall combined result.
To present the virtual channel extension, we note that the two outputs of the analog combiner are obtained as a linear combination of four antenna inputs with different weights, which are determined by some basic combining matrix W b . Defining X b HS T + N, we denote the channel output with this basic combining as
In order to simulate more antenna inputs and RF chains, we utilize multiple time instances to form a single analog combining input-output pair. 
It follows from (14) that we need to sequentially utilize the basic hardware four times to complete the divided four blocks of the analog combiner matrix. Following the same approach as the realization of 8 antenna inputs and 4 RF chains, a setup with 16 antenna inputs and 8 RF chains can be obtained by utilizing 16 different 4 × 2 basic combiners to complete the divided 16 blocks of the analog combiner matrix. Clearly, analog combiners with different analog combining ratios, namely, in which the number of antennas is not twice the number of RF chains, can be realized following the same guidelines by using only the required number of input and output ports. We conclude that the proposed hardware prototype is capable of testing a multitude of different analog combining systems and architectures in a physical configurable setup.
B. Prototype Physical Entities
The overall prototype system is depicted at the bottom of Fig. 3 . The prototype consists of the following components: a controller and display, providing the GUI; a computing center running the Matlab-based host application; an FPGA board; and the analog combiner hardware. In the following we elaborate on each of these entities. 
1) Controller and display:
The GUI allows to configure and evaluate the experimental setup in a user-friendly environment. In particular, our GUI provides the ability to change the main parameters of the experiment and to numerically compare the normalized MSEs obtained in two display modes: with respect to SNR or number of RF chains. The main controllable parameters include the number of UTs, training symbols, receive antennas, and the rank of receive side correlation matrix. Details of the supported parameter combinations are summarized in Table I .
Once the experimental setup is configured and a test is launched, the GUI presents in real-time the selected values used to configure the analog combining hardware for both PSOAC as well as CGAC. Furthermore, the GUI presents an updated normalized MSE curve during the test and after it is concluded, comparing the performance of the utilized PSOAC and CGAC to analog combiners in which the RF chains are directly connected to randomly selected antennas, as well as to a fully digital setup, namely, the performance achievable without analog combining when each antenna feeds a dedicated RF chain, constituting a fundamental lower bound on the channel estimation MSE with RF chain reduction. An overview of the GUI is depicted in Fig. 4 2) Computing Center: The computing center is a 64-bit laptop with 4 CPU cores and 16GB RAM running the Matlabbased host application. The application is controlled by the GUI, and implements the following functionalities:
• The host application computes the analog combiner weights using the algorithm detailed in Section III, and adjusts the analog combiner hardware before each simulation test.
• The application generates the digital baseband signals, i.e., the pilot sequences, as well as the wireless channel outputs, which are fed to the FPGA to generate the analog combiner input.
• On the receive side, the application processes the baseband channel outputs and produces the MMSE channel estimate via (5) .
The communication between the computing center and the hardware board is carried out over an Ethernet cable. Through the cable, the generated digital baseband channel outputs and analog combiner matrix are transmitted to the build-in memory of the FPGA board, and the baseband RF chain outputs are acquired.
3) FPGA board: The FPGA board consists of an off-theshelf Xilinx VC707 evaluation board. The evaluation board utilizes a 4DSP FMC204 16-bit DAC mezzanine card for baseband waveform generation, as well as an eight-channel 4DSP FMC168 16-bit digitizer card for sampling of the combined analog signal.
In the baseband analog signal generation process, the waveforms are stored as digital baseband I/Q pairs on the build-in memory. Then, the FPGA device reads out the prestored waveforms from the memory and employs an 8 Gbps Serializer/Deserializer (SerDes) device to transfer it to the 16-bit DAC mezzanine card. The DAC card then interpolates and converts the stored waveforms to analog baseband signals at a sample rate of 250 Msps. The analog baseband signals are transmitted to the analog combiner board through coaxial cables where they are up-converted to passband and linearly combined.
The analog combiner outputs, which are down-converted to baseband on the combiner board, are digitized using four out of eight channels of the 16-bit digitizer card with a 250 MHz sampling rate. Each I/Q pair occupies two channels of the digitizer card. The 1.2 Gbps SerDes transfers the sampled data to the FPGA who then writes the data to a digital firstin-first-out (FIFO) buffer for reading by the computing center.
The FPGA also produces selection and control commands for configuring of analog combiner. Once the host application produces an analog combiner configuration, the weights matrix is provided to the FPGA board via the FIFO buffer. These weights are then transferred to the analog combiner hardware using a serial peripheral interface (SPI) protocol, which is used to control the dedicated analog combiner board.
4) Analog combiner board:
The analog combiner board is a self-designed dedicated hardware, which realizes a controllable analog combiner network, and can serve as any of the common combiner architectures, e.g., phase shifter networks, switching networks [9] and DFT beamforming [17] . The designed analog combiner supports two different types of input signals: one is passband signals in the frequency band up to 4.5 GHz and another is BB signals each with a 125 MHz maximum bandwidth. In our experimental setup we use BB signals as our inputs, where up-conversion and downconversion are carried out on the analog combiner board. The board consists of five different blocks, implementing the main different functionalities, namely:
C1 Up-conversion of incoming signal and signal splitting. C2 Passband signal input and amplification. C3 Weights (phases and gains) application and configuration. C4 Summing up of the incoming signals. C5 Down-conversion and low-pass filtering of combined signals. The individual blocks of the analog combiner board are marked in Fig. 5 , which depicts the block diagram of the hardware as well as the circuit board.
In block C1, each of the four inputs is a complex BB signal transmitted from the FPGA, feeding an amplifier followed by a mixer. The BB signals whose maximal range of frequency is 125 MHz are up-converted to RF signals at a frequency of 1 GHz. The RF signals represent the passband signals observed by the BS antennas and each power level of the RF signals are tuned to 0 dBm and split into two same signals. In order to support passband inputs, one must only replace block C1 with the corresponding passband signals observed by the antennas.
Each of the passband signals is forwarded to block C2, where it is fed to an amplifier with 16 dB fixed gain. Specifically, we use an ADL5566 dual RF operational amplifier for each splitted pair of passband signals. This amplifier has a low latency and a very flat frequency response, supporting input signals with up to 4.5 GHz bandwidth without having the amplification vary over frequency. Thus, the receiver contains four ADL5566 amplifiers for the four RF input signals. The wideband range property of ADL5566 makes our hardware suitable for a broad range of RF signals used in actual communication systems.
The signals from the amplifiers are separated into two groups. Each group has four inputs, including one signal from each splitted pair. Then, each signal is splitter again into two signals with 90-degree offset, which are used as the inputs to a phase shifter and gain block. The purpose of this additional splitting with phase offset stems from the fact that we use differential phase shifters which operate on such inputs. In block C3, each of the four channels in one of the groups is activated by an ADL5390 analog vector multiplier, implementing the phase and gain of each analog combining weight applied to the input signal. The weights applied in the vector multiplications are determined by the output DC level of an AD7808 octal 10-bit DACs with serial load capabilities, which receives control commands from the FPGA board used to configure the analog combining weights by finding a lookup table. The usage of controllable gains and phases requires a calibration stage when the interconnections are established to guarantee that the configured weights are correctly translated into the desired phase and gain values.
Arguably, the most common analog combiner architecture is based on phase shifters [5] , [12] , [13] . In practice, applied phase shifters are digitally controlled with phase resolutions typically above 5 degrees. This crude resolution may significantly degrade the system performance by inducing quantization errors. In our novel combiner prototype, we used vector multiplexer ADL5390 as an analog phase shifter controlled by the 10-bit DAC, allowing to realize combiner with controllable gains as well as providing the ability to reach an improved resolution of less than 1.5 degree in phase.
The outputs of each group are then summed up by a combiner in block C4. Then, the combined signals are downconverted in block C5 using a set of ADL5382 downconverters with the same local oscillator used for upconversion. The down-converted signals are then filtered to baseband signals with a maximum 125 MHz bandwidth. Finally, the signals of the two outputs are forwarded to the FPGA where they are acquired using a 4DSP FMC168 16-bit digitizer card, obtaining the digital outputs for further digital signal processing in the computing center.
V. EXPERIMENT RESULTS
In the experimental study, we evaluate the channel estimation performance when the observed channel output is combined using our analog combining hardware, configured as both a CGAC as well as a PSOAC. The channel estimation accuracy is compared to that achievable when using a random antenna selection, which randomly selects N rf out of N bs antennas, and also to the traditional fully digital setup, where each antenna is connected to a dedicated RF chain. In all the presented experiments, the number of UTs is fixed to K = 3, and the pilot sequence length is τ = K = 3. Following [10] , we fix the transmit side correlation matrix P to the K × K identity matrix. The receive side correlation matrix Q is generated as follows: for a fixed N Q > 0, we randomize an N bs ×N Q proper-complex Gaussian matrix with i.i.d. zeromean unit variance entries, denotedQ, and set Q =QQ * . We consider two settings of N Q : N rf < N Q ≤ N bs , referred to as the regular setting, and N Q = N rf , referred to as the best setting. We generate 1000 independent realizations of Q. The used performance metric is the normalized MSE, computed by averaging
E{ h 2 } over the generated receiver correlation matrices. For each Monte Carlo simulation, new realizations of the channel matrix, noise vector, and pilots matrix are generated, where the noise and the channel matrix obey the model detailed in Section IV, while the pilots matrix are obtained from K eigenvectors of a τ × τ random matrix randomized from a proper-complex Gaussian distribution with i.i.d. zero-mean unit variance entries.
A. Normalized MSE versus SNR
We first test the normalized MSE in estimating the channel versus SNR in the range [0, 30] dB, for an analog combiner with N bs = 8 antennas and N rf = 4 RF chains. Note that this analog combiner model is obtained using the vitrual channel extension approach detailed in Section IV. The results are depicted in Figs. 6(a)-6(b) for high rank (regular) receive correlation and for low rank (best) receive correlation, respectively. Observing Figs. 6(a)-6(b), we note that using the proposed hardware prototype, the PSOAC configured via Algorithm 1, which represents a practical family of analog combiners, achieves channel estimation accuracy within a very small gap compared to the costly CGAC architecture. Furthermore, both PSOAC as well as CGAC architectures notably outperform the naive approach of random antenna selection. As expected, the fully digital architecture achieves the lowest MSE performance, as it has access to the observed channel output without dimensionality reduction.
For the high rank (regular) receiver correlation case, it is observed in Fig. 6(b) that every approach which implements analog combining meets an error floor at high SNRs (above 20 dB). This error floor is due to the model mismatch induced by the dimensionality reduction, which becomes the main performance bottleneck at this SNR regime. However, when the rank of the receive correlation matrix is not larger than the number of RF chains, the optimal MSE performance observed using fully digital receivers is also achievable using our analog combiner prototype, in line with the theoretical results of [9] .
B. Normalized MSE versus Number of RF Chains
Next, we numerically evaluate the normalized MSE performance achievable using our hardware analog combiner prototype for different number of RF chains. Here, we let the number of RF chains N rf vary in the range of [2, N bs ], for a fixed SNR level of 15 dB, and for a high rank (regular) receive correlation setup. The results of this experiment are depicted in Figs. 6(c), 6(d) and 6(e), for N bs = {4, 8, 16} , respectively. Observing Figs. 6(c)-6(e), we note that for all considered scenarios the normalized MSEs of our hybrid architectures approach to the normalized MSEs of the fully digital receiver as the number of RF chains increases. This results settle with the fact that for N rf = N bs these networks implement a fully digital receiver. Specially, when the reduction rate N rf N bs is above 62.5%, the performance gaps between the proposed analog combiners and the fully digital receiver become negligible. This behavior is due to the numerical observation that most simulated channels can be reliably characterized by roughly 5/8 of their eigenmodes, which are reliably recovered and restored using our analog combiner hardware prototype. Moreover, among the hybrid receivers, the PSOAC achieves normalized MSE values which are only slightly higher than that of the costly CGAC, emphasizing the benefits of its design via Algorithm 1. Finally, both the CGAC as well as the PSOAC have lower normalized MSEs than the random antenna selection.
These results demonstrate the ability of our proposed configurable analog combiner hardware to efficiently implement desirable RF chain reduction while inducing a minimal performance degradation on the overall communications system.
VI. CONCLUSIONS
In this work, we presented a hardware prototype of a MIMO receiver with RF chain reduction via configurable analog combining. Our proposed prototype consists of a specially designed combiner board as well as a dedicated experimental setup which allows to test and adjust the analog combiner weights. We configure the analog combiner to optimize the channel estimation accuracy in MIMO systems by proposing an algorithm which improves upon state-of-the art design methods. Using our hardware prototype we were able to achieve MIMO channel estimation accuracy which is comparable to that achievable using costly fully-digital receivers.
APPENDIX A PROOF OF THEOREM 1
Let W = VΛU * be the singular value decomposition of W, where V ∈ C N rf ×N rf and U ∈ C N bs ×N bs are its left and right singular matrices, respectively, and Λ ∈ C N rf ×N bs is its singular values diagonal matrix. To prove the theorem, we first note that the objective (7) is invariant to the setting of the unitary V ∈ U, and thus f (W) = f (ΛU * ) = tr (P ⊗ Q)(S * ⊗ UΛ * )
× [(S ⊗ ΛU * )((P ⊗ Q) + p n I Kn bs )(S * ⊗ UΛ * )]
Next, we write ΛU * =ΛŨ * whereΛ ∈ D andŨ ∈ C N bs ×N rf are the first N rf columns of Λ and U, respectively. Using this formulation, it is noted that (15) is invariant to the setting ofΛ, and can be written as f (W) = f (Ũ * )
= tr (P ⊗ Q)(S * ⊗Ũ) × (S ⊗Ũ * )((P ⊗ Q) + p n I Kn bs )(S * ⊗Ũ)
where (a) follows from the cyclic invariance property of the trace operator and by substituting P = αI K and SS * = I τ . Note that the optimization in (16) is solved usingŨ = U o by [9, Prop. 2], thus concluding the proof of the theorem.
