ABSTRACT In massive multiple-input multiple-output (MIMO) systems using a large number of antennas, it would be difficult to connect high-resolution analog-to-digital converters (ADCs) to each antenna component due to high cost and energy consumption problems. To resolve these issues, there has been much work on implementing symbol detectors and channel estimators using low-resolution ADCs for massive MIMO systems. Although it is intuitively true that using low-resolution ADCs makes it possible to save a large amount of energy consumption in massive MIMO systems, the relationship between energy consumption using low-resolution ADCs and detection performance has not been properly analyzed yet. In this paper, the tradeoff between different detectors and total baseband energy consumption including flexible ADCs is thoroughly analyzed taking the optimal fixed-point operations performed during the detection processes into account. In order to minimize the energy consumption for the given channel condition, the proposed scheme selects the best mode among various processing options while supporting the target frame error rate. The numerous case studies reveal that the proposed work remarkably saves the energy consumption of the massive MIMO processing compared with the existing schemes.
I. INTRODUCTION
Massive multiple-input multiple-output (MIMO) exploiting a number of antennas at base stations (BSs) is considered as one of the key ingredients of 5G and beyond wireless communications [1] - [5] . It was shown that simple linear processing, i.e., maximum-ratio combining (MRC) and matched beamforming (MBF) become optimal with infinite number of antennas [3] and that the transmission power can be scaled down proportional to the number of antennas [6] , [7] . Using a large number of antennas, however, would experience practical issues, e.g., high power consumption due to numerous power-hungry hardware components.
To cope with realization issues, many works [8] , [9] have shown that it is possible to use non-ideal hardware in massive MIMO systems. Among many possible non-ideal hardware components, using low-resolution ADCs at the BS is of great interest. It is well known that the ADC power consumption increases exponentially with the ADC resolution bits [10] ; i.e., using current state-of-the-art ADC architectures that exploit 12 to 16 bits resolutions would easily consume more than 100 Watts in massive MIMO systems [11] . Therefore, it is possible to save a large amount of power consumption (and also realization cost) during uplink transmissions by using low-resolution ADCs in massive MIMO systems.
Recently, there has been much work on designing symbol detectors and channel estimators for massive MIMO using low-resolution ADCs [12] - [19] . Near-optimal nonlinear detector and channel estimator for massive MIMO using one-bit ADCs were proposed in [12] while joint data detection and channel estimation with low-resolution ADCs were proposed in [13] . In [17] , the linear minimum mean squared error (LMMSE) channel estimator based on the Bussgang decomposition [20] was proposed and achievable rates using linear combiners were analyzed. It was shown that it is also possible to use low-resolution ADCs in wideband massive MIMO systems [14] , [16] , [18] . Mixed use of high-and low-resolution ADCs was analyzed in [15] and [19] where it was shown that using a few number of high-resolution ADCs on top of many low-resolution ADCs has negligible performance degradation. For detailed discussions, we refer to [21] that has well summarized and compared possible channel estimation and symbol detection techniques using low-resolution ADCs.
In spite of the attractive detection performance, however, the algorithms associated with low-resolution ADCs are in general expensive in terms of the hardware complexity and the processing energy as they require numerous processing iterations associated with non-linear components [12] , [13] . If the baseband system targets the high data-rate of the 5G specification, moreover, the previous performance-oriented algorithms are no longer acceptable for the practical solution. Therefore, the linear detection algorithms such as MRC, zero-forcing (ZF), and MMSE are regarded as the realistic candidates for the practical massive-MIMO baseband system due to their simple and non-iterative matrix operations. In addition, the energy-performance trade-offs can be utilized to further reduce the processing costs for the given channel condition. Considering a number of processing options from algorithm to circuit-design levels, for the first time, we present in this paper an efficient multi-mode massive MIMO baseband processing, which dynamically selects the energy-optimized computing option while still they are satisfying the target frame error rate (FER) performance. Including the conversion energy of flexible ADCs, experimental results show that the proposed processing method can save the energy consumption of the massive MIMO system by up to 93.12% compared to the straightforward detection scheme that only considers the baseband complexity.
The rest of this paper is organized as follows. Section II presents the backgrounds of massive MIMO systems. Section III describes the proposed unified symbol detector with the optimized computing resolutions. Including the conversion energy of flexible ADCs, in Section IV, the multi-mode computing solution is introduced to select the energy-optimized mode. Using the proposed method, a number of case studies are compared to the previous works in Section V. Finally, the conclusions are made in Section VI.
II. BACKGROUNDS OF MASSIVE MIMO SYSTEMS
We first describe the system model of massive MIMO using low-resolution ADCs. Then we explain two linear symbol detectors for massive MIMO, i.e., MRC and ZF detectors. With a large number of antennas and perfect channel state information (CSI), it is well known that these linear detectors become near optimal with perfect ADCs [3] . We assume the BS has perfect CSI in this paper as accurate channel estimation is possible even with low-resolution ADCs in massive MIMO [12] , [13] , [16] , [17] , [21] .
A. MASSIVE MIMO USING LOW-RESOLUTION ADCS
We consider a single-cell uplink system with K singleantenna users where the BS is equipped with M receive antennas with M K . Assuming a transmit power constraint of ρ for each user, the received signal y = [y 1 
where H ∈ C M ×K is the channel matrix between the BS and the K users. The k-th column of the channel matrix
T is the channel vector between the BS and the k-th user where h mk denotes the channel coefficient between the m-th antenna at the BS and the k-th user. The data vector x ∈ C K is the concatenated data symbols of the K users, i.e.,
To obey the power constraint of ρ, the data symbol of the k-th user denoted as
The noise vector n ∈ C M is the additive white Gaussian noise (AWGN), i.e., n ∼ CN (O, I M ), where O is M × 1 zero vector and I M is M × M identity matrix, respectively. Therefore, the signalto-noise ratio (SNR) is ρ.
To reduce the power consumption, the BS uses B-bit lowresolution ADCs for both the in-phase and the quadrature components of y elementwise, which gives the total number of low-resolution ADCs equipped at the BS as 2M . The quantized output of the m-th receive antenna after the use of low-resolution ADCs is given bŷ
where the function
is the B-bit quantizer with the i-th quantization levelr i ∈ R associated with its quantization region R i ∈ R. Note that functions Re (·) and Im (·) return the real and imaginary parts of the given signal, respectively. The quantized value of r ∈ R is given bŷ
The quantized received signal is then given bŷ
Note that a conceptual diagram of the system described until now is illustrated in Fig. 1 .
B. MRC DETECTOR
For MRC detector, the BS processes its received signal by multiplying the channel matrix Ĥ
wherex MRC is the MRC estimate of x, and H H is the conjugate transpose matrix of H. In general, MRC detectors VOLUME 7, 2019 attempt to maximize the SNR of user k without considering inter-user interference. Hence, MRC detectors suffer from significant performance degradation due to inter-user interference. However, it is reported that the MRC detector becomes optimal with infinite number of antennas and perfect CSI at the BS [3] .
To analyze the effect of deploying unlimited number of antennas at the BS, we rewrite (5) aŝ
wherex MRC,k is the k-th element ofx MRC . As M → ∞, the effective channel coefficients h k 2 and h H k h k in (6) are deterministic in probability, i.e., [3] 
where (7) and (8) follows from the law of large numbers (LLN). The implication of (7) and (8) is that for unlimited number of antennas with perfect CSI, the MRC detector perfectly eliminates inter-user interference, maximizing the SNR of desired signal.
C. ZF DETECTOR
The analysis of MRC detectors reveals the fact that as the number of antennas grows without limit, inter-user interference is removed and the performance of MRC detectors approaches that of an optimal detector. However, a drawback of this argument is that the assumption of unlimited antennas cannot be applied to more realistic situations where the number of BS antennas per user is limited to a finite value. In [7] , under practical situations where M → ∞ while the ratio K /M is kept fixed, it was shown that the MMSE detectors are superior to the MRC detectors by comparing the required degrees of freedom (DoF) per user to achieve the same performance. Now, consider an uplink system employing the ZF detector. The ZF estimate of x is obtained by multiplying the pseudo inverse of the channel matrix H to the received signal,
where H † denotes the pseudo inverse matrix of H. Note that the ZF detectors are equivalent to the MMSE detector at high SNR regime [22] ; hence, the ZF detector would outperform the MRC detector using similar arguments made in [7] . Moreover, as discussed in [23] , it is easy to show that the ZF detector is possible to perform perfect detection even using one-bit ADCs as the number of antennas at the BS goes to infinity. Using the algebraic formula of pseudo inverse matrix H † , ZF estimate can be expressed as,
Using (10), as a result, we can exploit the MRC estimate of x to calculate the ZF estimate by multiplying (H H H) −1 tox MRC .
III. COMPUTING MODULES FOR MIMO PROCESSING A. PROPOSED UNIFIED DETECTOR ARCHITECTURE
To support the efficient massive MIMO baseband processing, in general, it is important to develop the optimized hardware architectures for matrix operations, i.e., matrix additions, multiplications, and inversions [24] , [25] . In order to provide the energy-performance tradeoffs, in this work, the proposed symbol detector supports two different detection algorithms, i.e., the MRC detector shown in (5) and the ZF detector derived in (9) . With the practical number of antennas, in general, the ZF detector provides better FER performance while requiring more energy associated with the matrix inversion process. On the other hand, the MRC scheme uses only matrix multiplication once, which becomes more energy-efficient by sacrificing the FER performance [26] . As the results of the MRC algorithm can be reused for the processing of the ZF method, as depicted in (10), we introduce in this work the unified detector architecture. Fig. 2 shows the conceptual diagram of the proposed unified symbol detector. Note that the number of receive antennas, denoted as M , is determined during the design time, whereas the number of single-antenna users, denoted as K , in this work is fixed to 8 to show a concrete example of the proposed detector. To support the ADC speed in this work, i.e., 40M sample/s, the unified detector is designed to operate at the speed of 160MHz in 65nm CMOS technology, and thus accepts M input samples in every 4 cycles from the 2M ADCs. For the different detection algorithms, as shown in Fig. 2 , the proposed detector introduces three primitive computing modules, i.e., two types of inner-product modules (IP1 and IP2), and the matrix inversion unit (INV). Basically, IP1 and IP2 have the same architecture except for that IP1 calculates the inner product with vectors having the length of M , where IP2 targets vectors whose length is fixed to K . In fact, it is quite straightforward to realize the MRC scheme in (5), i.e., only 2 parallel IP1 units followed by the vector synchronizer are utilized for calculatingx MRC as depicted in Fig. 2 . In order to increase the operating frequency up to 160MHz, as depicted in Fig. 3 , the internal operations of an IP1 module is pipelined into 4 processing cycles. In the first cycle, more precisely, M element-wise multiplications are performed simultaneously and M /4 twolevel binary adder trees are followed to partially accumulate the multiplication results. These results are temporally stored by using M /4 pipeline registers, and then remained accumulation is performed through the following 3-stage pipelined binary adder tree as shown in Fig. 3 . Note that the vector synchronizer is necessary to collect the serially computed elements from IP1 units, finally constructing the MRC result as shown in Fig. 2 .
In contrast to the simple MRC detection, it is much more complex to support the ZF scheme in (10) as the algorithm necessitates multiple matrix multiplications followed by the matrix inversion. In order to support ZF operation, as shown in Fig. 2 , the proposed unified detector first calculates H H H from the input channel matrix H, requiring numerous IP1 modules in parallel. To reduce the number of inner products, the unified detector utilizes the fact that H H H is a Hermitian matrix, where the conjugate transpose result is identical to itself. Using this property, we can compute only the upper (or the lower) triangular parts of H H H, reducing the number of inner products from 64 to 36. In order to accept the new input symbols in every 4 cycles, we utilize 9 parallel IP1 modules by considering their pipelined processing. Similar to the MRC detecting, the matrix synchronizer rearranges the results of IP1 units to construct H H H, and A −1 approximation module generates the proper inputs to the matrix inversion in (10) . The proposed INV unit is based on Neumann series approximation, eliminating the impractical matrix division operations with the series of multiplications [27] . There exist many ways to perform matrix inversion [28] - [30] , we implemented INV unit using Neumann series approximation as it has relatively VOLUME 7, 2019 FIGURE 4. SER performance of ZF detector with exact matrix inversion (Exact) and Neumann series approximation using one iteration (P = 1).
small computation complexity when iteration number is small. According to Neumann series, the inverse of the input matrix Z can be approximated as follows.
where P stands for the number of iterations of Neumann series and A represents the initial approximation of Z. It is well known that the inverse Z −1 converges to the correct result when P is large enough [31] . Moreover, setting the initial approximation Aplays a critical role to determine the speed of convergence. In the massive MIMO processing using the large number of antennas, fortunately, the target matrix Z = H H H tends to be diagonally dominant, so we can easily choose the initial approximation A with diagonal elements of H H H as shown in Fig. 2 . As depicted in Fig. 4 , simulation results show that the proposed initial guess is acceptable as the inversion process with only one iteration already approaches the accurate result with the negligible errors. In this work, therefore, we set P to be 1 for the costeffective solution. Then, our Neumann series approximation becomes;
Note that A −1 is the diagonal matrix collecting the reciprocals of diagonal elements of Z. Note that this A −1 estimation is performed in parallel to the matrix synchronizer in Fig. 2 . Considering the simplified matrix inversion process in (12) , as a result, the proposed INV unit contains two matrix multiplier modules (MUL) and one matrix subtractor module (SUB). Handling the input 8×8 matrices in a fully parallel manner, the INV unit necessitates 4 processing cycles to complete its operation. After calculating (H H H) −1 , as depicted in Fig. 2 , the unified detector reutilizes the buffered results of the MRC scheme,x MRC , finally obtaining the ZF estimate,x ZF . This last step is operated with the IP2 modules as shown Fig. 2 . Similar to the IP1 architecture, we adopt 4-stage multiplication process for realizing the IP2 architecture having the different vector size. As each major computing step equally takes 4 processing cycles, as a result, the proposed unified symbol detector can accept the new input symbols in every 4 cycles for both algorithms. Fig. 5 illustrates the processing sequence of the proposed unified detector. Note that the processing sequence of the n-th detection process, denoted as #N , is serially allocated to the pipelined processing units without causing the unwanted waiting periods. Hence, the proposed unified detector successfully supports a seamless detection scenario, as depicted in Fig. 5 , which is necessary to the high-speed massive MIMO processing. 
B. OPTIMIZED COMPUTING RESOLUTIONS
For the practical realization of the unified symbol detector, all the computations should be operated with the fixed-point numbers, and it is important to precisely determine the proper computing resolution of each processing unit [28] . In order to minimize the energy consumption, in general, we need to reduce the computing resolution as low as possible. Based on the numerous simulations associated with different MIMO configurations and fading channel conditions, in this work, we optimize the computing resolutions of IP1, IP2, and INV operators. There are two interesting observations on the computing resolution in Table 1 . The first is that we need more bits for integer parts when the number of antennas increases. This is a natural observation that more antennas lead to the enlarged vector size on IP1 unit, and the internal peak values during inner products become bigger. The second point is related to the different resolutions between IP1 and other processing units. Starting from the INV operation, more precisely, we use much more bits on fractional parts as shown in Table 1 . As the IP1 module only performs multiply and accumulate operations, in fact, an IP1 unit does not need such a high level of accuracy in fractional parts. However, the INV performs the inverse operation of the input Hermitian matrix depicted in (10) . As the value of each diagonal element in Hermitian matrix H H H is driven by accumulating M elements of H, the outputs of inversion process consequently contain extremely small values, requiring more resolutions on fractional parts. Based on the optimized computing resolutions, therefore, the proposed unified symbol detector minimizes the hardware costs for both MRC and ZF algorithms, still providing the acceptable FER performances.
IV. ENERGY-OPTIMIZED MULTI-MODE COMPUTING SCHEME FOR MASSIVE MIMO PROCESSING
In this section, the multi-mode massive MIMO processing algorithm is newly introduced for minimizing the symbol detection energy. Based on the proposed unified detector architecture, we first have two different detecting options for the detection strategy, i.e., MRC and ZF algorithms. In order to provide more processing modes, in addition, we adopt the flexible ADC architecture that can control its quantization dynamically. Note that changing the detecting mode directly affects to the overall energy consumption as well as the FER performance. Hence, it is important to find a way to select the optimal MIMO mode, consuming the minimum amount of energy while providing the acceptable FER performance.
In our selection algorithm, depending on the channel condition, we first construct the candidate set to choose the optimal processing configuration. By examining the FER of each mode, the candidate set only includes the configurations achieving the target FER for the given SNR. For the sake of simplicity, in this work, we use the target FER of 10 −1 , which is widely accepted for the practical applications [32] , [33] . To calculate the FER, after the MIMO processing, we adopt 1024b, 0.5-rate polar codes for 5G specification [34] , [35] . Note that we also assume the M × 8 massive MIMO system where the number of receive antennas and the modulation scheme are pre-determined during the system-design level. After constructing the candidate set, we estimate the overall energy consumption of each mode to find the optimal one in terms of energy consumption. This seems to be obvious, but estimating the energy consumed by two major processing units, i.e., the ADC and the symbol detection, requires more intensive circuit-level studies as follows.
A. ENERGY IN THE UNIFIED SYMBOL DETECTOR
To calculate the energy consumption precisely, the unified symbol detector hardware dedicated for the different massive MIMO configurations are designed and fabricated in 65nm CMOS technology. Note that all the detector designs are operating at the speed of 160MHz by adapting the balanced pipelined architecture. By detecting 8 symbols in every 4 cycles, as described in the previous section, the unified detector in this work can support the high-speed ADC whose conversion rate is up to 40Msample/s, which can be happily adopted to the next-generation wireless communications [1] . In order to realize the energy-optimized detector for different MIMO configurations, all the internal processing units of each design are realized to follow the optimal computing resolutions depicted in Table 1 . Table 2 summarizes the energy consumption for handling a set of M received samples in the unified symbol detectors targeting the different MIMO configurations. By adopting the ZF algorithm, which is basically more complex algorithm than MRC, the energy consumption of symbol detection process is naturally increased compared to the simple MRC algorithm. For the case of the 256×8 massive MIMO system with 64QAM modulation, for example, the proposed unified symbol detector consumes only 6.28nJ for processing the MRC algorithm where the ZF scheme requires 6.36 times more energy for detecting the received symbols. In the previous researches, therefore, the MRC algorithm would have a higher priority than the ZF algorithm when two algorithms both provide acceptable detecting performance for the given SNR due to low energy consumption of MRC algorithm [30] . As the proposed work provides precise tradeoffs between the FER performance and the energy consumption by adopting the flexible ADCs, however, the previous simple selection rule for the detection algorithm should be reconsidered by checking the energy consumed by multiple ADCs. In other words, the MRC scheme associated with high-resolution ADCs and the ZF algorithm followed by the low-resolution ADCs are fairly compared in this work to find more energy-optimized baseband operation.
B. ENERGY CONSUMPTION IN THE FLEXIBLE ADCS
In RF-analog systems, in general, the ADC is the most critical circuit component that often determines the system performance. As ADCs have a wide variety of performance characteristics, it is necessary to quantify the performance of ADC into a single metric to understand the effect of it on the overall system. To do this, the figure of merit (FoM) can be obtained by the trade-off among the power consumption, resolution and bandwidth. In general, there are Walden FoM VOLUME 7, 2019 (FoM W ) [10] and Schreier FoM (FoM S ) [36] FoM W = Power f s_nyq × 2 ENOB (13)
where f s_nyq is the Nyquist frequency, SNDR is a signal to noise and distortion ratio, and ENOB is the effective number of bits, related to the ADC resolutions. It is well known that ENOB is calculated from SNDR as [10] ENOB = (SNDR − 1.76) /6.02 (15) Note that FoM W shows that 2× power is needed to increase 1b resolution, while FoM S shows 4× more power is required due to thermal noise limit. For this reason, state-of-the-art ADCs to date tend to follow the FoM W up to the 8b resolution, and the FoM S for higher ENOBs, where the thermal noise of capacitor limits the performance [37] . From the ADC surveys in [37] , this work selects the results of the target Nyquist frequency (40∼80MHz) and the 65nm process design only, and calculates the average FoM value of them, FoM W = 352fJ/c/s and 158dB FoM S , as shown in the Fig. 6 . Note that this ADC survey only includes the state-of-the art designs, and generally does not include the result of the power-hungry ADC input buffer or reference drivers. In addition, the flexible ADCs can be easily implemented with the minimal overheads by using various circuit techniques depending on the type of ADCs like power gating (pipeline) [38] , capacitor switching (SAR) [39] , [40] , sampling frequency scaling (delta-sigma). Therefore, it can be said that the calculated FoM S and FoM W are not overestimated values and appropriately predict the ADC energy consumption required in this study.
V. CASE STUDIES
In this section, we present several case studies to show the impacts of the proposed energy-optimized MIMO detection scheme. Based on the unified detector architecture, for fair comparisons, three different detection schemes are applied to each study; 1) the straightforward MRC-preferred method (MRC scheme has higher priority when MRC and ZF are both capable) with the 9b-fixed ADCs, 2) the MRC-preferred algorithm based on the flexible ADCs, and 3) the proposed energy-optimized scheme that considers the processing energy of the unified detector as well as the conversion energy from the flexible ADCs. Increasing the flexibility of ADC resolutions obviously leads to the precise trade-offs between the energy consumption and the FER performance. Because of the additional control logics, however, it may also increase the complexity of each ADC. Based on the FER simulations according to the different ADC resolutions, therefore, the flexible ADC in this work is assumed to have only 3 options for its resolution, i.e., 3-, 5-, and 9-bits, which is the practical assumption for providing the ADC flexibility with the acceptable complexity overheads [41] . We assumed two channels to model the realistic communication environments in various situations. The first one is the Rayleigh fading channel, which describes only non-line-of-sight (NLoS) signals, and the other one is the Rician fading channel, which includes NLoS signals and line-of-sight (LoS) signals [42] . In the Rayleigh channel, the channel matrix H ∼ CN (0, I N ) is associated with the Rayleigh-distribution random components. In the Rician fading channel, on the other hand, H is modeled as
where H ω describes NLoS signals and each element of H ω is a complex normal random variable whose absolute value is distributed as Rayleigh distribution, andH is describing LoS signals, which can be expressed as
where θ n is the angle of arrival (AoA) of the nth user [42] , [43] . Considering the practical cases, we assumed that cells are sectionalized and users are evenly distributed in the cell, therefore, AoA is distributed within the interval −π /3 and π/3. In addition, the Rician K -factor is selected to 10dB [43] . Fig. 7 illustrates the FER performance of different symbol detection algorithms and ADC resolutions of 256×8 MIMO processing with 16QAM modulation in the Rayleigh fading channels. In terms of the ideal FER performance, as we discussed, the complex ZF algorithm is the better option than the simple MRC scheme. Note that the usage of 9-bit ADCs always converges the ideal performance of each detection algorithm. On the other hands, results using ADCs with 5-and 3-bits gradually degrade the FER performance, but they still offer the target FER depending on the channel condition. For the different detection scenarios, Fig. 8 shows the energy consumption per each MIMO processing, i.e., detecting 8 received symbols at a time. With the fixed 9-bit ADCs, the straightforward algorithm changes the decoding method from ZF to MRC when SNR > −11.92dB. By utilizing the flexible ADCs, as shown in the figure, the simple MRC-preferred method changes resolution of ADCs and reduces energy consumption in each SNR regions. However, when SNR is in the region of −11.92dB < SNR < −10.41dB, the MRC-preferred scheme with flexible ADCs, depicted by triangle symbols in Fig. 8 , selects the MRC scheme with 9-bit ADCs, consuming 97.33nJ per each detection. For the same condition, on the other hand, note that the proposed scheme, depicted by cross symbols, uses only 45.03nJ and 41.12nJ by utilizing the ZF algorithm with 5-bit ADCs and 3-bit ADCs, which selects more energy-efficient processing mode. By checking the required energy in detail, as a result, our proposed energy-optimized detection method always adopts the energy-efficient scenario, saving the detection energy by up to 92.30% and 57.75% compared to the MRC-preferred scheme with fixed and flexible ADCs, respectively. Fig. 9 shows the second case study based on the 512×8 MIMO processing with the 64QAM modulation. Similar to the previous study, lowering ADC resolutions sacrifices the FER performance accordingly. Compared to the previous works, as depicted in Fig. 10 , the proposed method reduces the detection energy successfully. Under the Rayleigh fading channel conditions, as a result, our energy-optimized scheme effectively minimizes the required symbol detection energy.
A. CASE STUDIES ON RAYLEIGH FADING CHANNELS

B. CASE STUDIES ON RICIAN FADING CHANNELS
In this case study, FER performances of the Rician channels are slightly different from those of the Rayleigh channel due to the strong LoS signals [44] . More precisely, each linear detector can achieve target FER by employing fewer antennas VOLUME 7, 2019 for our working scenarios. In addition, the performance gap between the MRC detection and the ZF detection becomes negligible as the number of antennas is increased. To take into account of the realistic scenarios, therefore, in this case study, we use 128 antennas for the cases of Rician channel, which is smaller than the case studies of Rayleigh fading channel. Fig. 11 shows the FER performance of different symbol detection algorithms and ADC resolutions of 128×8 MIMO processing with the 64QAM modulation. Similar to the studies on Rayleigh fading channels, applying 9-bit ADCs achieves the ideal performances while 3-and 5-bit ADCs gradually degrades performances. As shown in Fig. 12 , it can be seen again that the previous MRC-preferred scheme is inefficient in a certain region of SNR. More precisely, the proposed energy-optimized scheme consumes only 21.92nJ if the SNR is given in the range of −3.79dB < SNR < −3.11dB, while conventional scheme consumes 48.23nJ. As a result, the proposed work saves symbol detection energy by up to 93.12% and 54.55% compared to the MRC-preferred scheme having the fixed and flexible ADCs, respectively.
Regardless of channel types and conditions, the proposed work always provides the best processing mode to minimize the symbol detection energy, leading to the cost-effective massive MIMO processing for the next-generation wireless systems.
VI. CONCLUSION
In this paper, we have proposed a new energy-optimized multi-mode symbol detection architecture. In the proposed scheme, we adopt flexible ADCs to find the optimal ADC resolution that can minimize the symbol detection energy while satisfying the target FER. For the precise estimation on the required energy, moreover, the unified symbol detector is implemented and optimized to support different major linear detection algorithms in practical applications. To verify the proposed concept, we performed a number of case studies by changing the resolution of ADCs, the number of antennas, the modulation schemes, and even the channel modeling. As a result, the proposed energy-optimized detecting scheme was shown to always minimize the required detection energy by considering both the flexible ADCs and the baseband processing at the same time, which provides more accurate trade-offs between the energy consumption and the FER performance. His research interests include circuit technique for analog/digital frequency synthesizer PLL, ultra-low-power clock generation for sensor node system, sensor interfaces, and analog/mixed signal circuits for biomedical applications. He was a recipient of the Gold Prize at the 17th HumanTech Paper Award hosted by Samsung Electronics in 2011. 
