Analog-to-digital converters (ADCs) are a major contributor to the power consumption of multiple-input multipleoutput (MIMO) communication systems with large number of antennas. Use of low resolution ADCs has been proposed as a means to decrease power consumption in MIMO receivers. However, reducing the ADC resolution leads to performance loss in terms of achievable transmission rates. In order to mitigate the rate-loss, the receiver can perform analog processing of the received signals before quantization. Prior works consider oneshot analog processing where at each channel-use, analog linear combinations of the received signals are fed to a set of onebit threshold ADCs. In this paper, a receiver architecture is proposed which uses a sequence of delay elements to allow for blockwise linear combining of the received analog signals. In the high signal to noise ratio regime, it is shown that the proposed architecture achieves the maximum achievable transmission rate given a fixed number of one-bit ADCs. Furthermore, a tradeoff between transmission rate and the number of delay elements is identified which quantifies the increase in maximum achievable rate as the number of delay elements is increased.
I. Introduction
One of the most significant challenges in the development of 5G cellular communication technologies is energy consumption. The use of large antenna arrays leads to energy demands which are inconsistent with the limited power budget available in mobile devices and small-cell access points [1] . Analog to digital converters (ADCs) are a major contributor to the power consumption in multiple-input multiple-output (MIMO) receivers. In conventional MIMO systems with digital beamforming, it is assumed that each receiver antenna is connected to a high resolution ADC [2] . In standard ADC design, the power consumption is proportional to the number of quantization bins and hence grows exponentially in the number of output bits [3] , [4] . One method which has been proposed to address high power consumption in MIMO systems with large number of antennas is to use low resolution ADCs (e.g. one-bit threshold ADCs) at each receiver antenna [5] - [12] . Reducing the ADC resolution decreases power consumption, however, it also results in lower transmission rates. This suggests a tradeoff between transmission rate and power consumption which is controlled by the number and resolution of the ADCs at the receiver.
In classical information theory, it is well-known that in order to achieve optimal transmission rates, communication must be performed over asymptotically large blocks of data [13] . More precisely, an optimal decoder performs a possibly nonlinear operation on an asymptotically large block of channel outputs. In MIMO systems using high resolution ADCs, the discretization loss is negligible due to the fine quantization grid. Simultaneous blockwise decoding is made possible by storing the digital output and performing the decoding operation over large blocklengths in the digital domain. However, when low resolution ADCs are used, discretizing the individual channel outputs prior to blockwise decoding leads to loss of information and suboptimal performance [6] . In particular, restricting to one-bit ADCs leads to large quantization noise, and a significant reduction in achievable rates [8] , [10] .
Rate-loss due to low resolution quantization can be attributed to two constituents which we call intrinsic and extrinsic rate-loss. To elaborate, consider the MIMO communication system shown in Fig. 1 . Assume that the receiver is equipped with n q one-bit threshold ADCs. An upper-bound on the channel capacity is given by min(n q , C) bits per channel use, where C is the capacity of the MIMO channel when using ADCs with very high resolution. In other words, due to the restriction on the number of ADCs, the channel capacity is decreased by at least C − min(n q , C) bits per channel-use. This intrinsic rate-loss cannot be reduced by improving the receiver architecture design without the use of additional one-bit ADCs. Considering the receiver architecture in Fig. 1(b) , in practice only a limited set of analog operations illustrated as f a (·) in the figure may be implemented. Prior works have studied the use of one-shot analog linear combiners and threshold ADCs [8] - [12] . It has been shown that the maximum rate achievable using the architecture in Fig. 1 is less than min(n q , C) due to practical limitations in analog processing [8] . More precisely, the communication system suffers an additional extrinsic rateloss of min(n q , C) − R * bits per channel-use, where R * is the maximum achievable rate when these practical limitations are taken into account. In theory, the extrinsic rate-loss may be reduced by improving the receiver architecture design.
In this work, we consider communication over MIMO channels where one-bit threshold ADCs are used at the receiver. We propose a blockwise analog processing module in which delay elements are used to reduce the extrinsic rate-loss due to one-bit ADCs. We show that for a large class of MIMO channels, in the high signal to noise ratio (SNR) regime, the proposed architecture completely eliminates the extrinsic rateloss and achieves the maximum transmission rate among all receiver architectures with a fixed number of one-bit ADCs. We show the existence of a fundamental tradeoff between the number of delay elements and the maximum achievable rate for the proposed architecture. In addition, we show that given a fixed number of one-bit ADCs and delay elements, non-zero thresholds are necessary to achieve optimal transmission rates; whereas, using asymptotically large numbers of delay elements leads to optimal rates without requiring non-zero thresholds.
In the receiver architecture proposed in this paper, the ADC thresholds are chosen according to the channel gain matrix and are assumed to be fixed throughout the transmission block. In a companion paper [14] , we propose a class of adaptive threshold receiver architectures, where the quantization thresholds at each channel-use are dependent on the channel outputs in the previous channel-uses. While the fixed threshold architecture in this paper is simpler to implement, the adaptive threshold architecture in [14] is more amiable to analysis for pointto-point (PtP) communication in the low SNR regime and multiterminal communications.
The rest of the paper is organized as follows. Section II describes the system model. Section III includes the proposed receiver architecture along with an analysis of the resulting achievable rate region. Section IV concludes the paper.
Notation: The random variable 1 E is the indicator function of the event E. The set of numbers {1, 2, · · · , n}, n ∈ N is represented by [n] . For a given n ∈ N, the n-length vector (x 1 , x 2 , . . . , x n ) is written as x n . The subvector (x k , x k+1 , · · · , x n ) is denoted by x n k . We write ||x n || 2 to denote the L 2 -norm of x n . An n × m matrix is written as
be a sequence of column vectors; the notation [h n (1) , h n (2) , · · · , h n (m) ] † represents the column vector of length mn consisting of the concatenation of the original vectors. The n × n identity matrix is shown by I n . We write a n×m ⊗ b n ×m to denote the Kronecker product of matrices. The value of i modulo k is represented by mod k (i), i, k ∈ N. The binary entropy function
II. System Model and Preliminaries

A. System Model
We consider a PtP communication system characterized by the triple (n t , n r , h n r ×n t ), where n t is the number of transmitter antennas, n r is the number of receiver antennas, and h n r ×n t is the channel gain matrix. The matrix h n r ×n t is assumed to be fixed over the transmission block, and known at the transmitter and receiver. The channel input and output vector pair (X n t , Y n r ) is related through
where N n r is a vector of independent and identically distributed Gaussian variables with unit variance and zero mean, and the channel input has average power constraint P. It is assumed that n q one-bit threshold ADCs are available at the receiver. The receiver uses the architecture shown in Fig. 1 (b) which consists of an analog signal processing step prior to quantization and a digital signal processing step afterwards. The channel output is processed in the analog domain and the resulting vector is input to the ADCs. The output of the ADCs is processed in the digital domain to reconstruct the message. In its most general form, the analog processor may have causal memory. More precisely, the output of f a (·) at time i, may depend on the matrix of received channel outputs Y i×n r , where the jth row of Y i×n r is the channel output at time j, j ≤ i. Let n ∈ N be the length of the transmission block and define G a = { f a : R n×n r → R nn q } as the space of all functions with causal memory. Due to practical considerations, only a subset of the functions in G a are implementable. We denote the space of implementable functions by F a . The set of implementable functions F a which are considered in this paper will be discussed in Section III. The communication problem is formalized below.
Definition 1 (QMIMO). A PtP MIMO system with one-bit ADCs (QMIMO) is characterized by the tuple (n t , n r , h n r ×n t , n q , F a ), where n q is the number of one-bit ADCs, and F a ⊆ G a . Let n, Θ ∈ N be a pair of natural numbers, f a ∈ F a an implementable analog function and t nn q ∈ R nn q a vector of quantization thresholds, where t in q in q −n q +1 , i ∈ [n] is the threshold vector in the ith channel use. An (n, Θ, f a , t nn q )-transmission system consists of a pair of encoding and decoding functions (e, d) where X n×n t = e(M) is the channel input over n channel uses
a sequence of one-bit ADCs. Achievability is defined in the standard Shannon sense. The capacity maximized over all implementable analog functions is denoted by C Q (h n r ×n t , n q , F a ).
In [8] , a receiver architecture is considered where the analog processing module consists of linear combiners along with non-zero threshold ADCs. The architecture is shown in Fig.  2 . The linear combiner matrix v n q ×n r is applied to the received signals at each channel-use. This receiver architecture does not allow for temporal processing of the received signals in the analog domain. More precisely, the set F a considered in [8] consists of all memoryless and linear analog processors:
where Y (k+1)n kn+1 , k ∈ {0, 1, · · · , n r − 1} is equal to the kth row of Y n×n r . We call this receiver architecture one-shot. The channel capacity using this one-shot architecture maximized over all input distributions, threshold vectors and linear combining matrices is denoted by C OS (h n r ×n t , n q ).
It is known that one-shot processing of the analog signals leads to a significant extrinsic rate-loss [8] , [10] . In fact, the one-shot capacity is shown to grow at most logarithmically in the number of one-bit ADCs. Consequently, in the high SNR regime, the extrinsic rate-loss due to the application of one-shot receiver architectures is at least 1 n q − O(log n q ) and becomes arbitrarily large as the number of ADCs is increased.
In Section III, we introduce blockwise analog processing architectures which use delay elements to allow for temporal processing of the analog signals before quantization. We show in Theorem 1 that this allows us to completely eliminate the extrinsic rate-loss in a large class of MIMO systems in the high SNR regime including MIMO systems where the number of one-bit ADCs is at least twice the number of transmitter antennas and receiver antennas. To analyze the performance of the proposed architectures, we utilize a geometric interpretation introduced in [8] , [10] . The geometric interpretation is especially helpful in analyzing the set of achievable rates in the high SNR regime.
The geometric interpretation and combinatorial background is briefly described in the next subsection.
B. Combinatorial Background
Loosely speaking, as the SNR is increased, the effect of noise in Equation (1) becomes negligible and the channel output is almost equal to h n r ×n t X n t . In fact, in the absence of noise, the channel output space is Im(h n r ×n t ) the image of the channel gain matrix h n r ×n t . In the following, we describe the relation between partitions of the subspace Im(h n r ×n t ) and the maximum transmission rate when one-shot receiver architectures are used.
Consider the one-shot architecture described in Fig. 2 . For a given channel output vector y n r , let j i = Q(w i ), i ∈ [n q ], where w n q = v n q ×n r y n r + t n q is the input vector to the one-bit ADCs. The binary vector j n q is the vector of ADC outputs. The set of all channel output vectors y n r which result in the ADC output vector j n q is
For a given pair (t n q , v n q ×n r ), the collection of sets B(t n q , v n q ×n r ) = {B j 1 , j 2 ,··· , j nq | j i ∈ {0, 1}, i ∈ [n q ]}, is a partition of Im(h n r ×n t ). The number of non-empty partition elements 1 We write f ( Fig. 2 . A one-shot receiver architecture, where the linear combiner is characterized by the matrix v nr ×nq , and the ADC thresholds are t nq = (t 1 , t 2 , · · · , t nq ).
corresponds to the number of messages which can be transmitted reliably as the SNR is taken to be asymptotically large. Note that for some binary vectors j n q , the set B j 1 , j 2 ,··· , j nq may be empty. For instance, let n t = n r = 1, n q = 2, h n r ×n t = 1, v n q ×n t = 1 1 , and t n q = 0 0 . Then B 0,1 = {y|Q(y) = 0, Q(y) = 1} = ∅, similarly, B 1,0 = ∅. As a result, the number of partition elements may be less than 2 n q . In order to increase the transmission rate, it is desirable to choose (t n q , v n q ×n r ) such that the number of non-empty partition elements is maximized. We use the following proposition throughout the paper.
Proposition 1. ([15])
The maximum number of non-empty partition elements is given by
Additionally , if the threshold vector is taken to be the all-zero vector, then:
The maximum number of non-empty partition regions grows exponentially in n q since log n k = nh b ( k n ) + O(log n) [13] .
III. Blockwise Receiver Architectures
We propose blockwise receiver architectures in which delay elements are used to perform blockwise temporal processing of the received signals before quantization. Communication is performed in n = b channel-uses, where n, and b are called the blocklength, inner blocklength, and outer blocklength, respectively. The blockwise receiver architecture uses a delay network consisting of 2 delay elements as shown in Fig. 3 , where each delay element D n r (·) takes the vector of received signals at the ith channel-use Y n r (i) and outputs Y n r (i − 1). In other words, D n r (·) delays the received analog vector by one channel-use. The stored analog signals are combined using the linear combining matrix v n q × n r over channel-uses.
To clarify the linear combination process, let us describe the first 3 channel-uses. In the first channel-uses the received signals Y n r (i), i ∈ [ ] are stored in the delay network. In the next channel-uses, the second batch of received signals Y n r (i), + 1 ≤ i ≤ 2 are stored in the delay network while the linear combiner operates on the previously stored signals Y n r (i), i ∈ [ ]. More precisely, for the ith channel-use where
, Y nr (i)) and i ≥ 2 .
In the third channel-uses, the third batch of received signals Y n r (i), 2 + 1 ≤ i ≤ 3 are stored in the delay network while the linear combiner operates on the previously stored signals Y n r (i), i ∈ [ ]. This process continues for b blocks of length until the nth channel-use, where n is the blocklength. The output of the linear combiner is given to the n q one-bit threshold ADCs. The threshold vector used in the one-bit ADCs changes periodically with a period of channel-uses. More precisely, let t n q ∈ R n q and define t n q (k) = t kn q kn q −n q +1 , k ∈ [ ]. For the first channel-uses, the threshold vector t n q (i) is used in the ith channel-use. For the second channel-uses, the threshold vector t n q (i − ) is used in the ith channel-use. Generally, for i ∈ [n], let k = mod (i), the vector t n q (k) is used as the threshold vector for the one-bit ADCs at the ith channel-use.
We call the resulting communication system a D-QMIMO system, where D refers to delay. The set of implementable analog functions for this architecture is:
where Y jn r ( j−2 )n r +1 = (Y n r ( j−2 +1), Y n r ( j−2 +2), · · · , Y n r ( j−1), Y n r ( j)), where 2 ≤ j ≤ n. The channel capacity optimized over all analog combining matrices, and threshold vectors is denoted by C (h n r ×n t , n q ) for a given delay .
Note that D-QMIMO systems are a special class of QMIMO systems where the analog processing is restricted to linear operations. Furthermore, the one-shot setup described in Fig.  2 is a special case of D-QMIMO where the length of each inner-block is equal to one (i.e. = 1). As a result, C OS (h n r ×n t , n q ) = C 1 (h n r ×n t , n q ) ≤ C (h n r ×n t , n q ), where ≥ 2.
We derive the bounds provided in Theorem 1 below on the performance of the following proposed coding strategy. Consider a D-QMIMO communication system where (t n q , v n q × n r ) are taken so that the partition B(t n q , v n q × n r ) has the maximum number of non-empty elements as described in Proposition 1.
In other words, (t n q , v n q × n r ) are chosen such that the number of non-empty partitions is equal to rank(h nr ×nt ) i=0 n q i . We use the fact that log n k = nh b ( k n ) + O(log n) [13] and perform a second order analysis of the number of non-empty partition elements as the number of delay elements is increased asymptotically to characterize the set of achievable rates for high SNRs.
Theorem 1. For the D-QMIMO communication system with n q one-bit ADCs, the capacity C satisfies the following
as SNR→ ∞, where α = min{ rank(h nr ×nt ) n q , 1 2 }, β = min{ n r n q , 1 2 }. Particularly, if rank(h n r ×n t ) = n r , then
An outline of the proof is provided in the Appendix, where a general coding strategy for arbitrary SNRs is presented. The resulting rate is analyzed in the high SNR regime. The complete proof is given in [16] .
The following observations follow from Theorem 1: I) The capacity approaches n q as → ∞ if n q ≤ 2rank(h n r ×n t ). Consequently, the extrinsic rate-loss is completely eliminated. This is in contrast with prior works (e.g. [8] , [10] ), where the high SNR capacity grows logarithmically in n q . II) The maximum achievable rate due to using non-zero threshold ADCs is 1 log rank(h nr ×nt ) i=0 n q i , whereas when zero threshold ADCs are used, the maximum rate is 1 log rank(h nr ×nt ) i=0 2 n q −1 i . The two values converge to each other as → ∞. This shows that when long delays can be tolerated, zero threshold ADCs can be used in scenarios where non-zero thresholds are costly to implement without any loss in transmission rate. III) For a fixed number of transmitters n t and receivers n r , as the number of one-bit ADCs n q is increased, the maximum achievable rate increases linearly when n q ≤ 2rank(h n r ×n t ) since h b (α) = h b ( 1 2 ) = 1. The maximum achievable rate increases logarithmically when n q 2rank(h n r ×n t ) since n q h b rank(h n r ×n t ) n q = rank(h n r ×n t )(log n q − O(log n q )). This is shown in Fig. 4 , where for a MIMO system with n r = 10, the achievable rate in Theorem 1 is plotted as a function of n q for n t ∈ {2, 4, 6, 8} as the number of delay elements is taken to be asymptotically large.
IV. Conclusion
We have considered point-to-point communication over MIMO systems when a limited number of one-bit ADCs are available at the receiver. We have proposed a receiver architecture which uses a sequence of delay elements to allow for blockwise linear combining of the received analog signals. In the high SNR regime, given a fixed number of one-bit ADCs, we have shown that the proposed architecture achieves the maximum transmission rate among all receiver architectures. Furthermore, we have characterized a tradeoff between Fig. 4 . The figure shows the maximum achievable high SNR rate when the number of delay elements is taken to be asymptotically large for the MIMO system with n r = 10 and n t ∈ {2, 4, 6, 8}. The red full line is the R = n q line which is achievable if n t , n r are asymptotically large. The channel is assumed to be full-rank. transmission rate and the number of delay elements which quantifies the increase in maximum achievable rate as the number of delay elements is increased. In a companion paper [14] we propose a class of adaptive threshold architectures analyze their performance in PtP communications in the low SNR regime and broadcast channel communications.
Proof of Theorem 1
To prove the achievability (lower bound on C ), we describe an outline of the coding strategy for arbitrary SNRs, where the average transmission power constraint is E(||X n t || 2 2 ) ≤ P. The resulting communication rate is then analyzed as SNR→ ∞. Fix and b, where b is the outer code blocklength.
Consider the pair (t n q , v n q × n r ) which achieve the maximum number of non-empty sets in Proposition 1. For a given rate R, define Θ n = 2 nR , where n = b . The message M ∈ [Θ n ] is transmitted over b + 1 transmission blocks each of symbols, where each symbol is in R n t . Let B(t n q , v n q × n r ) = {B j nq | j i ∈ {0, 1}, i ∈ [ n q ]} be the partition corresponding to the pair (v n q × n r , t n q ) as defined in Section II. Define J = { j n q ∈ {0, 1} n q |B j nq ∅}, and letŷ n r j nq ∈ B j nq , j n q ∈ J be a set of representatives for the partition elements. We define the input vector corresponding toŷ n r j nq aŝ
x n t j nq = argminˆy nr j nq =(h nr ×nt ⊗I )x nt x n t 2 ,
and the cost associated withŷ n r j nq as:
c(ŷ n r j nq ) = minˆy nr j nq =(h nr ×nt ⊗I )x nt x n t 2 . We define the random vector Z n r through the transition probability P Z nr | Y nt , where: P Z nr | Y nt (ŷ n r k nq |ŷ n r j nq ) = P v n q × n r (ŷ n r j nq + N n r ) + t n q ∈ B k nq , where j n q , k n q ∈ J. Let C outer be the capacity of the PtP channel with the transition probability P Z nr | Y nr subject to the average power constraint E(c( Y n r )) ≤ P. We first construct a family of capacity achieving codes for this channel using standard random coding methods. Each symbol Y n r J nq in the randomly generated codewords has alphabet R n r . In order to transmit the symbol Y n r J nq , the transmitter finds the corresponding input X n t (Equation (4) ) and transmits the vector over channel uses. As the SNR goes to infinity, the channel P Z nr | Y nt becomes noiseless. It can be shown that the capacity is equal to 1 H( Y n r ) = 1 log rank(h nr ×nt ) i=0 n q i . We use the fact that log n k = nh b ( k n ) + O(log n) to show that the expression converges to the achievable rate in Theorem 1. The converse follows by using the Fano's inequality along with Proposition 1 and is omitted due to space limitations. The complete proof is given in [16] .
