Abstract-Systems currently being developed to operate across wide bandwidths with high sensitivity requirements are limited by the inherent dynamic range of a receiver's analog and mixedsignal components. To increase a receiver's overall linearity, we have developed a digital NonLinear EQualization (NLEQ) processor which is capable of extending a receiver's dynamic range from one to three orders of magnitude. In this paper we describe the NLEQ architecture and present measurements of its performance.
I. INTRODUCTION
In radar systems as well as many other applications (e.g., emerging Ultra-WideBand (UWB) communications), there is a need for a large dynamic range relative to the background noise and all other distortions. The need for dynamic range in radar returns is driven by the ratio of strong signals (such as clutter) versus weaker signals (returns from small targets). Systems currently being developed to operate across wide bandwidths with high sensitivity requirements are limited by the inherent dynamic range of the receiver's analog and mixed-signal components. Among these components (e.g., LNA, mixer), the ADC commonly has the lowest dynamic range [1] . A receiver's deviation from its ideal 'linear' performance is commonly characterized by its spurious-and/or intermodulation-free dynamic range (SFDR and IFDR), which is a frequency-domain measurement that determines the minimum signal level that can be distinguished from distortion components. The SFDR and IFDR of an ADC are typically dominated by circuit-based (e.g., buffer amplifier, sample-andhold) nonlinearities that are distinct from the nonlinear process of ideal quantization, which in principle can be circumvented with processing gain [2] In this paper we develop an NLEQ processor composed of nonlinear polynomial filters to reduce polynomial distortions generated by the LNA, mixer and ADC. We also address practical problems in identifying an equalization architecture, such as separating the distortions that are induced by the analog signal generator (i.e., the excitation) from those that are generated by the receiver itself.
The uniqueness of NLEQ is that it suppresses wide band receiver nonlinear distortions in a computationally efficient fashion. Existing approaches to achieving computationally This work is sponsored by DARPA under Air Force contract FA8721-05-C-0002. Opinions, interpretations, conclusions and recommendations are those of the author and are not necessarily endorsed by the United States Government. efficient polynomial filter architectures for RF compensation, principally developed to mitigate distortions generated by power amplifiers in transmitters, limit the multidimensional signal space over which the architecture can suppress spectral regrowth and in-band spurs [3] , [4] . In this paper, we develop a technique to construct a polynomial filter architecture that searches over an unrestricted multidimensional signal space to select polynomial components that yield the highest equalization performance for a given computational complexity. In particular, we develop a coordinate system representation for polynomial filters, and leverage compressed sensing techniques to identify a sparse polynomial representation of an inverse nonlinearity.
The rest of this paper is organized as follows. In Section II, we develop a coordinate system representation used to construct an NLEQ architecture, and present an optimization procedure for selecting processing elements in that coordinate system. In Section III, we demonstrate the performance of NLEQ on both Maxim and Analog Devices ADCs, and in Section IV we conclude with a brief summary.
II. NONLINEAR EQUALIZER CONSTRUCTION
To construct an NLEQ architecture, we will first develop a horizontal coordinate system representation in a polynomial basis, and then describe two separate optimization techniques for identifying a computationally efficient NLEQ architecture using components from that basis.
A. Polynomial Basis
The nonlinear system response of an ADC can often be described with the P th-order (truncated) polynomial series expansion [5] 
where N p is the memory depth in each dimension of the pth-order Volterra kernel and h p (m 1 , . . . , m p ) are the pthorder kernel coefficients. The number of non-redundant terms
. Using N samples of x(n), equation (1) can be rewritten in matrix form as
with
where each of the
T .
The full series can now be expressed in matrix form as
with nonlinear convolution matrix
It was shown in [6] that a large class of nonlinear systems can be approximated with arbitrarily small error using the polynomial representation in (1). However, this comes with the disadvantage that a relatively large number of parameters (factorial in N and p) are needed to represent systems with modest polynomial order and memory. To reduce the computational complexity when using (1) to model ADC nonlinearities, we develop an efficient representation of (1) in a new coordinate system. The kernels in equation (1) can be rewritten in terms of elements of a horizontal coordinate system (HCS) in which a pth-order processing element (PE) is formulated as
where i 1 is used to center the data over the taps of the filter. Hence, it is possible to sum all the pth-order HCS PEs to obtain the pth order kernel
. . .
where the variables i k in (5) are used to center the data over the taps of the multidimensional filter. Equation (5) geometrically corresponds to coefficients selected along a single horizontal (m) dimension while the other dimensions of the pth-order kernel (α 2 to α p ) remain fixed. The HCS representation has a very appealing interpretation, which is that we can represent (1) as the sum of one-dimensional convolutions multiplied by the product of time-delayed values of the input. Let the data matrix associated with the j th HCS PE of order p j be defined as X P Ej with the cth column given by 
represents the Hadamard product, N is the number of samples, N P Ej taps is the number of filter taps in HCS processing element j and p j ∈ {2, 3, . . . , P }. Using (7), we can formulate an approximation to (1) in which
where X P = X P E1 , . . . , X P E K is the data matrix associated with processing element set P. The construction in (8) both simplifies the computational burden of using sequential estimation in architecture identification, described next, and provides a regular structure for hardware implementation.
B. Nonlinear Equalizer Architecture Identification
Using (8), we can derive an equalizer architecture for mitigating harmonic and intermodulation (nonlinear) distortion generated by an ADC, as illustrated in Fig. 1 , where the nonlinear response of the ADC is estimated and subtracted from an appropriately delayed version of the ADC output. The objective of this section is to present techniques that enable the construction of a polynomial equalizer using a minimum set of PEs by efficiently searching over the multidimensional signal space.
1) Forward-Backward Sequential Estimation in Architecture Identification: In this section we develop a sequential algorithm for selecting PEs that minimize the mean square error, ε. Let ε(y; P) = min h y −X P h 2 denote the modeling error, where X P = X P E1 , . . . , X P E L , P is the set of PEs in the architecture with cardinality |P| = L, and y is the vector of ADC output samples. Let P ⊂ X , where X is the set of userdefined candidate processing elements with |X | L, then the pseudo-code for sequentially selecting processing elements to construct an NLEQ architecture is shown in Fig. 2 . The total number of processing elements, |X |, can be adjusted according to computational considerations; the PEs that comprise a set are unique in their polynomial order, delay values and filter initialize P = ∅ while ε(y; P) ≥ δ { // Add Processing Elements while |P| ≤ k { p ← arg min p∈X ε(y; P ∪ {p}) P ← P ∪ {p}, X ← X \{p} }; // Remove Processing Elements while |P| ≥ m { p ← arg min p∈P ε(y; P\{p}) P ← P\{p}, X ← X ∪ {p} }; }; Fig. 2 . Pseudo-code for forward-backward sequential estimation. Each new PE j from the set X that yields the best minimum mean square error (MMSE) performance in combination with the previous j − 1 PEs is added to the set P in the forward stage. In the backward stage, a PE is removed one at a time from P and placed back into X such that the PE removed has the least impact on MMSE performance.
coefficients. The parameter δ in the outer loop of the pseudocode is a threshold for the MMSE; alternatively a fixed number of iterations can be used. In either case, the parameters k and m, where m ≤ k, are user defined and can change from iteration to iteration.
2) Global Estimation in Architecture Identification: An alternative to forward-backward sequential estimation is to formulate the problem of NLEQ architecture identification as a constrained optimization, in which the constraints are imposed to insure a computationally efficient solution. In [7] , basis pursuit was used to find a sparse set of coefficients in a linear system identification problem. However, unlike linear systems, the size of the convolution matrix X(x) representing the nonlinear combinations of the input data can grow prohibitively large. One approach to reducing the dimensionality leverages the following theorem:
Theorem 1: Let X ∈ R N×M , with N M , be a nonsingular matrix whose columns correspond to the p-fold products of the input, and let y correspond to some vector in R N . Then there exists a projection matrix P ∈ R M ×N with X = P X ∈ R M×M , and hence an orthogonal projection matrix
XX y, where y = P y. Proof: Consider a matrix X with full column rank having singular value decomposition
with U ∈ R N×M , and N M . Further, consider the solutions to the least squares problems h = arg min h y − Xh and h * = arg min h P y − P Xh , where P is a M × N matrix that projects any N -dimensional vector v down to an M -dimensional space.
It is possible to use the left singular vectors to reduce the dimensionality without any loss in performance; however, the computational cost of computing the SVD (singular value decomposition) of a large matrix, coupled with the fact that Theorem 1 is only valid for N ≥ M , is of very little practical value. We propose two techniques for reducing the dimensionality in a computationally efficient fashion. To reduce the dimensionality of the rows, we leverage the following lemma [8] .
Lemma 1 (Johnson-Lindenstrauss): For any 0 < < 1 and any integer n, let K be a positive integer such that K ≥ 4( 2 /2 − 3 /3) −1 lnn. Then for any set V of n points in R N , there is a map f :
Extensions to the Johnson-Lindenstrauss lemma [9] have used concentration measure theory to show that if points in a vector space are projected onto a randomly selected subspace of suitably high dimension then the distances between the points (in a Euclidean sense) are approximately preserved. Therefore, we can construct a matrix Φ ∈ R K×N whose elements are are randomly drawn from a Gaussian distribution (N (0,
, so that the projected matrix given by ΦX ∈ R K×M where K < N does not impart a significant error during architecture identification.
To further reduce the size of the matrix X, its columns can be pruned by projection and subtraction. First, the M columns of ΦX are unit normalized so that each column can be considered as the coordinate of a point on the surface of a unit hypersphere. From the set of M vectors, L vectors are selected one at a time, such that the mth vector chosen has the highest correlation with Φy (m−1) after the m − 1 projections of the previously selected columns have been subtracted off, that is
In (10), the symbol (ΦX) i(m) is the mth column of ΦX that has the highest correlation with Φy (m−1) , whose column index is given by i(m), and x, y is the dot product of x and y. The goal of projection and subtraction is to find the points on the unit hypersphere with widest angular separation that have a significant projected component on the received data. We use projection and subtraction to reduce the dimensionality of the data so that the subsequent constrained optimization is computationally tractable. Defining the matrix Θ ∈ R M ×L such that XΘ keeps only the columns of X we get from projection and subtraction in (10), the constrained optimization to find a sparse NLEQ architecture is given bŷ
where e = [1 1 . .
T and h opt =ĥ + −ĥ − . Equation (11) is easily solved using second order cone programming (SOCP) [10] . The basis pursuit (BP) cost function, along with the constraint that h ≥ 0, in (11) is an  1 norm that favors sparse solutions [7] . The scalar regularization parameter  in the constraint balances the residual  2 reconstruction error, that is, it ensures that the sparse solution h opt whose nonzero entries span the columns of ΦXΘ is in the cone of feasible solutions. Note that unlike forward-backward sequential estimation, the columns selected (non-zero entries of h opt ) correspond to single tap processing elements, where individual PEs are not connected with any specifi c coordinate system (e.g., horizontal, etc.).
III. MEASURED RESULTS

A. Test Setup
To evaluate NLEQ performance, we used the MIT Lincoln Laboratory NLEQ testbed depicted in Fig. 3 . The analog outputs of three Agilent E8257D tone generators were combined, filtered, and injected into an ADC that was seated in a temperature controlled chamber set to 20
• C. A Windows PC running MATLAB scheduled the tone generators, controlled the Agilent 16702B logic analyzer that was used to capture data at the output of the ADC, and transferred data from the logic analyzer's memory back to the PC's hard drive. We present NLEQ performance results using an 8-bit and a 14-bit ADC: the Maxim MAX108, sampling at a rate of 1500 MS/s and the Analog Devices AD6645 sampling at a rate of 105 MS/s. 
B. Training and Verifi cation
We trained both devices with a series of one-, two-, and three-tone sets. We spaced the tones across a 40 MHz band of interest for the AD6645, and a 500 MHz band for the MAX108. The number and type of tone sets used to excite the linear and nonlinear modalities in the ADCs are listed in Table I . In all cases, the NLEQ architecture's computational complexity was constrained so that it could be effi ciently implemented in hardware.
After identifying an NLEQ architecture with the techniques presented in Section II-B.1, we cross-validated NLEQ performance using a sequence of verification tone sets. These one-, two-, and three-tone verification sets were entirely different from the training sets used to derive the coeffi cients. We quantify dynamic range by taking the mean of the individual SFDR/IFDR measurements for each of the verification tone sets. This is the mean dynamic range (MDR) performance metric, which we measure in dBFS.
The individual tone generators did not impart intermodulation distortion, however, each tone generator did impart harmonic distortion. This makes it diffi cult to separate the distortions generated by the RF receiver from those of the excitation used during training. By operating in the second Nyquist zone (IF sampling), harmonic distortions fell out of band and were filtered off by the anti-aliasing filter preceding the ADC. Alternatively, if first Nyquist zone (base band) sampling were required, ignoring harmonic distortions in training and instead spacing two tones very close to one another to mimic harmonic distortion yielded satisfactory results. Table II and Figures 4 and 5 illustrate the measured performance of NLEQ operating on data from the MAXIM MAX108 8-bit 1.5 GSps ADC, and the Analog Device AD6645 14-bit 105 MSps ADC. The MDR improvement of the MAX108 after NLEQ is roughly 22 dB, with and equalizer complexity of 217 operations per sample. The equalizer was constructed using forward/backward sequential estimation which selected 8 HCS PEs on the forward pass. Only marginal improvement (< 1 dB) using backward iterations was achieved. Basis pursuit yielded the same performance/computational complexity results as forward sequential estimation, indicating the robustness of the forward sequential estimation process in selecting HCS PEs. The MDR improvement of the AD6645 is only 10 dB, however, as is evident from Fig. 5 , the distortions are virtually pushed down to the noise fl oor. The NLEQ architecture for the AD6645 required 9 HCS processing elements. 
C. Performance
IV. SUMMARY
In this paper we developed a computationally efficient architecture for nonlinear equalization (NLEQ) in a new coordinate system to improve the dynamic range of RF receivers. We demonstrated a dynamic range improvement of over two orders of magnitude on the 8-bit, 1.5 MSps Maxim MAX108, and an order of magnitude improvement in the dynamic range of the 14-bit, 105 MSps Analog Devices AD6645. In both cases, the computational complexity of NLEQ was on the order of 200 operations per sample. We also demonstrated that NLEQ was capable of significantly improving the linearity of a device (AD6645) that already started out with a very high (90 dB) dynamic range.
