An adaptive channel shortening equalizer design for multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) radio receivers is considered in this presentation. The proposed receiver has desirable features for cognitive and software defined radio implementations. It consists of two sections: MIMO decision feedback equalizer (MIMO-DFE) and adaptive multiple Viterbi detection. In MIMO-DFE section, a complete modified Gram-Schmidt orthogonalization of multichannel input data is accomplished using sequential processing multichannel Givens lattice stages, so that a Vertical Bell Laboratories Layered Space Time (V-BLAST) type MIMO-DFE is realized at the front-end section of the channel shortening equalizer. Matrix operations, a major bottleneck for receiver operations, are accordingly avoided, and only scalar operations are used. A highly modular and regular radio receiver architecture that has a suitable structure for digital signal processing (DSP) chip and field programable gate array (FPGA) implementations, which are important for software defined radio realizations, is achieved. The MIMO-DFE section of the proposed receiver can also be reconfigured for spectrum sensing and positioning functions, which are important tasks for cognitive radio applications. In connection with adaptive multiple Viterbi detection section, a systolic array implementation for each channel is performed so that a receiver architecture with high computational concurrency is attained. The total computational complexity is given in terms of equalizer and desired response filter lengths, alphabet size, and number of antennas. The performance of the proposed receiver is presented for two-channel case by means of mean squared error (MSE) and probability of error evaluations, which are conducted for time-invariant and time-variant channel conditions, orthogonal and nonorthogonal transmissions, and two different modulation schemes.
Introduction
The fundamental problem in the design of future wireless communication systems is to reliably and efficiently transmit and receive information signals over imperfect channels using substantially high data rates. One successful approach adopted in several wireless standards such as digital audio broadcasting (DAB), digital video broadcasting (DVB-T), local area networking (LAN), and metropolitan area networking (MAN) is orthogonal frequency division multiplexing (OFDM) in which the entire bandwidth is divided into several narrow subbands so that the frequency response over each individual subband is relatively flat, and each subband channel occupies only a small fraction of the original bandwidth. Nevertheless, OFDM-based wireless communication systems can receivers are reduction of complexity, i.e., avoidance of matrix inversions, and extension to broadband implementation. Consequently, the recent research such as [6] [7] [8] focused on reducing the complexity of V-BLAST receiver architectures for frequency selective channels by developing efficient matrix inversion operations.
An important problem in realizing OFDM system designs, however, is the appendage of a cyclic prefix (CP) with a length at least equal to the channel length to each block of N IFFT coefficients, and this application may not be adequate in case the length of CP, ξ , is large relative to the data length, N, so that the channel throughput is reduced by a factor N/(N + ξ). In addition, information about the channel length may not even be available in some cases. Accordingly, it is desired to design systems that guarantee a certain amount of throughput, i.e., N/(N + ξ) ∼ = 1, in all possible channel conditions. An elegant solution to this problem is to implement a time domain equalizer to shorten the channel memory and hence reduce the CP overhead [9] .
A recent development in the design of next-generation wireless communication systems is the cognitive radio, built on a software radio, which functions as an intelligent system that is aware of its environment and uses the methodology of understanding-by-building to learn from the environment and adapt to statistical variations in the input stimuli in order to establish reliable communication by efficient utilization of the radio spectrum [10] . The concept of software radio on the other hand relies on the development of DSP technology, that is flexible, reconfigurable, and reprogrammable by software to adapt to an environment where there are multiple services, standards, and frequency bands [11] . Correspondingly, the infrastructure in a software radio system is generally required to use reconfigurable VLSI hardware components such as DSP chip sets [12] , FPGAs [13] , embedded processors [14] , and even general purpose processors [15] .
A typical cognitive radio cycle includes spectrum sensing, analysis, reasoning, and adaptation to new operating parameter steps [16] . The cognitive radio can detect the availability of a portion of frequency band through spectrum sensing and analysis steps [17] . During the reasoning step, it determines the optimum operating parameters, so that no harmful interference to other users of the spectrum is generated due to its transmission. In the adaptation step, the radio switches to transmission and reception mode using its reconfigurability and reprogrammability property and tunes its operating parameters according to its best response strategy. Another emerging requirement for cognitive radios is location and environment awareness that involves modeling the capabilities of human beings and bats for realization of advanced and autonomous location and environment awareness features [18] . Adaptive positioning, determining the coordinates of a cognitive radio in space, is a step towards realization of accurate location awareness in cognitive radios. The author has recently proposed a spectrum estimation method in [19] and a range estimation method in [20] that are suitable for spectrum sensing and positioning functions of cognitive radios, respectively. In this paper, we focus on the reception mode of operation of cognitive MIMO-OFDM radios and propose a new minimum MSE channel shortening equalizer design, which consists of adaptive fron-tend MIMO-DFE and multiple Viterbi detection sections.
The optimum solution for the channel shortening equalization problem can be found using one of these constraints: (1) unit tap constraint and (2) unit energy constraint; and the performances under these two criteria were compared for single input single output (SISO) and MIMO channel shortening equalization in [21, 22] , respectively. Since the findings in these papers show that the unit energy constrained channel shortener equalization resulted in better performance, we have used unit energy constraint for the MIMO channel shortening optimization problem under consideration in this paper. Accordingly, the contributions of the paper can be stated as follows: (1) the proposed equalizer has a front-end MIMO-DFE as opposed to the MIMO feed forward equalizer (MIMO-FFE) in [22] , (2) a modified version of sequential processing multichannel lattice stages (SPMLSs) [23] is utilized in the design of front-end MIMO-DFE and a complete modified Gram-Schmidt orthogonalization of multichannel input data, which avoids matrix inversions, enables scalar only operations and contributes to the flexibility, reconfigurability, and reprogrammability of the receiver, is attained, (3) the proposed equalizer can be viewed as a V-BLAST receiver for frequency selective channels, (4) spectrum sensing or range estimation can be accomplished at no cost by simply reconfiguring the front-end MIMO-DFE as multichannel spectral analysis or positioning filter as shown in [19, 20] , respectively, and (5) a detailed computational complexity and performance analysis is presented. The first contribution is important from the perspective of interference removal and, by means of that, error performance, whereas the second one is considered the key since matrix inversion is a major bottleneck in the design of embedded receiver architectures that increases computational complexity [24] . The third one is relevant since the receiver operations of a V-BLAST system can be considered as performing Gram-Schmidt orthogonalization [25] , whereby inter-symbol interference (ISI) as well as interchannel interference (ICI) effects are suppressed. The fourth contribution is crucial from the point of view of cognitive radio operation cycle, so that the filter structure of MIMO-DFE can be reused for spectrum sensing and range estimation functions of cognitive radio, and finally, a comparative computational complexity and performance analysis of the proposed equalizer with respect to the other benchmark equalizers such as MIMO-DFE, MIMO-FFE, and multichannel Viterbi equalizer (VE), has been provided, which to the best of the author's knowledge, does not exist in the literature.
Various MIMO-DFEs for MIMO ISI channels have been proposed in the literature [26] [27] [28] after the introduction of the finite length MIMO-DFE in [29] , and it has been delineated by Ginis and Cioffi in [30] that DFE is the basic principle behind the BLAST detection algorithm. Very recently, a QR decomposition-based MIMO-DFE has been presented by Wang et al. in [31] . In QR decomposition approaches, the Q matrix is implicitly formed and then used to compute the R matrix; whereas in the Gram-Schmidt approach, the inverse of the R is implicitly formed and then used to compute the Q matrix. As a consequence of this fact, Regalia and Bellanger showed in [32] that there exists a duality between QR and lattice methods and the possibility of combining elements of both approaches to obtain new hybrid algorithms. With respect to developing these hybrid algorithms, Ling showed in [33] that an orthogonal Givens rotation-based algorithm algebraically coincides with the recursive modified Gram-Schmidt-based lattice algorithm in [34] . In accordance with this perspective, we modify the SPMLS using Givens rotation-based lattice algorithms of [33] on the structure of the SPMLS, so that a sequential processing multichannel Givens lattice stage (SPMGLS) is obtained. SPMLSs are known for their modular, orderrecursive, and regular structure as we previously used them in the decision feedback equalization of nonlinear communication channels in [35] . Additionally, good numerical properties are incurred by the use of Givens rotation-based lattice algorithms.
Subsequent to Givens lattice realization of the front-end MIMO-DFE, we perform a systolic array implementation of multiple adaptive Viterbi detectors [36] , thereby a highly concurrent receiver structure is obtained. A twochannel (2 × 2) problem is considered in this presentation due to the ease of explanation and space limitations in developing the method. However, it is considered straightforward to apply the method to any number of channels, i.e., to massive MIMO implementations for next-generation wireless systems [37] , at the expense of increased complexity. Even though the complete orthogonality and thereby the suppression of ISI and ICI is accomplished in the minimum mean square error sense for any number of transmit and receive antennas, the performance achieved in terms of MSE as well as in the probability of error will depend on how much the channel is ill-conditioned.
The organization of this paper is as follows. In Section 2, the adaptive multichannel channel shortening equalization optimization problem is introduced. In Section 3, we describe the adaptive multiple Viterbi detection section of the proposed equalizer. The computational complexity computations are treated in Section 4. In Section 5, we present the experimental results, and finally, Section 6 is concerned with the discussion of results and conclusions. (•) * represents the complex conjugate of (•). (•) T , and (•) H stands for the transpose and the Hermitian transpose of (•), respectively. The variables m, i, and n are global while all other variables are local. The variable m represents the stage number while i and n are the time indexes related to data and coefficients, respectively, till we equate them in Section 3 to have a single time index.
Optimization problem statement
We consider the discrete-time baseband equivalent 2 × 2 channel shortening equalization problem depicted in Figure 1 , where the number of transmitters (M T ) and receivers (M R ) is assumed equal, so that the number of
Note that the kth baseband channel in Figure 1 models the effects of serial-to-parallel (S/P) and parallel-to-serial (P/S) conversions, the addition and removal of CP, the IDFT and DFT operations, and the physical baseband channel itself as delineated in Figure 2 . Accordingly, the input signal to the kth receiver can be expressed as the sum of transmitted signals corrupted by ISI, ICI, and the noise:
in which h ,k (n) is the impulse response of the channel between the th transmitter and the kth receiver, whereas x (n) is the transmitted sequence, and μ ,k is the memory of channel h ,k (n). Accordingly, h ,k (n) for = k constitute ICI, and the elements of h ,k (n) for = k and μ ,k = 0 amount to ISI. u k (n) denotes the kth channel noise.
In the adaptive two-channel channel shortening equalizer design problem, the objective is to find an exponentially windowed, least squares (LS) solution for the coefficients of the kth adaptive DFE and the corresponding kth adaptive desired impulse response (ADIR) filter that minimizes the kth cost function:
at each time instant n, where k = 1, 2 and β is the exponential weighting factor. Herein, the error signal, e k n (i), is given by: The kth desired signal, d k n (i), at the output of the kth ADIR is expressed as:
whereas the estimate of kth desired signal,d k n (i), in training mode of operation, is equal to the output signal of the kth DFE:
and it is identical to the delayed output signal of the kth 
, while it is equal to the kth detected signal in decision directed mode of operation of the receiver (x k (i) =x k (i)). Herein, D k represents the delay experienced by the kth input signal, x k (i), through the corresponding channel during the training mode of operation.
Subsequently, we define the input vector to the kth DFE, y k (i), at time instant i, as:
and the corresponding coefficient vector, p k (n), at time instant n, as:
The input vector to the kth ADIR filter at time instant i and its coefficient vector at the time instant n are also defined as:
Note that we assume, without loss of generality, that N b k ≤ N f k for the kth DFE. The main concern of the exponentially weighted LS problem under consideration is thus to find, at each time n, the kth optimal coefficient vectors, p k (n) and w k (n), that would minimize the cost function:
which can be expressed in matrix form as follows:
Herein,
, which is given by:
and y k (n), which can be expressed as:
and Rx kxk (n) is the N d k × N d k autocorrelation matrix of the kth ADIR filter input data vectorx k (i), and is found by:
Note that Rx k y k (n) R H y kxk (n). Subsequently, the kth optimal coefficient vector for the equalizer is determined by differentiating J k (n) with respect to p k (n), setting the derivative to zero, and solving for p k (n):
In order to find a solution for the optimal coefficient vector of the kth ADIR filter, we substitute Equation (16) back into the cost function in Equation (12) and attain the following quadratic form in w k (n):
Then, the expression enclosed by square brackets in Equation (17) , a symmetric N d k × N d k matrix, is defined as: (18) so that the cost function in Equation (17) can be restated as:
In minimizing the expression in Equation (19), a unit energy constraint, w H k (n)w k (n) = 1, is applied to the kth ADIR filter coefficients to avoid the trivial null equalizer solution [21] , and the following Lagrangian expression is formed:
After taking the derivative of the expression in (20) and equating to zero, we get:
which shows that the optimal kth ADIR coefficient vector w opt k (n) and λ are the unit magnitude eigenvector and eigenvalue of the matrix Rx k y k (n), respectively. If the expression on the righthand side of Equation (21) is substituted for Rx k y k (n)w k (n) in Equation (19) and also (19) is replaced with w Hopt k (n), then the minimum cost can be stated as follows: (22) which demonstrates that the cost function is minimized by choosing w opt k (n) to be equal to the eigenvector of the matrix Rx k y k and that the corresponding eigenvalue λ is the minimum eigenvalue of the matrix Rx k y k and is represented with λ min . Consequently, the optimal coefficient vectors for the kth equalizer and the kth ADIR filter, p opt k (n) and w opt k (n), are given by Equations (16) and (21), respectively.
V-BLAST type MIMO-DFE
We would like to use a V-BLAST type design approach for the front-end filter of the proposed equalizer, and we also require the design of a single, multichannel, and compact equalizer structure, so that two separate equalizers and direct evaluations as in (16) are avoided, and the same filter can be reconfigured as spectral analysis or positioning filter. These objectives can be accomplished by considering the equivalence of V-BLAST and modified Gram-Schmidt orthogonalization operations, and therefore by completely orthogonalizing the two-channel input data of DFE using SPMGLSs, which provide scalar only operations, good numerical properties as well as modularity, regularity, order recursiveness, and reconfigurability to the solution of equalization problem under consideration. Hereupon, we present the modifications we propose to make in SPMLSs so as to obtain SPMGLSs and then the design of front-end multichannel DFE using SPMGLSs.
SPMGLS
A SPMGLS has a block structure as shown in Figure 3 , and the input signal vectors to a SPMGLS are defined as follows: the input forward prediction error vector:
the backward prediction error vector:
and the estimation error vector:
The elements of input forward and backward prediction error vectors in Equations (23) and (24) are orthogonalized by using self-orthogonalization processors (SOPs), which are triangular-shaped processors in Figure 3 . The outputs of SOPs are the orthogonalized forward prediction error vector:
and the orthogonalized backward prediction error vector:
The elements off −1 (i) are fed into a forward prediction reference-orthogonalization processor (ROP) in order to predict the elements of b −1 (i − 1) and to produce the stage output back prediction error vector b (i). The elements ofb −1 (i) are fed into a ROP to perform p-channel joint process estimation and to produce the stage output estimation error vector e (i). Subsequently, the elements ofb −1 (i) are delayed and are also fed into another ROP to obtain the stage output forward prediction error vector
There are two types of processing cells, single and double circular processors, in a SPMGLS as in the original SPMLS in [23] . Nevertheless, we change the processing equations implemented in these processing cells with the equations of the square root version of the Givens algorithm in [33] . The interconnections and signals propagating through these processing cells are shown in Figure 4 . The processing cells symbolized with a double circle, which are also called boundary (angle computer) cells, perform the following equations:
From Equations (28), (29), and (30), it can be shown that the parameters c(i) and s(i) satisfy the equation:
The connection between the input, γ in (i), and output, γ out (i), likelihood variables is defined as:
On the other hand, the processing cells symbolized with a single circle, which are called internal (rotator) cells, perform the following equations:
and Accordingly, Equations (31), (33), and (34) can be combined into:
where
with |Q(i)| = 1, so that it performs Givens plane rotation in a complex plane, and thereby the stability of Givens algorithm is guaranteed.
Sequential givens lattice orthogonalization
The V-BLAST processing is made possible by utilizing SPMGLSs in the design of MIMO-DFE, so that the number of channels at different sections of the proposed multichannel lattice DFE is different due to the sequential processing nature of SPMGLSs. Therefore, we carry out the exponentially weighted LS optimization problem by taking into consideration each of these sections separately and assume that the proposed equalizer is comprised of three cascaded equalizers, which are two-channel, threechannel, and four-channel lattice sections; and we use a different index for each section while using m to indicate a stage in the whole equalizer. Henceforth, we focus on the case N without unduly complicating the development and loss of generality.
In order to sequentially solve the exponentially weighted LS optimization problem under consideration, we first organize the elements of input signal vectors y 1 
T , according to the natural ordering of SPMGLSs as: 1 . Accordingly, we redefine Equations (13) and (14) using this new data vector as follows:
and
where k = 1, 2. The orthogonalization of input data using SPMGLSs corresponds to the transformation of (38) and (39) into:
respectively. Here, (n) is the 2 × 2 lower triangular transformation matrix and is realized stage-by-stage using 2 × 2 lower triangular transformation matrices:
whose diagonal elements are all equal to unity at time instant n, andκ (n) is the reflection coefficient computed at the single circular cell in the triangular shaped self-orthogonalization processor of the th two-channel SPMGLS. Then, the lattice joint process estimation coefficients are computed by means of:
where k (n) represents the kth row of the 2 × 2 lattice joint process estimation reflection coefficient matrix (n) which is also sequentially computed stage-bystage using 2 × 2 joint process estimation coefficient matrices:
in whichκ , k,j (n) is the jth reflection coefficient related to the estimation of the kth desired signal, and it is computed at the (k, j)th single circular cell of the square shaped reference-orthogonalization processor related to joint process estimation at the th two-channel SPMGLS. Note that the matrix inversion operation in Equation (16) is transformed into a simple scalar inversion operation in (43) due to the diagonal nature of (n). After the processing of input signals by twochannel lattice stages, the first estimation error signal, x 1 (i) = e 1 (i), which corresponds to the detected and fed back signal of the first channel, is incorporated at the N f 1 − N b 1 + 1 th stage as the third channel. Accordingly, we expand the optimization problem by organizing the elements of the input data vectors y 1 
T as follows:
and input to the three-channel lattice section, where the stage number (m) takes values in the range given by
Subsequently, we solve the optimization problem in (43) once again with the new input vector, in which case α (n) and α (n) are the 3α × 3α lower triangular transformation and the 3 × 3α lattice joint process estimation coefficient matrices, respectively. α (n) is computed sequentially by means of 3 × 3 lower triangular transformation matrices, L α (n) and α (n), and is similarly realized stage-by-stage making use of 3 × 3 joint process estimation coefficient matrices, α (n), at time instant n.
Finally, the optimization problem is expanded one more time with the inclusion of the second estimation error signal,x 2 (i) = e 2 (i), which is related to the detected and fed back signal of the second channel, and this time, the elements of input data vectors y 1 
T are organized as:
where the stage number (m) is in the range given by
2 due to four-channel processing. Similar to two-channel and three-channel cases, we solve the optimization problem in (43) using the new data vector in Equation (46), in which case ϑ (n) and ϑ (n) are 4ϑ × 4ϑ lower triangular transformation, and 4 × 4ϑ joint process estimation error coefficient matrices at the time instant n, respectively. Similar to previous cases, these matrices are computed stage-by-stage by the use of 4 × 4 lower triangular transformation matrices, L ϑ (n), and 4×4 joint process estimation error coefficient matrices, ϑ (n), at time instant n, respectively.
Computation of error order updates
Due to the sequential nature of the proposed lattice structure, we carry out the multichannel error order update task by taking into consideration two-channel, three-channel, and four-channel sections separately, and therefore we assume that the filter is comprised of three cascaded filters as described in the previous subsection. The prediction and joint state estimation errors for the end of the observation interval n = i at the output of the th order two-channel equalizer section, where 0 < m ≤ N f 1 − N b 1 , can be stated in terms of lattice parameters and the ( − 1)th order prediction errors as follows:
(n − 1)
where the lower triangular and square coefficient matrices are generated in triangular-shaped self-orthogonalization and square-shaped reference-orthogonalization processors in a two-channel SPMGLS as previously defined in Equations (42) and (44). The joint process estimation error updates are accordingly given as:
We then multiply the lower triangular and square coefficient matrices in Equations (47), (48), and (49), and make the following definitions:
in order to obtain compact versions of the Equations (47), (48), and (49) as follows:
The development of prediction and joint process estimation error order updates from (α − 1)th order to αth for the three-channel section, where the stage number (m) takes values in the range given by
2 , is carried out in a similar fashion to the twochannel section, and they can be expressed in compact form with the following equations:
where 3, 2 (n)κ e α, 3, 3 (n)
The prediction and joint process estimation error order update equations from (ϑ −1)th order to ϑth order for the four-channel section, where the stage number (m) is in the range given by
, can be derived by following a similar procedure to two-and three-channel sections, and 4×1 error order update matrices can be subsequently obtained with 4 × 4 lower triangular and square coefficient matrices.
Matrix visualization
In order to visualize the cascading and functioning of two-channel, three-channel, and four-channel sections as a single equalizer, we provide a matrix representation of sequential Givens lattice orthogonalization by considering N DFEs for the first and second channels, and also organizing the elements of input data vectors y 1 
by taking into consideration different numbers of parameters in the feed forward and feedback channels and shifting properties of input data. This matrix helps us to visualize the orthogonalization process, and thus to draw a diagram of the four channel DFE structure under consideration as in Figure 5 . Note that the elements of the first and second rows are related to the input signals of the first and the second channels of the DFE under consideration, while the third and fourth rows are associated with the detected and fed back signals. Lattice orthogonalization begins with the elements of the first two rows using two-channel sequential lattice processing stages until the first fed back channel is incorporated as the new channel at a transitional stage, which is the N f 1 − N b 1 + 1 th stage. Then, the orthogonalization continues with threechannel lattice stages until the fourth channel, which is related to the detected and fed back signal of the second channel, is taken into the process at another transitional stage, which is the N f 2 − N b 2 + 1 th stage, and so the orthogonalization of input data finalizes with four-channel stages when the mean squared estimation error performance requirements are met, and thereby the kth desired signal, d k (n), is sequentially estimated using self-orthogonalized backward prediction error signals as follows:
Here, the first and second summations represent the estimation accomplished by the two-channel and threechannel sections, respectively, and the third summation is connected with the four-channel estimation section. In each section,κ m,k,j (n) represents the jth estimation reflection coefficient at the mth stage related to the kth channel as defined in the previous subsection, andb 
Adaptive multiple systolic Viterbi detection
In order to achieve an all systolic equalizer architecture, we propose to use the systolic array processor approach in [36] for the design and implementation of Viterbi detection, so that a high degree of computational concurrency is obtained by operating simultaneously and in synchronization with the rest of equalizer circuitry. Accordingly, the most computationally intensive operation in the Viterbi detection of sent data is related to the comparator metric, and a systolic computation of this metric is accomplished by multiply and accumulate operations:
for υ branches leading from states at time instant i − 1 to each state at time instant i as illustrated in Figure 6 . Herein, υ stands for the alphabet size, OSMI 
) related to the kth channel of equalizer, and z k n (i) is the kth equalizer output as given in (5) . Note that the elements of coefficient vector c k (i):
are computed as:
where w k represents the coefficient vector for the kth ADIR filter as defined in (10) . Note that a processing element (PE) of the systolic array in Figure 6 is symbolized with a circled PE, and the memory of computation cycle is designated as L = 3N d k − 3 for the ease of illustration.
Computational complexity
The computational complexity can be calculated by considering two main sections of the proposed channel shortening equalizer. The first section implements the MIMO-DFE while the second one is related to the Viterbi processing. The number of operations required for the MIMO-DFE can be calculated by thinking about the number of operations per stage and the number of stages. The number of operations for a single SPMGLS with two, three, and four channels have been computed by making use of complexity calculations in [23] and [33] as 84,171, and 288, respectively. There are N 
The complexity calculation for systolic Viterbi section can be accomplished as follows. The total number of processing elements in the systolic array that implements Viterbi processing per channel is given as υ × N d k in [36] . Each element in this array performs one addition and one multiplication, which are counted together as one operation. Then, the total number of operations for the systolic implementation of M Viterbi detectors are calculated as
k . Accordingly, the total computational complexity for the proposed equalizer taking into account both MIMO-DFE and multiple Viterbi detector sections becomes 84N
We compare the computational complexity of the proposed channel shortening equalizer using a front-end feed forward equalizer (CSFFE) and a front-end decision feedback equalizer (CSDFE) with those of VE in [38] , where each channel from transmitter to receiver is assumed to have a memory of μ. In Figure 7 , we plotted the complexity curves when M = 2, υ = 2, and μ = 4, 8, 16. It shows that the proposed method is not computationally advantageous when the channel memory (μ = 4) is two times the ADIR filter memory (N d − 1 = 2). However, the complexity of the proposed method becomes advantageous regardless of feed forward filter length (N f ) used when the channel memory is four times the ADIR filter memory and far more beneficial when the channel memory is eight times the ADIR filter memory. Figure 8 displays that the computational advantage of the proposed method comparing to VE becomes more attractive when the alphabet size is increased from υ = 2 to υ = 4 even for the channel memory values of μ = 4, 8; and Figure 9 demonstrates that the computational advantage becomes less pronounced when the number of antennas increases from M = 2 to 4. The computational complexity vs. the number of antennas analysis has also been carried out due to the recent interest in massive MIMO for next-generation wireless systems [37] . We would like to point out that the computational complexities of CSDFE and CSFFE are larger than DFE and FFE, respectively, by an amount of
In this analysis, we assumed that Figure 10 , the computational complexity vs. number antennas curves for CSDFE, CSFFE, DFE, and FFE when υ = 256 and N d − 1 = 8, and M is increasing from 0 and to 150, are presented. For smaller values of υ and N d − 1, the difference between the computational complexities of CSDFE and DFE or CSFFE and FFE, respectively, is minor, and the curves of CSDFE and DFE or CSFFE and FFE can not be discriminated, which implies that the performance increase by the implementation of CSDFE instead of DFE or CSFFE instead of FFE, respectively, is achieved at the expense of almost negligible computational complexity cost, as will be clearer in the next section. In Figure 11 , we compare the computational complexity of the proposed method with those of VE when the channel memory values of μ = 4, 8, and 16 are used, the alphabet size is υ = 2, the ADIR memory is assumed as N d − 1 = 2, and M is increasing from 0 to 150. Finally, we repeat the same comparison in Figure 12 for υ = 4 in order to demonstrate the effect of using larger alphabet size. Figure 11 shows that the proposed method is computationally advantageous comparing to VE when channel memory is larger than eight (μ > 8), and in Figure 12 , it can be seen that the computational complexity advantage of the proposed method improves when the alphabet size is increased from υ = 2 to υ = 4, and the proposed method becomes less complex comparing to VE even when μ = 8.
Experimental results
The performance of the proposed receiver was investigated by means of MSE and probability of error simulations. In these evaluations, we considered linear time-invariant channels with spectral nulls as well as time-varying channels. The following channel impulse response matrices were defined in order to be used in simulations that demonstrate performance of the proposed equalizer: 
where δ(n) represents the dirac delta function, and the channel impulse response h a (n) is defined as h a (n) = 4 j=0 a j δ(n − j), which has spectral nulls and a large eigenvalue spread (χ = 1317.65). Herein, the channel coefficients have been taken from [39] , and are given by: Note that eigenvalue spread (χ) was determined by using the method described in [40] , assuming a feed forward equalizer with a memory of N f − 1 = 18, and a noise variance of 0.001. The timevarying channel impulse response is defined as h b (n) = 4 j=0 b j (n)δ(n − j), where b j (n) represents the jth timevariant attenuation factor, generated independently using the improved version of Jakes' Rayleigh fading model in [41] by assuming a data rate of 100 kbytes/s and doppler shifts of f D = 50 Hz and f D = 10 Hz ; b j (n) is also normalized such that 4 j=0 b 2 j (n) = 1 for all n. By choosing the same channel for both direct and indirect channels, we demonstrate the performance under such a severe distortion situation that both ISI and ICI are significant, and therefore, the use of DFE is justified. On the other hand, we could have generated a longer channel with the same spectral characteristics (or eigenvalue spread), in which case we would not be able to benchmark the performance with those of VE, as the simulation of VE becomes computationally cumbersome for longer channels. Note that ρ represents the gain factor for the effect of ICI and takes values between 0 and 1. In this presentation, we consider two values of the gain factor, ρ = 0 and ρ = 1, which correspond to completely orthogonal and nonorthogonal transmissions, respectively [42] .
In the simulations for the performance evaluation of the proposed method (CSDFE/CSFFE) with respect to FFE, DFE, and VE, and also in the simulations for the performance comparison of the proposed method when different ADIR filter memories are used, the input signal x(n) applied to the channel was made of uniformly distributed bipolar (+/−)1/ √ 2 random numbers because of relative simplicity it provided in simulations. Note that the uniformly distributed bipolar random numbers represent BPSK modulation supported by the IEEE 802.11n WLAN standard [43] . In order to account for the modulations that are both in the IEEE 802.11n WLAN and IEEE 802.16e MAN standards and to demonstrate the effect of higher modulation on probability of error performance, we also performed CSFFE/CSDFE simulations by using the input signal made of uniformly distributed random numbers taking values from (1/2+i/2, −1/2+i/2, −1/2− i/2, 1/2 − i/2), which represents QPSK modulation (i = √ −1), and compared against the performance results of the proposed method for BPSK modulation in both timeinvariant and time-variant channel cases. Moreover, we assume the complete knowledge of time-invariant channel and the knowledge of time-variant channel memory in VE and adaptive Viterbi equalizer (AVE) performance evaluations, respectively. On the other hand, the proposed method does not need any information about channel; however, if information about channel is already available and an evaluation on the badness of channel can be carried out, the result of this evaluation can used to determine the ADIR filter memories of the proposed equalizer so as to improve the performance.
The channel noise signal was additive white Gaussian noise (AWGN) with zero mean and is uncorrelated with the input signal. The received signal-to-noise ratio (SNR) per each channel of the receiver is defined as:
where σ 2 n k is the variance of AWGN for the kth channel. Accordingly, SNRs for all channels of the receiver are equal, and the system SNR is defined as SNR = SNR k for k = 1, 2. The exponential weighting factors were 0.99 and 1.0 for the front-end equalizer and ADIR filter, respectively, when the channel was time-invariant. In time-varying channel case, they were assumed as 0.975 and 1 in order to better track the signal. The probability of error evaluations were conducted using 4 × 10 5 samples so that 2 × 10 5 samples per channel were used, and the simulations were carried out in training mode of receiver operation. The delays (D k ) for the desired signals in this mode of operation were assumed equal (D = D k ) for k = 1, 2, and D was chosen so as to minimize MSE; that is, D = (N f −1+ℵ−1)/2 when only front-end FFE is utilized,
Note that the memories of feed forward and feedback sections of DFEs, ADIR filters, and channel impulse responses for k = 1, 2 were assumed equal, i.e.,
The noise variance per channel during MSE simulations was 0.001.
Time-invariant channel
The objective of simulations with the channel matrix h 1 (n) is to display the performance of the proposed CSDFE with respect to CSFFE, FFE, DFE, and VE. In the simulations, we have taken into account the modularity and regularity properties of SPMGLSs and started simulations with FFE; after observing the performance, we altered the equalizer to CSFFE, then we added new SPMGLSs to alter the equalizer to DFE, subsequently to CSDFE. The memory of FFE (N f − 1) was 18, while the memory of feedback channels (N b − 1) for DFE was 4, and the memory of ADIR (N d − 1) filter was 2.
In Figure 13 , we present the MSE performance of the proposed equalizer when orthogonal transmission is used, i.e., ρ = 0, and also compare its performance to those of CSFFE, FFE, and DFE together with the performance for the channel matrix h 2 (n). In Figure 14 , we provide the corresponding probability of error performance for orthogonal transmission and compare the performance of CSDFE with those of CSFFE, FFE, DFE, VE, and with the performance for the channel matrix h 2 (n). It can be seen in these figures that the performance of CSDFE is better than those of CSFFE, FFE, and DFE, respectively. It also has the closest probability of error performance to that of VE. Note that neither MSE nor probability of error performances for the channel matrix h 2 (n) with ρ = 0 change with the use of different equalizers, CSDFE, CSFFE, DFE, or FFE, since this channel matrix with ρ = 0 does not include ISI and ICI components. Figures 15 and 16 , on the other hand, display MSE and probability of error comparisons when ρ = 1, and also provide comparison with respect to the performance of DFE when the channel matrix h 2 (n) is used, since the channel matrix h 2 (n) with ρ = 1 does not have ISI components. The effect of ICI on the performance of CSDFE and CSFFE can seen by comparing MSE values between Figures 13 and 15 and by collating probability of error values between Figures 14 and 16 . It can be deduced from these comparisons that the performance improvement that can achieved by the combination of CSDFE and orthogonal transmission is far more beneficial than using CSFFE with orthogonal transmission. Furthermore, Figures 17 and 18 show the performance improvement that can be achieved by using an ADIR filter memory (N d − 1) of 3 instead of 2 for both CSFFE and CSDFE in ρ = 0 and ρ = 1 cases, respectively. In order to investigate the effect of channel memory for a given ADIR filter memory on the performance of the proposed method, we generated channels with longer impulse responses than that of h a (n), nevertheless, we made sure that these channels have exactly the same spectral characteristics or eigenvalue spread with h a (n). We then produced the corresponding channel matrices using the same channel impulse responses for direct and indirect paths as in h 1 (n) and repeated the aforementioned experiments and found out that the performance was not different from the ones displayed in Figures 13, 14, 15, 16, 17 , and 18. Consequently, it can be said that, the eigenvalue spread of channel, not the memory as the channel shortener name implies, determines the performance of the proposed equalizer. Subsequently, we examine the effect of using a higher modulation scheme on the probability of error performance of the proposed method, and in Figures 19 and  20 , we present the performance degradation caused by switching modulation from BPSK to QPSK in orthogonal and nonorthogonal transmission cases, respectively.
Time-variant channel
Our objective in the simulations using the channel matrix h 3 (n) is to present the performance of the proposed equalizer under two different time-varying channel conditions. and CSDFE when ρ = 1 and ρ = 0. In Figure 22 , we show the corresponding probability of error performances. Note that the probability of error values of CSFFE for ρ = 1 and ρ = 0 saturate approximately at SNR= 25 dB to 2 × 10 −2 and 1.5 × 10 −2 , respectively, the probability of error values of CSDFE, on the other hand, saturate approximately at SNR= 27 dB to 10 −3 and 6.3 × 10 −4 , which are closer to the probability of error values of AVE when ρ = 0, that converges to approximately 23.4 × 10 −5 at SNR= 23 dB.
The second experiment under time-variant channel conditions was carried out using a lower doppler frequency so that the effect of doppler frequency on the equalizer performance can be demonstrated. When we compare the MSE performance results in Figure 23 with those of Figure 21 , which is related to the MSE performance for f D = 50 Hz, we see that the MSE performances of CSFFE for ρ = 1 and ρ = 0 when f D = 50 Hz are 1.33 and 1.22 times, respectively, higher than those of CSFFE when f D = 10 Hz. The same comparison for CSDFE yields that the MSE performances when f D = 50 Hz are 2 and 2.85 times higher for ρ = 0 and ρ = 1, respectively, than the MSE performances when f D = 10 Hz. Similar evaluations can be done for the probability of error performances, and it can be seen in Figure 24 that the probability of error curves of CSFFE for ρ = 0 and ρ = 1 cases saturate to lower values than those of CSFFE in Figure 22 . On the other hand, whereas probability of error curves of CSDFE for ρ = 0 and ρ = 1 in Figure 22 reach to the probability of error value of 10 −2 at SNRs of 14 and 17 dB, respectively, they converge to the same probability of error value at approximately 3.5 dB lower SNRs, i.e., at 10.5 and 13.5 dB in Figure 24 .
We have performed two more probability of error simulations for the time-variant channel case using f D = 50 Hz, the first of which was related to the performance improvement that can be gained by using an ADIR filter memory the probability of error performance comparison when using QPSK modulation instead of BPSK in ρ = 0 and ρ = 1 cases, respectively.
It can be seen in these figures that the performance improvement attainable through orthogonal transmission under time-variant channel conditions is not as significant as time-invariant channel conditions.
Conclusions
A V-BLAST type channel shortening equalizer design for cognitive MIMO-OFDM radios has been presented. The V-BLAST property for frequency-selective channels is realized by completely orthogonalizing the input data using SPMGLSs, so that a systolic MIMO-DFE is accomplished at the front-end of the proposed channel shortening equalizer. Accordingly, ISI and ICI effects are suppressed due to completely orthogonalizing nature of the receiver structure. A systolic array implementation of multiple adaptive Viterbi detectors is also utilized in order to realize a channel shortening equalizer with high degree of computational concurrency. The matrix inversions, which are significant bottlenecks in receiver design, are avoided, scalar only operations are enabled. A highly modular, regular, order-recursive, and simple receiver architecture, which is suitable for the DSP chip-and FGPA-based signal processing implementations of MIMO-OFDM wireless communication systems, is obtained. Spectrum sensing and positioning functions, important tasks for cognitive radios, can be accomplished at no cost by simply reconfiguring the front-end MIMO-DFE filter as spectral analysis and positioning filters, respectively. These properties make the proposed equalizer a good candidate for software defined cognitive radio receiver realizations of MIMO-OFDM systems.
The computational complexity of the proposed equalizer has been provided by separately taking into account the MIMO-DFE and multiple adaptive Viterbi detector sections. Then, the total complexity was compared to the complexity of VE for different channel and ADIR filter memories and different alphabet and antenna sizes.
The performance has been supplied in terms of MSE and probability of error analysis for orthogonal and nonorthogonal transmissions under time-invariant and time-variant channel conditions using two different modulation schemes, and it has been demonstrated that desirable performance results can be attained particularly under time-invariant channel conditions when orthogonal transmission and BPSK modulation is used together with CSDFE implementation. It has been also shown that the performance of a CSDFE is between those of Viterbi and DFEs under time-invariant as well as time-variant channel conditions.
It has been revealed that the channel shortener equalizer is indeed a reduced complexity Viterbi equalizer with its ADIR filter memory functioning as a trade-off parameter between performance and complexity for a given channel matrix. Another important property of the proposed equalizer is that it does not need channel information.
Competing interests
The author declares that he has no competing interests.
