Systems And Methods For Lattice Reduction by Anderson, David Verl et al.
I lllll llllllll Ill lllll lllll lllll lllll lllll 111111111111111111111111111111111 
US008559544B2 
c12) United States Patent 
Anderson et al. 
(10) Patent No.: 
(45) Date of Patent: 
US 8,559,544 B2 
Oct. 15, 2013 
(54) SYSTEMS AND METHODS FOR LATTICE 
REDUCTION 
(56) References Cited 
U.S. PATENT DOCUMENTS 
(75) Inventors: David Verl Anderson, Alpharetta, GA 
(US); Brian Joseph Gestner, Atlanta, 
GA (US); Wei Zhang, Santa Clara, CA 
(US); Xiaoli Ma, Norcross, GA (US) 
6,675,187 
6,697,633 
2007 /0097856 
2007/0115909 
2008/0240277 
Bl* 1/2004 
Bl* 212004 
Al* 5/2007 
Al* 5/2007 
Al* 10/2008 
Greenberger ................. 708/622 
Dogan et al. .................. 455/509 
Wang eta!. ................... 370/210 
Wang eta!. ................... 370/342 
Anholt et al. ................. 375/262 
2009/0003476 Al* 1/2009 ...................... 375/260 (73) Assignee: Georgia Tech Research Corporation, 
Atlanta, GA (US) 
2009/0116588 
2009/0190683 
Al* 512009 
Al* 712009 
Rog et al. 
McNamara et al. .......... 375/340 
Awater et al. ................. 375/262 
2009/0196379 Al* 8/2009 Gan et al. ...................... 375/340 
( *) Notice: Subject to any disclaimer, the term ofthis 
patent is extended or adjusted under 35 
U.S.C. 154(b) by 305 days. 
* cited by examiner 
Primary Examiner - Shuwang Liu 
Assistant Examiner - David Huang 
(21) Appl. No.: 12/943,824 (74) Attorney, Agent, or Firm - Dustin B. Weeks, Esq.; Ryan A. Schneider, Esq.; Troutman Sanders LLP 
(22) 
(65) 
(60) 
(51) 
(52) 
(58) 
Filed: Nov. 10, 2010 (57) ABSTRACT 
Prior Publication Data 
Disclosed herein are lattice reduction systems and methods 
for a MIMO communication system. One such method 
includes providing a channel matrix corresponding to a chan-
nel in a MIMO communication system, preprocessing the 
channel matrix to form at least an upper triangular matrix, 
implementing a relaxed size reduction process, and imple-
menting a basis update process. Implementing the relaxed 
size reduction process comprises choosing a first relaxed size 
reduction parameter for a first-off-diagonal element of the 
upper triangular matrix, choosing a second relaxed size 
reduction parameter, which is greater than the first relaxed 
size reduction parameter, for a second-off-diagonal element 
of the upper triangular matrix evaluating whether a first 
relaxed size reduction condition is satisfied for the first-off-
diagonal element of the upper triangular matrix, and evaluat-
ing whether a second relaxed size reduction condition is 
satisfied for the second-off-diagonal element of the upper 
triangular matrix. 
US 2012/0159122Al Jun. 21, 2012 
Related U.S. Application Data 
Provisional application No. 61/259,878, filed on Nov. 
10, 2009. 
Int. Cl. 
H04B 7102 
G06F 15180 
U.S. Cl. 
(2006.01) 
(2006.01) 
USPC ............................................. 375/267; 712/31 
Field of Classification Search 
USPC ..................... 375/267, 295; 712/31, E09.002 
See application file for complete search history. 23 Claims, 5 Drawing Sheets 
~ 
Pr.civMir;-g 2 chant~iS~ mfttt:x f;i;H"Sspondin9 ~i) a ch.~mtN5~ in o 
Ml MO ( .. ·:.cmiffH.R"dcai::or1 sy9t.em 
<>«>» i"" ~
PrBprc.ct'&Ss;r.g ihc- ~;h{n~1"e-! r.:;Y.:ix fs;. tcnn at !e;;o:St .:.~r. upper 
!n;mgt:!;;r rMlrix 
filfillli 
~rn.:p~em.;H1~ing a size rd't:uctii;n prct:')}S<s on al~rro.en:s <l the 
\J!)!)i<' Ulang1Jlar m;JfJlx 
~1'l1L1Zl 
lmpk=!m;snting a b;:u5j;s lJf>dt,tt;s ;')fOC:SSS ON df&JO-:"?a: e!;;<mants :)t 
:3"1*- UpfJ6f ffi<:<ngu1at mfflfiX 
U.S. Patent Oct. 15, 2013 Sheet 1of5 US 8,559,544 B2 
Steg 105 
Providing a channel matrix corresponding to a channel in a 
Ml MO cotnrnunicaikm system 
StepJ10 
Preprocessing the channel rnatrix lo fonn at least an upfJ:er 
triangular rnatdx 
1 , 
Sten 115 
!rnplementing a size reduction process on etarrleilts of the 
upper triangular rnatrix 
~· , 
Step 120 
Implementing a basis update process on dhlgonal elements of 
the upp·er triangular rnatrix 
Figure 1 
U.S. Patent Oct. 15, 2013 Sheet 2 of 5 US 8,559,544 B2 
Figure 2 
U.S. Patent 
I 
I 
l 
I 
I 
I 
lfi*-1.k-1l 
l Front '>_ .... _ 
I 
:Memmv :e·· -- ............ _ ... 
Bus 
Oct. 15, 2013 Sheet 3 of 5 US 8,559,544 B2 
Figure 3 
U.S. Patent Oct. 15, 2013 Sheet 4 of 5 US 8,559,544 B2 
~ rJI ::J co ~ 
"'11::1' 
.l9 (I) co 
0 ... 
E :::J 
e C> u. 
I LL 
I 
I 5 5 rJI Ul Ul Ul Q) 2l . 0 
e e 
Cl. Cl. 
...J ...J 
IH ...J ...J ...J ...J () () (ii (ii c: c: 0 0 :;:::: 
:;:::: 
:0 :0 
~ ~ 
{l. E e 
u. 
10""1 l> ·• • · · · :~· ·. ·. 
-:-···· 
104 t < • : ••• , • 
·:'!-.·.· .• •. · ..•.•. 
:~·· ·. . l, .• •. . .• .· · ..... 
i HI: ti : ·=~~ [15] 
10 ; , """"~"""" CA ··MfvlSB···S.IC ·[12]· 
·· 1:1 · :P;:··cii£MMss src :~ * .: ........... : .,.. * . :· .. ~= .. = .... : ....... :: ........ ~;.:.. .. : . ..:.: : .. :·- ... :: .... ...:·OMO:: .. ·.·.-:: .. : ... : 
l. * . ... +:I·.·. 
. *s ·~· i • .~· = o··. ·9 .. /tit ·(· ~.i'WJ/ ·)· 
··n 
•::i 
.... \ff··:· ... : ......•...... ,,,.1~: 
··{ 
•··I 
·: ···::s-
l{j ij "ill_ m ·• JJ 'fl.HO r . • • . . . . . ..·. . ..... . 
i: I 1 S: CLLL iter ma~ ••·· •· ·. • •· ·· •·•· ·• • •• •• •· ·. ••• •• .• •· ·• ••· ·. •• . •· .· .• ----. . :~> :.:: . . : . . . . . . .·'t- . . . . ·. .· .· .· .· . . . . ·.. .· . . .·· . . . .·. ·> .· . . . .· . • . .· ·. •• . .. ..... · .• LL,;,;,;,;,;,;,;,;,;,;,;,;,;,;,;,;,;i,;,;,;,;,;,;,;,;,;,;,;x,;,;,;,;,;,;,;,;,;,;,;.L,;,;,;,;,;,;,;,;,;,;,;;;,;,;,;,;,;,;,;,;,;,;:J;,,;,,;,,;,,;,,,,,,,,,"b;,,,,,,,,,;,,;,,;,,,;L,;,,;,,;,,,,,,,,,,,A,;,h;,,,,,,,,,;,,,,J,,,,,,,;,,;,,;,,~,_;,,,;I 
10 12 14 1.6 11 20 22 24 26· 2.8 30 
SNR(dB) 
Figure 5 
~ 
00 
• 
~ 
~ 
~ 
~ 
= ~ 
0 
(') 
~ 
.... 
~Ul 
N 
0 
.... 
(.H 
1J1 
=-('D 
('D 
..... 
Ul 
0 
..... 
Ul 
d 
rJl. 
00 
u. 
tit 
\C 
u. 
~ 
~ 
= N 
US 8,559,544 B2 
1 
SYSTEMS AND METHODS FOR LATTICE 
REDUCTION 
CROSS-REFERENCE TO RELATED 
APPLICATIONS 
This application claims the benefit of U.S. Provisional 
Patent Application No. 61/259,878, filed on 10 Nov. 2009, 
entitled "Modified Complex Lenstra, Lenstra, Lovasz Based 
Lattice Reduction Hardware Architecture for MIMO Detec-
tion", which is hereby incorporated by reference as if fully set 
forth below. 
FEDERALLY SPONSORED RESEARCH 
STATEMENT 
The invention described in this patent application were 
made with Govermnent support under Contract No. 
DAAD19-01-2-0011, awarded by the U.S. Army Research 
Lab. The Govermnent has certain rights in the invention 
described in this patent application. 
FIELD OF THE INVENTION 
Embodiments of the present invention relate generally to 
signal processing systems and methods and, more particu-
larly, to systems, devices, and methods for multiple-input 
multiple-output signal transmission detection. 
BACKGROUND OF THE INVENTION 
Multiple-Input Multiple-Output ("MIMO") communica-
tion systems are becoming increasingly popular as a solution 
to increasing demands for higher data-rates and more reliable 
wireless communication systems. These systems comprise 
multiple antennas at a transmitter side of the communication 
system and multiple antennas at the receiver side of the com-
munication system. Each transmitter antenna can transmit a 
different signal at a common frequency through a different 
channel of the communication system. Each receiver antenna 
may receive each signal from the multiple transmitter-anten-
nas. During transit, the transmitted signals may encounter 
different obstacles such that the frequency response of each 
channel is different. Thus, a common goal of conventional 
systems is to attempt to efficiently detect the transmitted 
symbols by determining the frequency response of each chan-
nel in the communication system. 
2 
do not collect the same diversity (negative logarithmic 
asymptotic slope of the BER versus Signal-to-Noise-Ratio 
("SNR") curve) as ML detection. As a result, these methods 
exhibit greatly reduced system performance compared to ML 
detectors. 
Other conventional symbol detection systems employ 
Sphere Decoding ("SD") algorithms. Hardware implementa-
tions of SD algorithms can achieve ML or near-ML perfor-
mance. Unfortunately, these methods exhibit greatly 
10 increased symbol-rate processing complexity compared to 
linear or SIC detectors. The complexity of SD methods can 
also vary widely with changing channel conditions. 
The maximum packet-rate of 802.1 ln is considerably less 
than the symbol-rate. Therefore, it is desirable to obtain 
15 detection systems and methods that achieve ML or near-ML 
performance at the cost of increased preprocessing complex-
ity as oppose to increased symbol-rate processing complex-
ity. Systems having these desired characteristics include Lat-
tice Reduction ("LR") aided detectors, which, unlike SD 
20 methods, incorporate LR algorithms into the preprocessing 
part oflinear or SIC detectors and only increase the symbol-
rate processing complexity slightly. Specifically, LR systems 
and methods only require lattice reduction once per received 
packet (per subcarrier). LR-aided detectors also exhibit the 
25 desirable property of having a complexity that is independent 
of both the channel SNR and signal constellation (assuming 
individual arithmetic operations have 0(1) complexity). 
A variety of hardware realizations of LR-aided detectors 
have been explored to exploit these properties and to achieve 
30 near-ML performance. Various explorations have included a 
Very-Large-Scale Integration ("VLSI") implementation of a 
simplified Brun's LR algorithm and a software implementa-
tion of Seysen' s LR algorithm on a reconfigurable baseband 
processor. Frequently explored variants, however, employ the 
35 Complex Lenstra-Lenstra-Lovasz ("CLLL") LR algorithm. 
The CLLL algorithm has the desirable properties of requir-
ing sorted QR-decomposition preprocessing instead of Direct 
Matrix Inversion ("DMI") preprocessing. Further, the CLLL 
algorithm has superior performance to the conventional 
40 MIMO detection systems and does not suffer from the scal-
ability issues in some of the conventional systems. The CLLL 
algorithm can also be used to significantly reduce the com-
plexity of SD algorithms. The conventional CLLL algorithm, 
however, is unable to be feasibly implemented in fixed-point 
45 hardware architecture. 
BRIEF SUMMARY OF THE INVENTION 
Although the optimal solution to the MIMO symbol detec-
tion problem, Maximum Likelihood ("ML") detection, is 
known, a brute-force ML detector implementation involves 50 
an exhaustive search over all possible transmitted symbol 
vectors. This approach is infeasible for hardware implemen-
tations when either a large signal constellation or a large 
number of transmit and receive antennas are employed. 
Hence, a goal of conventional systems is to design hardware 55 
for MIMO symbol detection that achieves comparable Bit-
Error-Rate ("BER") performance to the ML detector while 
having low hardware complexity andmeeting throughput and 
latency requirements. 
The present invention describes lattice reduction systems 
and methods for a multiple-input multiple-output communi-
cation system. An exemplary embodiment of the present 
invention provides a lattice reduction method including pro-
viding a channel matrix corresponding to a channel in a 
multiple-input multiple-output communication system, pre-
processing the channel matrix to form at least an upper trian-
gular matrix, implementing a size reduction process on ele-
ments of the upper triangular matrix, and implementing a 
basis update process on diagonal elements of the upper trian-
gular matrix. 
Some conventional MIMO symbol detection systems 60 
employ methods of linear detection and Successive Interfer-
ence Cancelation ("SIC"). Because most of the required pro-
cessing for these detectors need only occur at the maximum 
packet-rate (preprocessing) and the required symbol-rate pro-
cessing has relatively low-complexity, the throughput 65 
requirements for certain wireless standards, such as 802.1 ln, 
can be achieved in these systems. These methods, however, 
In an exemplary embodiment of the present invention, the 
size reduction process is a relaxed size reduction process, 
which includes choosing a first relaxed size reduction param-
eter for a first-off-diagonal element of the upper triangular 
matrix, choosing a second relaxed size reduction parameter, 
which is greater than the first relaxed size reduction param-
eter, for a second-off-diagonal element of the upper triangular 
matrix, evaluating whether a first relaxed size reduction con-
US 8,559,544 B2 
3 
dition is satisfied for the first-off-diagonal element of the 
upper triangular matrix, and evaluating whether a second 
relaxed size reduction condition is satisfied for the second-
off-diagonal element of the upper triangular matrix. 
In another exemplary embodiment of the present invention, 
the basis update process is a rapid basis update process, which 
includes choosing an efficient Siegel condition factor so that 
a first Siegel condition can be evaluated between a first pair of 
adjacent diagonal elements of the upper triangular matrix 
without using multiplication operations and evaluating 10 
whether the first Siegel condition is satisfied between the first 
pair of adjacent diagonal elements of the upper triangular 
matrix without using multiplication operations. The first pair 
of adjacent diagonal elements can include a first diagonal 
element and a second diagonal element. 15 
In yet another exemplary embodiment of the present inven-
tion, the basis update process is an iterative basis update 
process and includes computing a 2x2 unitary matrix using a 
number of vectoring iterations. In some embodiments, the 
step of computing the 2x2 unitary matrix is completed in a 20 
number of cycles equal to a number of pipeline stages plus the 
number of vectoring iterations minus one. 
In addition to lattice reduction methods, the present inven-
tion provides lattice reduction systems. An exemplary 
embodiment of a lattice reduction system includes a master 25 
processor configured to transmit a complex-integer output to 
a first FIFO queue and a 2x2 unitary matrix output to a second 
FIFO queue, a first slave processor in indirect communication 
with the master processor by way of at least the first FIFO 
queue and configured to receive the complex-integer output 30 
from the first FIFO queue and process a unimodular matrix, 
and a second slave processor in indirect communication with 
the master processor by way of at least the second FIFO queue 
and configured to receive the 2x2 unitary matrix output from 
the second FIFO queue and process a second unitary matrix. 35 
In some embodiments of the lattice reduction system, the 
master processor, the first slave processor, and the second 
slave processor utilize a single multiplier pipeline structure. 
These and other aspects of the present subject matter are 
described in the Detailed Description below and the accom- 40 
parrying figures. Other aspects and features of embodiments 
of the present invention will become apparent to those of 
ordinary skill in the art, upon reviewing the following 
description of specific, exemplary embodiments of the 
present invention in concert with the figures. While features 45 
of the present invention may be discussed relative to certain 
embodiments and figures, all embodiments of the present 
invention can include one or more of the features discussed 
herein. While one or more embodiments may be discussed as 
having certain advantageous features, one or more of such 50 
features may also be used with the various embodiments of 
the invention discussed herein. In similar fashion, while 
exemplary embodiments may be discussed below as system 
or method embodiments it is to be understood that such exem-
plary embodiments can be implemented in various devices, 55 
systems, and methods of the present invention. 
BRIEF DESCRIPTION OF THE FIGURES 
4 
FIG. 2 provides a schematic diagram for an exemplary 
embodiment of a single Newton-Raphson iteration-based 
integer-rounded divider. 
FIG. 3 provides an schematic diagram for an exemplary 
embodiment of a single iteration per cycle Householder 
CORDIC architecture. 
FIG. 4 provides a block diagram of an exemplary embodi-
ment of a lattice reduction system. 
FIG. 5 provides BER results obtained by some embodi-
ments of the present invention in comparison to conventional 
lattice reduction systems and methods 
DETAILED DESCRIPTION OF PREFERRED 
EMBODIMENTS 
To facilitate an understanding of the principles and features 
of the invention, various illustrative embodiments are 
explained below. In particular, the invention is described in 
the context ofbeing systems and methods for lattice reduction 
for MIMO communication systems. Embodiments of the 
present invention may be applied to any wireless MIMO 
communication system standard including but not limited to 
IEEE 802.1 ln (Wifi), 4G, 3GPP, Long term Evolution, 
WiMAX, and HSPA+. Embodiments of the invention, how-
ever, are not limited to use in wireless MIMO communication 
systems. Rather, embodiments of the invention can be used 
for processing other MIMO communication systems, includ-
ing, but not limited to, optical MIMO systems or other trans-
mission systems having an architecture incorporating mul-
tiple transmitters. 
The components described hereinafter as making up vari-
ous elements of the invention are intended to be illustrative 
and not restrictive. Many suitable components or steps that 
would perform the same or similar functions as the compo-
nents or steps described herein are intended to be embraced 
within the scope of the invention. Such other components or 
steps not described herein can include, but are not limited to, 
for example, similar components or steps that are developed 
after development of the invention. 
The following paragraph describes the notation used 
herein. Superscript H denotes Hermitian, * conjugate, and r 
transpose. The real and imaginary parts are denoted as 9t [•] 
and :S[ •]respectively. lal is reserved for the absolute value of 
scalar a or cardinality of a if a is a set, llall for the 2-norm of 
vector a, and E[ •] for expectation. IN denotes the N xN identity 
matrix. Unless explicitly stated otherwise, then-th element of 
a vector x is denoted by xn, and the (m,n)-th element of a 
matrix x is denoted by xm n· 
The present invention d~scribes lattice reduction systems 
and methods for a MIMO communication system. FIG. 1 
provides a block diagram of an exemplary LR method 100. 
An exemplary embodiment of the present invention provides 
a LR method 100 including providing a channel matrix cor-
responding to a channel in a MIMO communication system 
105, preprocessing the channel matrix to form at least an 
upper triangular matrix 110, implementing a size reduction 
process on elements of the upper triangular matrix 115, and 
implementing a basis update process on diagonal elements of 
the upper triangular matrix 120. 
The LR method 100 can be considered a general descrip-
tion of some methods of the present invention. In various 
embodiments, the LR method 100 can be modified according 
to the desires of particular embodiments of the present inven-
tion. For example, implementing a size reduction process 115 
The following Detailed Description of preferred embodi- 60 
ments is better understood when read in conjunction with the 
appended drawings. For the purposes of illustration, there is 
shown in the drawings exemplary embodiments. But, the 
subject matter is not limited to the specific elements and 
instrumentalities disclosed. In the drawings: 65 can be implementing a relaxed size reduction process, imple-
menting a basis update process 120 can be implementing a 
rapid basis update process, or both. 
FIG. 1 provides a block diagram of an exemplary lattice 
reduction method. 
US 8,559,544 B2 
5 
In an exemplary embodiment of the LR method 100, the 
implementing a size reduction process 110 can be implement-
ing a relaxed size reduction process, which includes choosing 
a first relaxed size reduction parameter for a first-off-diagonal 
element of the upper triangular matrix, choosing a second 
relaxed size reduction parameter, which is greater than the 
first relaxed size reduction parameter, for a second-off-diago-
nal element of the upper triangular matrix, evaluating whether 
where 
1)=) E[SHS] 
N, 
6 
and w=[ w u w 2 , ... , w N) r can be the white Gaussian noise 
vector observed at the Nr receive-antennas with zero mean 
a first relaxed size reduction condition is satisfied for the 
first-off-diagonal element of the upper triangular matrix, and 
evaluating whether a second relaxed size reduction condition 
is satisfied for the second-off-diagonal element of the upper 
triangular matrix. 
10 and covariance matrix E[ww1']=a}IN. In some embodi-
ments, the elements of H can be ind~pendent identically 
distributed (i.i.d.) complex Gaussian distributed coefficients 
with zero mean and unit variance. Additionally, in some 
embodiments, a noise variance aw 2 is known at the receiver, 
In another exemplary embodiment of the LR method, the 
implementing a basis update process 120 can be implement-
ing a rapid basis update process, which includes choosing an 
efficient Siegel condition factor so that a first Siegel condition 
can be evaluated between a first pair of adjacent diagonal 
elements of the upper triangular matrix without using multi- 20 
plication operations and evaluating whether the first Siegel 
condition is satisfied between the first pair of adjacent diago-
nal elements of the upper triangular matrix without using 
multiplication operations. The first pair of adjacent diagonal 
elements can include a first diagonal element and a second 25 
diagonal element. 
15 and H is known at the receiver but unknown at the transmitter. 
In yet another exemplary embodiment of the LR method, 
the implementing a basis update process 120 can be imple-
menting an iterative basis update process and includes com-
puting a 2x2 unitary matrix using a number of vectoring 30 
iterations. In some embodiments, the step of computing the 
2x2 unitary matrix is completed in a number of cycles equal 
to a number of pipeline stages plus the number of vectoring 
iterations minus one. 35 
Given this model system, symbol detection can be the process 
of determining an estimates § of the symbol vector s that was 
sent based on knowledge ofH, y, and a}. 
In an exemplary embodiment of the present invention, a 
Minimum Mean Square Error ("MMSE")-SIC detection 
method can be derived by starting with the MMSE equalizer 
equation. Based on the model in Equation 1, the MMSE 
equalizer equation can be, 
Equation 2: 
In an exemplary embodiment, a subsequent symbol detec-
tion step can comprise application of a quantization function 
Qs, which can quantize each element to the closest symbol in 
S. This can yield a detection result of §CMMSE)=Qsl X(MMSE) J, 
In another exemplary embodiment, xCMMSE) can also be found 
by first defining, 
Equation 3 
In addition to LR methods, the present invention provides 
LR systems. An exemplary embodiment of a LR system 
includes a master processor configured to transmit a com-
plex-integer output to a first FIFO queue and a 2x2 unitary 
matrix output to a second FIFO queue, a first slave processor 
in indirect communication with the master processor by way 
of at least the first FIFO queue and configured to receive the 
complex-integer output from the first FIFO queue and process 
40 and then computing the least-squares solution to the (over-
constrained) extended system HxCMMSEl==Y, which can yield, 
a unimodular matrix, and a second slave processor in indirect 
communication with the master processor by way of at least 45 
the second FIFO queue and configured to receive the 2x2 
unitary matrix output from the second FIFO queue and pro-
cess a second unitary matrix. In some embodiments of a LR 
system, the master processor, the first slave processor, and the 
second slave processor utilize a single multiplier pipeline 50 
structure. 
Equation 4: 
In some embodiments of the present invention, the MMSE-
SIC solution §CMMSE-sic) can be determined by first finding 
the QR-decomposition, H=QR, where Q can be a (Nr+N,)xN, 
matrix and R can be a N,xN, upper triangular matrix. This 
factorization can then be substituted into Equation 4 to obtain 
(assuming R is invertible), 
R.x<MMSE)=?J'Y Equation 5: 
In some embodiments, o="Q7 and, 
N, 
-b - " -R . AC.MM SE-SIC) 
n L...J n,1 s1 
Equation 6 
j=n+l 
= Qs --------
can be used to complete the MMSE-SIC detection method. 
Unfortunately, the diversity order collected under some of 
these embodiments is Nr-N,+1. 
Some embodiments of the present invention comprise a 
flat-fading MIMO communication system with N, transmit-
antennas and Nr receive-antennas. The data stream in these 
embodiments can be divided into N, sub-streams and trans- 55 
mitted through N, antennas. In some embodiments, s= 
[s1 , s2 , ... , sN)rESN, can represent the N,xl transmitted data 
vector at one time slot where S can be the constellation set of 
each element ins. In some embodiments, H can be the NrxN, 
channel matrix corresponding to a channel in a MIMO com- 60 
munication system. In some embodiments, y=[y1 , y2 , ... , 
YN]r can denote the received signal at one time slot from N, 
receive-antennas. In an exemplary embodiment of the present 
invention, the input-output relationship for a MIMO commu-
nication system can be, 
To restore full receive diversity order N, to linear and SIC 
65 detectors, some embodiments of the present invention 
employ LR techniques in the detection process. Some 
embodiments of the LR methods comprise preprocessing H Equation 1: 
US 8,559,544 B2 
7 
to produce a reduced lattice basis H=HT, where Tis a unimo-
dular matrix. This factorization allows Equation 1 to be 
rewritten as, 
Equation 7: 
Some embodiments of the LR systems and methods can 
comprise finding an estimate z of the transmitted symbol 
vector in the z-domain using linear detection or SIC. In some 
of these embodiments,§ can be determined by transforming 
each element of z back to the original signal constellation 10 
using s=Qs[TZ]. 
In some embodiments, the complex-valued QR-decompo-
sition formulation of the LLL algorithm, the CLLL algo-
rithm, can operate on the QR-decomposition ofH. The H=QR 
factorization returned by the CLLL algorithm can satisfy the 15 
size reduction condition in Equation 8 and the complex 
Lovasz condition in Equation 9, 
Equation 8: 
20 
Equation 9: 
where the parameter ll is a relaxation parameter that can be 
arbitrarily chosen from (0.5, 1]. To reduce the complexity of 
the CLLL algorithm, in some embodiments of the present 
invention, the complex Lovasz condition is replaced with the 25 
Siegel condition, 
Equation 10: 
wheres is a Siegel condition factor. In an exemplary embodi-
ment, the Siegel condition factor is chosen from [2, 4]. Table 30 
1 shows the pseudo-code for a non-limiting exemplary 
embodiment of a lattice reduction method with the Siegel 
condition (Line 9 of Table 1) and forms the starting point of 
other embodiments of the present invention. 
TABLE 1 
Exemplary Embodiment of a Lattice Reduction Metbod 100 
[Q,R,T] =QR (H); k = 2; g = 1(1 + j) 
whilek,; N, 
for n = k- 1 : -1 : 1 
ll = roun~ (R,,JR,,,n) 
35 
40 
8 
(QAM), however, can be S={sl9t [s],:S[s]E A}, where A= 
{ -v'M + 1, ... , -1, 1, ... v'M-1}. Therefore, Equation 5 can be 
reformulated such that detection can be carried out as ifthe 
real and imaginary parts of the original constellation set are 
drawn from the consecutive integers. 
In an exemplary embodiment of the present invention, a 
new constellation set can be defined as 
The symbol vector in Equation 1 can then be characterized as 
a symbol vector sESN'thathas been transformed by 2s-1(1 +j). 
This idea can be applied to Equation 5 by making the substi-
tution x.CMMSE)=2x.CMMSE)_l(l+j) and simplifying, 
RxCMMsE1 = ~l:t CY+ Hl(l + j)) 
2 
Equation 11 
In an exemplary embodiment of the present invention, 
Equation 11 can allow for the utilization of the CLLL algo-
rithm with the MMSE-SIC detector in Equation 6. The CLLL 
algorithm can return the factorization HT=QR, where Q can 
be a (Nr+N,)xN, matrix. Applying this relationship to Equa-
tion 11 yields, 
Equation 12 
Using the substitution ofz=T- 1xCMMSE) and partitioning Q 
into a NrxN, matrix (>C1l and N,xN, matrix (>C2 l, such that 
Q=[(QC1lf(QC2lff, a new form of Equation 5 is represented 
by, 
- 1 ( -(1) H - l ) Rz = 2 (Q ) y +RT- 1(1 + j) Equation 13 
Line 1 
Line 2 
Line 3 
Line4 
Line 5 
Line 6 
Line 7 
Line 8 
Line 9 
Rl:n,k = Rl:n,k- U · Rl:n,n 
T,k= T,k- u · T,n 
gn=gn+u·gk 
end 
} size reduction 
45 
Detection in the z-domain can be completed by first com-
puting z using Equation 6 with the Qs function replaced with 
the integer-rounding function. Then, the estimated symbol 
vector can be determined by computing s=Qs[2TZ-l(l+j)]. 
Hence, CLLL-MMSE-SIC detection for QAM can require, in Line 10 
Line 11 
Line 12 
Line 13 
Line 14 
Line 15 
Line 16 
Line 17 
Line 18 
Line 19 
if IRk-l,k-112 > l:;1Rk)2 
0=---
Siegel condition 
evaluation 
1 r J?;-1,, l?,,, 1 
llRk-Lk,k II -k,,, l?,_1,, ~k.~llk~k:lS'.:~~'j£l k,k-lN, } , basis update 
Swap (k - 1 )·tb and k-tb columns in Rand T 
Swap (k - 1 )·tb and k-tb rows in g 
k=max(k-1,2) 
else 
k=k+l 
end 
end 
In some embodiments of the present invention, ifthe origi-
nal signal constellation set comprises the infinite complex 
integer plane, then the signal constellation set in the z-domain 
can also comprise the infinite complex integer plane. In some 
50 an exemplary embodiment, that the CLLL algorithm generate 
the vector T-1 l(l+j) in addition to (>Cll, R, and T. 
Some embodiments of the invention provide an LR method 
that comprises providing a channel matrix, H, corresponding 
to a channel in a MIM 0 communication system 105. In some 
55 embodiments of the LR method, the step of preprocessing the 
channel matrix to form at least an upper triangular matrix 110 
can be preprocessing the channel matrix, H, to generate a 
unitary (>C1l matrix, an upper triangular R matrix, unimodular 
T matrices and thevectorT- 1 l(l+j) (denoted by gin Table 1). 
60 An exemplary embodiment of the invention preprocesses the 
channel matrix, H, by performing a QR-decomposition pre-
processing step on the channel matrix. An exemplary embodi-
ment of QR-decomposition preprocessing is illustrated in 
Line 1 of Table 1. Some embodiments of the LR method of these embodiments, during the initial detection step in the 
z-domain, the Qs function in Equation 6 can be replaced with 65 
the element-wise integer-rounding operation. The signal con-
stellation set for M-ary Quadrature Amplitude Modulation 
comprise implementing a size reduction process on elements 
of the R matrix 115. In an exemplary LR method, implement-
ing a size reduction process on elements of the R matrix 
US 8,559,544 B2 
9 
comprises satisfying Equation 8 for progressively larger 
upper-left square sub-matrices ofR. 
Additionally, some embodiments of the LR method com-
prise implementing an iterative basis update process 120. In 
an exemplary embodiment of the LR method, implementing 
an iterative basis update process 120 comprises satisfying 
Equation 9 for progressively larger upper-left square sub-
matrices of R. In another exemplary embodiment of the LR 
method, implementing an iterative basis update process 120 
comprises satisfying Equation 10 for progressively larger 
upper-left square sub-matrices of R. Some embodiments of 
the invention iteratively update g and the (>Cll, R, and T 
matrices as needed. In some embodiments, the variable k in 
Table 1 can indicate the size of a currently active upper-left 
square sub-matrix ofR. 
In some embodiments of the present invention, it is desired 
to implement an LR system in fixed point hardware. Careful 
magnitude analysis of the R elements, based on the system 
model in Equation 1 and LR method operation, can aid in 
avoiding undesirable overflow behaviors. In some embodi-
ments of the invention, parts of the LR method are modified to 
reduce the method complexity and streamline the hardware 
realization. 
In some embodiments of the invention, preprocessing the 
channel matrix, H, using QR-decomposition preserves the 
column energy ofH. In these embodiments, the squared mag-
nitudes of the R elements in column i before the start of the LR 
method can be upper bounded by the column energy in the 
i-th H column for a constant aw =amax using, 
10 
on both algorithm complexity reduction, BER performance, 
and preservation of numerical properties, including tightly 
bounded R diagonal elements and tightly bounded size reduc-
tion operation results. 
In an exemplary embodiment of the LR method, imple-
menting a size reduction process comprises executing only 
the first inner-loop iteration, shown on Lines 4-7 of Table 1, 
during each outer loop iteration. In some of these embodi-
10 ments of the present invention, the symbol vector can be 
substantially unchanged by executing only a first inner-loop 
iteration, as opposed to multiple inner-loop iterations, during 
each outer-loop iteration. 
The following proof illustrates that the symbol vector can 
15 be substantially unchanged by executing only a first inner-
loop iteration during each outer-loop iteration-i.e. that the 
symbol vector is substantially unaffected by the size reduc-
tion of the entire matrix. First, a matrix A(') can be defined to 
be a unimodular matrix that has a one on each diagonal, 
20 
arbitrary complex integers in the upper off-diagonal elements 
of the i-th column, and zero on the remaining matrix entries. 
It is then assumed that Qs in Equation 6 is the element-wise 
integer rounding operation. If x is the SIC solution when an 
25 arbitrary N,xN, invertible upper-triangular matrix R with 
complex elements and an N,xl complex vector b are used in 
the Equation 6 SIC recursion, then the elements of the SIC 
solution x', when R'=RA (i) and b are used in Equation 6, are 
then equal to, 
Equation 14: 3o 
where 2llh,ll2 can be Chi-square distributed with degrees of 
freedom 2N rand a max can be the maximum value of aw when 
the step of preprocessing the channel matrix 105 is included 
in the LR method. 
To determine an upper bound B,n,,forthe magnitudes of the 
35 
R elements, some embodiments of the present invention treat 
each H matrix as a generation ofN1 i.i.d. trials of the random 
variable in Equation 14. Therefore, B,nit can be determined 
according to a target overflow probability. In an exemplary 40 
embodiment of the present invention, the probability of the 
column norm llll,;11 a bound B,nit corresponds to one overflow 
event every 22.7 years for an 802.1 ln system that requires the 
processing of 128 MIMO channel matrices every four micro-
seconds. In some embodiments, if saturation quantization at 45 
the receiver is adopted, B,nit can safely upper bound the ele-
ments of R at the start of the LR method. In an exemplary 
embodiment, when Nr=N,=4 and amax=0.62(4.15 dB), then 
B,n,,=22.s2. 
In some embodiments, the diagonal elements of R can be 50 
upper bounded by B,nit during operation of the LR method, 
which can use the Lovasz condition. Because at Line 11 in 
Table 1 the Rk-l k element satisfies the size reduction condi-
tion of Equation° 8, and s"=2 it follows that diagonal elements 
isnsNr Equation P- 1 
1 Sn S i-1, 
Given that only the i-th column elements ofR' in rows of 
index less than i are affected, then x'n =xn for isnsN,. Induc-
tion can then be used to prove that the second part of Equation 
P-1 is true for the base case n=i-1. Then it can be assumed that 
Equation P-1 is true for lsnsN, and shown that Equation P-2 
is true for 1-lsnsN,, where 2slsi-2. 
Now let QR=HT is the H (defined in Equation 3) factor-
ization produced by a CLLL algorithm. Also let z be the SIC 
solution to Equation 13, 
- 1 -H _ 
Rz = 2Q (y + Hl(l + j)) Equation P- 2 
where the right-hand side is written in equivalent form. The 
symbol vector estimate (before scaling, shifting, and quanti-
zation to the nearest symbol constellation) from this z-do-
of R can be upper bounded by B,nit in embodiments of the 
invention where the Siegel condition is used. Further, in these 
embodiments where the Siegel condition is used, the magni-
tudes of the real and imaginary parts of the R off-diagonal 
entries are bounded by 
55 main solution is then s=Tz. Also let R.C'l=R:Ci- 1lB(') and 
T'=T('- 1lB(') with R C1l=R and TC1l=T. Let BC') be generated by 
running the procedure in Table 2 with R=R('- 1l initially. IfR 
is size-reduced to produce a new upper-triangular matrix 
R'=R(BC2l ... B(N,)) and an updated unimodular matrix T'=T 
after implementing a size reduction process 115. 
Some embodiments of the present invention involve modi-
fications to a CLLL algorithm and can therefore be evaluated 
60 (BC2l ... BCN,l), then the updated CLLL-MMSE-SIC symbol 
vector estimate s' (before scaling, shifting, and quantization 
to the nearest symbol constellation) is substantially 
unchanged, i.e. s'=s. Let zCi) be the SIC solution to Equation 
P-2 that has R replaced with R.Ci), and sC'l is the subsequent 
65 symbol vector estimate. Then, because s'=s(N,), induction on i 
can be used to prove s'=s. Beginning with the base case i=2, 
Equation P-1 can be used to show that 1;C2l=TC2lzC2ls. 
US 8,559,544 B2 
11 
Ifit is assumed that sCi)=s, then it can be shown that 1;C'+1l=s 
using, 
and 
where 
(i) (i) 
Z1 + u1,i+lZi+l 
(i) (i) 
Zi + Ui,i+ 1Zi+1 
(i) 
Zi+l 
(i) 
T U+!) TU) '\' TU) i+l = i+l - L...J Uj,i+l j · 
j=l 
Finally, 
Equation P- 3 5 
10 
Equation P-4 15 
20 
Equation P- 5 25 
1 
!/Jn,k 2 2' 
12 
if 9t [Rn ,k] I scpn ,kl Rn) and I :S[Rn,kl I scpn ,kl Rn) are satisfied. In 
some embodiments of the invention, a size reduction process 
is performed on Rn,k elements for n<k-1 only when the 
relaxed size reduction condition is not satisfied. 
As the value of the relaxed size reduction parameter <Pn,k 
increases, how often the size reduction algorithm is per-
formed decreases, which decreases the complexity of the 
algorithm while maintaining bounded size reduction results. 
Therefore, some embodiments of the present invention com-
prise choosing a second relaxed size reduction parameter, 
which is greater than the first relaxed size reduction param-
eter, for an element in a second or third off-diagonal of the R 
matrix. In an exemplary embodiment the value of the second 
relaxed size reduction parameter is set to greater than 1.5. 
Some embodiments of the present invention comprise 
implementing an iterative basis update on elements of the R 
matrix 120. During the step of implementing a basis update 
process 120, the Rk,k-l element, which becomes the k-th R 
diagonal if the colunm swap in Line 13 of Table 1 is per-
formed, can be updated at Line 11 in Table 1, such that the 
magnitude of this element can be less than or equal to the 
magnitude of the Rk-l k-l element. In some embodiments, the 
Rk,k-l element, which can become the (k-1)-th R diagonal = TUlzUl 
=ii) 
=S 
30 after the column swap, can be updated such that the squared 
magnitude is equal to I Rk,kl 2 +I Rk- l ,kl 2 . In some embodiments, 
by applying a failed Siegel condition and a relaxed size reduc-
tion condition, the following is true, 
TABLE2 
Generations ofB(i) Matrices for Full Size Reduction 
on the i-th column ofR 
35 
Equation 15 
Line 1 
Line 2 
Line 3 
Line4 
Line 5 
B(i)=IN,; 
forn=i-1:-1: 1 
Un,i =rnund CR,,,; I Rn,n); 
R, = R; - un,i , Rn; B/il - un,i , Bn <O ; 
end 
In some embodiments, if the size reduction condition on 
40 the Rk-l,k element is relaxed, then implementing a basis 
update process 120 can increase the maximum squared mag-
nitude of the R diagonal elements by a factor of 
This proof also illustrates that in some embodiments of the 45 
present invention, the symbol vector estimate is substantially 
unaffected by implementing a size reduction process 115 
comprising performing operations on Rk-l,k elements for 
2sksN,. Although these embodiments have lower complexity 
than conventional CLLL algorithms, the R elements that do 50 
not undergo size reduction operations can increase uncontrol-
lably, which is not desirable in fixed-point hardware imple-
mentation. 
Therefore, in some embodiments of the present invention 
implementing a size reduction process 115 can be implement- 55 
ing a relaxed size reduction process. The relaxed size reduc-
tion process comprises choosing a first relaxed size reduction 
parameter <Pn k for an element in a first-off-diagonal of the R 
matrix, deterinining whether a relaxed size reduction condi-
tion is satisfied, and performing an iterative size reduction 60 
algorithm on an element of the R matrix if the relaxed size 
reduction condition is not satisfied. In an exemplary embodi-
ment, the value of the first relaxed size reduction parameter is 
at least 0.5. In another exemplary embodiment, for the R 
matrix and a given ll in Table 1, the Rn k element for n<k 65 
satisfies a relaxed size reduction conditioii, which is defined 
by 
( ~ + 2!/Jf-1,k). 
Therefore, if the maximum number of basis updates in an 
iterative basis update process is G and all <Pk-l k are the same, 
a new upper bound for the magnitude of the R diagonal 
elements can be defined as, 
1 2 ~ + 2!/Jk-1,k s 1 
Equation 16 
1 2 ~ + 2!/Jk-1,k > 1 
As discussed above, in some embodiments of the present 
invention comprising implementing a relaxed size reduction 
process, a size reduction operation is required for each inner-
loop iteration, Lines 4-7 of Table 1 if the Rn k element in Line 
5 of Table 1 does not satisfy a relaxed size' reduction condi-
tion. In some of these embodiments, the invention further 
comprises forcing a size reduction operation each inner-loop 
US 8,559,544 B2 
13 
iteration when the magnitude of the real or imaginary part of 
the Rn,k element in Line 5 exceeds 
14 
In some embodiments, to remove the dependence of Equa-
tion 20 on R1,1, the induction proof discussed earlier can be 
slightly modified to accommodate the Siegel condition in 
Equation 10, which results in, 
_ l-n _ 
IRn,nl < ,-,- IR1.1I 
Equation 21 
In some embodiments, for the size reduction operations on 
the k-th colunm, R'n k can represent an intermediate value of 
Rn k afterthe first (k..'.n-1) inner-loop iterations but before the 10 
siz~ reduction operations on then-th row element of the k-th 
colunm. Additionally, u1 k can be equal to zero when execu-
tion of the (k-1 )-th inner:loop iteration does not occur (Rn k at 
Line 4 ofTable 1 satisfies the relaxed size reduction condition 
Additionally, in some embodiments, the substitution of 
Equation 21 into Equation 18 followed by substitution of 
Equation 20 results in, 
and the 15 k-! l'R[R~.kJI < l'R[Rn,kll + ~ /f ¢n.t(B + l'R[R;,,JI + IJ[R;,kJI) Equation 22 
1 
-B 
2 
upper bound) and can be equal to the u value in Line 4 of Table 
1 at the (k-1)-th inner-loop iteration when execution of this 
inner-loop iteration does occur. This intermediate size reduc-
tion result can be written as: 
k-! 
k~,k = Rn,k - ~ Ut,kkn,t 
Equation 17 
l=n+l 
l=n+l 
In some embodiments, an upper bound on the magnitude of 
20 the :S[R'n k] can be obtained by repeating the steps in Equa-
tions 18-22. 
The upper bounds for n=k-1 can be trivially determined 
and then recursively substituted as the upper bounds for 
smaller n are determined. If the <Pn / s do not change during 
25 execution of the system, then, ' 
30 
k-! 
l'R[R~.kJI < /'n,kB + l'R[Rn,kJI + ~ lYp,k(l'R[Rp,k]I + IJ[Rp,k]I) 
Equation 23 
p=n+l 
where the aP/s and Yn/s can be determined during the 
recursive substitution process. 
The summation on the right-hand side of Equation 17 
involves Rn 1 element, which can be the result of size reduc-
tion operati~ns during previous outer loops (when the system 35 
or method was operating on upper-left square matrices 
In some embodiments, at the end of an outer loop for a 
particular k, IRp,kl be upper bounded by _!3 for p=k-1 as a 
result of possible basis updates and by B/Y2 for P"'k-1, when 
a basis update does not occur. Therefore, in some embodi-
ments, the maximum energy that can be re-distributed among 
smaller than kxk). In some embodiments, by applying the 
relaxed size reduction condition to the Rn 1 element, the real 
component of R'n,k in Equation 17 can be ~pper bound by: 
k-! 
l'R[Rn,kll + ~ (l'R[u1,k]I + l\![u1,k]l)</>n.tlRn,nl 
Equation 18 
l=n+l 
In some embodiments, to remove the dependence of Equa-
tion 18 on the u1 k from the definition of Rn k elements, the 
enforcement of the absolute ' 
1 
-B 
2 
upper bound results in the following relation, 
Equation 19 
Using signed magnitude techniques, this can be written as, 
Equation 20 
40 the R1 ·k-I k sub-vector elements as the result of subsequent 
basis i.ipiliites (as Siegel conditions fail and a LR system or 
method operates on smaller matrix sizes) can be shown by, 
45 k-! 2 ( k-2) ~ IRp,kl s B2 1 + - 2-
p=l 
Equation 24 
To maximize the right-hand side in Equation 23, in some 
50 embodiments, it can be assumed that subsequent basis 
updates distribute the energy among the sub-vector elements 
to maximize the upper bound. By solving this constrained 
maximum problem, 
55 
Equation 25 
( 
1 k-1 ) ) k - + .z.: a:~.k 
2 p=n+l 
60 and a similar bound can be reached for the imaginary com-
ponents. In some embodiments of the present invention, the 
LR systems and methods can be safely utilized in fixed-point 
implementations by designing hardware around these upper 
bounds. 
65 In an exemplary embodiment of the present invention, 
implementing a size reduction process 115 comprises com-
puting an integer-rounded quotient (shown in Line 4 of Table 
US 8,559,544 B2 
15 
1 ). In some embodiments, however, this computation can 
often be avoided by noticing that 
implies 9t [u]=O and 
10 
16 
which can be rewritten as 
Because the upper bound of this interval is 
1 
2·~ 
satisfies Equation 28. 
In some embodiments, round( q') can be computed by first 
computing nr'n to f + 1 bits of precision (truncating the remain-
ing bits), then applying 2'1', and rounding to the nearest inte-
ger. 
In some embodiments, an upper bound ~.a for the relative 
error (lnE'2'1'1/q) of the NR formulation in Equation 27 is 
derived, where 2Y can be the number of entries in each LUT. 
This relative error allows some embodiments of the invention 
to take advantage of the relaxed size reduction condition. In 
an exemplary embodiment, given that the fixed-point NR in 
implies 9t [u]l=l. In some embodiments, the case of 9t [u]I, 15 
l:S[u] I> 1 can be handled by performing a relaxed size reduc-
tion process that comprises using an integer rounded-divider 
based on a single Newton-Raphson (NR) iteration if a first or 
second relaxed size reduction condition is not satisfied. These 
embodiments can take advantage of the divisor reuse that can 20 
be inherent in the LR method. In some embodiments, if it is 
assumed that reciprocals are being buffered, this reciproca-
tion based approach is also useful for the subsequent detec-
tion step because the stored reciprocals can be used for the 
SIC recursion in Equation 6. 25 Equation 27 that has a maximum relative_error ~.a is used to 
compute the u for size reduction on the Rn,k (Line 4 in Table 
1 ), the relaxed size reduction condition for this entry can be 
satisfied if, 
In some embodiments of the present invention, a formal 
description of the NR iteration based division method is used. 
To compute n/d for n,d>O, d can be first normalized such that 
d2'1'=c!,,, where 1 sdn <2. An estimate r'n of the reciprocal of dn 
can then be computed from an initial estimate rn +E, which can 30 
be obtained from a look-up table (LUT), using, 
Equation 26: 
where r =1/d and E can be the error of the initial estimate. 
In so~e e~bodiments of the present invention Equation 26 35 
is altered for a fixed-point hardware implementation. This can 
be done by introducing the notation { w.f} to indicate an 
unsigned number representation having w integer bits and f 
fraction bits, letting the function Q1 indicate truncation quan-
} 40 tization to {2.(f+l) , and letting the function Q2 indicate 
truncation quantization to { 2.f}. Some embodiments also 
allow an additional LUT for (rn+E)2 . If it is assumed that both 
n and d have {w.f} representation and the (rn+E) LUT has 
{ 1.a} representation, then Equation 26 can be modified, 
which results in, 45 
Equation 27: 
In some embodiments, if E' can be the reciprocal error such 
that r'n =rn+E', then E'<O. The integer-rounded quotient can 50 
then be found by computing q'=n2'1'r'n and then determining 
round( q'). If q=n/d, then for a fixed-point NR formulation in 
Equation 27, the following is true: 
55 
Equation 28 
1 
I R~Rn.kll• 1\!~Rn.kll <!/Jn~ -2 Rn,m Rn,m y,a 
Equation 29 
where cjJ k can be the relaxed size reduction condition factor associat~d with the Rn,k entry. Equation 29 can be proven true 
by letting u' be the integer-rounded quotient produced by 
using the fixed-point NR formulation. If n=9t [Rn,kl I and 
d=IRn), then 9t [u']l=round(q'). Equation 28 then implies, 
Equation 30 
The absolute error Eq can be upper bounded by 
which, in accordance with Equation 29, is bounded by 
where E =q-q' and !;=round( q'-q. Equation 28 can be proven 
true. Be~ause E'<O, it follows that Eq>O. Then round( q') satis- 60 If this result is applied to the lower bound in Equation 30 and 
~ ~ 
1 ' ' 1 q' - 2 <round(q) sq+ 2, 65 
US 8,559,544 B2 
17 
assumption is applied to the upper bound in Equation 30, 
then, 
18 
cation datapath, which can require four cycles. The divisors 
and corresponding reciprocals (normalized reciprocal r'n and 
shift value 'ljJ) can be stored in a cache for use during subse-
A. IR[Rn,kJI IR[ 'JI IR[Rn,kJI A. Equation 31 5 
quent iterations of the LR method. Further, a collection of 
comparators and straightforward logic can be used to detect 
trivial u values and evaluate the relaxed size reduction con--'f'n,k + --- < U < --- +'f'n,k 
Rn,n Rn,n 
Equation 31 demonstrates that in some embodiments 
where 9t [ u'] I is used for the magnitude of the real part of u in 
Line 5 of Table 1, then the real part of the updated Rn,k entry 
can satisfy the <Pn,k relaxed size reduction condition. These 
same arguments can be used to prove that in some embodi-
ments where l:S[u']I is used for the magnitude of the imagi-
nary part ofu in Line 5 of Table 1, then the imaginary part of 
the updated Rn k entry can satisfy the <Pn k relaxed size reduc-
tion condition.' ' 
In some embodiments, given a fixed-point NR in Equation 
27 that has a maximum relative error My,a, Equation 29 is 
satisfied if, 
, , 1-Mya( 1) 1 IR[u JI, l\J[u JI < ~ !/Jn,k - 2 - 2 
y,a 
Equation 32 
Equation 32 can be proven true because the application of 
Equation 28 and 
together imply that 
1 llR[kn,Jll 
-;;: + Rn,~ (1 - My,a) < IR[u'JI. 
This inequality, when reconciled with Equation 32, implies 
that Equation 29 is true. These same arguments can be used to 
prove Equation 32 true for I :S[ u'] I. 
dition. 
Some embodiments of the invention provide an LR method 
comprising implementing a basis update process 120. In 
10 some embodiments of the present invention, the implement-
ing a basis update process 120 can comprise evaluating a 
Lovasz condition-as discussed earlier. In an exemplary 
embodiment of the present invention, implementing a basis 
update process 120 can comprise evaluating a Siegel condi-
15 tion. In another exemplary embodiment, the implementing a 
basis update process 120 can comprise computing a 2x2 
unitary matrix, which can be denoted by 8. In systems and 
methods of the present invention, evaluating the Siegel con-
dition (Line 9 in Table 1) can be relatively straightforward 
20 while computation of the 8 (Line 10 in Table 1) can require 
the inverse square-root operation, which has a high hardware 
complexity. Therefore, some embodiments of the present 
invention use a numerically stable and efficient method to 
compute 8, which becomes apparent by forming the vector, 
25 v~ [IRk,kl 9t [Rk_ 1;J':SfRk-i,kW Equation 33: 
and viewing these computations as a vector normalization 
problem. If this is done, 8 can be formed from the elements of 
v/llvll, and the updated (k-1)-th diagonal after the 8 multipli-
30 cation in Line 11 of Table 1 and colunm swap in Line 13 of 
Table 1 is llvll· 
Some embodiments of the present invention solve this 
vector normalization problem by applying the Householder 
CORDIC algorithm. In an exemplary embodiment, vectoring 
35 iterations (rotating a vector to an axis) and rotation iterations 
(rotating a vector around an arbitrary axis) can be performed 
by low hardware complexity shifts and additions. A sequence 
of J Householder vectoring iterations can be used to compute 
llvll to a certain precision within a constant CORDIC gain 
40 factor, C=II,~/(1+2-2'+ 1 ), 
CCllvll+E)e 1 ~A(J) ... A0 >v Equation 34: 
In some embodiments of the present invention, in accor- 45 
dance with Equation 32, it is not necessary to perform extra 
rounding error detection and correction proposed by some 
conventional systems. When Equation 32 is not satisfied, this 
extra logic remains unnecessary ifLR iterations are repeated 
until all u' values generated during an iteration satisfy Equa- 50 
tion 32 and all off-diagonal elements in the k-th colunm ofR 
satisfy the 
where A (i) can be determined from the sign of the 
vector elements at the end of each vectoring iteration and 
(ACi)fAC')=(l+2-2 '+1 ) 2 I, e1=[1 0 O]r, and Eis an error term 
introduced by the finite number of vectoring iterations, J. J 
can be defined to be a precision parameter equal to the number 
of vectoring iterations performed to compute llvll· In an exem-
plary embodiment J is equal to nine. However, J can also be 
equal to any other positive integer. In some embodiments, 
multiplication by A (i) can be implemented with addition 
operations and bit-shifts of length i and 2i, which can be 
easily realized on an FPGA using J:l multiplexers. v/llvll can 
55 
then be computed by rotating the vector (1/C)e1 using the 
transpose of the A (i) matrices in the opposite order: 
absolute upper bound after the size reduction process is per-
formed for that iteration. Because computed u' component 
magnitudes are always less than or equal to the actual com- 60 
ponent magnitudes, the previous analysis of the relaxed size 
reduction condition remains valid. 
FIG. 2 provides a schematic diagram for an exemplary 
embodiment of a single Newton-Raphson iteration-based 
integer-rounded divider. In the exemplary embodiment, the 65 
reciprocation datapath, which can compute Equation 27 in 
four cycles, shares a multiplier with the reciprocal multipli-
Equation 35 
Due to the reversed order that the (A Ci)f matrices are 
applied in Equation 35, the normalized vector computation 
begins after these matrices are determined. In these embodi-
ments, for a single-iteration-per-cycle Householder CORD IC 
architecture, 21 cycles can be required for the 8 matrix com-
putation. Therefore, some embodiments of the present inven-
US 8,559,544 B2 
19 
tion overlap the computation of llvll and v/llvll· This can be 
done with a slight manipulation of Equation 34, which results 
in, 
V; _ T (1) (1)( 1 ) llvll+.s -e1A ... A Ce1 
Equation 36 
20 
a second diagonal element. In an exemplary embodiment, the 
Siegel condition factor is chosen to be 2.06640625. Those 
skilled in the art will appreciate that the Siegel condition 
factor can be chosen in accordance with the demands of a 
particular implementation of the present invention. In another 
exemplary embodiment, implementing a rapid basis update 
process comprises evaluating whether the first Siegel condi-
tion is satisfied between the first pair of adjacent diagonal 
elements without using multiplication operations. In yet 
where v, can be the i-th element of v and e, can be the i-th 
standard Euclidean basis vector. Thus, in some embodiments 
of the present invention, in accordance with Equation 36, the 
i-th element ofv/llvll can be computed by rotating e/C using 
theA (')in the same order as applied in the vectoring iterations. 
10 another exemplary embodiment, implementing a rapid basis 
update process comprises evaluating whether a second Siegel 
condition is satisfied between a second pair ofadjacent diago-
nal elements of the upper triangular matrix without using 
multiplication operations. The second pair of adjacent diago-
In an exemplary embodiment, Equations 34 and 36 are 
implemented as part of the basis update process by employing 
a single iteration-per-cycle Householder CORDIC architec-
ture that has been unrolled and includes multiple pipeline 
stages configured to concurrently execute at least one vector-
ing iteration and at least one rotation iteration. In an exem-
plary embodiment, the architecture includes four pipeline 
stages. 
15 nal elements can comprise the second diagonal element and a 
third diagonal element. In still yet another exemplary 
embodiment, implementing a rapid basis update process 
comprises evaluating whether a third Siegel condition factor 
is satisfied between a third pair of adjacent diagonal elements 
20 of the upper triangular matrix without using multiplication 
operations. The third pair of adjacent diagonal elements can 
comprise the third diagonal element and a fourth diagonal 
element. FIG. 3 provides an schematic diagram for an exemplary 
embodiment of a single iteration per cycle Householder 
CORDIC architecture. In this exemplary embodiment, the 25 
single iteration per cycle Householder CORD IC architecture 
has been unrolled and includes multiple pipeline stages con-
figured to concurrently execute at least one vectoring iteration 
and at least one rotation iteration. In some embodiments, each 
pipeline stage can operate in vectoring mode (compute anA (') 30 
and then apply A (i) on the input vector) or rotation mode 
(apply a previously computed A (i) on the input vector). The 
pipeline can be initially filled by inputting v into stage-0 in 
vectoring mode during the first cycle of initialization and 
inputting e/C, e2 /C, and e3 /C in rotation mode during the 35 
next three cycles, respectively. The results of these vectoring 
and rotations proceed through the pipeline, feeding back to 
stage-0 when the end of the pipeline is reached. After J cycles, 
the computed C(llvll+E) can exit the pipeline, and the com-
puted elements ofv/(llvll+E) can exit the pipeline in the fol- 40 
lowing three cycles. Hence, embodiments of the invention 
adopting this architecture can compute 8 in J+3 cycles. 
Although, these embodiments may require four times the 
number of adders and shifters as embodiments using the 
approach in Equation 35, the complexity of the individual 45 
shifters is considerably decreased. Further, because unrolling 
can allow part of the shifting operations to be performed with 
wire shifts, each stage can have [J/(number of pipeline 
stages)]:! multiplexers. In addition, the unrolling in some 
embodiments allows more effective register re-timing 50 
because automated synthesis tools can move registers across 
the stages to improve the critical path. 
In some embodiments of the present invention, performing 
a basis update process occurs and the 8 matrix is computed 
only when the Siegel condition is false. Thus, 8 may not be 55 
speculatively computed. Therefore, in some embodiments of 
the LR method, implementing a basis update process 120, 
comprises employing a low complexity method for evaluat-
ing the Siegel condition. In, an exemplary LR method, imple-
menting a basis update process 120 can be implementing a 60 
rapid basis update process on diagonal elements in an upper 
triangular matrix R. The process can comprise choosing an 
efficient Siegel condition factor ~" such that a first Siegel 
condition can be evaluated between a first pair of adjacent 
diagonal elements of the upper triangular matrix without 65 
using multiplication operations. The first pair of adjacent 
diagonal elements can comprise a first diagonal element and 
In some embodiments, the complexity of evaluating the 
Siegel condition is determined in part by the value of the 
Siegel condition factor S· Therefore, some embodiments 
implement the rapid basis update process with an efficient 
Siegel condition factor. An efficient Siegel condition factor is 
a Siegel condition factor that simplifies the complexity of 
evaluating the Siegel condition. In an exemplary embodi-
ment, the efficient Siegel condition factor is a Siegel condi-
tion factor with a value such that the Siegel condition can be 
evaluated without using multiplication operations. In another 
exemplary embodiment, the efficient Siegel condition factor 
has a value such that the Siegel can be evaluated with a 
comparator and two adders. 
Evaluating the Siegel condition with a Siegel condition 
parameter s=2.06640625 is equivalent to evaluating the fol-
lowing: 
Equation 37: 
Thus, in an exemplary embodiment shown in FIG. 3, evalu-
ating a Siegel condition can be implemented very rapidly 
using only a comparator and two adders. The ability to 
quickly evaluate the Siegel conditions allows some embodi-
ments of the invention to easily incorporate re-evaluations of 
Siegel conditions after performing a basis update process into 
the Householder CORDIC architecture. In an exemplary 
embodiment, implementing a basis update process 120 com-
prises performing a basis update process and evaluating at 
least the first and third Siegel conditions, if the second Siegel 
condition is not satisfied. 
It can be noted that after a basis update for k=k', the state 
of the Siegel condition becomes uncertain for 
max(2,k'-l)sksmin(k'+l, N,). This observation can be uti-
lized in a collection of state machines, one for each Siegel 
condition, that track whether each Siegel condition is satis-
fied. Each of the state machines can indicate either "satisfied" 
or "uncertain" for each Siegel condition. In an exemplary 
embodiment, a single state machine tracks at least whether 
the first Siegel condition is satisfied, whether the second 
Siegel condition is satisfied, and whether the third Siegel 
condition is satisfied. In some embodiments, the LR method 
terminates when the state machine indicates that at least the 
first, second, and third Siegel conditions are satisfied. In 
anther exemplary embodiment, a first state machine tracks 
whether at least the first Siegel condition is satisfied, a second 
US 8,559,544 B2 
21 
state machine tracks whether at least the second Siegel con-
dition is satisfied, and a third state machine tracks whether at 
least the third Siegel condition is satisfied. In another exem-
plary embodiment, the LR method terminates when at least 
the first, second, and third state machines, indicate that at least 
the first, second, and third Siegel conditions are satisfied. In 
some embodiments, when all Siegel conditions in the upper 
triangular matrix are satisfied, the symbol vector estimate is 
unaffected by further size reduction operations. 
22 
LR method iteration to be repeated when any u component 
magnitude exceeds ten, size reduction operations for~ [u], 
:S[u]>lO can be handled efficiently. In some embodiments, 
choosing <Pk-I.k equal to 0.51 results in 9.6% of all inner-loop 
iterations producing au that has non-zero real and imaginary 
parts. 
FIG. 3 illustrates an exemplary embodiment of the present 
invention comprising a secondary datapath for evaluating the 
Siegel condition. The secondary datapath can also interface to 
an external data bus such that Siegel conditions can be evalu-
ated as the R matrix memory is being filled. The secondary 
datapath can also operate independently from the multilier in 
the Householder CORDIC architecture, which is used for 
both Householder CORDIC gain compensation (multiplica-
tion of C(llvll+E) by 1/C) and partial computation of basis 
updates. 
In some embodiments of the present invention, an NR-
based reciprocation datapath only requires three (unsigned) 
integer bits. In an exemplary embodiment, the dividends in 
10 Line 4 of Table 1 are the components of the R'k k elements, 
defined previously, and Equation 25 reveals that ~ll dividend 
magnitudes are bounded above by 26 ·60 (eight dividend inte-
ger bits). In some embodiments, after implementing a relaxed 
size reduction process with a particular k outer-loop iteration, 
15 the magnitude of each off-diagonal element in the k-th col-
umn is upper bounded by 
In an exemplary embodiment of the present invention, a LR 20 
method 100 for a MIMO communication system comprises 
providing a channel matrix corresponding to a channel in a 
MIMO communication system 105. The MIMO system can 
have any number ofNt inputs and any number ofNr outputs. 
In an exemplary embodiment, the MIMO system has four 25 
inputs and four outputs. The LR method 100 can further 
comprise preprocessing the channel matrix to form at least an 
upper triangular matrix 110. The upper triangular matrix can 
In some embodiments, no more than five integer bits are used 
at the beginning of the size reduction process to represent 
each real and imaginary component ofR. In some embodi-
ment, the magnitude of v in Equation 33 is upper bounded by 
Band the right hand side of Equation 34 is upper bounded by 
CB. 
In some embodiments, an orthogonality deficiency thresh-
old Eth parameter is used to affect how often the step of 
preprocessing the channel matrix 110 must be completed on 
the channel matrices to maintain a desired BER performance. 
In some embodiments, the orthogonoality threshold Eth is 
equal to 0.955. In some embodiments, when Eth=0.955, 40% 
be a NtxNr upper triangular matrix. In another exemplary 
embodiment, preprocessing the channel matrix to form at 30 
least an upper triangular matrix 110 can be preprocessing the 
channel matrix to form a unitary matrix, an upper triangular 
matrix, and a unimodular matrix. In another embodiment of 
the present invention, the preprocessing the channel matrix 
110 can be done by QR-decomposition 
In another embodiment, the lattice reduction method 100 
can comprise implementing a size reduction process on ele-
ments of the upper triangular matrix 115. In yet another 
embodiment, the implementing a size reduction process on 
elements of the upper triangular matrix 115 can be imple- 40 
menting a relaxed size reduction process on elements of the 
upper triangular matrix. In some embodiments, the relaxed 
size reduction process can comprise choosing a first relaxed 
size reduction parameter for a first-off-diagonal element of 
the upper triangular matrix. The first relaxed size reduction 45 
can be equal for all elements of the first off diagonal of the 
upper triangular matrix. 
35 of the channel matrices can be processed with at LR method 
or system to achieve 0.2 dB gap to ideal CLLL-MMSE-SIC 
detection. 
In another embodiment of the present invention, the 
relaxed size reduction process can comprise choosing a sec-
ond relaxed size reduction parameter for a second-off-diago- 50 
nal element of the upper triangular matrix. In an exemplary 
embodiment the value of the second relaxed size reduction 
parameter is greater than the value of the first relaxed size 
reduction parameter. In another exemplary embodiment of 
the present invention, the relaxed size reduction parameter, 55 
<Pn k is equal to 3/2 for all off diagonal elements except those 
In some embodiments of the present invention, when 
choosing design parameter values that affect computation 
precision of the hardware implementation, no loss of preci-
sion occurs while implementing a size reduction process 115 
because there is no expansion in the number of fraction bits. 
In some embodiments, an expansion of fraction bits occurs 
when a basis update occurs, which can involve computation 
of a 2x2 unitary matrix, which can be denoted by 8, and 
application of this matrix on an upper triangular matrix, 
which can be denoted by R, and a second unitary matrix, 
which can be denoted by Q. If Eth =0.955, a desired BER 
performance can be maintained with 13 fraction bits to rep-
resent the R and Q matrices, nine Householder CORDIC 
iterations, and up to 15 lattice reduction iterations. In some 
embodiments, the integer-rounded divider LUT requirements 
can be based on the R fraction bit choice. In some embodi-
ments y=5 and a=6 when ~.a <0.001. 
Developing a suitable top-level architecture for the LR 
systems and methods can be complicated by the fact that the 
dataflow of the systems and methods can be dynamic-each 
random channel matrix can result in a different sequence of 
memory accesses and operations. Careful inspection of the 
on" the first-off-diagonal. In some embodiments, when <Pn k is 
equal to 3/2 for all off diagonal elements except those on° the 
first-off-diagonal, size reduction on the elements where <Pn k is 
equal to 3/2 occurs approximately 6.6% ofall lsnsk-2 in"ner 
loop iterations (Lines 4-7 of Table 1). In some other embodi-
ments, the maximum encountered~ [u], :S[u] is ten. 
In some embodiments, by choosing a first relaxed size 
reduction parameter <Pk-I k equal to 0.51 and assuming G=15, 
which results in B being upper bounded by 7 .31, an integer 
rounded divider is designed such that My.a <0.001. In some 
embodiments, by representing u with 11 bits and allowing a 
60 systems and methods, however, indicates that that operations 
on T and g, in some embodiments, only depend on the gen-
erated u values from size reduction operations and operations 
on Q only depend on the El's generated from basis updates. 
Therefore, some embodiments of the present invention relate 
65 to a LR system comprising a master processor, a first slave 
processor, and a second slave processor. The master processor 
can be in indirect communication with the first slave proces-
US 8,559,544 B2 
23 
sor by way of at least a first first-in first-out (FIFO) queue. The 
master processer can be configured to transmit a complex-
integer output, which can be denoted by u, to the first FIFO 
queue. The first slave processor can be configured to receive 
the complex integer output from the first FIFO queue and 
process a unimodular matrix, which can be denoted by T. The 
master processor can also be in indirect communication with 
the second slave processor by way of at least a second FIFO 
queue. The master processor can be configured to transmit a 
2x2 unitary matrix output to the second FIFO queue. The 10 
second slave processor can be configured to receive the 2x2 
unitary matrix from the second FIFO queue and process a 
second unitary matrix, which can be denoted by Q. In some 
embodiments of the present invention, the first and second 15 
FIFO queues track the LR system state separately. 
In some embodiments of the present invention, separate 
multiplier pipeline structures exist for the each processor. In 
other embodiments, the master processor, first slave proces-
sor, and second slave processor share a multiplier pipeline 20 
structure. Because the generated complex-integer output val-
ues can be sparse when the relaxed size reduction condition is 
used and only a fraction of all LR iterations require a basis 
update process to be performed, it is advantageous toward 
high multiplier utilization to choose a shared multiplier/ac- 25 
cumulator structure with arbitration. In some embodiments, a 
multiplier pipeline that implements complex multiplication 
via separate real and imaginary component multiplication can 
24 
In some embodiments, at the beginning of each LR itera-
tion, a main controller can direct the Rk-l k element to be sent 
to the CORDIC pipeline support from either the R memory or 
a forwarding path. The integer-rounded divider can receive 
this element and begin reciprocation, or it can reuse a stored 
reciprocal. 
In some embodiments, at the beginning of each LR itera-
tion, a main controller can direct the Rk k element to be sent to 
the CORDIC pipeline support from either the Rmemory or a 
forwarding path. This module can then begin evaluating the 
Siegel condition according to Equation 37. 
In some embodiments, during the step of implementing a 
size reduction process 115, the Rk-l k element can be for-
warded from the Ru- 1 • k buffer to th~ dividend input of the 
introduced divider, and the relaxed size reduction condition 
evaluation can be initiated in this module. The real and imagi-
nary components can require one cycle for this evaluation 
operation. The results of the evaluation can be written into a 
small table that an R execution accesses. When this table 
indicates that a nonzero u has been generated, the R execution 
can begin fetching the required R column from the R memory 
and simultaneously issuing single u component multiplica-
tions to the multiplier pipeline, starting with the (k-1 )-th row. 
The multiplier results can then be sequentially added via an 
add-1 adder to their corresponding elements in the Ru-l.k 
buffer as they exit the multiplier pipeline, and the buffer can 
be updated with these new values. As the updated Rk-l.k be used to exploit the low frequency of fully complex-integer 
output values. 30 element is written to the buffer, it can be simultaneously 
forwarded to the integer-rounded divider. Because the inte-
ger-rounded divider can already contain the diagonal element 
and reciprocal (stored in caches), size reduction on the next R 
FIG. 4 provides a block diagram of an exemplary embodi-
ment of a lattice reduction system 400. The master processor 
can comprise the shared multiplier pipeline, column accumu-
lator, and all remaining modules in the diagram except the 
first slave processor, labeled as "T Processor," and the second 35 
slave processor, labeled as "Q Processor." 
Some embodiments of the LR systems 400 and methods 
100 operate heavily on a single column each iteration while 
implementing a size reduction process 115. Therefore, in 
some embodiments, the master processor is based around a 40 
partial column buffer that stores the Ra_ 1 k intermediate size 
reduction results. This choice is adv~t~geous because, in 
some embodiments, the R elements magnitude upper bound 
while implementing a size reduction process 115 can be 
greater than the R elements magnitude upper bound at both 45 
the start and end of the size reduction process. Therefore, in 
the partial column buffer architecture, the R memory need 
only be sufficiently wide to represent Rat the start of the size 
reduction process. 
element can then begin as the remaining elements complete. 
This process can continue until size reduction on the k-th 
column is complete. The gradual write-back of the R1 ·k-l k 
buffer elements to the R memory can be overlapped with this 
operation. 
In some embodiments, operation of the CORDIC pipeline 
support and the CORDIC pipeline is concurrent to imple-
menting a size reduction process 115. If the CORDIC pipe-
line support indicates that a Siegel condition is true, then the 
main controller can be signaled that k can be incremented. 
The next iteration can be initiated once the size reduction 
process is complete, or the lattice reduction method can be 
terminated if all Siegel conditions are now satisfied. If the 
Siegel condition is not satisfied, then the CORDIC pipeline 
can wait until either the size-reduced Rk-l.k element is for-
warded or the integer-rounded divider indicates that size 
reduction on this element is not required. The 8 (Line 10 of 
Table 1) calculation can then begin because the necessary 
operands have already been speculatively loaded at the start 
of the iteration. Once the specified number of CO RD IC itera-
tions have been completed, the uncompensated Cllvll result 
from Equation 34 streams out of the CORDIC pipeline to the 
CORDIC pipeline support for gain compensation. This can be 
followed by three cycles of the 8 elements streaming out to 
In some embodiments, the master processor can be addi- 50 
tionally based around a single-port, single complex entry 
memory for storing R. Address mapping can be employed for 
column swapping. The datapath modules can include the 
shared multiplier pipeline structure, an integer-rounded 
divider, and a Householder CORDIC architecture that has 55 
been partitioned into a CORD IC pipeline support (secondary 
datapath) and a CORDIC pipeline. Parallel operation among 
these modules can be enabled through a combination of for-
warding paths, speculative execution, and reordering of the 
original CLLL algorithm. 
In some embodiments, at the beginning of each LR itera-
tion, a main controller can direct the R1 ·k-l k buffer in the 
column accumulator to be loaded from th~ R i:nemory. Alter-
natively, the main controller can direct the current contents of 
the buffer from the previous LR iteration to be reused ifk,.2 65 
and a basis update was performed during the previous itera-
60 the CORDIC pipeline support, the R update, and the second 
ti on. 
slave processor (Q processor). As these elements input into 
the CORDIC pipeline support, the elements can be appropri-
ately signed and multiplied by the buffered Rk-l.k-l element to 
form the updated elements of the (k-1)-th column (due to a 
basis update) in Line 11 of Table 1. These elements and the 
computed llvll can then be sent to the main controller for 
US 8,559,544 B2 
25 
write-back. If there are remaining R elements that must be 
updated, then the main controller can mark these elements as 
"pending" in the scoreboard structure, trigger the R update to 
compute these remaining updates, decrement k, and effec-
tively swap the R by updating the address mapping register. 
Concurrent with this operation can be the reevaluation of 
affected Siegel conditions. 
26 
complete Q basis updates independently of the master pro-
cessor state. Partial 8 multiplication results can be accumu-
lated in the Qu register located in the column accumulator. 
In some embodiments, as the main controller initiates the 
next iteration, the R update can gradually fetch required ele- 10 
In some embodiments, to prevent the first FIFO queue or 
second FIFO queue entries from being overwritten before 
being processed, "nearly full" status flags can be included in 
the FIFO queues. For the first slave processor, this flag can be 
first asserted when the FIFO only has a sufficient number of 
empty entries to store the maximum number of possible non-
zero complex-integer u values generated during a single LR 
iteration. For the second slave processor, this flag can be first 
asserted when the FIFO only has a sufficient number of empty 
entries to store one additional 8. Therefore, in some embodi-
ments from the R memory and issue two multiplications to the 
multiplier pipeline when access is granted by the multiplier 
arbitration module. Therefore, each R updated by a 8 multi-
plication can require three accesses to this module. The Ru 
register and complex add-1 adder in the colunm accumulator 
can accumulate the partial 8 multiplication results from the 
multiplier pipeline. Upon the final accumulation for a particu-
lar element, the add-1 adder output can be written back to the 
R memory and the corresponding scoreboard entry can be 
updated. 
In some embodiments, the main controller comprises a 
memory arbiter. A memory arbiter can be advantageous 
because multiple modules can access the R memory and basis 
updates on R be overlapped with subsequent LR iterations. If 
no data dependency is present, then the highest priority can be 
assigned to memory reads associated with size reduction and 
the lowest priority can be assigned to memory read requests 
from the R If, instead, the scoreboard indicates that a cur-
rently requested element is "pending," then the master pro-
cessor can stall and the R read requests can be promoted to the 
highest priority. The master processor can remain in this 
"priority inversion" state until the dependency is resolved. 
In some embodiments, the first slave processor is based on 
the hardware structures in the master processor that handle 
the size reduction process. The first slave processor can issue 
operations to the multiplier pipeline to implement Lines 6-7 
of Table 1. The first slave processor can contain a single-port 
memory to store an augmented matrix that comprises T con-
catenated with g (If MMSE processing is desired). The first 
FIFO queue can contain both non-zero complex integer u 
values and control flags that indicate the state of the master 
processor when a u was generated. These can include flags 
that indicate ifthe currently retrieved u was the last u gener-
ated for that lattice reduction iteration and if k was incre-
15 ments of the LR systems or methods, the master processor is 
configured to stall if the first FIFO queue or the second FIFO 
queue has a minimum number of empty entries. In an exem-
plary embodiment of the present invention, the first FIFO 
queue has a depth ofl 6. In another exemplary embodiment of 
20 the present invention, the second FIFO queue has a depth of 
mne. 
In some embodiments of the present invention, arbitration 
can be used to handle contention for access to the shared 
25 multiplier pipeline among the various modules. In some 
embodiments, the first slave processor utilizes the multiplier 
pipeline structure when the master processor is not utilizing 
the multiplier pipeline structure. In some embodiments, the 
30 
second slave processor utilizes the multiplier pipeline struc-
ture when the master processor and the first slave processor 
are not utilizing the multiplier pipeline structure. In some 
exemplary embodiments, the multiplier pipeline structure has 
a utilization rate exceeding 80%. In some embodiments, an 
35 arbitration scheme can be adopted such that the master pro-
cessor can progress through lattice reduction iterations as 
quickly as possible. Therefore, in some embodiments, when 
no data dependency exists, a multiplier arbitration module 
40 can assign highest priority to the R execution module fol-
lowed by the first slave processor, the R module, and the 
second slave processor. In some embodiments, when a data 
dependency exists ("priority inversion"), requests from the R 
update module can be promoted to highest priority, and 
45 
requests from the R execution module can be demoted to 
lowest priority. 
mented or decremented during that iteration. These flags, 
when combined with internal address mapping for column 
swapping, independent tracking ofk, and the separate single- 50 
port memory, can allow the first slave processor to operate 
independent of the master processor state. In addition, each 
entry in the first FIFO queue can contain flags that can indi-
cate if the currently retrieved u has real or imaginary compo-
nents equal to ±1. Hence, the first slave processor can issue 55 
single u component multiplications or utilize the trivial ±1 
multiplication path in the multiplier pipeline. Results of these 
integer operations can be accumulated using an add-2 adder 
and a straightforward shift register (Tk buffer) in the colunm 60 
accumulator. 
In some embodiments of the present invention, overlapped 
execution of the LR system can occur on multiple channel 
matrices among the three processors. This can be accom-
plished by employing two banks of memory in each proces-
sor. As a processor is operating on one memory bank, the 
other memory bank can be simultaneously filled with the next 
matrix/vector associated with the next channel matrix to pro-
cess. Then, once the processor completes operations on the 
current memory bank, it can simultaneously output the cur-
rent memory bank contents and immediately begin process-
ing on the other memory bank (assuming the FIFO queues are 
not empty in the first slave and second slave processor case). 
In an exemplary embodiment of the present invention, the 
LR systems and methods can be implemented in Verilog. In 
another exemplary embodiment, hardware realization can be 
completed using an FPGA flow comprising Synplify Pro for 
In some embodiments, the second slave processor can be 
substantially similar to the R except that the 8 parameters can 
retrieved from a second FIFO queue. The second FIFO queue 
can also contain an entry for the value ofk associated with the 
currently accessed 8. Because the second slave processor can 
also contain a separate, single-port memory for Q, it can 
65 synthesis and Xilinx ISE 9 .1 for place-and-route (PR). Tables 
3 and 4 summarize hardware realization results for variety of 
FPGA targets. 
US 8,559,544 B2 
27 28 
TABLE3 
Comparison of Hardware Realization Results for a Variety ofFPGA Targets 
A Conventional A Conventional 
Realization of 
Clarkson's 
Algorithm 
Realization of 
Seysen's 
Algorithm Exemplary Embodiments of Present Invention 
Platform XC2VP30-7 65 nm ASIC XC2VP30-7 XC2VLX110-3 XC4VLX80-12 
Multipliers 
Hardware Use 
Clock(MHz) 
cycles per matrix 
24 
7,349 slices 
100 
420 avg. 
4 4 4 
3,571 slices 
173 
67 ,000 gates 
400 
3,640 slices 1,758 slices 
140 206 
1368 worst-case 49 avg., 96 system avg., 447 worst-case 
TABLE4 
Distribution of Required Slices 
First Slave Processor 
Integer-Rounded Divider 
Master Processor 
CORDIC Pipeline Support 
CORDIC Pipeline 
Multiplier Pipeline 
Column Accumulator 
Second Slave Processor 
8% 
14% 
15% 
8% 
27% 
5% 
17% 
6% 
FIG. 5 provides BER results obtained by some embodi-
ments of the present invention in comparison to conventional 
LR systems and methods. The BER performance of previ-
ously implemented algorithms in FIG. 5 was obtained from 
ideal algorithm models (unlimited iterations and floating-
point precision). From FIG. 5, it is shown that by implement-
ing the LR systems and methods of the present invention on 
only 40% of all channel matrices on average, a considerable 
BER performance improvement is achieved over the MMSE 
detection of some conventional systems. FIG. 5 also illus-
trates that some embodiments of the present invention 
achieve a 5 dB improvement in BER performance compared 
to some conventional systems that employ Brun's algorithm 
and are within 1. 5 dB of optimal ML detection. 
Some embodiments of the present invention have also been 
evaluated from a system perspective by simulating the packet 
structure of an 802.1 ln system in Mixed Mode. The OFDM 
symbol length in this case is 4 µs, and there are 52 sub-
carriers. In the simulation, it was assumed the sorted QR-
decomposition of the channel matrix for each sub-carrier is 
completed just at the corresponding symbol vector associated 
with that sub-carrier in the first OFDM symbol is received. 
The simulation measured the latency of the at the end of the 
first transmitted OFDM symbol, used the Virtex5 synthesis 
results, andsetE,h=0.955. Simulations of this system configu-
ration indicate that the probability of the latency exceeding 
12.08 µs (3.02 OFDM symbols) is 0.5%, and the average 
latency is 5.7 µs. Hence, a LR processor with an OFDM 
symbol buffer is sufficient to handle medium to large size 
packets (10-100 OFDM symbols). To handle smaller packets, 
either additional LR processors can be adopted or the E,h and 
s can be dynamically adjusted, which can reduce the com-
plexity and latency. Adoption of this adaptive technique may 
require that the secondary datapath in FIG. 2 incorporate 
multiple Siegel condition approximations, which could be 
straightforwardly implemented using multiplexers on the 
inputs of the adders. 
15 
20 
ti on while 3 .42 µs for the Seysen' s algorithm implementation. 
The exemplary embodiment can achieve this 37% reduction 
in worst case latency with using about half the number of 
multipliers as the Seysen implementation. 
Further, an exemplary embodiment of the present invention 
requires less than an eighth of the processing cycles compared 
to some conventional systems employing an Implementation 
of Clarkson' s algorithm. The significant improvement over is 
achieved a number of ways. First, the conventional imple-
25 mentation ofClarkson's algorithm utilizes a shared division 
unit for both computing u values (via reciprocation multipli-
cation) and computing 8 matrices. An exemplary embodi-
ment instead utilizes a reduced-precision reciprocation unit in 
addition to a collection of comparators for detecting trivial u 
30 values. In addition, the reciprocals are sufficiently accurate 
for use in the subsequent SIC detection step. Second, an 
exemplary embodiment uses a relaxed the size reduction con-
dition on the R elements as opposed to eliminating size reduc-
35 tion operations, as done in the conventional implementation 
ofClarkson's algorithm. This allows the exemplary embodi-
ment to upper bound the R elements during the LR process-
ing. The slight increase in the number of size reduction opera-
tions is more than compensated by the efficient utilization of 
40 the multiplier pipeline structure in the an exemplary embodi-
ment, which can be over 80% for the system. Third, the 
3-dimension Householder CORDIC algorithm employed in 
an exemplary embodiment requires only one sequence of 
vectoring iterations, while the 2-dimension rotation-based 
45 CORD IC unit in the conventional implementation of Clark-
son's algorithm requires two sequences of vectoring itera-
tions. The unrolling inherent in the exemplary embodiment's 
Householder CORD IC architecture, which supports the con-
current 8 computation, results in the CORDIC pipeline 
50 requiring the largest percentage of hardware resources (as 
shown in Table II). The critical path of the CORDIC pipeline, 
which limited by the achievable clock frequency in some 
conventional systems, is improved by over 20%, and this 
module can be easily shared among LR processors. Fourth, an 
55 exemplary embodiment modifies how a Siegel condition is 
computed in conventional systems. In an exemplary embodi-
ment, a low complexity approximation is employed that 
results in a negligible degradation in BER performance. As a 
result, the exemplary embodiment is able to re-evaluate Sie-
60 gel conditions rapidly without multiplication and use this 
information to terminate the LR system earlier. 
It is to be understood that the embodiments and claims of 
From Table 3, it is shown that current embodiments of the 
present invention achieve considerable improvement over 65 
conventional systems. The channel matrix processing latency 
this invention are not limited to wireless MIMO communica-
tion systems, but as those of ordinary skill in the art would 
understand, the systems and methods of the present invention 
may be used in a large majority of MIMO communication 
is 2.17 µsin an exemplary embodiment of the present inven- systems. 
US 8,559,544 B2 
29 
It is further to be understood that the embodiments and 
claims are not limited in their application to the details of 
construction and arrangement of the components set forth in 
the description and illustrated in the drawings. Rather, the 
description and the drawings provide examples of the 
embodiments envisioned. The embodiments and claims dis-
closed herein are further capable of other embodiments and of 
being practiced and carried out in various ways. Also, it is to 
be understood that the phraseology and terminology 
employed herein are for the purposes of description and 10 
should not be regarded as limiting the claims. 
Accordingly, those skilled in the art will appreciate that the 
conception upon which the application and claims are based 
may be readily utilized as a basis for the design of other 
structures, methods, and systems for carrying out the several 15 
purposes of the embodiments and claims presented in this 
application. It is important, therefore, that the claims be 
regarded as including such equivalent constructions. 
30 
4. The lattice reduction method according to claim 1, 
wherein implementing a relaxed size reduction process fur-
ther comprises performing a relaxed size reduction process 
using only addition operations and an integer rounded divider 
based on a single Newton-Raphson iteration if the first or 
second relaxed size reduction condition is not satisfied. 
5. A lattice reduction method for a multiple-input multiple-
output communication system, the method comprising: 
providing a channel matrix corresponding to a channel in a 
multiple-input multiple-output communication system; 
preprocessing the channel matrix to form at least an upper 
triangular matrix; 
performing a size reduction process on elements of the 
upper triangular matrix; and 
implementing a rapid basis update process on diagonal 
elements in the upper triangular matrix, comprising 
choosing an efficient Siegel condition factor; and 
evaluating whether a first Siegel condition is satisfied 
between a first pair of adjacent diagonal elements of 
the upper triangular matrix comprising a first diagonal 
element and a second diagonal element without using 
multiplication operations, the first Siegel condition 
based in part on the efficient Siegel condition factor. 
6. The lattice reduction method according to claim 5, 
Furthermore, the purpose of the foregoing Abstract is to 
enable the U.S. Patent and Trademark Office and the public 20 
generally, and especially including the practitioners in the art 
who are not familiar with patent and legal terms or phraseol-
ogy, to determine quickly from a cursory inspection the 
nature and essence of the technical disclosure of the applica-
tion. The Abstract is neither intended to define the claims of 
the application, nor is it intended to be limiting to the scope of 
the claims in any way. It is intended that the application is 
defined by the claims appended hereto. 
25 wherein the implementing a rapid basis update process fur-
ther comprises evaluating whether a second Siegel condition 
is satisfied between a second pair of adjacent diagonal ele-
ments of the upper triangular matrix comprising the second 
diagonal element and a third diagonal element without using What is claimed is: 
1. A lattice reduction method for a multiple-input multiple-
output communication system, the method comprising: 
providing a channel matrix corresponding to a channel in a 
multiple-input multiple-output communication system; 
preprocessing the channel matrix to form at least an upper 
triangular matrix; 
30 multiplication operations. 
7. The lattice reduction method according to claim 6, 
wherein the implementing a rapid basis update process fur-
ther comprises evaluating whether a third Siegel condition is 
satisfied between a third pair of adjacent diagonal elements of 
implementing a relaxed size reduction process on elements 
of the upper triangular matrix, comprising: 
35 the upper triangular matrix comprising the third diagonal 
element and a fourth diagonal element without using multi-
plication operations. 
choosing a first relaxed size reduction parameter for a 
first-off-diagonal element of the upper triangular 
matrix; 
choosing a second relaxed size reduction parameter, 
which is greater than the first relaxed size reduction 
parameter, for a second-off-diagonal element of the 
upper triangular matrix; 
40 
8. The lattice reduction method according to claim 7, 
wherein the efficient Siegel condition factor is 2.06640625. 
9. The lattice reduction method according to claim 7, 
wherein a state machine tracks at least whether the first Siegel 
condition is satisfied, whether the second Siegel condition is 
satisfied, and whether the third Siegel condition is satisfied, 
and the lattice reduction method further comprises terminat-
evaluating whether a first relaxed size reduction condi-
tion is satisfied for the first-off-diagonal element of 
the upper triangular matrix, the first relaxed size 
reduction condition based in part on the first relaxed 
size reduction parameter; and 
45 ing when the state machine indicates that the first, second, and 
third Siegel conditions are satisfied. 
evaluating whether a second relaxed size reduction con- 50 
dition is satisfied forthe second-off-diagonal element 
of the upper triangular matrix, the second relaxed size 
reduction condition based in part on the second 
relaxed size reduction parameter; and 
implementing a basis update process on diagonal elements 55 
of the upper triangular matrix. 
2. The lattice reduction method according to claim 1, 
wherein implementing a relaxed size reduction process fur-
ther comprises performing a relaxed size reduction process 
using an integer rounded divider based on a single Newton- 60 
Raphson iteration if the first or second relaxed size reduction 
condition is not satisfied. 
3. The lattice reduction method according to claim 1, 
wherein implementing a relaxed size reduction process fur-
ther comprises performing a relaxed size reduction process 65 
using only addition operations if the first or second relaxed 
size reduction condition is not satisfied. 
10. The lattice reduction method according to claim 7, 
wherein a first state machine tracks whether at least the first 
Siegel condition is satisfied, a second state machine tracks 
whether at least the second Siegel condition is satisfied, and a 
third state machine tracks whether at least the third Siegel 
condition is satisfied, and the lattice reduction method further 
comprises terminating when at least the first, second, and 
third state machines indicate that at least the first, second, and 
third Siegel conditions are satisfied. 
11. The lattice reduction method according to claim 7, 
wherein the evaluating whether the first Siegel condition is 
satisfied, the evaluating whether the second Siegel condition 
is satisfied, and the evaluating whether the third Siegel con-
dition is satisfied are performed using a comparator and two 
adders. 
12. The lattice reduction method according to claim 7, 
wherein the evaluating whether the first Siegel condition is 
satisfied, evaluating whether the second Siegel condition is 
satisfied, and evaluating whether the third Siegel condition is 
satisfied each occur while memory of the upper triangular 
matrix is being filled. 
US 8,559,544 B2 
31 
13. The lattice reduction method according to claim 7, 
wherein implementing a rapid basis update process further 
comprises performing a basis update process and evaluating 
at least the first and third Siegel conditions, if the second 
Siegel condition is not satisfied. 
32 
18. A lattice reduction system, comprising: 
a master processor configured to transmit a complex-inte-
ger output to a first First-In First-Out ("FIFO") queue 
and a 2x2 unitary matrix output to a second FIFO queue; 
14. A lattice reduction method for a multiple-input mul-
tiple-output communication system, the method comprising: 
providing a channel matrix corresponding to a channel in a 
multiple-input multiple-output communication system; 
preprocessing the channel matrix to form at least an upper 10 
triangular matrix; 
a first slave processor in indirect communication with the 
master processor by way of at least the first FIFO queue 
and configured to receive the complex-integer output 
from the first FIFO queue and process a unimodular 
matrix; and 
a second slave processor in indirect communication with 
the master processor by way of at least the second FIFO 
queue and configured to receive the 2x2 unitary matrix 
output from the second FIFO queue and process a sec-
ond unitary matrix, 
performing a size reduction process on elements of the 
upper triangular matrix; and 
wherein the master processor, the first slave processor, and 
the second slave processor utilize a single multiplier 
pipeline structure. 
implementing an iterative basis update process on elements 15 
in the upper triangular matrix, comprising computing a 
2x2 unitary matrix using a number of vectoring itera-
tions, 
19. The lattice reduction system according to claim 18, 
wherein the master processor, the first slave processor, and the 
second slave processor each comprises two memory banks 
20 configured so that multiple charmel matrices may be pro-
cessed concurrently. 
wherein the computing the 2x2 unitary matrix is completed 
in a number of cycles equal to a number of pipeline 
stages plus the number of vectoring iterations minus 
one. 
15. The lattice reduction method according to claim 14, 
wherein the computing a 2x2 unitary matrix using a number 
20. The lattice reduction system according to claim 18, 
configured so that the multiplier pipeline structure has a uti-
lization rate greater than 80%. 
of vectoring iterations employs a single iteration per cycle 25 
Householder CORDIC architecture that has been unrolled 
21. The lattice reduction system according to claim 18, 
wherein the master processor is configured to stall if the first 
FIFO queue or the second FIFO queue has a minimum num-
ber of empty entries. 
and includes the number of pipeline stages configured to 
concurrently execute at least one vectoring iteration and at 
least one rotation iteration. 
16. The lattice reduction method according to claim 15, 
wherein each of the number of pipeline stages operates in a 
vectoring mode or a rotation mode. 
17. The lattice reduction method according to claim 15, 
wherein each of the number of pipeline stages comprises at 
least one multiplexer with an input and a plurality of outputs 
equivalent to the number of vectoring iterations divided by 
the number of pipeline stages. 
22. The lattice reduction system according to claim 18, 
30 wherein the first slave processor utilizes the multiplier pipe-
line structure when the master processor is not utilizing the 
multiplier pipeline structure. 
23. The lattice reduction system according to claim 18, 
wherein the second slave processor utilizes the multiplier 
35 pipeline structure when the master processor and the first 
slave processor are not using the multiplier pipeline structure. 
* * * * * 
