Hardwarebeschleuniger für Interference Alignment in In-House Mehrbenutzer- Kommunikationssystemen by Kock, Markus & Blume, Holger
Institut für Mikroelektronische Systeme
ITG Workshop “Sound, Vision & Games”, 22. September 2015, Hannover
Hardwarebeschleuniger für Interference 
Alignment in In-House Mehrbenutzer-
Kommunikationssystemen
Markus Kock, Holger Blume
Markus Kock, 22.09.2015 Slide 2
Institute of Microelectronic Systems
 Interference Alignment
 increased channel capacity in multi-user scenarios
 Physical layer technique







Markus Kock, 22.09.2015 Slide 3
Institute of Microelectronic Systems
 Dedicated hardware accelerator for Minimum Mean Square Error  
(MMSE) Interference Alignment (IA)
 Digital baseband processing
 Low-latency real-time operation (latency < 1 ms)
Objectives
Markus Kock, 22.09.2015 Slide 4
Institute of Microelectronic Systems
Multi-User MIMO Communication System, TDMA
 MIMO spatial multiplexing: multiple antennas per user, one
data stream per antenna
 Channel capacity shared by all users
Markus Kock, 22.09.2015 Slide 5
Institute of Microelectronic Systems
Multi-User MIMO Communication System, IA
 MIMO spatial multiplexing: multiple antennas per user, one
data stream per antenna
 Channel capacity shared by all users
 Interference Alignment: simultaneously transmitting users
Markus Kock, 22.09.2015 Slide 6
Institute of Microelectronic Systems















 Channel capacity scales with
number of users K
Markus Kock, 22.09.2015 Slide 7
Institute of Microelectronic Systems
 Scenario: multi-user point-to-point communication system
 Linear precoding and decoding at TX and RX, respectively





































    Y U H V X n
K
T
k k kj j j k
j
Received signal:
 K: #users, d: datastreams / user
Nt: antennas / TX, Nr: antennas / RX
 Problem formulation:
Determine ܄௞	and ܃௞	for given ۶௞௝
 Several approaches feasible:
Max SINR, Max Sum-Rate, MMSE, …
Markus Kock, 22.09.2015 Slide 8
Institute of Microelectronic Systems
 Channel coherence time depends on scenario
 Precoding matrices need to be adapted to the channel within 
channel coherence time
 High data throughput AND low-latency realtime computation 
required
  Low-latency computation of ܄௞	and ܃௞	(< 1 ms)
 Additional system latencies: channel estimation, transmit CSI,
distribute ܄௞	and ܃௞
Fast-changing channels
Markus Kock, 22.09.2015 Slide 9
Institute of Microelectronic Systems
 MMSE criterion: minimize overall interference + noise
Algorithm[1]:
1. Start with arbitrary ܄௞
2. Update
3. Compute system MSE
4. Repeat steps 2 and 3 until convergence
Lagrange multiplier	ߣ௞ iteratively determined













    
    


U H V V H I H V
V H U U H I H U
K
H H




k jk j j jk k kk k
j
[1]: D. Schmidt, C. Shi, R. Berry, M. Honig, and W. Utschick, “Minimum Mean Squared Error Interference Alignment”, 2009
2 1Vk F
Markus Kock, 22.09.2015 Slide 10
Institute of Microelectronic Systems















































Data dependencies Maximum parallel HW data flow
Markus Kock, 22.09.2015 Slide 11
Institute of Microelectronic Systems
Dedicated accelerator for
integration in SDR SoCs
Hardware System Architecture
Top-level
 All communication via OCP or AXI 
on-chip busses




 Variable number of processing
elements (PE) for computing ܄௞
and ܃௞
 Controller
Markus Kock, 22.09.2015 Slide 12
Institute of Microelectronic Systems
 Compute V or U for one user at a 
time (mode select)
 Main complexity: Gaussian
elimination, shared by V and U 
modes
 Mode V
 Iterative root-finding for ߣ௞
 Mode U
 No iterations required
Processing Element
PE detail
Markus Kock, 22.09.2015 Slide 13
Institute of Microelectronic Systems
 Inner loop contains matrix inversion
 Solve equation system instead
 Candidates: SVD, LU, QR, …
 Criterion: low-latency
 Gaussian elimination
 Small matrix sizes sufficient precision
 Latency: one multiplication per eliminated unknown








     





k kj j j kj kk k k k
j
k k k
Markus Kock, 22.09.2015 Slide 14
Institute of Microelectronic Systems
Augmented System: 
 Variation of Gaussian elimination
 Integer-preserving, division-free (elimination loop)
 Eliminate two unknowns per step
 Row-wise normalization after each elimination step
 One final division required per result coefficient
Two-Step Bareiss Algorithm
   
11 12 1 11 1
21 22 2 21 2
1 2 1
| |
       
Q B I U
 
 





N N NN N Nd
q q q b b
q q q b b
q q q b b
Q U Bk k k
Markus Kock, 22.09.2015 Slide 15
Institute of Microelectronic Systems
Two-Step Bareiss Algorithm Result
11 1 11 1





       
   
   





q q b b
q q b b
q b b






      
 
 








 Repeat to obtain diagonal form
Final division required for
each result coefficient u
One common factor per row
 Skip multiplication
Fewer operations compared
to single step elimination
Markus Kock, 22.09.2015 Slide 16
Institute of Microelectronic Systems
 Fixed-point representation
 Only multiplications and additions/subtractions required
 Data shifted diagonally by two elements per elimination step







Markus Kock, 22.09.2015 Slide 17
Institute of Microelectronic Systems
 Eliminate two unknowns
 Critical path: 2 MUL + 4 ADD
 Row-wise block renormalization
(shift) after each elimination step
Systolic Array Processor Critical Path
( )AB CD E  
Markus Kock, 22.09.2015 Slide 18
Institute of Microelectronic Systems
 Channel capacity within 0.1% vs. floating-point MATLAB reference
 FPGA synthesis
 Target: Xilinx Virtex-6 XC6VLX550T-2
 Software: ISE 14.7
 Clock constraint: 50 MHz
 Latency 520 μs for worst-case system (Nt = Nr = 11, K = 19) 




51531 7.50% 81560 23.73% 364 42.13%
1
20942 3.05% 32702 9.52% 148 17.13%
1 16585 2.41% 25682 7.47% 80 9.26%
5 3 1 1
3 32980 4.80% 56974 16,58% 330 38,19%
1 24916 3.62% 43514 12.66% 232 26,85%
Implementation Results
Markus Kock, 22.09.2015 Slide 19
Institute of Microelectronic Systems
 Hardware acceleration required for very low-latency MMSE-IA
 Resource requirements prohibitive for large system configurations
 Worst-case processing latency < 520 μs is achievable
Conclusion
Markus Kock, 22.09.2015 Slide 20
Institute of Microelectronic Systems
Thank you for your attention!
Questions?
