Design and Implementation of a Polar Codes Blind Detection Scheme by Condo, Carlo et al.
1Design and Implementation of a Polar Codes Blind
Detection Scheme
Carlo Condo, Member, IEEE, Seyyed Ali Hashemi, Student Member, IEEE, Arash Ardakani,
Furkan Ercan, Student Member, IEEE, Warren J. Gross, Senior Member, IEEE
Abstract—In blind detection, a set of candidates has to be
decoded within a strict time constraint, to identify which trans-
missions are directed at the user equipment. Blind detection
is required by the 3GPP LTE/LTE-Advanced standard, and it
will be required in the 5th generation wireless communication
standard (5G) as well. Polar codes have been selected for use
in 5G: thus, the issue of blind detection of polar codes must
be addressed. We propose a polar code blind detection scheme
where the user ID is transmitted instead of some of the frozen
bits. A first, coarse decoding phase helps selecting a subset of
candidates that is decoded by a more powerful algorithm: an
early stopping criterion is also introduced for the second decoding
phase. Simulations results show good missed detection and false
alarm rates, along with substantial latency gains thanks to early
stopping. We then propose an architecture to implement the
devised blind detection scheme, based on a tunable decoder that
can be used for both phases. The architecture is synthesized
and implementation results are reported for various system
parameters. The reported area occupation and latency, obtained
in 65 nm CMOS technology, are able to meet 5G requirements,
and are guaranteed to meet them with even less resource usage
in the latest technology nodes.
I. INTRODUCTION
Blind decoding, also known as blind detection, requires the
receiver of a set of bits to identify if said bits compose a
codeword of a particular channel code. In 3GPP LTE/LTE-
Advanced standards blind detection is used by the user
equipment (UE) to receive control information related to the
downlink shared channel. The UE attempts the decoding of
a set of candidates, to identify if one of the candidates holds
its control information. Blind detection will be required in the
5th generation wireless communication standard (5G) as well:
ongoing discussions are considering a substantial reduction of
the time frame allocated to blind detection, from 16µs to 4µs.
Blind detection must be performed very frequently, and given
the high number of decoding attempts that must be performed
in a limited time [1], it can lead to large implementation costs
and high energy consumption. Blind detection solutions for
codes adopted in previous generation standards can be found
in [2]–[4].
Polar codes are a class of capacity-achieving error correcting
codes, introduced by Arıkan in [5]. They are characterized
by simple encoding and decoding algorithms, and have been
selected for use in 5G [6]. In [5], the successive-cancellation
C. Condo, S. A. Hashemi, A. Ardakani, F. Ercan and W. J. Gross are
with the Department of Electrical and Computer Engineering, McGill
University, Montre´al, Que´bec, Canada. e-mail: carlo.condo@mail.mcgill.ca,
seyyed.hashemi@mail.mcgill.ca, arash.ardakani@mail.mcgill.ca,
furkan.ercan@mail.mcgill.ca, warren.gross@mcgill.ca.
(SC) decoding algorithm has been proposed as well. It is
optimal for infinite code lengths, but its error-correction
performance degrades quickly at moderate and short code
lengths. In its original formulation, it also suffers from long
decoding latency. SC list (SCL) decoding has been proposed
in [7] to improve the error-correction performance of SC, at
the cost of increased decoding latency. In [8]–[11], a series
of techniques has been proposed, aimed at improving the
decoding speed of both SC and SCL without sacrificing error-
correction performance.
Blind detection of polar codes has been recently addressed
in [12], where a blind detection scheme fitting within 3GPP
LTE-A and future 5G requirements has been proposed. It
is based on a two-step scheme: a first SC decoding phase
helps selecting a set of candidates, subsequently decoded with
SCL. An early stopping criterion for SCL is also proposed
to reduce average latency. Another recent work on polar
code blind detection [13] detaches itself from 4G-5G standard
requirements, and proposes a metric on which the outcome of
the blind detection can be based.
In this work, we extend the blind detection scheme pre-
sented in [12] and its early stopping criterion by considering
SCL also in the first decoding phase, and provide improved
detection accuracy results. We then propose an architecture
to implement the blind detection scheme: it relies on an SCL
decoder with tunable list size, that can be used for both the first
and second decoding stages. The architecture is synthesized
and implementation results are reported for various system
parameters.
The rest of the paper is organized as follows. Section II
introduces background information on polar codes and blind
detection. Section III details the proposed blind detection
scheme, and provides simulation results to evaluate its per-
formance. The architecture of the blind detection system is
detailed in Section IV, and implementation results are given
in Section V. Finally, Section VI draws the conclusion.
II. PRELIMINARIES
A. Polar Codes
A polar code P(N,K) is a linear block code of length N =
2n and rate K/N , and it can be expressed as the concatenation
of two polar codes of length N/2. This is due to the fact
that the encoding process is represented by a modulo-2 matrix
multiplication as
x = uG⊗n, (1)
ar
X
iv
:1
80
1.
01
82
0v
1 
 [c
s.I
T]
  4
 Ja
n 2
01
8
2s = 4
s = 3
s = 2
s = 1
s = 0
Fig. 1: Binary tree example for P(16, 8). White circles at s =
0 are frozen bits, black circles at s = 0 are information bits.
where u = {u0, u1, . . . , uN−1} is the input vector, x =
{x0, x1, . . . , xN−1} is the codeword, and the generator matrix
G⊗n is the n-th Kronecker product of the polarizing matrix
G =
[
1 0
1 1
]
. The polarization effect brought by polar codes
allows to divide the N -bit input vector u between reliable and
unreliable bit-channels. The K information bits are assigned
to the most reliable bit-channels of u, while the remaining
N−K, called frozen bits, are set to a predefined value, usually
0. Codeword x is transmitted through the channel, and the
decoder receives the logarithmic likelihood ratio (LLR) vector
y = {y0, y1, . . . , yN−1}.
In the seminal work on polar codes [5], the SC decoder is
proposed. The SC-based decoding process can be represented
as a binary tree search, in which the tree is explored depth
first, with priority given to the left branches. Fig. 1 shows
an example of SC decoding tree for P(16, 8), where nodes at
stage s contain 2s bits. White leaf nodes are frozen bits, while
black leaf nodes are information bits.
Fig. 2 portrays the message passing among SC tree nodes.
Parents pass LLR values α to children, that send in return the
hard bit estimates β. The left and right branch messages αl
and αr, in the hardware-friendly version of [14], are computed
as
αli =sgn(αi)sgn(αi+2s−1)min(|αi|, |αi+2s−1 |), (2)
αri =αi+2s−1 + (1− 2βli)αi, (3)
while β is computed as
βi =
{
βli ⊕ βri, if i < 2s−1,
βri−2s−1 , otherwise,
(4)
where ⊕ denotes the bitwise XOR. The SC operations are
scheduled according to the following order: each node receives
α first, then sends αl, receives βl, sends αr, receives βr, and
finally sends β. When a leaf node is reached, βi is set as the
estimated bit uˆi:
uˆi =
{
0, if i ∈ F or αi ≥ 0,
1, otherwise,
(5)
where F is the set of frozen bits.
The SC decoding process requires full tree exploration:
however, in [8], [15] it has been shown that it is possible to
prune the tree by identifying patterns in the sequence of frozen
and information bits, achieving substantial speed increments.
s+ 1
s
s− 1
α β
αl
βl
βr
αr
Fig. 2: Message passing in tree graph representation of SC
decoding.
This improved SC decoding is called fast simplified SC (Fast-
SSC).
SC decoding suffers from modest error-correction perfor-
mance with moderate and short code lengths. To improve it,
the SCL algorithm was proposed in [7]. It is based on the
same process as SC, but each time that a bit is estimated at a
leaf node, both its possible values 0 and 1 are considered. A
set of L codeword candidates is stored, so that a bit estimation
results in 2L new candidates, half of which must be discarded.
To this purpose, a path metric (PM) is associated to each
candidate and updated at every new estimate: the L paths with
the lowest PM survive. In the LLR-based SCL proposed in
[16], the hardware-friendly formulation of the PM is
PMil =
{
PMi−1l , if uˆil =
1
2 (1− sgn (αil)) ,
PMi−1l + |αil |, otherwise,
(6)
where l is the path index and uˆil is the estimate of bit i at path
l. As with SC decoding, SCL tree pruning techniques relying
on the identification of frozen-information bit patterns have
been proposed in [9], [11], called simplified SCL (SSCL) and
Fast-SSCL.
B. Blind Detection
The physical downlink control channel (PDCCH) is used
in 3GPP LTE/LTE-Advanced to transmit the downlink control
information (DCI) related to the downlink shared channel. The
DCI carries information regarding the channel resource allo-
cation, transport format and hybrid automatic repeat request,
and allows the UE to receive, demodulate and decode.
A cyclic redundancy check (CRC) is attached to the DCI
payload before transmission. The CRC is masked according to
an ID, like the radio network temporary identifier (RNTI), of
the UE to which the transmission is directed, or according to
one of the system-wide IDs. Finally, the DCI is encoded with
a convolutional code. The UE is not aware of the format with
which the DCI has been transmitted: it thus has to explore a
combination of PDCCH locations, PDCCH formats, and DCI
formats in the common search space (CSS) and UE-specific
search space (UESSS) and attempt decoding to identify useful
DCIs. This process is called blind decoding, or blind detection.
For each PDCCH candidate in the search space, the UE
performs channel decoding, and demasks the CRC with its
ID. If no error is found in the CRC, the DCI is considered as
carrying the UE control information.
Based on LTE standard R8 [1], the performance specifica-
tions for the blind detection process are the following:
• The DCI of PDCCH is from 8 to 57 bits plus 16-bit CRC,
masked by 16-bit ID.
3• In UESSS, a maximum of 2 DCI formats can be sent
per transmission time interval (TTI) for 2 potential frame
lengths. Therefore, 16 candidate locations in UESSS →
32 candidates.
• In CSS, a maximum of 2 DCI formats can be sent per
TTI for 2 potential frame lengths. Therefore, 6 candidate
locations in CSS → 12 candidates.
• Code length could be between 72 and 576 bits.
• Information length (including 16-bit CRC) could be be-
tween 24 and 73 bits.
• Target signal-to-noise ratio (SNR) is dependent on the
targeted block error rate (BLER): 10−2.
• There are two types of false-alarm scenarios: Type-1,
when the UE ID is not transmitted but detected, and
Type-2, when the UE ID is transmitted but another one
is detected. The target false-alarm rate (FAR) is below
1.52× 10−5.
• Missed detection occurs when UE ID is transmitted but
not detected. The missed detection rate (MDR) is close
to BLER curve.
• The available time frame for blind detection is 16µs.
III. BLIND DETECTION SCHEME
In [12], polar codes have been considered within a blind
detection framework, and a blind detection scheme has been
proposed. Frozen bit positions are selected to instead transmit
the RNTI. Fig. 3 shows the block diagram of the devised
blind detection scheme. C1 candidates are received at the
same time: in this case, C1 = 44. The C1 candidates are
decoded with the simple SC algorithm, and a PM is obtained
for each candidate, equivalent to the LLR of the last decoded
bit: thanks to the serial nature of SC decoding, the LLR of
the last bit can be interpreted as a reliability measure on
the decoding process. The PMs are then sorted, to help the
selection of the best candidates to forward to the following
decoding phase. C2 candidates are in fact selected to be
decoded with the more powerful SCL decoding algorithm, that
guarantees a better error-correction performance, at a higher
implementation complexity. The C2 candidates are chosen as:
1) All candidates whose ID, after the first phase, matches
the one assigned to the UE. If more than C2 are present,
the ones with the highest PMs are selected.
2) If free slots among the C2 remain, the candidates with
the smallest PMs are selected. The candidates with large
PMs have higher probability to be correctly decoded:
if their ID does not match the one assigned to the
UE, it is probably a different one. On the other hand,
candidates with small PMs have a higher chance of being
incorrectly decoded, and a transmission to the UE might
be hiding among them.
After the SCL decoding phase, if one of the C2 candidates
matches the UE ID, it is selected, otherwise no selection is
attempted.
In [12], an early stopping criterion has been proposed as
well, to reduce the latency and energy expenditure of the
second phase of the blind detection scheme, The first phase
requires the full decoding of each candidate, to identify the
SCL1
Decoding
0
1
...
C1 − 1
...
PM
Sorting
and
Candidate
Selection
0
...
C2 − 1
SCLmax
Decoding
Fig. 3: Polar codes blind detection scheme.
C2 codewords that will be sent to the second phase. In the
second phase, however, all codewords whose ID does not
match the UE ID will be discarded. Thus, as soon as the
ID is shown to be different, the decoding can be interrupted.
Since SC-based decoding algorithms estimate codeword bits
sequentially, the ID evaluation can be performed every time
an ID bit is estimated. In case the estimated bit is different
from the UE ID bit, the decoding is stopped.
Three methods of ID bits have been described in [12] to
choose the bits assigned to the ID:
• ID mode 1: the ID bits are the 16 most reliable bits after
the K information bits.
• ID mode 2: the ID bits are the 16 most reliable bits, while
the K information bits are the most reliable bits after the
16 ID bits.
• ID mode 3: considering the order with which bits are
decoded in SC-based algorithms, the ID bits are the first
16 to be decoded among the K + 16 most reliable bits.
The three techniques yield negligible differences in terms
of error-correction performance, while ID mode 3 yields
considerable advantages over mode 1 and mode 2 when early
stopping is applied. In fact, since the ID bits are decoded
earlier, the average percentage of estimated bits decreases, and
the reduction in average latency is more substantial.
In this work, we generalize the blind detection scheme pro-
posed in [12], by considering SCL also for the first decoding
phase. In particular, we consider a list sizes L1 ≥ 1 for the
first decoding phase, and a list size Lmax > L1 for the second
decoding phase. It should be noted that when L1 = 1, the
blind detection scheme reverts to that of [12].
A. Simulation Results
To evaluate the effectiveness of the proposed blind detection
scheme, simulations were performed. The BLER, MDR, and
FAR have been measured on the additive white Gaussian noise
(AWGN) channel, with binary phase-shift keying (BPSK)
modulation, at the variation of different code parameters. We
focused on polar codes with block lengths N = {256, 512},
since in [12] it has been shown that they constitute the most
critical cases in terms of speed. Four information lengths
K = {8, 16, 32, 57} have been considered, while the number
of ID bits has been set to 16. The 3GPP standardization
committee has decided that information bits in polar codes
must be assigned to the K most reliable bit-channels [17]:
thus, the ID bits have been assigned according to ID mode
1. The ID values assigned to the C1 candidates are randomly
selected over 16 bits. While different numbers of candidates
passed to the second phase have been considered in [12], we
4−10 −8 −6 −4 −2 0 2
10−5
10−4
10−3
10−2
10−1
100
SNR [dB]
B
L
E
R
N = 256, K = 8 N = 256, K = 16
N = 256, K = 32 N = 256, K = 57
N = 512, K = 8 N = 512, K = 16
N = 512, K = 32 N = 512, K = 57
Fig. 4: BLER curves with SCL when L = 8.
−8 −6 −4 −2 010
−4
10−3
10−2
10−1
100
SNR [dB]
M
D
R
-B
L
E
R
MDR K = 8 MDR K = 16
MDR K = 32 MDR K = 57
BLER K = 8 BLER K = 16
BLER K = 32 BLER K = 57
Fig. 5: Missed detection rates after the second decoding phase
with L1 = 2, Lmax = 8, and C2 = 5. Transmissions include
C1/2 cases of N1 = 256 and C1/2 cases of N2 = 512.
have focused here on C2 = 5, for which a good tradeoff
between accuracy and latency is found. At the same time, we
set Lmax = 8 and L1 = 2: it is a representative case for which
Lmax guarantees good error-correction performance, and at
which SCL decoders can be implemented with reasonable
complexity.
Fig. 4 plots the BLER curves for all the considered code
lengths and rates. As expected, their error-correction perfor-
mance improves as the code length increases and the code
rate decreases. In Fig. 5, the first of the metrics specific to
the blind detection problem, the MDR, is depicted. The MDR
can be defined as the number of missed detections divided
by the number of transmissions in which the UE ID was
sent. The curves in Fig. 5 have been obtained considering
C1/2 candidates of length N1 = 256, and C1/2 candidates
of length N2 = 512 in each transmission, with K1 = K2
information bits. Together with the MDR, in Fig. 5 the BLER
curves relative to the aggregate transmissions are portrayed.
It can be seen that the MDR curve is always lower than the
relative BLER curve.
The FAR curves for the considered case study are portrayed
in Fig. 6. The system target FAR is equivalent to the FAR
obtained with a 16-bit CRC: in 5G, a CRC of at least 16-bits
long is foreseen. Here, we evaluate the additional contribution
that the proposed blind detection scheme can bring in lowering
the FAR on top of the CRC. It can be seen that the FAR
is kept below the 10−4 threshold at SNR values for which
the BLER is still very high, and decreases as the channel
conditions improve. In the blind detection method presented in
[13], the FAR increases as the MDR decreases. On the other
hand, the proposed scheme allows to decrease both at the same
time, thus avoiding performance limitations that could make
it unappealing for 5G standard applications.
The impact of the devised early stopping criterion on the
average number of estimated bits is shown in Fig. 7, for
K = 32 and K = 57. These results consider each of the
C2 candidates separately, since the number of candidates of
length N1 and N2 in the second phase depends on the PMs
received from the first phase, and thus on channel SNR. The
solid curves have been obtained in cases the UE ID was sent
through the considered code, while the dashed curves in cases
it was not sent through the code.
• For N = 256 (curves with a circle marker), it is
possible to observe the same behavior noted in [12] for
N = 128 as well. In case the UE ID was sent, as the
channel conditions improve, the number of estimated bits
increases until stabilizing at a maximum average value.
This phenomenon can be explained by the fact that when
the SNR is low, it is more likely that the codeword
carrying the UE ID is not selected to be among the C2
candidates. Thus the decoders in the second phase easily
encounter ID bits different from the UE ID early in the
decoding process. As the channel conditions improve, the
codeword with the UE ID falls among the C2 candidates
with rising probability. Consequently, the decoder tasked
with its decoding does not interrupt the process, reaching
100% estimated bits, while the remaining C2−1 decoders
stop the decoding early, thus averaging the estimated bit
percentage at a stable value (67% for K = 32 and 61%
for K = 57). The dashed curves show instead a stable
value regardless of channel conditions: since among the
C2 candidates there is never one carrying the UE ID, all
second phase decoders tend to stop the decoding early,
at a percentage independent of the SNR, and mostly
influenced by the position of bits assigned to the ID.
• For N = 512 (curves with a cross marker) a similar
behavior to the N = 256 case can be observed when the
UE ID is not sent, with the average number of estimated
bits stable at all the considered SNR values. On the other
hand, when the UE ID is sent, the trend is different: at
low SNR values, the percentage of estimated bits is very
close to 100%. As the SNR value increases, the average
starts to decrease, until it settles on a stable value. This
5−6 −4 −2
10−5
10−4
SNR [dB]
FA
R
K = 8 K = 16
K = 32 K = 57
Fig. 6: False alarm rates after the second decoding phase with
L1 = 2, Lmax = 8, and C2 = 5. Transmissions include C1/2
cases of N1 = 256 and C1/2 cases of N2 = 512.
−6 −4 −2 0 2
40
60
80
100
SNR [dB]
A
ve
ra
ge
es
tim
at
ed
bi
t
%
N = 256, K = 32 N = 256, K = 32 - no UEID
N = 256, K = 57 N = 256, K = 57 - no UEID
N = 512, K = 32 N = 512, K = 32 - no UEID
N = 512, K = 57 N = 512, K = 57 - no UEID
Fig. 7: Average percentage of estimated bits during the second
decoding phase with early stopping when Lmax = 8 and C2 =
5.
behavior is due to the fact that at low SNR, it is very
unlikely that a codeword with N = 512 is among the C2
second phase candidates if the UE ID is not matching: the
longer code length and lower rate contribute to a higher
decoding reliability during the first phase, that allows to
screen out unlikely candidates better than the N = 256
case.
IV. HARDWARE ARCHITECTURE
To evaluate the implementation cost of the devised blind
detection scheme, we designed a decoder architecture that
supports it, portrayed in Fig. 8. An array of flexible list size
SCL decoders handles both the first and second decoding
phase. A dedicated module selects the C2 candidates for the
second phase according to the criteria described in Section III.
Candidate
Selection
PM sorting
and
SCL decoder
1
Controller
Output
Input
SCL decoder array
SCL decoder
NSCLmax
Fig. 8: Polar codes blind detection system architecture.
A. Flexible list size SCL decoder
We based our SCL decoder architecture on that of [11], [18]:
the decoding process follows the one described in Section II-A
for a list size Lmax. Most of the datapath and memories are
instantiated Lmax times: multiple candidates are stored at the
same time, with the best candidate being selected at the end of
the decoding. While in [11], [18] the final candidate is selected
according to a CRC check, in the proposed architecture no
CRC is considered, and the validity of the final candidate is
based on the matching ID and PM value.
The SC decoding tree is descended by computing (2) and
(3) at each stage s, with priority being given to left branches.
These calculations are performed by Lmax parallel sets of P
processing elements (PEs), with P being a power of 2. In the
stages for which 2s > 2P , the operations in (2) and (3) are
performed over 2s/(2P ) steps, while a single step is needed
otherwise. Internal memories store the updated LLR values
between stages.
PEs get two LLR values as input, and concurrently compute
both αl and αr according to (2) and (3), respectively. The
correct output is selected depending on the index of the
leaf node to be estimated. When a leaf node is reached, the
decoder controller module identifies the leaf node as either
an information bit or a frozen bit. If a frozen bit is found,
the paths are not split, and the bit is estimated only as 0,
and the L memories are updated with the same bit or LLR
values. Instead, in case of an information bit, both 0 and 1 are
considered, so that paths are split, and the PMs updated for
the 2L candidates according to (6). Afterwards, the PMs are
sorted, identifying the L surviving paths.
All memories in the decoder are registers, enabling the
internal LLR and β values to be read, updated by the PEs,
and written back in a single clock cycle. At the same time,
the paths are either updated or split and updated, and the new
PMs computed. In the following clock cycle, in case the paths
were split, the PMs are sorted and the surviving paths selected.
Codes with different code lengths can be decoded by storing
the appropriate memory offsets for every considered code in
a dedicated memory.
This baseline decoder has been modified to better fit the
needs of the proposed blind detection scheme. In order to
maximize resource sharing, the SCL decoder has been sized
6for Lmax > L1, and the effective list size can be selected
through a dedicated input. The Lmax − L1 paths that are not
used in the first decoding phase are used to decode up to
b(Lmax − L1)/L1c additional candidates at the same time.
In order to exploit the unused paths, additional functional
modules are necessary.
• The baseline decoder uses a single memory to store the
channel LLR values, sharing it among the different paths.
If different codewords have to be decoded at the same
time, the channel memory needs to be instantiated not
once, but bLmax/L1c times.
• The decoder relies on sorting and selection logic that
identifies the surviving Lmax ones after paths are split. To
support the parallel decoding of bLmax/L1c candidates,
as many sorting and selection modules targeting the
selection of L1 paths out of 2L1 are instantiated.
If L1 = 1 is selected, the path splitting and PM sorting steps
are bypassed, reverting decoders to the standard SC case. Since
a single set of SCL decoders can handle both decoding phases,
the total number of decoders is NSCLmax (see Fig. 8). However,
the effective number of decoders for the first decoding phase
is NSCL1 = NSCLmax × bLmax/L1c.
The early stopping technique described in Section III has
been also implemented. The decoder receives as input the
position of the ID bits and the value of the UE ID: every time
a bit in an ID position is estimated, the bit value is compared
to the expected UE ID bit. All paths whose estimated bit does
not match the UE ID bit are deactivated. This operation is
performed after the L surviving paths have been selected, in
order not to force the survival of unlikely paths and increase
the FAR. In case all paths have been deactivated, the decoding
is stopped. The early stopping logic can be activated and
deactivated by means of a dedicated control signal. Since
the same hardware is used for both decoding phases, early
stopping is enabled only during the second one.
B. PM sorting and candidate selection
Fig. 9 depicts the architecture of the PM sorting and
candidate selection block. It processes the output of the first
decoding phase to select the C2 candidates for the second
phase, and selects the overall system output based on the
results from the second phase. For each of the NSCL1 first
phase decoders, a PM and a flag signalling a UE ID match
are received. They are stored every time the respective Valid
signal is risen by the decoder. The Valid signal is also used
as an enable for the PM and UE ID match register address
counter, and for the counter keeping track of how many
codewords had a matching UE ID after the first phase. When
all the C1 candidates have gone through the first decoding
phase, a Valid signal is issued to the sorter module, that
receives as input all the stored PMs. The sorter module returns
the C2 minimum PMs in as many clock cycles: each PM
is compared to all the others, and a single clock cycle is
necessary to identify the minimum one, that is excluded from
the subsequent comparison. When the C2 minima have been
found, the selector module considers how many candidates
had a matching UE ID after the first phase, and selects the C2
candidates for the second phase among them and those with
the minimum PM values. The C2 candidates are sent to the
NSCLmax decoders by means of a dedicated counter. Returning
PMs and UE ID match flags are received and compared by
another selector: when all C2 candidates have been decoded,
the selected codeword, if any, is output.
V. IMPLEMENTATION RESULTS
The architecture proposed in Section IV has been described
in VHDL and synthesized in TSMC 65 nm CMOS technology.
Table I reports the synthesis results for the architecture sized
for a maximum code length Nmax = 512, a maximum list size
Lmax = 8, C2 = 5, and a target frequency f = 1 GHz. Various
NSCLmax values have been considered, leading to different
latencies and area occupations. Since during the first decoding
phase L1 = 2, the effective number of decoders NSCL1
is equal to 4NSCLmax , even if only NSCLmax are physically
instantiated. Regarding the area, the NSCLmax SCL decoders
contribute to the majority of the complexity, ranging from
97.8% when NSCLmax = 1 to 99.7% when NSCLmax = 5. The
logic complexity of the PM sorting and candidate selection
module remains almost unchanged at the variation of NSCLmax ,
being mainly affected by C1 and C2. Memories have been
synthesized with registers only, without the use of RAM, and
account for 36% of the total area occupation.
The worst case latency of the proposed blind detection
system can be found as
Tbd =
⌈
C1
NSCL1
⌉(
T 1SCL
2
+
T 2SCL
2
)
+ Tsort +
⌈
C2
NSCLmax
⌉
max
(
T 1SCL, T
2
SCL
)
,
(7)
where T 1SCL and T
2
SCL are the SCL decoding latencies for
codes of length N1 and N2, respectively, while Tsort is the
number of time steps required to sort the PM of the first
decoding phase and obtain the C2 candidates out of the C1
candidate locations. Also, it is worth remembering that for the
proposed architecture, NSCL1 = bLmax/L1c × NSCLmax . The
SCL decoding latency can be found as [16]
T xSCL = 2Nx +Kx + 16− 2,
for x ∈ {1, 2}. From the results presented in Table I, it is
possible to see that even when considering the relatively old
65 nm technology node, the 16µs worst case latency target can
be reached with a single SCL decoder running at a frequency
of 1 GHz, while NSCLmax = 5 guarantees a worst case latency
of 3.6µs, meeting the 4µs target as well.
However, considering only the worst case latency is indeed
an unrealistic scenario. To begin with, while there is no
guarantee on how the C2 candidates are distributed among
N1 and N2, simulation results have shown that we can expect
the C2 candidates either to favor the shorter code length, or
to be equally divided between N1 and N2 candidates. Thus,
the factor ⌈
C2
NSCLmax
⌉
max
(
T 1SCL, T
2
SCL
)
7Counter
Counter
UEID match
Counter
Selector
Selector
Registers
UEID match
Registers
PM
C1
NSCL1
NSCL1
UEID match
Sorter
Valid
PM
Valid
Enablee C1
Codeword
C1
C2
EnableEnable
Valid
Codeword
NSCLmax
UEID match
Codeword
PM
Output
Fig. 9: PM sorting and candidate selection architecture.
in (7), that represents the contribution of the second decoding
phase, could be better expressed as:⌈ dC2/2e
NSCLmax
⌉
T 1SCL +
⌈
C2 − dC2/2e
NSCLmax
⌉
T 2SCL .
Note that this is still a conservative assumption, since it
entails the C2 candidates equally divided among the two
code lengths. We can refine this assumption by taking in
account the effect of early stopping. We can approximate the
latency reduction with a multiplicative factor Ex associated to
T xSCL. Consequently, the average latency of the blind detection
system, for NSCLmax < C2, can be computed as
Tbd =
⌈
C1
NSCL1
⌉(
T 1SCL
2
+
T 2SCL
2
)
+ Tsort
+
⌈ dC2/2e
NSCLmax
⌉
T 1SCLE
1 +
⌈
C2 − dC2/2e
NSCLmax
⌉
T 2SCLE
2 ,
(8)
while for NSCLmax ≥ C2 it becomes
Tbd =
⌈
C1
NSCL1
⌉(
T 1SCL
2
+
T 2SCL
2
)
+ Tsort
+max
(
T 1SCLE
1, T 2SCLE
2
)
.
(9)
Considering the number of UEs connected to the shared
channel, blind detection is dominated by instances in which
a particular UE ID is not sent. Thus, we can set Ex as the
fraction of bits expressed by the dashed curves in Fig. 7. The
average latency results in Table I show substantial reduction
with respect to the worst case latency case, within a more
realistic framework. Even within the 65 nm technology node,
with NSCLmax ≥ 4, the average latency is below 4µs. With
the latest technology nodes, a substantially higher frequency
will be easy to achieve, along with proportionally smaller area
occupation. It is consequently safe to assume that the 4µs
worst case latency target can be easily met for NSCLmax ≥ 3,
and the average latency with NSCLmax ≥ 2.
VI. CONCLUSION
In this work, we propose a polar codes blind detection
scheme. The candidates go through a first, coarser decoding
phase, that helps to select a few of them for a second, finer
TABLE I: TSMC CMOS 65 nm blind detection scheme
synthesis results for Lmax = 8, P = 64, C2 = 5, and
f = 1 GHz.
NSCLmax
Area Worst case latency Average latency
[mm2] [clock cycles] [µs] [clock cycles] [µs]
1 1.555 14720 14.7 11843 11.8
2 3.086 8330 8.3 6470 6.5
3 4.596 5555 5.6 4541 4.5
4 6.117 4710 4.7 3696 3.7
5 7.654 3620 3.6 3451 3.5
decoding phase. An early stopping criterion is proposed for
the second phase, to reduce average latency. We evaluate the
effectiveness of the blind detection scheme, and propose an ar-
chitecture to implement it. It is based on an SCL decoder with
tunable list size, that can be used for both decoding stages.
The architecture is synthesized and implementation results
are reported for various system parameters. The reported area
occupation and latency, obtained in 65 nm CMOS technology,
are able to meet 5G requirements, and are guaranteed to meet
them with even less resource usage in the latest technology
nodes.
REFERENCES
[1] 3rd Generation Partnership Project (3GPP), “Physical layer procedures,”
3GPP TS 36.213 V.8.2.0, 2008.
[2] R. Moosavi and E. G. Larsson, “A fast scheme for blind identification of
channel codes,” in 2011 IEEE Global Telecommunications Conference
- GLOBECOM 2011, Dec 2011, pp. 1–5.
[3] T. Xia and H. C. Wu, “Novel blind identification of LDPC codes using
average LLR of syndrome a posteriori probability,” IEEE Transactions
on Signal Processing, vol. 62, no. 3, pp. 632–640, Feb 2014.
[4] J. Zhou, Z. Huang, C. Liu, S. Su, and Y. Zhang, “Information-dispersion-
entropy-based blind recognition of binary bch codes in soft decision
situations,” Entropy, vol. 15, pp. 1705–1725, 2013.
[5] E. Arıkan, “Channel polarization: A method for constructing capacity-
achieving codes for symmetric binary-input memoryless channels,” IEEE
Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, July 2009.
[6] “Final report of 3GPP TSG RAN WG1 #87 v1.0.0,”
http://www.3gpp.org/ftp/tsg ran/WG1 RL1/TSGR1 87/Report/Final
Minutes report RAN1%2387 v100.zip, Reno, USA, November 2016.
[7] I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Trans. Inf.
Theory, vol. 61, no. 5, pp. 2213–2226, May 2015.
[8] G. Sarkis, P. Giard, A. Vardy, C. Thibeault, and W. Gross, “Fast polar
decoders: Algorithm and implementation,” IEEE J. Sel. Areas Commun.,
vol. 32, no. 5, pp. 946–957, May 2014.
8[9] S. A. Hashemi, C. Condo, and W. J. Gross, “Simplified successive-
cancellation list decoding of polar codes,” in IEEE Int. Symp. on Inform.
Theory, July 2016, pp. 815–819.
[10] C. Xiong, J. Lin, and Z. Yan, “Symbol-decision successive cancellation
list decoder for polar codes,” IEEE Trans. Signal Process., vol. 64, no. 3,
pp. 675–687, February 2016.
[11] S. A. Hashemi, C. Condo, and W. J. Gross, “Fast simplified successive-
cancellation list decoding of polar codes,” in IEEE Wireless Commun.
and Netw. Conf., March 2017, pp. 1–6.
[12] C. Condo, S. A. Hashemi, and W. J. Gross, “Blind detection with polar
codes,” IEEE Communications Letters, vol. PP, no. 99, pp. 1–1, 2017.
[13] P. Giard, A. Balatsoukas-Stimming, and A. Burg, “Blind Detection of
Polar Codes,” ArXiv e-prints, May 2017.
[14] C. Leroux, A. Raymond, G. Sarkis, and W. Gross, “A semi-parallel
successive-cancellation decoder for polar codes,” IEEE Trans. Signal
Process., vol. 61, no. 2, pp. 289–299, January 2013.
[15] A. Alamdar-Yazdi and F. R. Kschischang, “A simplified successive-
cancellation decoder for polar codes,” IEEE Commun. Lett., vol. 15,
no. 12, pp. 1378–1380, December 2011.
[16] A. Balatsoukas-Stimming, M. Bastani Parizi, and A. Burg, “LLR-based
successive cancellation list decoding of polar codes,” IEEE Trans. Signal
Process., vol. 63, no. 19, pp. 5165–5179, October 2015.
[17] “Draft report of 3GPP TSG RAN WG1 #AH NR2 v0.1.0,”
http://www.3gpp.org/ftp/tsg ran/WG1 RL1/TSGR1 AH/NR AH
1706/Report/Draft Minutes report RAN1#AH NR2 v010.zip,
Qingdao, China, June 2017.
[18] S. A. Hashemi, C. Condo, and W. J. Gross, “A fast polar code list
decoder architecture based on sphere decoding,” IEEE Trans. Circuits
Syst. I, vol. 63, no. 12, pp. 2368–2380, December 2016.
