A Complexity Reduction Method for Successive Cancellation List Decoding by Dizdar, Onur
ar
X
iv
:1
81
2.
09
35
7v
2 
 [c
s.A
R]
  1
9 A
ug
 20
19
1
A Complexity Reduction Method for Successive
Cancellation List Decoding
Onur Dizdar
Abstract—This brief introduces a hardware complexity re-
duction method for successive cancellation list (SCL) decoders.
Specifically, we propose to use a sorting scheme so that L
paths with smallest path metrics are also sorted according
to their path indexes for path pruning. We prove that such
sorting scheme reduces the input number of multiplexers in
any hardware implementation of SCL decoding from L to
(L/2 + 1) without any changes in the decoding latency. We also
propose sorter architectures for the proposed sorting method.
Field programmable gate array (FPGA) implementations show
that the proposed method achieves significant gain in hardware
consumptions of SCL decoder implementations, especially for
large list sizes and block lengths.
Index Terms—Successive cancellation list decoding, Polar
codes, Reed-Muller codes, hardware complexity, sorting.
I. INTRODUCTION
S
UCCESSIVE-Cancellation (SC) is the first decoding al-
gorithm proposed for polar codes by Arıkan in [1]. Be-
ing a low-complexity algorithm, SC brings a penalty in the
achievable error performance. In [2], successive cancellation
list (SCL) decoding was proposed to improve the error per-
formance, following similar ideas in [3] developed for Reed-
Muller (RM) codes.
A common problem of SCL decoder implementations is the
high hardware complexity, which is mostly due to memory
elements and large multiplexers in the designs. Memory ele-
ments in SCL decoder implementations store calculated log-
likelihood ratio (LLR) values, decoded bits, partial-sums and
pointers for each path.
Multiplexers are used to copy the memory contents between
decoding paths after path pruning stages. L-to-1 multiplexers
are used for this purpose for each of L paths, a structure
which is commonly referred as a crossbar. Input widths of
these multiplexers are equal to the widths of the corresponding
memory elements. For example, in SCL decoder architectures
[4]-[15], L-to-1 multiplexers with input widths of O(N) bits
(e.g. N bits, N/2 bits, etc.) are used to copy the contents
of registers storing decoded bits and/or partial-sums between
paths. In the cases where random access memory (RAM)
blocks are used for storage, which is the conventional approach
for storing calculated LLR values, pointer memories are used
[5]. In such cases, L-to-1 multiplexers with input widths equal
to the width of pointer registers are required for each path,
the widths being O(log2 N log2 L) bits. Similar arguments are
valid for designs [16]-[21], where decoded bits and/or partial-
sums are stored in RAM blocks or partly in registers and partly
in RAM blocks.
In this work, we propose a method to reduce the hardware
complexity of SCL decoder implementations. We achieve this
reduction by limiting the number of possible interactions be-
tween decoding paths by applying a novel sorting mechanism
to determine the surviving paths at decision making stages
of SCL decoding. Specifically, L surviving paths, which are
chosen out of 2L candidate paths, are obtained as sorted with
respect to their path indexes and path copying operations are
performed according to the result of this sorting mechanism.
We prove that the proposed method reduces the input number
of memory copying multiplexers from L to (L/2+1). We also
describe sorter architectures to enable the proposed method in
an SCL decoder.
The rest of the paper is organized as follows. We give
background information on SCL decoding in Section II. The
proposed complexity reduction method and sorter architectures
are described in Section III. Section IV gives the implemen-
tation results. Section V concludes the paper.
II. SUCCESSIVE-CANCELLATION LIST DECODING
Vectors are denoted by bold lowercase letters. We use
ck
i
to denote the vector (ci, ci+1, . . . , ck). For any set S ⊆
{0, 1, . . . , N − 1}, Sc denotes its complement.
A high level description of the SCL decoding algorithm is
given in Algorithm 1. The indexes of information bits in the
uncoded bit vector u of length-N are chosen from A, which is
the set of indexes of K polarized channels with smallest Bhat-
tacharyya parameters [1]. The probability W
(i)
N
(
y,ui−10 [k]|u
)
is defined as the decision probability for the i-th bit of the k-
th decoding path for the bit value u ∈ {0, 1}. SCL decoders
keep L decoded bit sequences during the decoding process in
order to enhance the error performance. Decoding paths are
formed during the decision making stages of SC decoding for
information bits. An existing path is split into two candidate
paths for the bit decision values of 0 and 1 for the decoded
information bit.
In order to limit the exponential growth of decoding paths,
path pruning is performed to limit the number of paths to
maximum list size L. When the number of candidate paths
exceed L, an SCL decoder chooses L paths as the surviving
paths according to their path metrics. Path metrics are calcu-
lated from decision probabilities or LLR values depending on
the decoder implementation.
After path pruning, new paths are formed from existing
paths. In software implementations of SCL decoding, the
existing decoding paths are continued, copied or killed if one,
both or none of their candidate paths survive, respectively.
In hardware implementations, memory copying operations are
required to perform such task, which require L-to-1 multi-
plexers for each memory element. In the next section, we
2SC 
PE
Block
Channel 
LLRs
Path 1
Path 2
Path L
Control Logic
Fig. 1: Generic architecture for SCL decoders
describe a sorting method to reduce the number of inputs of
such multiplexers down to L/2 + 1.
III. COMPLEXITY REDUCTION BASED ON MULTIPLEXERS
IN SCL DECODERS
As mentioned in the previous chapter, an SCL decoder
forms candidate paths from existing paths for each information
bit. In this work, we assume that such operation is performed
by forming candidate paths with indexes 2l − 1 and 2l from
the existing path with index l, 1 ≤ l ≤ L. Path metrics of the
candidate paths can be calculated as in [4] or [22].
A generic SCL decoder architecture is given in Fig. 1. In
conventional hardware implementations of the SCL algorithm,
decoding paths are assigned to dedicated hardware elements,
i.e., circuitry for calculations and memory elements. The
dedicated circuitry of a decoding path consists of processing
elements (represented by PE block in the SC module) and
partial-sum logic (represented by PS logic in the PS unit)
and performs SC decoding calculations for the specific path.
The dedicated memory elements of a decoding path store the
calculated LLR values, partial-sum bits, previously decoded
bits and/or pointers to specify the RAM blocks which the
particular path should access for decoding calculations.
After path pruning, the dedicated hardware of decoding
paths are assigned to new surviving paths to continue with
the decoding operations. The memory contents for each new
surviving path are copied from the corresponding dedicated
hardware of an existing path according to an ordering. The
ordering is determined by a sorter or a module which finds
the L candidate paths with smallest path metrics. Then, such
L candidate paths are chosen as surviving paths and they are
assigned to dedicated hardware in the specific order at the
output of the sorter. The memory contents of the existing paths,
which the surviving paths are formed from, are copied from
their respective dedicated memory elements to the new mem-
ory elements according to this ordering. Similar arguments are
valid for different types of modules that extract the L surviving
paths with smallest metrics without a specific ordering.
The copying operations explained above are performed by
crossbars consisting of L-to-1multiplexers for each of L paths,
as shown in Fig. 1. The input widths of such multiplexers
are required to be O(N) bits if they are employed to copy
partial-sum and decoded bit registers. Calculated LLR values
are conventionally stored in RAM blocks, as the number of
bits to be stored is higher than those of partial-sums and
Algorithm 1: uˆ = SCL(y,A,uAc , L)
N =length(y), γ = 1
for i = 0 to N − 1 do
if i /∈ A then
uˆi[k]← ui, ∀k ∈ {1, . . . , L}
else
if γ < L then
for k = 1 to γ do
uˆ
i−1
0 [{2k − 1, 2k}]← uˆ
i−1
0 [k]
uˆi[2k − 1]← 0
uˆi[2k]← 1
end
γ ← 2γ
else
Γ←SORT
((
uˆ
i−1
0 [k], u
)
,W
(i)
N
(
y
N−1
0 ,u
i−1
0 [k]|u
))
,
∀k ∈ {1, . . . , L}, ∀uˆ ∈ {0, 1}
uˆi0[k]← Γk , ∀k ∈ {1, . . . , L}
end
end
end
k′ ← argmaxk∈{1,...,γ}W
(N−1)
N
(
yN−10 ,u
N−2
0 [k]|uN−1 [k]
)
return uˆ[k′]
decoded-bits. In this case, pointer memories with input widths
of O(log2 N log2 L) bits are employed to map calculated LLR
RAM blocks to paths [5] and are copied by crossbars with
equal widths. Similarly, in architectures where partial-sums
are stored in RAM blocks or partly in registers and partly
in RAM blocks, crossbars are employed to copy the pointer
memories and the registers that store portions of partial-sums.
A. Proposed Method
We achieve a reduction in the input number of crossbar
multiplexers in an SCL decoder. The reduction is achieved
by ordering the indexes of surviving paths, so that dedicated
memory elements of each decoding path can get memory
contents from a limited set of other dedicated paths. We prove
that the number of elements in such a set is
(
L
2 + 1
)
, so that
the employed multiplexers are
(
L
2 + 1
)
-to-1 instead of L-to-1.
Proposition: In an SCL decoder implementation, the num-
ber of paths that can have its memory contents copied to a
specific path after path pruning is
(
L
2 + 1
)
instead of L, if the
L surviving paths are ordered according to their indexes.
Proof: We denote the candidate paths with index i,
1 ≤ i ≤ 2L. A candidate path i is formed from the existing
path
⌊
i−1
2
⌋
+ 1 according to our path splitting definition.
After path pruning, L candidate paths with
indexes i1, i2, . . ., iL survive, where 1 ≤ ik ≤ 2L and
3∀k ∈ {1, 2, . . . , L}. The surviving path ik is assigned to path
k to continue with the decoding operations and the memory
contents of the existing path
⌊
ik−1
2
⌋
+ 1 are copied to the
memories of path k. When the surviving path indexes are
sorted, they satisfy the expression
i1 < i2 < . . . < iL. (1)
Since 1 ≤ ik ≤ 2L, expression (1) implies that surviving path
indexes are limited by specific minimum and maximum values.
For any ik, the minimum and maximum values are found as
k ≤ ik ≤ 2L− (L − k). (2)
In order to find all possible indexes of existing paths that the
k-th surviving path can originate from, we use (2) to write⌊
k − 1
2
⌋
+ 1 ≤
⌊
ik − 1
2
⌋
+ 1 ≤
⌊
L+ k − 1
2
⌋
+ 1. (3)
Expression (3) shows that dedicated memories of the k-th
surviving path can get memory contents from dedicated mem-
ories of paths with indexes specified by the limits. It is
straightforward to show that there are
(
L
2 + 1
)
elements in
the interval (3) for L being an even number, which completes
the proof.
The proposed sorting method does not affect the latency or
the error performance of SCL decoding. We demonstrate the
error performances of SCL decoding with conventional and
proposed sorting methods in Fig 2.
-0.5 0 0.5 1 1.5 2 2.5
Eb/N0 (dB)
10-4
10-3
10-2
10-1
100
BL
ER
Conventional sorting
Proposed sorting
Fig. 2: SCL decoding performance with conventional and
propsed sorting, N = 1024, K = 512, L = 8.
B. Sorter Design for the Proposed Method
In this section, we give sorter designs for the proposed
method. A generic sorter architecture for the method is given
in Fig. 3. The architecture takes the path metrics and path
indexes of 2L candidate paths as inputs,denoted by m in k
and i in k, respectively. We use a two stage sorter. The
first stage finds the surviving paths according to their path
metrics. The second stage sorts the surviving paths according
to their indexes. The architecture outputs the path metrics and
path indexes of L surviving paths, denoted by m out k and
i out k, and quantized by q and p bits, respectively.
We propose 3 different sorter designs. Our aim is to offer a
trade-off between hardware complexity and throughput by the
proposed designs. One of the sorter architectures we employ
in our designs is the radix-2L sorter [5], [23]. Other sorting
methods we consider are the bitonic sorter proposed in [24]
m_in_1
m_in_2
m_in_2L
i_in_1
i_in_2
i_in_2L
q
p
p
q
i_out_1
i_out_2
i_out_L
m_out_1
m_out_2
m_out_L
Index
Sorter 
(L Inputs)
q p
q
p
    Metric 
    Sorter
(2L Inputs)
Fig. 3: Proposed sorter architecture
and maximum values filter (MVF) proposed in [20]. MVF is a
bitonic sorter without the final sorting stages, so that it extracts
L paths with the smallest path metrics out of 2L candidate
paths without any ordering. Table I summarizes the proposed
designs and their complexity and delay characteristics with
respect to each other.
TABLE I: Proposed Sorter Designs
Design
Metric
Sorting
Index
Sorting
Complexity Delay
1 MVF Bitonic Low High
2 MVF Radix-L Medium Medium
3 Radix-2L Radix-L High Low
IV. IMPLEMENTATION RESULTS
In this section, we verify the proposed sorting method by
field programmable gate array (FPGA) implementations. We
use Xilinx-XCZU9EG-FFVB1156-2-i in the implementations
and the methods in [25] for obtaining multiplexers with
number of inputs different from 2m.
TABLE II: LUT Consumptions of Conventional Crossbars
(Synthesis Results)
N = 2048 N = 4096 N = 8192
L = 4 10140 18948 59113
L = 8 49600 79133 286330
L = 16 210572 426616 1437269
L = 32 852630 1962224 6610732
First, we investigate the hardware complexities of the cross-
bars. Table II presents the look-up table (LUT) numbers for
conventional crossbars for different list sizes and bit widths.
One can observe that significant numbers of LUTs are required
for a single crossbar when the bit width is large.
Table III gives the implementation results for state-of-the-art
sorters and the sorter designs explained in the previous section.
We consider the simplified bubble sorter (SBS) [26], radix-2L
sorter [5], MVF [20] and simplified odd-even sorter (OES)
[27] for comparison. From the results in Tables II and III,
one can directly observe that the crossbar complexities are
significantly higher than those of the sorters. Therefore, the
increase in overall decoder complexity due to the employed
sorter design is expected to be negligible compared to the gains
obtained from crossbars with the proposed method.
Secondly, one can observe that the proposed sorter designs
offer a trade-off between hardware complexity and throughput,
even though the complexity gain from crossbars is expected
to be more significant, as mentioned above.
4TABLE III: Sorter Implementation Results
SBS Radix-2L MVF OES Design 1 Design 2 Design 3
LUTs
Freq.
(MHz)
LUTs
Freq.
(MHz)
LUTs
Freq.
(MHz)
LUTs
Freq.
(MHz)
LUTs
Freq.
(MHz)
LUTs
Freq.
(MHz)
LUTs
Freq.
(MHz)
L = 4 124 408.2 351 434.8 322 285.7 295 238.1 390 200.0 443 206.2 461 333.3
L = 8 589 161.3 1551 285.7 1073 149.3 526 128.2 1248 87.7 1671 107.5 2143 188.7
L = 16 2982 82.0 6852 172.4 4141 83.3 1983 74.6 4579 47.4 5392 62.5 9246 111.1
L = 32 13757 32.1 29541 113.6 11114 54.6 5949 44.2 17059 29.0 23375 37.3 43638 66.2
TABLE IV: Estimated LUT Gains from Crossbars
N = 2048 N = 4096 N = 8192
L = 4 2535 4737 14778
L = 8 18600 29675 107374
L = 16 92125 186645 628805
L = 32 399670 919793 3098781
Finally, comparing the state-of-the-art and the proposed
sorters, one can observe that the proposed sorting method does
not necessarily imply higher sorter complexity or delay. For
example, the hardware consumption of a Radix-2L sorter is
higher than those of Designs 1 and 2 for L > 4 and L > 8,
respectively. Design 3 achieves higher throughput than those
of SBS, MVF and OES for L > 4. Furthermore, there are large
variations in hardware consumptions and operating frequencies
also among the state-of-the-art sorters. We can conclude that
the proposed sorting method can be implemented with lower
hardware complexity or higher throughput than those of state-
of-the art sorting methods depending on the design.
Table IV presents the estimated LUT gains from the cross-
bar implementations when the proposed sorting method is
used. Comparing Tables III and IV, one can observe that the
expected hardware consumption gains are much larger than
possible hardware consumption increases due to the sorter
designs in a decoder.
TABLE V: Decoder Implementation Results
Decoder Conv. w/ OES Simp. w/ Design 3
L 4 8 4 8
(N,K) (4096, Any)
LUTs 76229 198082 69058 151901
Registers 27445 54371 27392 54371
RAM [Mbits] 0.74 1.46 0.74 1.46
fop[MHz] 129.9 66.7 172.4 96.2
Latency [clock cycles] 12928
TP [Mbps] 41.1 21.1 54.6 30.5
Sorter fop[MHz] 238.1 128.2 333.3 188.7
Sorter LUTs 295 526 461 2143
In order to validate the estimated gains in Table IV, we im-
plement SCL decoders with and without the proposed method.
We use the semi-parallel architecture in [28] with P = 32
processing elements and LLR-based metric calculation in [4].
Decoded bits are stored in registers of N bits. Calculated
LLRs are stored in RAM blocks and pointer registers of
(log2 N − 1) log2 L bits are used for pointer-based copying.
Partial-sums are calculated by the partial-sum network in
[29] and stored in registers of P (parallel part) and N/2
(serial part) bits. LLRs are represented in the conventional
sign-magnitude form, however, different LLR representation
methods, such as [30], can also be employed. All registers in
the implementations are copied by crossbars with the corre-
sponding input widths. The implemented decoders are flexible
in terms of code rate. Conventional decoders without proposed
sorting method employ simplified OES sorter, which is a
low-complexity state-of-the-art sorter as seen from Table III.
Implementation results are given in Table VI.
The results show that the proposed method achieves signif-
icant hardware consumption gain of SCL decoders. For the
considered decoder architecture and block lengths, we obtain
a hardware consumption gains of approximately 9.41% for
L = 4 and 23.3% for L = 8 with the proposed method.
The complexity gains can be verified from the crossbar im-
plementation results in Table IV for a 4096-bit crossbar to
copy decoded bit registers and a 2048-bit crossbar to copy
partial-sum registers.
TABLE VI: Comparison with State-of-the-Art Decoders
Decoder
Simp. w/
Design 3
[19] [34] [35]
L 8 4 4 32
(N,K)
(4096,
2048)
(1024,
512)
(1024,
512)
(4096,
2048)
LUTs/ALMs 153856 142961 101160 67211
Registers 54371 19795 13544 31247
RAM [Mbits] 1.49 4404 0 22440
fop[MHz] 96.0 42.66 - 107
Latency [cycles] 7297 371 4064 16019
TP [Mbps] 54 115 - 27.35
As specified in Table II, the decoding latencies of the
implemented decoders are equal to the latency of a semi-
parallel SC decoder with additional N clock cycles for path
pruning operations. The sorting method we propose does not
increase the decoding latency, thus the throughput values of
the architectures in comparison have identical relations with
the maximum achievable operation frequencies. The maximum
delay path of the decoder passes through the sorter, as pointed
out in [4]. We observe that the throughput of the decoders
using the proposed sorting method are higher than those of the
conventional decoders, owing to the smaller delay of Design 3
sorter with respect to the delay of OES sorter. The results show
that the decoders using proposed sorting method with proper
sorter designs can also achieve significant gains in hardware
consumption and higher throughput values with respect to
those of decoders using conventional state-of-the-art sorters.
Finally, we compare a decoder implementation employing
the proposed sorting method with state-of-the-art SCL de-
coders. Table VI gives the implementation results. We optimize
our decoder for code rate 0.5 for a fair comparison with the
presented decoders. For this purpose, we apply certain sim-
plifications of simplified successive cancellation list (SSCL)
decoding [31]. Specifically, we perform rate-0 and repetition
code simplifications for constituent codes of length up to 16.
For a polar code with N = 4096 and K = 2048, this
reduces the decoding latency down to 7297 with a similar
hardware complexity, as also stated out in [32]. We note that
the throughput can further be improved by the optimizations
in [33] without any change in the sorter architecture.
5Compared with the decoders in [19] and [34], the de-
coder with the proposed sorting method has a significant
advantage in terms of hardware complexity. More specifically,
the proposed method can support twice the list size, which
requires approximately four times as large hardware resources
as observed from the results in Table II, for a four times
as large block length with similar hardware resources and
throughput. The decoder in [35] achieves hardware complexity
reduction by completely eliminating decoded bit and partial-
sum crossbars at the expense of increased latency. Therefore,
the decoder in [35] has a higher latency even though it
benefits from multibit decoding (MBD) [14] and switches
to parallel decoding at low decoding stages to achieve la-
tency reduction. As seen from the implementation results,
the simplified decoder with the proposed sorting method can
achieve larger throughput than that of [35] with significantly
lower memory consumption. On the other hand, the decoder
in [35] can operate with larger list sizes owing to the lack
of crossbars with large input widths. The results show that
the proposed method offers a balanced decoder design with
reasonable hardware complexity and throughput, especially for
large block lengths.
We note that the presented sorters and decoder are example
implementations to verify the gains obtained by the proposed
sorting method. The method can be applied in different de-
coder architectures and using different sorter designs.
V. CONCLUSION
In this work, a hardware complexity reduction method for
SCL decoder implementations is proposed. The method com-
prises applying a sorting mechanism to candidate path metrics
so that surviving paths are sorted according to their path in-
dexes. With the proposed method, (L/2+1)-to-1 multiplexers
can be employed instead of L-to-1 multiplexers for memory
copying operations after path pruning. Implementation results
show that significant reduction in hardware consumption is
achievable with the proposed method without any penalty in
decoding latency. Future work includes novel sorter designs
and ASIC implementations for the proposed method.
REFERENCES
[1] E. Arıkan, “Channel polarization: a method for constructing capacity-
achieving codes for symmetric binary-input memoryless channels,” IEEE
Trans. Inform. Theory, vol. 55, no. 7, pp. 3051–3073, July 2009.
[2] I. Tal and A. Vardy, “List decoding of polar codes,” in Proc. IEEE Int.
Symp. Inform. Theory (ISIT), July 2011, pp. 1–5.
[3] I. Dumer and K. Shabunov, “Soft-decision decoding of Reed-Muller
codes: recursive lists,” IEEE Trans. Inform. Theory, vol. 52, no. 3, pp.
1260266, Mar. 2006.
[4] A. Balatsoukas-Stimming, M. B. Parizi and A. Burg, “LLR-based
successive cancellation list decoding of polar codes,” IEEE Trans. Signal
Process., vol. 63, no. 19, pp. 5165–5179, Oct. 2015.
[5] A. Balatsoukas-Stimming, A. J. Raymond, W. J. Gross and A. Burg,
“Hardware architecture for list successive cancellation decoding of polar
codes,” IEEE Trans. Circuits and Syst. II, Exp. Briefs, vol. 61, no. 8,
pp. 609–613, Aug. 2014.
[6] B. Yuan and K. K. Parhi, “Low-latency successive-cancellation list
decoders for polar codes with multibit decision,” IEEE Trans. Very Large
Scale Integration (VLSI) Syst., vol. 23, no. 10, pp. 2268–2280, Oct. 2015.
[7] Y. Fan et al., “Low-latency list decoding of polar codes with double
thresholding,” in 2015 IEEE Intern. Conf. on Acoust., Speech and Signal
Process. (ICASSP), Brisbane, QLD, 2015, pp. 1042–1046.
[8] Y. Fan et al., “A low-latency list successive-cancellation decoding
implementation for polar codes,” IEEE Journ. on Sel. Areas in Commun.,
vol. 34, no. 2, pp. 303–317, Feb. 2016.
[9] S. A. Hashemi, A. Balatsoukas-Stimming, P. Giard, C. Thibeault and
W. J. Gross, “Partitioned successive-cancellation list decoding of polar
codes,” in 2016 IEEE Intern. Conf. on Acoust., Speech and Signal
Process. (ICASSP), Shanghai, 2016, pp. 957–960.
[10] S. A. Hashemi, C. Condo and W. J. Gross, “A fast polar code list decoder
architecture based on sphere decoding,” IEEE Trans. Circuits Syst. I,
Reg. Papers, vol. 63, no. 12, pp. 2368–2380, Dec. 2016.
[11] S. A. Hashemi, C. Condo and W. J. Gross, “Fast and Flexible Successive-
Cancellation List Decoders for Polar Codes,” IEEE Trans. Signal Pro-
cess., vol. 65, no. 21, pp. 5756–5769, Nov. 2017.
[12] A. Sural, “An FPGA implementation of successive cancellation list
decoding for polar codes,” Bilkent University masters thesis, Feb. 2016.
[13] W. Song, C. Zhang, S. Zhang and X. You, “Efficient adaptive successive
cancellation list decoders for polar codes,” in 2016 IEEE Int. Conf. on
Digital Signal Process. (DSP), Beijing, 2016, pp. 218–222.
[14] B. Yuan and K. K. Parhi, “LLR-based successive-cancellation list
decoder for polar codes with multibit decision,” IEEE Trans. Circuits
and Syst. II, Exp. Briefs, vol. 64, no. 1, pp. 21–25, Jan. 2017.
[15] C. Zhang, X. You and J. Sha, “Hardware architecture for list successive
cancellation polar decoder,” in 2014 IEEE Int. Symp. Circuits Syst.
(ISCAS), Melbourne VIC, 2014, pp. 209–212.
[16] J. Lin, C. Xiong and Z. Yan, “A high throughput list decoder architecture
for polar codes,” IEEE Trans. Very Large Scale Integration (VLSI) Syst.,
vol. 24, no. 6, pp. 2378–2391, June 2016.
[17] C. Xiong, J. Lin and Z. Yan, “A multimode area-efficient SCL polar
decoder,” IEEE Trans. on Very Large Scale Integration (VLSI) Syst.,
vol. 24, no. 12, pp. 3499–3512, Dec. 2016.
[18] C. Xia, Y. Fan, J. Chen and C. Tsui, “On path memory in list successive
cancellation decoder of polar codes,” in 2018 IEEE Int. Symp. Circuits
Syst. (ISCAS), Florence, Italy, 2018, pp. 1–5.
[19] C. Xiong, Y. Zhong, C. Zhang and Z. Yan, “An FPGA emulation
platform for polar codes,” 2016 IEEE Int. Workshop on Signal Process.
Syst. (SiPS), Dallas, TX, 2016, pp. 148–153.
[20] J. Lin and Z. Yan, “An efficient list decoder architecture for polar codes,”
2016 IEEE Trans. on Very Large Scale Integration (VLSI) Syst., vol. 23,
no. 11, pp. 2508–2518, Nov. 2015.
[21] C. Xiong, J. Lin and Z. Yan, “Symbol-decision successive cancellation
list decoder for polar codes,” IEEE Trans. Signal Process., vol. 64, no. 3,
pp. 675–687, Feb. 2016.
[22] B. Yuan and K. K. Parhi, “Successive cancellation list polar decoder
using log-likelihood ratios,” in Proc. IEEE Asilomar Conf. Signals, Syst.,
Comput., 2014, pp. 548-552.
[23] L. G. Amaru, M. Martina and G. Masera, “High speed architectures for
finding the first two maximum/minimum values,” IEEE Trans. on Very
Large Scale Integration (VLSI) Systems, vol. 20, no. 12, pp. 2342–2346,
Dec. 2012.
[24] K. E. Batcher, “Sorting networks and their applications,” in Proc. Spring
Joint Comput. Conf., Atlantic City, NJ, USA, May 1968, pp. 307–314.
[25] K. Chapman, “Multiplexer design techniques for datapath performance
with minimized routing resources,” Oct. 31, 2014, XILINX.
[26] A. Balatsoukas-Stimming, M. Bastani Parizi and A. Burg, “On metric
sorting for successive cancellation list decoding of polar codes,” 2015
IEEE Int. Symp. Circuits Syst. (ISCAS), Lisbon, 2015, pp. 1993–1996.
[27] B. Yong Kong, H. Yoo and I. Park, “Efficient sorting architecture
for successive-cancellation-list decoding of polar codes,” IEEE Trans.
Circuits and Syst. II, Exp. Briefs, vol. 63, no. 7, pp. 673-677, July 2016.
[28] C. Leroux, A. Raymond, G. Sarkis, and W. Gross, “A semi-parallel
successive-cancellation decoder for polar codes,” IEEE Trans. Signal
Process., vol. 61, no. 2, pp. 289–299, Jan. 2013.
[29] Y. Fan and C.-Y. Tsui, “An efficient partial-sum network architecture for
semi-parallel polar codes decoder implementation,” IEEE Trans. Signal
Process., vol. 62, no. 12, pp. 3165–3179, June 2014.
[30] H. Yoon and T. Kim, “Efficient successive-cancellation polar decoder
based on redundant LLR representation,” IEEE Trans. Circ. Sys. II:
Exp. Briefs, vol. 65, no. 12, pp. 1944–1948, Dec. 2018.
[31] S. A. Hashemi, C. Condo and W. J. Gross, “Simplified successive-
cancellation list decoding of polar codes,” 2016 IEEE Int. Symp. Inf.
Theory (ISIT), Barcelona, 2016, pp. 815–819.
[32] C. Zhang and K. K. Parhi, “Latency analysis and architecture design of
simplified SC polar decoders,” IEEE Trans. Circ. Sys. II: Exp. Briefs,
vol. 61, no. 2, pp. 115–119, Feb. 2014.
[33] Y. Zhou, Z. Chen, J. Lin and Z. Wang, “A high-speed successive-
cancellation decoder for polar codes using approximate computing,”
IEEE Trans. Circ. Sys. II: Exp. Briefs, vol. 66, no. 2, pp. 227–231,
Feb. 2019.
6[34] X. Liang, C. Yang, J.and Zhang, W. Song, and X. You, “Hardware
efficient and low-latency CA-SCL decoder based on distributed sorting,”
2016 IEEE Glob. Commun. Conf. (GLOBECOM), Washington, DC,
2016, pp. 1-6.
[35] C. Xia et al., “An implementation of list successive cancellation decoder
with large list size for polar codes,” 2017 27th Int. Conf. on Field Prog.
Logic and App. (FPL), Ghent, 2017, pp. 1-4.
