Architecture Evaluation Tool for 3D CAMs by Chen, Yong-Xiao & Li, Jin-Fu
Architecture Evaluation Tool for 3D CAMs
Yong-Xiao Chen and Jin-Fu Li
Advanced Reliable Systems (ARES) Lab.
Department of Electrical Engineering
National Central University
Taoyuan, Taiwan 320
Abstract—Three-dimensional (3D) integration using through-
silicon via (TSV) has been used for memory designs. Content
addressable memory (CAM) is an important component in digital
systems. In this paper, we propose an evaluation tool for 3D
CAMs, which can aid the designer to explore the delay and
power of various partitioning strategies. Delay, power, and energy
models of 3D CAM with respect to different architectures are
built as well.
I. INTRODUCTION
Three-dimensional (3D) integration technology using
through silicon via (TSV) has been used for optimizing the
performance and power of DRAM [1]. Content addressable
memory (CAM) is also a component having very high re-
quirement on the performance and low power consumption.
3D CAM provides the possibility to meet the requirement.
A typical CAM has the read, write, and compare (search)
operations. The compare operation is the most critical oper-
ation on power and performance. While a compare operation
is executed, the data on searchlines (SLs) are compared with
the data stored in the CAM words in which each word exports
the comparison result through its matchline (ML). Then, the
comparison results of all words are evaluated by a hit signal
generator and/or priority address encoder (PAE).
Cell-level and architecture-level 3D CAMs were reported
in [2], [3]. The cell-level 3D CAM needs a huge amount of
TSVs. In this paper, therefore, we only consider architecture-
level 3D CAMs. Architecture-level 3D CAMs can be divided
into searchline- and matchline-partitioned CAMs. Fig. 1 shows
the two different 3D CAM architectures for an 8-word CAM.
This paper presents an architecture evaluation tool for 3D
CAMs. The architecture evaluation tool can provide the power
and delay information with respect to different 3D CAM
architectures.
II. PROPOSED EVALUATION TOOL FOR 3D CAMS
Different 3D architectures have different influences on the
area, power, delay, and the number of required TSVs. Further-
more, either searchline-partitioned architecture or matchline-
partitioned architecture can be designed with multiple layers.
The number of layers has a heavy impact on the delay,
power, and area of the 3D CAM. Therefore, an evaluation
tool is developed for 3D CAMs, which can be used to aid the
designer to select an appropriate architecture. Fig. 2 shows
the conceptual diagram of the evaluation tool for 3D CAMs.
Given the information of CAM cell structure and ML structure,
W2
W1
W0
W3
M2
M1
M0
M3
W6W7
W5
W4
M6
M5
M4
M7
4WL
3WL
7WL
0WL
3ML
7ML
0ML
4ML
M3
M2
M1
M0
3MR
0MR
4MR
7MR M7
M6
M5
M4
0A1A2A
Scramble
Table
M6M7
M5
M3M4
M2
M1
M0
W6W7
W5
W4
W2
W1
W0
W3
(a)
A[2]A[1]A[0]
4−bit
TSV
Data/Comparand
4−bit
PAE & HG
1
5
2
6
WL
WL
WL
WL PAE & HG
4−bit
4
1
5
2
6
3
PAE & HG4−bit
7WR
WR
WR
WR
WR
WR
WR
WR
Hit
HPA[2] HPA[1:0]
0
Data/Comparand
Hit
HPA[2] HPA[1:0]
8−bit
HPA[2:0]
A[2:0]
PAE & HG
Data/Comparand
Hit
AND
AND
Hu
HPAu
HPAd
MUXOR
Hd
AND
AND
Hu
Hd
HPAd
HPAu
(b)
(c)
OR MUX
PAE & HG
Fig. 1. (a) A 2D CAM with 8 words. (b) Searchline-partitioned 3D CAM
with 2 layers and 8 words. (c) Matchline-partitioned 3D CAM with 2 layers
and 8 words [3].
parameters of the 3D CAM, the number of partitions of
MLs and SLs, the electrical data of used CMOS technology
node, and TSV parameters, the evaluation tool can export the
read/write delay, read/write power, search delay, search power,
and TSV counts of the 3D CAM.
3D CAM2D CAM
(Nbank )
(Nx)
(Ny)
3D CAM Evaluation Tool
Inputs Outputs
Write/Read delay
Write/Read power
Search delay
Search power
TSV counts
CAM cell & ML structure
CMOS technology node
TSV parameter
(N)# of rows of a CAM
# of sub−bank of a CAM
Word size (B)
# of cuts on SLs
# of cuts on MLs
Fig. 2. Conceptual diagram of the proposed architecture evaluation tool for
3D CAMs.
Delay and power models of 3D CAMs are developed. Here,
only the delay and power models of search operation are
explained. Those for the other operations can be developed
1
1st Intl. Workshop on Emerging Memory Solutions, DATE Conference 2016, Dresden, Germany
2016, KLUEDO, Publication Server of University of Kaiserslautern
in a similar way. The search delay model is shown as below.
Tsearch = TML,pre + TSL,drive + TML,eva + TPAE , (1)
where TML,pre, TSL,drive, TML,eva, and TPAE denote the
delay of per-charging ML, driving SL, evaluating ML, and
PAE, respectively. Fig. 3 shows the modified RC models of
SL and ML in a 3D CAM. According to the RC models, the
delay of SL and ML for a 3D CAM can be calculated. If the
CAM is divided into multiple active layers, the critical path
of SL and ML in each layer can be shortened. The equivalent
capacitance and resistance of ML and SL in each layer can be
expressed as follows:
CML =(B/Nx) × (2 · Cd + Cwire,w) (2)
RML =(B/Nx) × Rwire,w
CSL =(N/Ny) × (Cg + Cwire,h)
RSL =
N
Ny
× Rwire,h
The search power consumption of 3D CAM is mainly
constituted by the MLs, SLs, the additional logic for 3D CAM,
additional TSVs, and PAE, which can be expressed as follows:
Psearch = PML + PSL + Plogic + PTSV + PPAE . (3)
The power consumption of each component can be calculated
in terms of the switching activity, equivalent capacitance,
voltage supply, and operating frequency.
TSVR
CML TSVC
2
MLRonR Rlogic
Clogic
2
MLL
Precharge
MLR
ML
TCAM cell (W)
TSV
TCAM cell (H)
TSV
SL Driver
SL
CSL
2TSVC driverC
CSL
2
driverRTSVR RSL
(a)
(b)
A
N
D
Fig. 3. RC models for (a) 3D ML search path and (b) 3D SL search path.
III. SIMULATION RESULTS
Here we use the 65nm parameters according to the ITRS re-
port [5] to evaluate different CAM architectures. Fig. 4 shows
the the delay and power per search operation of a searchline-
partitioned CAM (SPCAM) with respect to different numbers
of SL partitions. As the number of SL partitions increases,
we can see that the 256×128 CAM has significant reduction
in search delay. The search power is increased as the number
of SL partitions increases, because the total TSV counts and
additional logic are also increases.
Fig. 5 shows the delay and power per search operation
of matchline-partitioned CAM (MPCAM) with respect to
different numbers of ML partitions. As the number of ML
partitions increases, we can see that the 128×256 CAM has
significant reduction in search delay. The search power is
yy
(b)(a)
 0
 50
 100
 150
 200
 250
8421
Se
ar
ch
 p
ow
er
 (m
W
)
128x128
128x256
256x128
# of SL segements (N  )
 0
 0.2
 0.4
 0.6
 0.8
 1
 1.2
 1.4
 1.6
8421
Se
ar
ch
 ti
m
e 
(ns
)
128x128
128x256
256x128
# of SL segements (N  )
Fig. 4. Delay and power per search operation with respect to different SL
partitions.
increased as the number of ML partitions increases, the reason
is similar to the SPCAM.
xx
(a) (b)
 0
 50
 100
 150
 200
 250
 300
8421
Se
ar
ch
 p
ow
er
 (m
W
)
128x128
128x256
256x128
# of ML segements (N  )
 0
 0.2
 0.4
 0.6
 0.8
 1
 1.2
 1.4
 1.6
8421
Se
ar
ch
 ti
m
e 
(ns
)
128x128
128x256
256x128
# of ML segements (N  )
Fig. 5. Delay and power per search operation with respect to different ML
partitions.
IV. CONCLUSIONS
In this paper, an architecture evaluation tool for 3D CAMs
has been proposed. It can aid the designer to explore different
possibilities for partitioning a CAM across multiple layers.
Analysis results show that the search delay can be reduced
using 3D stacking in comparison with the 2D technology.
However, the search power is increased because of the addi-
tional capacitance of TSVs. Also, SPCAM is suitable for the
application demanding large number of words, but MPCAM
is suitable for the application demanding wide words.
ACKNOWLEDGMENT
This work was supported in part by the Ministry of Science
and Technology, Taiwan, R.O.C., under Contract NSC 102-
2221-E-008-108-MY3 and MOST 104-2220-E08-009.
REFERENCES
[1] P. Jacob, A. Zia, O. Erdogan, P. M. Belemjian, J.-W. Kim, M. Chu, R. P.
Kraft, J. F. Mcdonald, and K. Bernstein, “Mitigating memory wall effects
in high-clock-rate and multicore cmos 3-D processor memory stacks,”
Proc. of the IEEE, vol. 97, no. 1, pp. 108–122, Jan. 2009.
[2] W. R. Davis, E. C. Oh, A. M. Sule, and P. D. Franzon, “Application
exploration for 3-D integrated circuits: TCAM, FIFO, and FFT case
studies,” IEEE Trans. on VLSI Systems, vol. 17, no. 4, pp. 496–506, Apr.
2009.
[3] Y.-J. Hu, J.-F. Li, and Y.-J. Huang, “3-D content addressable memory
architectures,” in Proc. IEEE Int’l Workshop on Memory Technology,
Design and Testing (MTDT), Hsinchu, Sept. 2009, pp. 59–64.
[4] B. Agrawal and T. Sherwood, “Ternary CAM power and delay model:
Extensions and uses,” IEEE Trans. on VLSI Systems, vol. 16, no. 5, pp.
554–564, May 2008.
[5] Semiconductor Industry Association, “International technology roadmap
for semiconductors (ITRS), 2009 edition,” Seoul, Korea, Dec. 2009.
2
1st Intl. Workshop on Emerging Memory Solutions, DATE Conference 2016, Dresden, Germany
2016, KLUEDO, Publication Server of University of Kaiserslautern
