Histogram-based Auto Segmentation: A Novel Approach to Segmenting
  Integrated Circuit Structures from SEM Images by Wilson, Ronald et al.
Histogram-based Auto Segmentation: A Novel
Approach to Segmenting Integrated Circuit
Structures from SEM Images
Ronald Wilson, Navid Asadizanjani, Domenic Forte and Damon L. Woodard
Florida Institute for Cyber Security Research
University of Florida
Gainesville, FL, USA
Abstract—In the Reverse Engineering and Hardware Assur-
ance domain, a majority of the data acquisition is done through
electron microscopy techniques such as Scanning Electron Mi-
croscopy (SEM). However, unlike its counterparts in optical
imaging, only a limited number of techniques are available to
enhance and extract information from the raw SEM images. In
this paper, we introduce an algorithm to segment out Integrated
Circuit (IC) structures from the SEM image. Unlike existing
algorithms discussed in this paper, this algorithm is unsupervised,
parameter-free and does not require prior information on the
noise model or features in the target image making it effective in
low quality image acquisition scenarios as well. Furthermore,
the results from the application of the algorithm on various
structures and layers in the IC are reported and discussed.
Index Terms—Reverse Engineering, Hardware Assurance,
SEM, Segmentation
I. INTRODUCTION
Reverse engineering (RE) is the science of understand-
ing the constituent components from a final product. It is
essentially the entire engineering work flow performed in
reverse. This engineering technique can be applied to a variety
of products ranging all the way from aircraft to Integrated
Circuits (IC). As intuition suggests, the higher the complexity
in the product, the higher the hardships involved in RE the
product. This is especially true with the present day ICs with
several billion transistors and interconnections occupying a
very small area on a silicon wafer.
Despite the obvious shortcoming of illegal duplication of
proprietary technology, RE has several benefits. One of them
is to have a better understanding of the structure of the final
product and the effect of the physical processes used in the
manufacturing of the product. For instance, RE methods were
used to analyze and debug an IC [1]. Similar applications can
be seen in Trojan detection [2], [20] and settling legal disputes
between companies on infringement of intellectual property
usage.
In the past, RE of IC was mostly done with the help
of subject matter experts. Detailed optical images of the
IC die were taken and components marked down by hand
[3]. This was a very laborious process. However, with the
incorporation of image analysis algorithms into RE, the entire
process became easier and semi-automated. A simple method
is described in [4], where the image was processed using
a median filter followed by correlation matching. But with
the introduction of higher node technologies, optical imaging
became obsolete and required imaging modalities with higher
resolution such as Scanning Electron Microscopy (SEM). The
approaches developed for handling issues with optical images
cannot be reliably applied to SEM images. Some of these
issues such as lack of understanding of noise models, variation
in size of relevant features and others are discussed in the detail
along with their implications for hardware trust and assurance
in [10].
II. THE IMAGING MODALITY
SEM images are produced by accelerating electrons towards
a region of interest and observing the interactions of the
electrons with the target materials. There are two main types
of interactions: Secondary Electrons (SE) and Back Scattered
Electrons (BSE). The quality of these two interactions depend
on the constituent materials in the imaged IC and several
parameters set by the operator such as:
• Excitation Voltage: The excitation voltage of the electrons
control the depth of penetration into the sample. The
higher the voltage, the higher the penetration.
• Magnification: This parameter helps zoom into the image.
Small features can be easily imaged by adjusting this
parameter.
• Resolution: The parameter refers to the number of pixels
in the image. The higher the pixel count, the better the
quality of the image.
• Dwelling Time: This refers to the time the scanning beam
takes to measure a single pixel in the image. The higher
the time spent, the better the quality of the image.
The affect of tuning these parameters can be seen in Figure
1. In order to image an IC, the IC has to be depackaged and
delayered. Depackaging involves the extraction of the die from
its case and delayering performs removal of materials from the
die at a set depth. A detailed description of the depackaging
and delayering process has been described in [4], [5]. After
delayering, the IC is imaged in a row raster fashion. The type
of image to be selected for further processing can be based
on the criteria set forth in earlier works [6]. In our case, we
have chosen the SE images for all our experiments. However,
this algorithm can also be applied to BSE images.
ar
X
iv
:2
00
4.
13
87
4v
1 
 [e
es
s.I
V]
  2
8 A
pr
 20
20
Fig. 1. Affect of tuning imaging parameters in SEM [7], [10]
Even though the SEM produces images of considerably
high resolution, it does have inherent noise that introduces
artifacts in the image. All the methods proposed previously,
to the best of our knowledge, overcame the inherent flaws by
tuning the parameters discussed above. The effect of tuning the
parameters has been studied before with higher quality images
taking over 30 days to process [7]. The time frame to process
images for a higher node technology would be unfeasible. This
is one of the major obstacles to RE and one of the problems
this paper would tackle.
In order to reduce the time requirements, it will be necessary
to use lower quality images. This implies using images with
the inherent noise corruption. There are several sources of
noise in the imaging modality. Some of them are topography
of the material, manufacturing defects, diffusion, damage from
deprocessing the IC, atmospheric exposure during deprocess-
ing, electron transmigration from prior usage and conductivity
issues [10]. The degree of contribution from each one of these
noise sources is hard to model. Furthermore, there might be
other sources of noises that has not been fully understood yet.
Segmentation methods on SEM images currently employed
in the field of RE involves two distinct approaches. The
first approach relies on spatial and frequency domain filtering
typically followed by thresholding [4], [11]–[13]. The primary
concern behind the usage of these methods are the assumptions
on the noise model corrupting the SEM images. Applying
filters based on their effectiveness on optical images is not
necessarily guaranteed to carry over to SEM images. The
second approach relies on machine learning including deep
learning [8], [14]–[17], [19], [20]. The methods rely on trained
models which require a lot of labeled data while still producing
faulty segmentation. Moreover, these features are hand-picked
from image patches by the user. Due to the inherent nature of
the ICs, the type and size of features are drastically different
at various layers [18]. Therefore, an exhaustive set of features
cannot be determined efficiently. Most of these algorithms,
hence, prioritize the simple features in the metal and contact
layer.
III. OUR APPROACH
The proposed algorithm has only one optional parameter.
This is the size of the smallest feature that would be present
in the target image. For our application, this would be the
cross-section for the Vertical Interconnect Access (via) in our
imaging plane. However, the parameter can be set to a default
kernel size of 2x2. The idea is to use the largest kernel size
possible in order to speed up the segmentation. There are three
main steps in the algorithm.
A. Filtering image using simple merge and optimization
The initial step is to extract the actual histogram of the
image. In order to do that, the image needs to be filtered. We
employ a median filter of the fixed kernel size and convolve
it with the image in a row raster fashion using stride equal
to the width of the kernel. This ensures no overlap between
two consecutive kernel patches. The median value is calculated
and the difference between each consecutive kernel patches is
extracted. Once a row is complete, a frequency distribution
of the difference between consecutive medians are taken.
The differences are stored in ‘α’ and their corresponding
frequencies in ‘β’.
τ = min
[
β(i)− β(i) ∗ α(i)
]
(1)
Next, an optimization is performed using Equation 1 and
a merge threshold ‘τ ’ for the current row is calculated. All
kernel blocks with median difference less than ‘τ ’ are merged
together and their combined median is extracted. All pixels
in these kernels will have their intensity values replaced by
the combined median value. This process is performed on
all rows. Once completed, the frequency distribution of pixel
intensity values is extracted from the filtered image. This is
the estimated histogram for the image.
B. Finding local peaks in the histogram
In this step, we will extract the local maxima from the
estimated histogram for the image. This is performed using
an accumulator generated by algorithm 1. The algorithm
highlights the most likely local peaks in the histogram. These
peaks are taken to be the different materials present in the
image.
C. Finding the decision boundaries
This is the final stage of the algorithm. In this step, we
utilize the local peaks obtained from the previous step to
decide the boundary intensities of the different materials
present in the image. We have two approaches to deciding
the decision boundaries:
• Distance-based: The euclidean distance of the candidate
intensity from the detected peaks.
Algorithm 1 To find significant peaks in histogram
1: Sort intensity using corresponding frequency in decreasing
order and save as I
2: Set I ← I - I[max(frequency)]
3: Extract positive values from I and save as F
4: Initialize currentIntensity to 1
5: Initialize an accumulator array of size(F)
6: while currentIntensity < max(F) do
7: Find index of currentIntensity in F and save as f
8: Assign all elements from F[0] to F[f] into G
9: Initialize flag to True
10: while flag is True do
11: if currentIntensity + 1 in G then
12: currentIntensity ← currentIntensity + 1
13: else
14: currentIntensity ← currentIntensity + 1
15: Set flag as False
16: end if
17: end while
18: for any e ∈ G > currentIntensity do
19: accumulator[e] ← accumulator[e] + 1
20: end for
21: end while
22: Element-wise multiply accumulator with corresponding
indices in unsorted intensity
23: Threshold accumulator using its mean value
24: Repeat Steps from 3 to 23 for |negative values| in I
25: Join left and right accumulators preserving order
• Histogram-based: The intensity of minimum frequency
between two detected peaks.
In our results, the histogram-based method is used.
IV. DISCUSSION
The vias, shown in Figure 6, are the smallest feature for
our samples and their size was found to be bounded between
a kernel of size 4x4. Hence, we set the kernel size parameter
to 3x3.
The primary problem the paper is trying to address can
also be seen in Figure 6(c). The noise in the image is causing
the polysilicon regions to merge. Unlike common applications
of unsupervised segmentation in which the bounding regions
does not have to be kept precise, a merge of polysilicon
regions during the segmentation process can affect the entire
functionality of the final RE product. Hence, it is better
to slightly over-segment the polysilicon regions than under-
segment it. In this way, we can avoid incorrectly merged
regions.
The histogram of the raw polysilicon layer from Figure
6(a) is shown in Figure 2(a). It can be reasoned from the
histogram that there are only two possible peaks. However,
it can be seen from Figure 6(a) that there are three different
materials in the image: the polysilicon structures, the vias and
the silicon substrate. This ambiguity prompted the need to pre-
process the image histogram using the filtering step mentioned
(a) Raw Image Histogram
(b) Extracted Image Histogram
Fig. 2. Histogram Correction
in Section III-A. The corrected histogram can be seen in Figure
2(b). There are three distinct peaks in the estimated histogram,
thereby, resolving the ambiguity.
Once the histogram is estimated, the next step is to calculate
the decision boundaries between the different materials. This
is, however, difficult due to the noise in the histogram. A sim-
ple smoothing would be effective with the risk of smoothing
smaller peaks. Hence, algorithm 1 from Section III-B was used
to extract the peaks.
The accumulator described in the algorithm (algorithm 1
steps 1-21) can be seen in Figure 3(a). The basic idea behind
the accumulator is that peaks that are farther away from
the global maxima get more votes. This is a reasonable
assumption. However, the number of votes may be higher than
the actual peaks itself. Hence, the accumulator is multiplied
with the corresponding frequencies (algorithm 1 step 22). This
scales down the accumulator so that only points with high
frequency remain and the rest are suppressed. The result of
this step in shown in Figure 3(b). The suppression process is
enhanced further by thresholding the accumulator using the
mean of all the values (algorithm 1 step 23).
Thresholding the accumulator yields the graph shown in
Figure 4. The process results in distinct peaks with disconti-
nuities between them. The peaks that are in contact are merged
into one. The peaks that remain are highlighted using red
dots in the plot. Once the peaks are obtained, the histogram
is divided between the peaks as described in Section III-C.
The member intensities of each peak can be decided using
a simple distance metric such as euclidean distance or by
using the lowest frequency point between the peaks in the
histogram itself. The variation in the segmentation from these
two methods is very small. It stands to reason that, the better
the representation of the image by the histogram, the better
the results using the latter method.
TABLE I
SEGMENTED RESULTS FROM APPLYING VARIOUS FILTERS TO ALL LAYERS (THRESHOLD = 108 SELECTED USING GROUND TRUTH)
Layer Raw image AnisotropicDiffusion
Curvature
filter
Gaussian
filter
Median
filter
HAS
(Distance)
HAS
(Histogram) Ground Truth
Doped
Poly
silicon
Metal
(a) Raw accumulator
(b) Accumulator multiplied with frequency
Fig. 3. Accumulator [To the right direction only]
Fig. 4. Merged and thresholded accumulator showing selected peaks
V. RESULTS
The algorithm was applied to three main types of layers
found in IC: the doped region, polysilicon region and the
metal interconnect layer. The source images were taken from
a SmartCard IC with the parameters: 150 µm, 10 µs/pixel,
4096x4096 at 5 kV for the magnification, dwelling time,
resolution and excitation voltage respectively.
The doped region and its segmented result is depicted in
Figure 5. Doped regions are usually easy to segment. However,
due to the inherent noise in the image, as described earlier,
the image can have variation in its intensity. This can be
seen as bright regions on the lower left and right of the
image. This forces the algorithm to assign them to different
Fig. 5. Doped Region [Doping(red/orange), Silicon substrate(Green)]
peaks. But, the shape of the structure is still conserved. The
polysilicon region is shown in Figure 6(a). This is the hardest
to segment. The noise in this region is much higher and any
imperfections in the segmentation would cause a major setback
in the RE workflow. The shape of the structures determines
its functionality in the completed circuit. Even though the
segmentation depicted in Figure 6(b) is still a bit noisy, it still
extracts the shape of the structure along with the vias. The
noise is mostly concentrated on the silicon substrate which
is inconsequential to the RE process. The metal layer and its
segmented result is shown in Figure 7. Due to the nature of
the materials used, the effect of the inherent noise sources
on these types of images is minimal. Hence, they are easy to
segment.
We have also performed a comprehensive comparison of all
existing image analysis techniques for hardware assurance to
that of our algorithm on all three layers. The results are shown
in Figure 8 and Table I. The ground truth for the pixels in the
images were labelled manually. Each pixel was labelled as
foreground or background and their corresponding distribu-
tions were obtained. For a good quality image segmentation,
the distributions between foreground and background will
not overlap and, therefore, produce a large distance between
them. We have used this method as a means of comparison
Fig. 6. Polysilicon layer [Vias(red), Polysilicon(green), silicon sub-
strate(blue)] [10]
Fig. 7. Metal Layer [Metal(green), silicon substrate(blue)]
between different techniques. The distance for the raw image
distribution is given as a baseline. For simplicity, we have
used the Manhattan distance in our comparison. It can be seen
that our method consistently works better than all existing
methods currently used in the hardware assurance domain.
The parameters for the filters were taken from earlier works
[4], [11]. The results obtained from the performance analysis
are also consistent with the results reported in these works. It
can be observed from Table I that Curvature filter performs
the worst with the majority of the required features being
corrupted. Gaussian and Median filters performs comparably
with noisy under/over-segmented but usable results. Aniso-
metric Diffusion (AD) filter outperforms all of them. It can
be attributed to the fact that AD requires an estimate of the
noise in the image before processing. This prior information
enables it to perform better. However, in case of samples with
low noise like the metal layer, AD causes the metal features
to merge making it unsuitable for RE. With the initial pre-
Fig. 8. Performance of various filtering algorithms on SEM images. Higher
value indicates better performance. [Raw: Raw image, AD: Anisotropic
Diffusion, CURV: Curvature filters (Gaussian Curvature), GAUS: Gaussian
smoothing, MED: Median filter, HAS-D/H: Distance/Histogram]
processing in our approach, noise is considerably suppressed
and a better estimate of the image histogram is obtained. This
ensures consistent and effective segmentation of IC structures
irrespective of the materials present in the SEM image.
VI. CONCLUSION
In this paper, we were able to address and resolve a
major problem in the area of IC RE, the segmentation of IC
structures from poor quality SEM images. This was accom-
plished by extracting the histogram of the image, correcting
it and segmenting the histogram based on the number of
peaks in it. Even though this is the basic idea followed by
most segmentation algorithms, our method has some unique
advantages.
The algorithm strictly works on the histogram of the im-
age. Hence, the size of the image does not effect the final
segmentation. It does not try to model the noise sources or
the features. Hence, expensive data collection sessions can be
avoided. In addition, since the features are not modeled, partial
occlusions of the features due to image stitching would not
effect the segmentation process. The segmentation only relies
on the working principles of the imaging modality to provide
contrast between different constituent materials [9].
Unlike some off-the-shelf segmentation algorithms, our
method does not depend on the type of underlying distribution
of the pixels belonging to each of the materials. Hence, all
visible peaks are extracted from histogram by the algorithm
even in presence of noise.
The algorithm does not require parameter fine-tuning. The
only parameter used in the algorithm is the size of the smallest
feature that needs to be extracted. This can be easily calculated
from the raw image or set to the minimum size possible
without effecting the final result. Finally, being unsupervised,
the algorithm can be made completely automated with no
human interaction.
The application of our algorithm on SEM images of IC
indirectly reduces the time and labor required in imaging
the die by enabling the use of fast acquired lower quality
images. Considering the fact that imaging takes up most of
the time in the RE process [7], the algorithm would help in
the complete RE of ICs, with ICs using both legacy and higher
node technologies, in shorter time than previously possible.
REFERENCES
[1] Harriott, L. R., A. Wagner, and F. Fritz. ”Integrated circuit repair using
focused ion beam milling.” Journal of Vacuum Science & Technology
B: Microelectronics Processing and Phenomena 4.1 (1986): 181-184
[2] Courbon, F., Loubet-Moundi, P., Fournier, J.J. and Tria, A., 2015, March.
A high efficiency hardware trojan detection technique based on fast
SEM imaging. In Proceedings of the 2015 Design, Automation & Test
in Europe Conference & Exhibition (pp. 788-793). EDA Consortium
[3] Torrance, Randy, and Dick James. ”The state-of-the-art in IC reverse
engineering.” Cryptographic Hardware and Embedded Systems-CHES
2009. Springer, Berlin, Heidelberg, 2009. 363-381.
[4] Masalskis, G., 2008. Reverse engineering of CMOS integrated circuits.
Elektronika ir elektrotechnika, 88(8), pp.25-28.
[5] Principe, E.L., Asadizanjani, N., Forte, D., Tehranipoor, M., Chivas,
R., DiBattista, M., Silverman, S., Marsh, M., Piche, N. and Mastovich,
J., 2017. Steps Toward Automated Deprocessing of Integrated Circuits.
ISTFA.
[6] Blythe, S., Fraboni, B., Lall, S., Ahmed, H. and de Riu, U., 1993.
Layout reconstruction of complex silicon chips. IEEE journal of solid-
state circuits, 28(2), pp.138-145.
[7] N. Vashistha, M T Rahman, H. Shen, D L Woodard, N Asadizanjani
and M. Tehranipoor, Detecting Hardware Trojans Inserted by Untrusted
Foundry using Physical Inspection and Advanced Image Processing
Techniques, Publication pending in Spring Journal of Hardware System
and Security (HaSS), Tentatively December 1, 2018.
[8] Ral Quijada, Roger Dur, Jofre Pallars, Xavier Formatj, Salvador Hi-
dalgo, Francisco Serra-Graells,”Large-Area Automated Layout Extrac-
tion Methodology for Full-IC Reverse Engineering”, Publication pend-
ing, 2018
[9] Reimer, L., 2013. Scanning electron microscopy: physics of image
formation and microanalysis (Vol. 45). Springer.
[10] Botero UJ, Wilson R, Lu H, Rahman MT, Mallaiyan MA, Ganji F,
Asadizanjani N, Tehranipoor MM, Woodard DL, Forte D. Hardware
Trust and Assurance through Reverse Engineering: A Survey and
Outlook from Image Analysis and Machine Learning Perspectives. arXiv
preprint arXiv:2002.04210. 2020 Feb 11.
[11] B. M. Trindade, E. Ukwatta, M. Spence, and C. Pawlowicz,Segmentation
of integrated circuit layouts from scan electron microscopy images, in
2018 IEEE Canadian Conference on Electrical & Computer Engineering
(CCECE), pp. 14, IEEE,2018.
[12] A. Doudkin, A. Inyutin, and M. Vatkin, Objects identification on the
color layout images of the integrated circuit layers,in 2005 IEEE Intel-
ligent Data Acquisition and Advanced Computing Systems: Technology
and Applications, pp. 610614, IEEE, 2005.
[13] D. Lagunovsky, S. Ablameyko, and M. Kutas, Recognition of integrated
circuit images in reverse engineering, in Proceedings. Fourteenth Inter-
national Conference on Pattern Recognition (Cat. No. 98EX170), vol.
2, pp. 16401642, IEEE,1998.
[14] R. Nakagaki, Y. Takagi, and K. Nakamae, Automatic recognition of
circuit patterns on semiconductor wafers from multiple scanning electron
microscope images, Measurement Science and Technology, vol. 21, no.
8, p. 085501, 2010.
[15] D. Cheng, Y. Shi, T. Lin, B.-H. Gwee, and K.-A. Toh, Global template
projection and matching method for training-free analysis of delayered
IC images, in 2019 IEEE International Symposium on Circuits and
Systems (ISCAS), pp. 15, IEEE, 2019.
[16] D. Cheng, Y. Shi, T. Lin, B.-H. Gwee, and K.-A. Toh, Hybrid k-means
clustering and support vector machine method for via and metal line
detections in delayered IC images, IEEE Trans-actions on Circuits and
Systems II: Express Briefs, vol. 65,no. 12, pp. 18491853, 2018.
[17] D. Cheng, Y. Shi, B.-H. Gwee, K.-A. Toh, and T. Lin, A hierar-
chical multiclassifier system for automated analysis of delayered IC
images,IEEE Intelligent Systems, vol. 34, no. 2,pp. 3643, 2018.
[18] Wilson R, Acharya RY, Forte D, Asadizanjani N, Woodard D. A
Novel Approach to Unsupervised Automated Extraction of Standard Cell
Library for Reverse Engineering and Hardware Assurance. In ISTFA
2019: Proceedings of the 45th International Symposium for Testing and
Failure Analysis 2019 Dec 1 (p. 249). ASM International.
[19] Lippmann B, Werner M, Unverricht N, Singla A, Egger P, Dbotzky
A, Gieser H, Rasche M, Kellermann O, Graeb H. Integrated flow for
reverse engineering of nanoscale technologies. In Proceedings of the
24th Asia and South Pacific Design Automation Conference 2019 Jan
21 (pp. 82-89).
[20] Hong X, Cheng D, Shi Y, Lin T, Gwee BH. Deep Learning for Automatic
IC Image Analysis. In 2018 IEEE 23rd International Conference on
Digital Signal Processing (DSP) 2018 Nov 19 (pp. 1-5). IEEE.
[21] Shi Q, Vashistha N, Lu H, Shen H, Tehranipoor B, Woodard DL,
Asadizanjani N. Golden gates: a new hybrid approach for rapid hardware
trojan detection using testing and imaging. In2019 IEEE International
Symposium on Hardware Oriented Security and Trust (HOST) 2019
May 1 (pp. 61-71).
