Parallel semiconductor device simulation: from power to 'atomistic' devices by Asenov, A. et al.
 
 
 
 
 
 
Asenov, A. and Brown, A.R. and Roy, S. (1998) Parallel semiconductor 
device simulation: from power to 'atomistic' devices. In, International 
Workshop on Computational Electronics, 19-21 October 1998, pages 
pp. 58-61, Osaka, Japan.
 
 
 
 
 
 
 
 
http://eprints.gla.ac.uk/3031/ 
 
 
 
 
Glasgow ePrints Service 
http://eprints.gla.ac.uk 
invited 
Parallel Semiconductor Device Simulation: from Power to ‘Atomistic’ Devices 
A. Asenov, A.R. Brown and S. Roy 
Device Modelling Group 
Department of Electronics and Electrical Engineering 
University of Glasgow, Glasgow G12 SLT, UK 
Tel: +44 141 330 5233, Fax: +44 141 330 4907 
E-mail: A. Asenov 63 elec.gla.ac.uk 
This paper discusses various aspects of the parallel simulation of semiconductor devices on 
mesh connected MIMD platforms with distributed memory and a message passing 
programming paradigm. We describe the spatial domain decomposition approach adopted in 
the simulation of various devices, the generation of structured topologically rectangular 2D and 
3D finite element grids and the optimisation of their partitioning using simulated annealing 
techniques. The development of efficient and scalable parallel solvers is a central issue of 
parallel simulations and the design of parallel SOR, conjugate gradient and multigrid solvers is 
discussed. The domain decomposition approach is illustrated in examples ranging from 
‘atomistic’ simulation of decanano MOSFETs to simulation of power IGBTs rated for 1000V. 
1. Introduction 
Computer-aided numerical modelling and simulation 
has become an indispensable tool in the understanding, 
design and optimisation of various semiconductor devices. 
The complex architecture of modern devices requires in 
many cases 3D simulation. The use of parallel processing 
systems is a widely accepted approach to attain the 
computational power and memory requirement inherent in 
3D simulation [ 1-41. However, considerable attention must 
be paid to the underlying architecture of the parallel system 
to ensure maximum efficiency, scalability and portability of 
the code. 
We focus our discussion on the design of semiconductor 
device simulation algorithms for mesh connected MIMD 
platforms with distributed memory and four way 
connectivity. Our approach is based on finite difference or 
structured topologically rectangular finite element 3D grids 
[ 5 ] .  The nature of such grids makes them amenable to 
partitioning over mesh connected arrays of processors using 
domain decomposition techniques. The relative simplicity of 
the corresponding parallel code design reduces the time-to- 
answer when new models, simulation techniques and devices 
are investigated. 
We briefly describe the domain decomposition approach 
and the optimisation of the partitioning in the next section. 
Basic methods for generation of structured topologically 
rectangular 2D and 3D finite element grids are discussed in 
Section 3 .  Several aspects of the design of SOR, conjugate 
gradient and multigrid parallel solvers are discussed in 
Section 4. Finally. in Secton 5 we give examples of large 
scale parallel semiconductor device simulation, 
0-7803-4369-7/98 $10.00 0 1 9 9 8  IEEE 
2. Domain decomposition 
The basic idea of decomposing a 3D semiconductor 
device solution domain over a 2D array of NxM processors is 
illustrated in Fig.1 for a quarter of an IGBT cell. 
Fig. 1 :  Partitioning of a 3D semiconductor device solution domain 
over a 2D array of 2x4 processors. 
The device is partitioned into 2x4 subdomains along 
two of the spatial dimensions and each of the subdomains 
include the whole third dimension. In the above partition 
each processor is assigned a column of elements partitioned 
in one spatial plane and including all the elements in the 
58 
third direction. The partitioning must ensure that the edges of 
grid subdomains overlap only on neighbouring processors. 
For many iterative liinear solvers [6], highest parallel 
efficiency is obtained when the largest subdomain size 
volume, and the largest subdomain surface are at a 
minimum. Ideal first order load balancing only occurs when 
the number of grid nodes is exactly divisible by tlie number 
of processors in each dimension of the processor array. 
Otherwise deep oscillations in speedup and efficiency occur 
(Fig.2). To improve speedup for an arbitrary grid size, an 
alternate partitioning can be found using simulated annealing 
171 which Preserves the 4-way connectivity of grid 
subdomains and smoothes the performance oscillations. 
Fig.3: Triangulation of a circle with a nonuniform topologically 
rectangular grid. 
60 - 
50 - 
40 - 
? 
1 3 0 -  
a 
v3 
20 - 
10 - 
Rectilinear partitioning 
Annealed partitioning 
---------- 
10 20 30 40 53 Fig.4: Triangulation of a etched quantum dot with a 3D 
Problem size, i 
topologically rectangular FE grid. 
Fig.2: Speed-up with and without optimisation for an 8x8 processor 
anay 
3. Topologically rectangular grids topologically rectangular grids. 
To simplify the domain decomposition and the 
corresponding code design we use structured 2D and 3D 
topologically rectangular grids. Such grids allow two or 
tliree index ordering preserving tlie number of grid nodes in 
each one of the index directions. Nodes with neighbouring 
indices are physically adjacent in the grid. Most finite 
difference grids are inherently topologically rectangular, 
however it is also possible to construct topologically 
rectangular finite element (FE) grids. Although such 
requirements restrict to some extent the flexibility of the FE 
approximation we have found that devices with rather 
complicated shapes may be triangulated with topologically 
rectangular grids. 
Fig.? illustrates tlie basic concepts of the topologically 
rectangular grid ia A 2D example of finite element 
triangulation of a circle. Although such grids dbes not llaW 
the full flexibility of unstructured FE grids they allow for 
precise approximation of the region boundaries and local 
density refinement. The concept can be extended to 3D and 
Fig.4 illustrates the 3D FE triangulation of a n  etched 
quantum dot. 
With some care much more complex devices such as 
IGBTs (Fig.5) can be triangulated in 3D simulations using 
F i g 5  Schematic view of an IGBT. 
As can be seen from Fig.6 that the grid conforms not 
only to the cellular shape of the device but also to the 
metallurgical yn junctions inside. Fig.7 illustrates the quality 
of the approximation of the complex shape of the y n -  
junction deformed by implantation in the inter-cell space. 
59 
Fig.6: Triangulation of a 114 of an IGBT cell with a topologically 
rectangular grid. 
Fig.7: Detail of the grid enclosed by the pn-  junction.for a stopper 
implanted IGBT. 
4. Parallel solvers 
The design of parallel linear solvers is an open area of 
research. The efficient parallelisation of sparse LU 
decomposition is extremely difficult to achieve and good 
scalability is even harder. In the case of 3D problems, 
however, iterative linear solvers are in many cases the 
preferred choice due to the enormous memory requirement 
of the direct one In the case of meSll csnnected processors 
acceptable scalability cad be achieved for a large class of 
iterative methods including SOR, Newton-SOR [81 and 
multigrid techniques [91. 
Conjugate gradient (CG) type solvers are also amenable 
to parallelisation but the implementation of efficient and 
scalable preconditioning is still an issue. The incomplete 
Cholesky LU decomposition which is the preferable choice 
for single processor preconditioning of BiCGSTAB solvers 
is not inherently parallel. An alternative choice is to use 
polynomial preconditioning [ 101 which has a much higher 
degree of parallelism as it only requires the calculation of 
matrix-vector products. In Figs. 7 and 8 we illustrate the 
effect of various degree of polynomial preconditioning on 
the performance of a BiCGSTAB solver for the systems of 
equations arising from the discretisation of the Poisson and 
current continuity equations respectively. 
No preconditioning 
@ -- - - - - - - - -  1st order preconditioning 
2nd order preconditioning 
3rd order preconditioning 
- 
1E-03 
1E-04 
"0 1E05 
8 1E-06 
3 
1E-07 
1E08 
1E-09 
1E-10 
1 B l l  
0 50 100 150 200 
Number of Iterations 
Fig.8: Convergence property of BiCGSTAB solver with polyncmial 
precondihoning solving the system arising from the discretisition 
of the Poisson equation in a power diode simulation 
lEOl 
1E02 
1503 
lEOQ 
1E-05 
2 1E-06 
a 
B 
Ccl 
1E-07 
1508 
1509 
1E-10 
1Ell 
1.E-12 
No preconditioning 
1st order polynomial 
2nd order polynonlial 
3rd order polynomial 
0 190 200 399 
Number of Icerations 
Fig 9 Conv9iggence property of BiCGSTAB SOlVeY with p6lY~cmial 
preconditioning solving the system arising from the discretisition 
of the electron current continuity equation in a power ciode 
smulation 
Due to the stable positive definite structure of the matrix 
arising from the discretisation of the Poisson equation the 
convergence of the BiCGSTAB solver is much faster and 
smoother. The ill-conditioning and large dynamic range of 
the variables in the current continuity case slows down the 
convergence. The ripples in Fig.9 are most probably 
associated with truncation errors in calculating the direction 
of descent. 
60 
c 
Fig.10: Potential distribution in three 30 nm MOSFETs with different microscopic arrangements of the dopants. 
5. Examples 
‘Atomistic’ device simulation 
The discrete stochastic distribution of dopants in sub 
l00nm MOSFETs results in 3D potential and current 
distributions. Study of tlhe corresponding fluctuation effects 
requires 3D simulatioins with fine grain discretisation. 
Statistically significant samples of microscopically different 
devices have to be siniulated in order to understand tlie 
trends in tlie variation of the parameters and to build up 
reliable statistics on which the IC design and optimisation 
should be based. This is a computationally demanding task 
and a good candidate for parallel simulations. Fig.10 
illustrates the distribultion of the potential at threshold 
voltage for three macroscopically identical 30nm MOSFETs 
with different microscopic arrangements of the dopants and 
completely different threshold voltages. 
The distribution of electrons and holes in a cellular 
IGBT in on-state is shown in Fig. 11 (a) and ( b )  respectively. 
In Fig.ll(a) the MOSFET channel is clearly seen. On both 
figures significant ambipolar injection leading to 
conductivity modulation is seen in tlie low doped drift region 
of the device. 
6. Conclusions 
In this paper parallel approaches based on niesh 
connected arrays of processors for the purpose of 
semiconductor and nanostructure device simulation have 
been presented. The specific features of the parallel platiorm 
are accounted for in the design process which ensures the 
scalability and portability of the codes. 
References 
1. R.W. Dutton, K.H. Law, P.M. Pinsky, N. R. Aluru and 
B .P. Herndon: Proc. NASA Semiconductor Device 
Modeling Workshop (1996) 15. 
V. K. Naik, K. Eswar, M.K. Ieong: Proc. NASA 
Semiconductor Device Modeling Workshop (1096) 
77. 
0. Schenk, K. Gartner, W. Fichtner: Swiss Federal 
Institute of Technology Zurich, Technical Report No. 
97/19. 
U.A. Ranawake, C. Huster, P.M. Lenders and S.M. 
Goodnik: IEEE Trans. Computer-Aided Design of 
Integrated Circuits and Systems 13 (1994) 7 12 
A. Asenov, A. Brown and J.R. Barker: VLSI Design 6 
Cellular IGBT simulatiion 
2. 
3 .  
4. 
5 .  
_- 
Fig.11: Distribution of electrons ( a )  and holes (11) in an cellular 
IGBT. 
10. 0 .G.  Jolinson, C.A. Micchelli and G. Paul: Slam J. 
Numer. Anal. 20 (1983) 362 
61 
