Protection Engineering Construction and Capability Analysis of FDS Parallel Computing Environment  by Zhe, ZHAO et al.
Available online at www.sciencedirect.com
The 5th Conference on Performance-based Fire and Fire Protection Engineering
Protection Engineering Construction and Capability Analysis of 
FDS Parallel Computing Environment 
ZHAO Zhea,b , YAO Hao-weia,b , LIANG Donga,b,∗, HU Zhi-jiana,b
aSafety Engineering Research Center, Department of Engineering, Sun Yat-sen University, Guangzhou 510006, China 
bGuangdong Provincial Key Laboratory of Fire Science and Technology, Guangzhou 510006, China 
Abstract 
The fire field model FDS can get more details of distribution of physical quantities and its process, so FDS has been widely used
in the Performance-based design. To improve computing speed of the software, Based on MPI (Message Passing Interface) using 
the VC++ language, FDS parallel computing program has been compiled.  Specific to The fluid field using multiregional 
dissected and each sub segment would be allocated appropriate panel point to calculate. The calculation shows that with the 
increased amount of processor computing time has significantly reduced. The procedure and method that the text has established 
was feasible, and could further extend to large-scale parallel computing and engineering application. 
© 2011 Published by Elsevier Ltd.  
Keywords: FDS, parallel computing, MPI; 
∗  Corresponding author. Tel.: +86-20-3933-2230; fax: +86-20-3933-2927. 
E-mail address: Just4ujust4u@126.com 
1877–7058 © 2011 Published by Elsevier Ltd.
doi:10.1016/j.proeng.2011.04.719
Procedia Engineering 11 (2011) 723–729
Open access under CC BY-NC-ND license.
Open access under CC BY-NC-ND license.
724  ZHAO Zhe et al. / Procedia Engineering 11 (2011) 723–729
1. Introduction 
When conducting performance-based fire protection design and assessment, usually fire model is used to 
simulate the possible scenarios in the building fire. Among these fire model, field simulation is more widely used, 
which uses a computer to solve the state parameters in the process of fire spatial distribution and changes with time 
simulation, and can give the details of the fire process changes. 
Fire field simulation software (FDS) is developed by NIST, which belongs to a model of CFD (computational 
fluid dynamics). The simulation software to the fire simulation of fluid motion as the main object, the FDS 
calculation, the single is usually out of memory and computing slow. Therefore, it is necessary to introduce parallel 
computing to accelerate the calculation speed. 
Parallel computing means that more than one task or multi-instructions, or multi-data items are processed 
simultaneously. Completion of the process is known as parallel processing computer system, in which multiple 
processors are connected through the network in an orderly and organized manner. 
At present, the main form of parallel technology with the CPU, including non-symmetric multi-processor (AMY) 
technology, COMA technology, cluster (Cluster) technology, NUMA distributed memory access technology, 
symmetric multi-processing (SMP) technology, massively parallel processing (MMP) technology. Including 
symmetric multiprocessing (SMP) technology [1] is the most commonly used servers.  
This parallel computing of FDS software use symmetric multi-processing technology. To set up SMP system, the 
most crucial point is the need to match the appropriate CPU. Therefore, the same product model, the same type of 
CPU cores should be ensured to build SMP system. So maximizing the performance of the platform would be 
possible. FDS parallel solver is used to calculate the large-scale multi-processor computing problems, the principle 
is to partition the audience area parallel computing, the basic idea shown in Figure 1. 
Serial Computation in sub-domain 






Fig.1 Flow chart of parallel computing 
2.  Design and build of the hardware platform 
In the condition of determined physical model and numerical algorithm, the number of computational speed 
depends on the CPU, CPU performance, memory, access bandwidth of memory, the bandwidth node connectivity, 
mesh quality and district quality. Each specific problem, each particular machine is corresponding to a best number 
of partitions. If the partition is too many, CPU communication volume increases, when the partition number 
ZHAO Zhe et al. / Procedia Engineering 11 (2011) 723–729 725
increases to a certain extent but will reduce the computing speed; if partition number is too small, not fully use of 
more CPU participating in calculations, will also affect the calculation-speed. 
The speed of multi-CPU reading shared memory is limited by the main board bandwidth under the structure of 
symmetric multi-processor, and bus speed of communication between nodes is limited. This card uses a new 
generation of highway interface “PCI-E-Gigabit Ethernet”, which is the third generation of I\O highway technology. 
Switch to D-link DES-1016D, 1000MHZ. The protocol of RSH is used as the communication protocol between 
master and slave servers. 
The machine involved in the calculation˖CPU˖Intel Core™2 E7400ˈdominant frequency is 2.8GHZ. 
The memory of master server is 8GB, while the slave servers are 4GB, so the calculation demand is basically 
satisfied. 
At the same time in the platform build process, the following points should be noted: 
• The main server memory should be large enough (larger than the amount of data generated in a 
parallel calculation); 
• The performance of the operation nodes in the machine are uniform, to avoid the bottlenecks formed by slowest 
node. 
Figure 2 shows the schematic of the parallel computing platform, the CPU performance of the same, through 
the switches communicate with each other. 
Master Server 
CPU4CPU1 CPU2 CPU3 …
Fig.2 schematic diagram of parallel calculation floor 
3. Software Platform 
This article uses 64-bit Windows XP system, and more than enough data can be pre-loaded into physical memory 
by the applications, so that the processor can quickly access to the data. This feature reduces the data load virtual 
memory, and hard to find from the low-speed read and write data to the time it takes them, can make the 
application faster and more efficiently. 
3.1. MPI and MPICH 
MPI (Message-Passing Interface) is the standard specification of message passing parallel programming model, 
also is a platform-and language-independent programming standard. The current widespread using are the MPI v1.2 
and MPI v2.0 standard. MPI parallel programming platform is consisted of the transfer function from the standard 
message form and related auxiliary functions, multiple processes by calling these functions, to communicate. More 
than one program started at the same time, the formation of multiple independent processes running on different 
processors, with a separate memory space, inter-process communication to achieve by calling the MPI function.    
MPICH is an MPI which achieves the greatest impact, and was widely used on various systems to support 
parallel and distributed programming. It has the simultaneous development with the MPI specification, and for MPI 
726  ZHAO Zhe et al. / Procedia Engineering 11 (2011) 723–729
v2.0, there is realized version of MPICH 2. In this paper, data transfer between each node was carried out by 
MPICH 2, which can achieve a substantial increase in parallel efficiency. 
3.2. Results of parallel computing platform 
Main concern is parallel speedup, parallel efficiency and scalability of parallel computing. Speedup is calculated 
to accelerate the entire process of parallel computing capacity; parallel efficiency is the average utilization of each 
processor; Scalability (Scalability) refers to the performance of parallel computing as processors increase in 
proportion to the number of improvement Capacity. Based on efficiency measures such as law, with the increase in 
the number of processors, if the need to increase the size of small amounts of computation, the parallel efficiency 
can be maintained, then the parallel algorithm has good scalability. 
Speedup˖
Sn = T1 / Tn                                                                                                            (1) 
Parallel efficiency˖
En =Sn/nh100ˁ                                                                                                    (2)                         
Among˖T1 means calculated time of single processor, 
Tn means calculated time of n processors, 
        Sn – Speedup,      
En -Parallel efficiency. 
For example, FDS calculation of a single room fire [4], two sets of experiments was done, computational grid 
size of 300 million, respectively, and 6,000,000 grid, each experiment 3 times, taking the average of the results 
shown in Table 1, Table 2. 
Table 1 Parallel Computation of a single room fire test record form˄6 million˅
Nodes Single-step iteration 
time /s 
Speedup (Sn) Parallel efficiency  
En˄%˅
1 22.36 1 100 
2 11.75 1.90 95
3 9.18 2.43 81
4 7.12 3.14 78.5 
5 5.63 3.97 79.4 
6 4.81 4.65 77.5 
Table 2 Parallel Computation of a single room fire test record for˄3 million˅
ZHAO Zhe et al. / Procedia Engineering 11 (2011) 723–729 727
Nodes Single-step iteration 
time /s 
Speedup (Sn) Parallel efficiency  
En˄%˅
1 12.63 1 100 
2 6.58 1.92 96
3 5.13 2.46 82
4 3.77 3.35 83.7 
5 3.11 4.04 80.8 
6 2.65 4.76 79.3 
In table ’Single-step iteration time /s [5] means the average calculated time in the first 100 steps 
Fig.3 the relationship between single-step iteration time and processors number 
Single-step iteration time (td) Figure 3: when the calculation size unchanged, with the increase in processor, td 
reduced, that means the computation time required will significantly reduce. When the nodes number is 6, the 
single-step iteration time is much less compared with that of single. 
Fig.4 the relationship between speedup and processors number
Speedup (Sn) in Figure 4: With the increase in the number of processors to speed up ratio will increases. And the 
calculation can be seen from Figure 2, the speedup is slightly larger than the small scale of large-scale computing.
728  ZHAO Zhe et al. / Procedia Engineering 11 (2011) 723–729
Fig.5 the relationship between parallel efficiency and processors number Parallel efficiency (E)  
ZHAO Zhe et al. / Procedia Engineering 11 (2011) 723–729 729
in Figure 5: Contrast 3,000,000 and 6,000,000 grid parallel efficiency, it can be found that as the grid size 
increases, the parallel efficiency and basic do not change; when the same number of the grid, with the treatment The 
numbers increase in parallel efficiency decreased. Because with the number of processors increases, data exchange 
between the area increases, and traffic becomes relatively larger, the parallel efficiency will be slow down. So it can 
be predicted that the parallel efficiency will be low, if the nodes number is too large. 
Speedup from the chart the trend point of view, the fewer the number of processors, the performance of parallel 
computer to play better, but with the processor increases, decreasing the performance of parallel machines, this is 
because BUS bandwidth of shared memory machine is limited, CPU number the more memory access conflict 
intensified affect speedups. And, when calculating the size of nodes and while expanding, the parallel efficiency 
almost does not change, the server scalability proved good. 
4. Conclusion 
A good cost-effective parallel computing platform can be built through the integration of commercial hardware 
and software, capable of mass FDS calculation. 
For multi-processing symmetric platform, when the number of nodes increase small (from 3 units to 6 units), the 
iteration time there are much smaller, but little change in the parallel efficiency. 
The platform is good for scalable, parallel efficiency basically do not change with the calculations scale.  
When the number of processors increases, the area of data exchange between the increase in traffic is relatively 
larger, the parallel efficiency. For a specific number of grids, the number of processors has the best range. 
When the 6 sets of processor parallel machine operators, its computation speed is 4.5 times higher than that of 
single, and have a good parallel efficiency. Therefore, when dealing with large-scale computing, 6 processors have a 
very good price. 
Acknowledgements 
This work was supported by Guangdong Provincial Key Laboratory of Fire Science and Technology 
(2010A060801010.) 
References 
[1] Pang Wen-jiang, Wu Jian-lin. CFD Platform for parallel computing and performance analysis [J]. Chongqing Science and Technology 
Institute, 2009 (6): 158-161 
 [2] Maximilian Emans. Performance of parallel AMG-preconditioners in CFD-codes for weakly compressible flows[J]. Parallel Computing,
2010,(36): 326–338 
[3] Nan jiang. Altix 50 NS layer in three-dimensional steady flow numerical simulation of the parallel efficiency [J]. Aeronautical Computing 
Technique, 2010, (40) :46-48. 
[4] Yao HaoweiˈLiang DongˈHuZhijian. Reconstruction of a residential building fire based on large eddy simulation[C]. International 
Symposium on Fire Science and Fire Protection Engineering. 2009 
[5] Zheng Qiu-Ya, Liu Sanyang. multi-block structured grid CFD and load balancing of parallel computation [J]. Engineering Mathematics, 
2010, (2) :219-224. 
