A new parameterized architectural design for SENSE reconstruction by Siddiqui, M.F. et al.
            
A New Parameterized Architectural Design for 
SENSE Reconstruction 
  
Muhammad Faisal Siddiqui
1
, Ahmed Wasif Reza
1
, Jeevan Kanesan
1
, and Hammad Omer
2
 
 
1 Department of Electrical Engineering, Faculty of Engineering, University of Malaya, 50603 Kuala Lumpur, Malaysia 
2 Department of Electrical Engineering, COMSATS Institute of Information Technology, Islamabad, Pakistan 
Corresponding addresses  
wasif@um.edu.my 
 
 
 
Abstract: Reconfigurable hardware based architectures that could 
provide good quality image reconstruction for parallel Magnetic 
Resonance Imaging (MRI) within very less computation time are in 
high demand. Hardware platforms for specific reconstruction 
algorithms dramatically increase the power efficiency and decrease 
the execution time. This research proposes a new parameterized 
architecture design for Sensitivity Encoding (SENSE) 
reconstruction. This architecture is also synthesized for Field 
Programmable Gate Array (FPGA). Complex multiplier, divider 
and complex matrix multiplier modules are designed to implement 
the algorithm. Furthermore, the variable data bus widths are used in 
the data path of the architecture, which leads to reduce the 
hardware cost and silicon area. Experimental results and 
comparisons prove the efficiency of the architecture. Moreover, in 
terms of computation time, the result shows that the proposed 
technique is 1000 times faster than the conventional MATLAB 
reconstruction, while maintaining the quality of the reconstructed 
image. The results indicate that this architectural design can prove 
to be a significant tool for SENSE reconstruction  in MRI scanners.  
 
Keywords: MRI, Parallel MRI, SENSE, FPGA, HDL.  
1. Introduction 
In recent years, magnetic resonance imaging (MRI) has seen 
a wide use by the medical practitioners, as an advanced 
imaging tool, to identify the pathological conditions of the 
patients. MRI has proven itself as a low risk, dominant and 
flexible assessment technique for medical examination over 
the years because of its features, like better soft tissue 
differentiation, high contrast and spatial resolution. 
Furthermore, MRI can detect certain diseases much earlier 
than other medical imaging techniques [1]. One major 
limitation of the current MRI is its long acquisition time, 
which challenges the use of MRI for some applications and 
also increases the hospitals’ resource usage and power. 
Parallel imaging (PI) has been one of the most renowned 
innovations in magnetic resonance (MR) field which enables 
to increase the speed of the MRI scans by acquiring the data 
in parallel. The use of multiple receiver coils in PI reduces 
the image acquisition time significantly. Parallel MRI 
reconstruction techniques have been the central focus of 
research in recent times. Many different solutions have been 
proposed by different researchers [2],  which can be broadly 
categorized into ‘image-domain’ approaches (e.g., SENSE) 
and ‘k-space’ approaches (e.g., GRAPPA).  In this paper, 
Sensitivity Encoding (SENSE) [3] algorithm is used to 
design the hardware for PI reconstruction.  
The quick development of MRI reconstruction algorithms 
also demands the best possible solution for its 
implementation in hardware. These reconstruction algorithms 
are computationally intensive by nature, which consumes 
longer time and demands tremendous power. To satisfy such 
computation-hungry applications effectively, different 
platforms are used. These platforms may consist of 
computation cores, general purpose central processing unit 
(CPU), general purpose graphics processing unit (GPU), 
field programmable gate arrays (FPGAs), and combination of 
those [4-11]. Each technology has its advantages and 
limitations.  
In this paper, a parameterized architectural design of 
SENSE algorithm for two receiver coils is presented. A 
parameterized design allows the designers to reconfigure the 
hardware data buses widths. The proposed architecture is 
developed using hardware descriptive language (HDL), 
Verilog. This synthesizable code can be implemented on 
FPGA. Application specific hardware designs for FPGA 
provide greater speed than a software implementation on the 
general purpose CPU and also dissipate less power at the 
cost of design efforts [7]. Furthermore, architectural designs 
on reconfigurable hardware offer to exploit parallelism in the 
algorithm which yields a significant reduction in computation 
time. To validate the proposed design efficiency, MATLAB 
Simulink/ModelSim co-simulation platform is used. 
The rest of the paper is organized as follows. Section 2 
presents an overview of SENSE reconstruction. Section 3 
introduces the proposed architecture design for SENSE 
reconstruction algorithm. Experimental results and 
comparisons are discussed in Section 4. Finally, conclusions 
are drawn in Section 5. 
2. SENSE Reconstruction Algorithm 
SENSE is an image-domain reconstruction algorithm for 
parallel MRI. In this technique, sensitivity maps of the 
receiver coil elements are used to calculate the aliased signal 
component at each pixel and then these signals are allocated 
at actual pixel positions in the unwrapped image [2]. 
Mathematically, in matrix notation, SENSE algorithm can be 
defined as: 
CMS            (1) 
where “ C ” is a sensitivity matrix (also called encoding 
matrix) which contains the information of coil sensitivities of 
each coil, “ M ” is the image to be reconstructed and “ S ” is 
The 3
rd
 International Conference on Computer Engineering and Mathematical Sciences (ICCEMS 2014)   
335 
 
  
            
the aliased image captured by the MRI scanner. It becomes 
an inverse matrix problem and the solution of the unfolded 
image (desired MR image) “ M ” is given by: 
SCM
1
          (2) 
3. Hardware Implementation of SENSE 
The importance of speed in medical instruments generates 
the requirement to develop application specific hardware for 
real-time image processing. Architecture designs achieve 
better performance; reducing the computation time of the 
scan, and decreasing the power consumption of the 
resources. Different hardware platforms or combination of 
different platforms are used to implement such reconstruction 
algorithms. Table 1 provides a comparison of some 
platforms, such as CPUs, GPUs and FPGAs.  
 
Table 1. Comparison of CPU, GPU and FPGA 
 Peak 
(32/64bit) 
GFlops 
Power 
(W) 
Design 
Effort 
CPU (Core i7) 70/70 144 Easy 
GPU (RV870) 560/2800 150 Middle 
FPGA (Vertex-7) 160/628 7 Hard 
Reconfigurable hardware (such as FPGA) based designs 
improve the performance space ratio at the cost of design 
efforts [7]. Furthermore, instead of spending heavy costs to 
provide a solution for any application, the parameterized 
architecture design for FPGA provides a platform for the 
researches to validate the design, enables rapid prototyping 
of complex algorithms and a chance to avail debugging 
procedures. 
Parameterized architectural design of SENSE algorithm 
for two receiver coils with an acceleration factor of 2 is 
implemented in HDL, Verilog. Fixed point arithmetic is used 
to represent the decimal numbers in binary notation. 
Furthermore, complex numbers are saved in 32-bit real part 
and 32-bit imaginary part. However, the data widths can be 
changeable as the architecture is generalised by 
parameterization method.  
To implement SENSE reconstruction of two coils with an 
acceleration factor of 2, following equations have to be 
solved for each aliased pixel in undersampled data (folded 
image) of 2256128   in hardware: 
)128, ( )128, (), ( ), (), ( 111  yxMyxCyxMyxCyxS (3) 
)128, ()128, (), ( ), (), ( 222  yxMyxCyxMyxCyxS (4) 
These equations of a linear system can be solved by taking 
the inverse of “ C ” matrix and multiplying with matrix “ S ”, 
which gives the resultant unfolded image “ M ”. To solve this 
problem, a complex multiplier is needed to multiply two 
complex numbers in hardware. Furthermore, for the inverse 
of a matrix ( 22  ), the position and sign changing method is 
used. In this method, to find the inverse of C matrix, the 
position of )1 , 1(C is replaced by )2 , 2(C and the signs  of 
)2 , 1(C and )1 , 2(C  are changed. Then it is divided by the 
determinant of C matrix. However, the division is more 
costly than multiplication in hardware, so the inverse of a 
number module is used; to find the determinant 
multiplicative inverse value and then it is multiplied by the 
matrix, to reduce the number of divisions in the architecture. 
Finally, the complex matrix multiplier module is designed to 
multiply two complex matrices. 
Restoring divider is used to implement the inverse of a 
number module. It is a sequential divider and consumes 
longer time than other modules of the system. It obeys an 
iterative method and for n-bit data, it consumes n-clock 
cycles. On the other hand, remaining modules, such as adder, 
subtractor, complex multiplier and complex matrix multiplier 
are single cycle modules. 
Complex multiplier is implemented with 3 multipliers and 
4 add/sub modules instead of the conventional method which 
consumes 4 multipliers and 2 add/sub units. This approach 
increases the efficiency of the architecture design because 
multiplier is more costly than add/sub operation in hardware 
implementation. 
The complex matrix multiplier comprises of 4 complex 
multipliers and 4 add/sub modules. Moreover, the variable 
data width is used in intermediate connections of the 
architecture. The usage of variable bit-width in data-path 
provides efficient use of the silicon area and reduces the 
power consumption. 
The controller unit controls the data to write into the 
specific registers, and also deals with control signals 
according to the status inputs. Furthermore, it also generates 
the desired values of the rows and columns of the input 
image to be fed to the system.  
MATLAB Simulink/ModelSim co-simulation design of 
the proposed architecture is depicted in Figure 1. It consists 
of HDL co-simulation block (ModelSim Simulator), 
MATLAB fcn (Function) block, number of converter blocks, 
delay block and Simout workspace block. 
HDL co-simulation block communicates with ModelSim 
simulator and provides the input/output from MATLAB to 
ModelSim platform and vice-versa. The overall hardware 
design of SENSE reconstruction algorithm is inside this 
block. Simulink provides a test bench to this HDL block. 
MATLAB fcn (Function) block reads the data set, which 
includes coil sensitivity matrix and aliased image matrix. 
These values are fed to HDL co-simulation block as input 
according to the values of rows and columns of the desired 
aliased image pixels provided by the architecture. Coil 
sensitivity matrix variables are represented by “S1” to “S4” 
and folded image values are illustrated by “IM” in the 
MATLAB fcn block. 
However, convert blocks are used for type casting between 
MATLAB (double) variables to binary fixed point and vice 
versa. Delay block is used to reset the system initially for 10 
ns. Simout workspace block takes the data from the converter 
of HDL co-simulation block and saves in the workspace of 
The 3
rd
 International Conference on Computer Engineering and Mathematical Sciences (ICCEMS 2014)   
336 
 
  
            
 
Figure 1. MATLAB Simulink/ModelSim co-simulation of the proposed design 
 
the MATLAB. By accessing this workspace variable, the 
output image is displayed through “imshow” function of 
MATLAB. 
4. Results and Discussion 
The proposed architecture is validated by MATLAB 
Simulink/ModelSim co-simulation. The synthesized code of 
the proposed hardware design is also tested on Xilinx ISE 
13.2 software to find the maximum operating frequency of 
the proposed architecture. Table 2 shows the maximum 
operating frequency of the proposed design when the data - 
width of the divider is set to 64-bit and the computation time 
of SENSE reconstruction for 256256  size image at that 
frequency. 
 The dataset is acquired from 1.5 Tesla MRI scanner. The 
dimensions of the receiver coil sensitivity maps, aliased 
image (undersampled) and original image (fully-sampled) are 
2256256  , 2256128   and 256256 , respectively. 
Table 2. Maximum operating frequency and reconstruction 
time on FPGA (XILINX Virtex-7) 
Maximum 
Frequency (MHz) 
Reconstruction Time 
(ms) 
400 5.24 
Different comparison parameters are shown in Table 3. 
The performance is compared  with the MATLAB 
reconstructed output. The parameters include artefact power, 
mean g-factor, median g-factor, standard deviation g-factor, 
and computation time. 
The computation time of the proposed architectural design 
is quite remarkable, i.e., a thousand times better than 
MATLAB reconstruction at the cost of design efforts. This is 
because the proposed design is hardware based, whereas 
MATLAB is a heavily demanding software tool. Mean, 
median and standard deviation g-factors are the same in both 
reconstructions. However, the artefact power of the proposed 
design reconstruction is very high, but still it is in the 
acceptable range. Artefact power of the reconstructed image 
increases due to truncation error of the decimal number 
representation in hardware. For simplicity, the input provided 
to the proposed design is of 16-bit in 9.7 fixed point format. 
Due to this format, some values of the inputs are truncated 
and this increases the artefact power. This error can be 
reduced by increasing the bit-width at the input end. 
However, the current data width is the best trade-off and 
gives satisfactory results for an acceptable artefact power. 
Table 3. Comparison between MATLAB and proposed 
architecture reconstruction 
 MATLAB 
Reconstruction 
Proposed 
Architecture 
Reconstruction 
Computation 
Time (ms) 
5400 5.24 
Mean g-factor 18.2616 18.2616 
Median g-factor 8.9766 8.9766 
Std. Mean g-
factor 
19.8518 19.8518 
Artefact Power 5.7964 e
-29
 2.5 e
-3
 
Figure 2 illustrates the proposed hardware design 
reconstructed image and the original image. The dimension 
of the resultant gray scale image is 256256 . 
 
(a)                                             (b) 
Figure 2. (a) Original image (b) Proposed design 
reconstructed image 
The 3
rd
 International Conference on Computer Engineering and Mathematical Sciences (ICCEMS 2014)   
337 
 
  
            
The resultant image proves the competency of the 
proposed design; as the reconstructed image is an almost 
identical copy of the original image. Figure 3 shows the 
magnified version of the resultant image and the original 
image to enhance the comparison. The clarity of the image 
with respect to the original figure shows the remarkable 
efficiency of the proposed hardware. 
 
(a)                                             (b) 
Figure 3. Magnified section of  (a) Original image             
(b) Proposed design reconstructed image 
 
From the above results and comparisons, it is clear that the 
proposed system provides a significantly higher efficiency. 
The most remarkable feature of the proposed design is its 
low time consumption. Less power consumption is also one 
of the main features of the proposed work, as FPGA designs 
consume much less power than the CPU.  
5.  Conclusion 
In this study, a high-speed and low power parameterized 
architecture is proposed for SENSE reconstruction of two 
receiver coil elements. This efficient system provides 
promising artefact power value in reconstructing the parallel 
MR undersampled image. According to the experimental 
results, the proposed approach yields better performance in 
terms of execution time. Moreover, the generalized version 
of this architecture can be easily used in the modern MR 
scanners.  
In future, this work will be employed for different variants 
of SENSE and for any number of coils and acceleration 
factor. Furthermore, the computation time could be further 
decreased by using the advanced parallel processing 
techniques.  
References 
[1] S. Bauer, R. Wiest, L.-P. Nolte, and M. Reyes, "A survey of 
MRI-based medical image analysis for brain tumor studies," 
Physics in Medicine and Biology, vol. 58, p. R97, 2013. 
[2] D. J. Larkman and R. G. Nunes, "Parallel magnetic resonance 
imaging," Physics in medicine and biology, vol. 52, p. R15, 
2007. 
[3] K. P. Pruessmann, M. Weiger, M. B. Scheidegger, and P. 
Boesiger, "SENSE: sensitivity encoding for fast MRI," 
Magnetic resonance in medicine, vol. 42, pp. 952-962, 1999. 
[4] I. Chiuchişan and M. Cerlincă, "Implementation of Real-Time 
System for Medical Image Processing using Verilog Hardware 
Description Language," in Proceedings of the 9th 
International Conference on Cellular and Molecular Biology, 
Biophysics and Bioengineering (BIO’13), Chania, 2013, pp. 
66-69. 
[5] I. L. Dalal and F. L. Fontaine, "A Reconfigurable Real-time 
Reconstruction Engine for Parallel MRI," 2007. 
[6] S. S. Stone, J. P. Haldar, S. C. Tsao, W.-m. Hwu, B. P. Sutton, 
and Z.-P. Liang, "Accelerating advanced MRI reconstructions 
on GPUs," Journal of Parallel and Distributed Computing, 
vol. 68, pp. 1307-1318, 2008. 
[7] Y. Wang, Y. He, Y. Shan, T. Wu, D. Wu, and H. Yang, 
"Hardware computing for brain network analysis," in Quality 
Electronic Design (ASQED), 2010 2nd Asia Symposium on, 
2010, pp. 219-222. 
[8] J. Cong, V. Sarkar, G. Reinman, and A. Bui, "Customizable 
domain-specific computing," IEEE Design and Test of 
Computers, vol. 28, pp. 6-15, 2011. 
[9] B. Wang, T. Wu, F. Yan, R. Li, N. Xu, and Y. Wang, 
"RankBoost Acceleration on both NVIDIA CUDA and ATI 
Stream platforms," in Parallel and Distributed Systems 
(ICPADS), 2009 15th International Conference on, Shenzhen, 
2009, pp. 284-291. 
[10] N.-Y. Xu, X.-F. Cai, R. Gao, L. Zhang, and F.-H. Hsu, "Fpga-
based accelerator design for rankboost in Web search 
engines," in International Conference on Field-Programmable 
Technology, 2007. ICFPT 2007., Kitakyushu, 2007, pp. 33-
40. 
[11] L. Zhuo and V. K. Prasanna, "Sparse matrix-vector 
multiplication on FPGAs," in Proceedings of the 2005 
ACM/SIGDA 13th international symposium on Field-
programmable gate arrays, California, 2005, pp. 63-74. 
 
The 3
rd
 International Conference on Computer Engineering and Mathematical Sciences (ICCEMS 2014)   
338 
 
  
