FPGA-based quantum circuit emulation: A case study on Quantum Fourier transform by Lee, Y. H. et al.
FPGA-Based Quantum Circuit Emulation:
A Case Study on Quantum Fourier Transform
Y. H. Lee ∗ M. Khalil-Hani † M. N. Marsono ‡
VeCAD Research Laboratory, Faculty of Electrical Engineering,
Universiti Teknologi Malaysia, 81310 UTM Skudai, Johor Bahru, Malaysia
E-mail: yee.hui@fkegraduate.utm.my∗, khalil@fke.utm.my†, nadzir@fke.utm.my‡
Abstract—Hardware emulation based on field programmable
gate array (FPGA) platform is vital to harness the power of quan-
tum parallelism. As resource requirement grows exponentially
in classical modeling quantum system, an optimum hardware
architecture is crucial to emulate practical quantum circuits.
Quantum Fourier transform (QFT) finds application in several
critical quantum algorithms. In this work, experiments are
conducted based on QFT to identify suitable qubit representation
and hardware design technique. Experimental results show that
24-bit fixed point representation and serial architecture achieve
optimal computation accuracy and resource utilization in QFT
circuit emulation.
Index Terms—quantum circuit; hardware emulation; quantum
Fourier transform; field-programmable gate array
I. INTRODUCTION
As physical realization of quantum computer is proven to
be extremely challenging [1], implementation of viable large-
scale quantum computer is still ongoing. Various technolo-
gies namely ion trap [2], nuclear magnetic resonance [3],
and superconductor [4] are attempted for the construction
of quantum computer. Nevertheless, only several successful
implementations of small-scale quantum computations have
been achieved.
Instead of focusing on the realization of quantum gates, a
different approach known as quantum annealing which solves
optimization problems by finding the minimum point is used
in the 128-qubit D-Wave One and 512-qubit D-Wave Two
systems [5]. Yet D-Wave systems are currently too costly to
be prevalent and alternatively, a more affordable platform such
as field programmable gate array (FPGA) is preferable.
As the strength of quantum computations relies on the
parallelism provided by quantum superposition, simulation of
quantum algorithms using classical computer with sequential
behaviour is inadequate. Since a quantum system grows expo-
nentially with the increasing number of quantum bits (qubits),
simulation of a modest size quantum circuit might take hours
or up to several days [6]. In order to mimic the parallel nature
of quantum operations, hardware emulation based on FPGA
platform is vital.
As the key challenge in simulating and emulating quantum
computations is the exponential growth of resource utilization,
optimum hardware architecture is crucial to realize the emu-
lations of practical quantum circuits. Although precision error
due to the limitation of classical platform in expressing qubit
is inevitable, it can be minimized by either applying error cor-
rection model or selecting an appropriate qubit representation
format [7]. In this work, a study on the efficiencies of different
hardware architectures and qubit representations with varying
precisions for the purpose of quantum hardware emulation is
conducted.
This paper presents a case study of FPGA-based quantum
circuit emulation on quantum Fourier transform (QFT). QFT
finds application in several critical quantum algorithms such as
phase estimation, order finding, integer factorization, discrete
logarithm and hidden subgroup problem which offer signif-
icant speed-up over the classical approaches [8]. Although
classical Fourier transform involves complex computations, its
quantum counterpart QFT is straightforward and can be easily
mapped into simple quantum circuit. Hence, QFT is suitable
to be used as an entry-level case study for quantum circuit
emulation.
The rest of the paper is organized as follows. Theoretical
background is given in Section II. Section III explains the
theory and application of QFT algorithm. Related works are
discussed in Section IV. Section V presents the proposed
quantum circuit emulator, followed by experimental results
and analysis in Section VI. Lastly, conclusion is in Section VII.
II. THEORETICAL BACKGROUND
A. Quantum Bit (Qubit)
A quantum bit or a qubit is a unit of information describing
a two-dimensional quantum system. In quantum world, a qubit
can be in superposition of both state |0〉 and state |1〉. A
two-by-one matrix with complex numbers can be used to








A generic qubit can be written as in (2) where
|c0|2 + |c1|2 = 1. |c0|2 is to be interpreted as the probability
of the qubit to be found in state |0〉 after measurement whereas
|c1|2 is to be interpreted as the probability of the qubit to be
found in state |1〉. Whenever measurement is performed on a
qubit, it automatically collapses to a classical bit.
|ψ〉= c0|0〉+ c1|1〉 (2)
512978-1-4799-4833-8/14/$31.00 c©2014 IEEE
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on August 22,2021 at 08:47:06 UTC from IEEE Xplore.  Restrictions apply. 
B. Quantum Circuit Model
Quantum circuit model proposed by [9] is one of the most
matured architectures to represent the evolution of a quantum
system. Quantum dynamics/transformations are mapped to
quantum gate operations which can be represented by uni-
tary matrices. Quantum circuit model for QFT consists of
Hadamard gate, controlled phase-shift gate and SWAP gate.
1) Hadamard Gate: Hadamard gate, H is one of the most
useful single qubit gates which maps computational basis
states into superposition of basis states. The operation of
Hadamard gate can be expressed in matrix form as shown
in (3). Hadamard gate operation results in superposition of
basis states with equal probability and enables parallel com-









2) Controlled Phase-Shift Gate: Controlled phase-shift gate
is a 2-qubit gate which made up of one control qubit and one
input qubit. If the control qubit is true, phase-shift operation
will be performed on the input qubit, otherwise no operation
is executed. Matrix representation of the controlled phase-shift




1 0 0 0
0 1 0 0
0 0 1 0





3) SWAP Gate: Quantum SWAP gate performs simple
operation for switching the amplitudes of a quantum state.




1 0 0 0
0 0 1 0
0 1 0 0
0 0 0 1
⎤
⎥⎦ (5)
C. Tensor Product (Kronecker Product)
Tensor product or Kronecker product is a method to com-
bine one quantum system with another [1]. It is an important
process to allow the operation of multi-qubit gates in a
quantum system with superposition of basis states. An example



















III. QUANTUM FOURIER TRANSFORM (QFT)
Discrete Fourier transform (DFT) is a linear transformation
that can be defined in matrix form as described in (7) where
ω is the the 2n-th root of unity i.e. ω = e
2πi
2n . Coincidently,
the DFT matrix is a unitary matrix and therefore, can be







1 1 1 ... 1
1 ω1 ω2 ... ω2n−1
1 ω2 ω4 ... ω2(2n−1)
... ... ... ... ...
1 ω2n−1 ω2(2n−1) ... ω(2n−1)(2n−1)
⎤
⎥⎥⎥⎥⎦ (7)
For Fourier transform in quantum domain, discrete signal
samples are encoded as the amplitude sequences of a quantum
superposition of basis states [10]. For example in Shor’s fac-
toring algorithm [11], QFT is applied to take the superposition
outputs from previous process and generate its periodicity. The
QFT operation which transforms an arbitrary superposition of

















As requirement for a valid quantum state, |ψ〉 must be
normalized such that ∑2
n−1
j=0 | f ( jΔt)|2 = 1. If the signal inputs
do not fulfil this requirement naturally, the amplitudes which





j=0 | f ( jΔt)|2.
With some algebraic manipulations, QFT operation can be
derived from (9) to form (10). Based on (10), QFT algorithm
can be effectively mapped to the quantum circuit model as
depicted in Fig. 1.











(|0〉+ e2πi0. jn |1〉)(|0〉+ e2πi0. jn−1 jn |1〉) (10)
















































Fig. 1: QFT circuit model
IV. RELATED WORK
In 2004, a quantum circuit emulator that based on pipeline
architecture was proposed by [7]. An expander circuit, a
quantum error model, and a probabilistic measurement module
are designed as parts of the emulator. The presented hardware
emulator is implemented using FPGA and tested on QFT and
Grover’s search algorithms. Although pipeline design provides
high throughput, it consumes as much resources as parallel
implementation but with additional pipeline registers. This
has highly restricted the size of quantum circuit that can be
supported by FPGA-based hardware emulation. Furthermore,
the considerations of quantum error and probabilistic compu-
tation outputs in hardware emulation have greatly increased
the design complexity.
2014 International Symposium on Integrated Circuits (ISIC) 513
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on August 22,2021 at 08:47:06 UTC from IEEE Xplore.  Restrictions apply. 
On the other hand, [12] proposes to emulate quantum gates
based on their arithmetic operations. Basic quantum gates are
categorized into HRC (Hadamard, phase-shift, and controlled
phase-shift gates) and XYZC (X, Y, Z and CNOT gates)
groups. QFT and several classical logic circuits are used as
test cases of their designed emulator. However, the proposed
technique is limited by the available quantum gates and can
hardly be extended to the emulation of practical quantum
circuit.
Similar to [7], [13] emulates QFT circuit based on pipeline
design technique. The QFT circuits are constructed using
individual quantum gates and case studies of 2-qubit up to
16-qubit QFT are conducted. However, in order to fully utilize
the advantage of QFT, the input states should be expanded into
superposition of basis states instead of in computational basis
state as in [13].
V. QUANTUM CIRCUIT EMULATOR
As defined in (2), a quantum vector state can be repre-
sented by two complex floating point numbers for state |0〉
and state |1〉, respectively. For emulation purposes, floating
point numbers are replaced by fixed point representations (as
depicted in Fig. 2) to ensure effective resource utilization.
Precision error can be reduced by increasing the number of
mantissa bits with trade-off on logic resources.
    	
    
Fig. 2: Fixed point representation
Efficient hardware architecture for quantum circuit emula-
tion is crucial. By using QFT as case study, the efficiencies
of different architectures for FPGA emulation are studied.
Fig. 3 illustrates the concurrent design of a 2-qubit QFT.
Concurrent technique allows parallelism and effective resource
















Fig. 3: Concurrent architecture for 2-qubit QFT
On the other hand, pipeline design (as shown in Fig. 4)
is capable of producing high throughput with low CPD by
inserting pipeline registers after each stage of unitary transfor-
mation. However, exponential number of registers are required
for each increasing number of qubit. Quantum systems that
can be emulated by pipelined circuit emulator are strictly
constrained by the available resources of the targeted FPGA
platform. Furthermore, high throughput is not critical for
















Fig. 4: Pipeline architecture for 2-qubit QFT
An alternative hardware design technique is known as serial
architecture. Although serial design may require multiple itera-
tions to complete one computation, it opens up the opportunity
for resource sharing. Typically, serial architecture is selected
if resource utilization is the critical design consideration. By
choosing this design approach, resources such as registers
and hardware multipliers can be shared while maintaining
reasonable CPD.
As classical modeling of quantum system suffers from the
issue of huge resource requirement, serial design is exclusively
suitable for quantum hardware emulation. Fig. 5 shows the
design of a serial 2-qubit QFT circuit that consists of a
control unit (CU) and a datapath unit (DU). To the best
of our knowledge, this is the first proposal in literature to




































Fig. 5: Serial architecture for 2-qubit QFT
VI. RESULTS AND ANALYSIS
A. Experimental Setup
The quantum circuits discussed in this paper are designed
using SystemVerilog hardware description language and tar-
geted for Altera Stratix IV EP4SGX530KF43C4 FPGA. The
designed QFT circuits are tested using testbench method and
verified against the golden reference model in C software. In
addition, the functionality of the QFT unitary transformations
is compared with its corresponding Fourier transform matrix
(as described in (7)) to further confirm the correctness of the
QFT emulation model.
For the purpose of identifying an optimum architecture
that is suitable for practical quantum circuit emulation, two
main experiments are carried out in this work. First is to
identify the fixed point representation with tolerable precision
error and reasonable resource utilization. Second experiment
is performed to investigate the efficiencies of quantum circuit
514 2014 International Symposium on Integrated Circuits (ISIC)
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on August 22,2021 at 08:47:06 UTC from IEEE Xplore.  Restrictions apply. 
TABLE I: Resource utilization and precision error of different fixed point representations for concurrent QFT circuits
Total Bit (1 Sign bit; 1 Decimal Bit; N Mantissa Bit)
2-qubit QFT 5-qubit QFT
16 18 20 22 24 26 16 18 20 22 24 26
Combinational ALUTs 288 322 356 390 424 458 16448 39824 81504 96128 106384 128080
Dedicated Logic Registers 256 288 320 352 384 416 2048 2304 2560 2816 3072 3328
DSP Block 18-bit Element 32 64 64 64 64 64 736 1024 1024 1024 1024 1024
Precision Error (%) 0.012 0.005 0.001 0.000 0.000 0.000 0.383 0.089 0.030 0.007 0.003 0.001



















































(b) Dedicated logic registers


















































(d) Maximum operating frequency
Fig. 6: Resource utilization and operating frequency of QFT
circuit emulations (24-bit fixed point representation) based on
different hardware architectures
emulations (based on QFT case study) using concurrent,
pipeline and serial hardware design techniques.
B. Result and Analysis
In comparison with concurrent and pipeline designs, it can
be observed from Fig. 6 that serial design achieves balance on
both resource utilization and operating frequency. The use of
dedicated logic registers in serial architecture is lessen notably
yet reasonable operating frequency is maintained. The usage
of DSP blocks and logic elements can be further reduced by
reusing the hardware resources in serial design.
Experimental results in Table I show that 16-bit fixed
point representation incurs significant precision error for both
2-qubit and 5-qubit QFT emulations. By expanding the number
of mantissa bits up to 22-bit (24 total bits), precision error of
the 2-qubit QFT is successfully being brought down to zero. In
term of resource utilization, increment in the number of bits
causes gradual growth in resource utilization. Based on the
conducted experiments, it can be concluded that 24-bit fixed
point representation provides sufficient computation accuracy
for both 2-qubit and 5-qubit QFT emulations.
VII. CONCLUSIONS AND FUTURE WORK
Optimum hardware architecture is critical to enable hard-
ware emulation of practical quantum applications. QFT is pre-
sented in this paper as a case study to evaluate the efficiencies
of different qubit representations and hardware architectures.
Based on the experimental results, 24-bit fixed point repre-
sentation and serial architecture achieve optimal computation
accuracy and resource utilization in QFT circuit emulation.
As future work, the presented quantum circuit emulator will
be further tested on the emulation of real-world quantum
applications such as quantum computational intelligence.
ACKNOWLEDGEMENT
This work is supported by the Ministry of Higher Education
(MOHE) and Universiti Teknologi Malaysia (UTM) under
Fundamental Research Grant Scheme (FRGS) Vote No. 4F327.
REFERENCES
[1] N. S. Yanofsky and M. A. Mannucci, Quantum computing for computer
scientists. Cambridge University Press Cambridge, 2008, vol. 20.
[2] C. Monroe, D. Meekhof, B. King, W. Itano, and D. Wineland, “Demon-
stration of a fundamental quantum logic gate,” Physical Review Letters,
vol. 75, no. 25, p. 4714, 1995.
[3] N. A. Gershenfeld and I. L. Chuang, “Bulk spin-resonance quantum
computation,” science, vol. 275, no. 5298, pp. 350–356, 1997.
[4] J. Mooij, T. Orlando, L. Levitov, L. Tian, C. H. Van der Wal, and
S. Lloyd, “Josephson persistent-current qubit,” Science, vol. 285, no.
5430, pp. 1036–1039, 1999.
[5] M. H. Amin, N. G. Dickson, and P. Smith, “Adiabatic quantum opti-
mization with qudits,” Quantum information processing, vol. 12, no. 4,
pp. 1819–1829, 2013.
[6] M. A. Perkowski, “Multiple-valued quantum circuits and research chal-
lenges for logic design and computational intelligence communities,”
IEEE Connections, vol. 3, no. 4, pp. 6–12, 2005.
[7] A. U. Khalid, Z. Zilic, and K. Radecka, “FPGA emulation of quantum
circuits,” in IEEE International Conference on Computer Design: VLSI
in Computers and Processors. ICCD 2004. IEEE, 2004, pp. 310–315.
[8] M. A. Nielsen and I. L. Chuang, Quantum computation and quantum
information. Cambridge university press, 2010.
[9] A. Barenco, D. Deutsch, A. Ekert, and R. Jozsa, “Conditional quantum
dynamics and logic gates,” Physical Review Letters, vol. 74, no. 20, p.
4083, 1995.
[10] C. P. Williams and S. H. Clearwater, Explorations in quantum comput-
ing. Springer, 1998.
[11] P. W. Shor, “Algorithms for quantum computation: discrete logarithms
and factoring,” in 35th Annual Symposium on Foundations of Computer
Science, 1994 Proceedings. IEEE, 1994, pp. 124–134.
[12] M. Aminian, M. Saeedi, M. S. Zamani, and M. Sedighi, “FPGA-based
circuit model emulation of quantum algorithms,” in IEEE Computer
Society Annual Symposium on VLSI. ISVLSI’08. IEEE, 2008, pp. 399–
404.
[13] J. F. Rivera-Miranda, A. Caicedo-Beltrán, J. D. Valencia-Payán, J. M.
Espinosa-Duran, and J. Velasco-Medina, “Hardware emulation of quan-
tum fourier transform,” in IEEE Second Latin American Symposium on
Circuits and Systems (LASCAS), 2011. IEEE, 2011, pp. 1–4.
2014 International Symposium on Integrated Circuits (ISIC) 515
Authorized licensed use limited to: UNIVERSITY TEKNOLOGI MALAYSIA. Downloaded on August 22,2021 at 08:47:06 UTC from IEEE Xplore.  Restrictions apply. 
