On Actual Preparation of Dicke State on a Quantum Computer by Mukherjee, Chandra Sekhar et al.
1On Actual Preparation of Dicke State on a
Quantum Computer
Chandra Sekhar Mukherjee ∗‡, Subhamoy Maitra ∗§, Vineet Gaurav †¶ and Dibyendu Roy ∗‖
∗ Indian Statistical Institute, Kolkata, † Indian Institute of Science Education and Research, Mohali
Abstract—The exact number of CNOT and single qubit gates
needed to implement a Quantum Algorithm in a given architec-
ture is one of the central problems of Quantum Computation.
In this work we study the importance of concise realizations
of Partially defined Unitary Transformations for better circuit
construction using the case study of Dicke State Preparation. The
Dicke States (|Dnk 〉) are an important class of entangled states
with uses in many branches of Quantum Information. In this
regard we provide the most efficient Deterministic Dicke State
Preparation Circuit in terms of CNOT and single qubit gate
counts in comparison to existing literature. We further observe
that our improvements also reduce architectural constraints of
the circuits. We implement the circuit for preparing |D42〉 on the
“ibmqx2” machine of the IBM QX service and observe that the
error induced due to noise in the system is lesser in comparison
to the existing circuit descriptions. We conclude by describing the
CNOT map of the generic |Dnk 〉 preparation circuit and analyze
different ways of distributing the CNOT gates in the circuit and
its affect on the induced error.
Ieeekeywords Quantum Computing, Quantum Circuit,
Dicke States, IBMQ, CNOT, Noisy Computation.
I. INTRODUCTION
One of the most fundamental aspects of Quantum Me-
chanics is Quantum Computation. Quantum Computers enable
Quantum Algorithms that can perform operations with even
super exponential speed-ups in time over the best known
classical algorithms. Any quantum algorithm can be defined
as a series of unitary transformations and can be implemented
as a Quantum Circuit. A quantum circuit has a discrete set
of gates such that their combinations can express any unitary
transformation with any desired accuracy. Such a set of gates is
called a universal set of gates. We know from the fundamental
work by Barenco et.al [1] that single qubit gates and the
controlled NOT (CNOT) gate form a universal set of gates.
We call these gates as elementary gates.
Quantum State Preparation is a topic within Quantum
Computation that has garnered interest in the past two decades
due to applications of special quantum states in several fields
of Quantum Information Theory. A n-qubit quantum state
|ψn〉 can be expressed as the superposition of 2n orthonormal
basis states. In this work we look at n qubit states as super
position of the computational basis states |x1x2 . . . xn〉 , xi ∈
{0, 1}, 1 ≤ i ≤ n. The basis states in the expression of
|ψn〉 with non zero amplitude are called the active basis states.
Starting from the state |0〉⊗n any arbitrary quantum state can
be formed using O(2n) elementary gates, although for many n
‡ chandrasekhar.mukherjee07@gmail.com § subho@isical.ac.in ¶ vi-
neet.gaurav1@gmail.com ‖ roydibyendu.rd@gmail.com
qubit states preparation circuits with polynomial (in n) number
of elementary gates is possible. The family of Dicke States
|Dnk 〉 is one such example. |Dnk 〉 is the n-qubit state which is
the equal superposition state of all
(
n
w
)
basis states of weight
k. For example |D31〉 = 1√3 (|001〉+|010〉+|100〉). Dicke states
are an interesting family of states due to the fact that they have(
n
k
)
active basis states, which can be exponential in n when
k = O(n) but need only polynomial number of elementary
gates to prepare. Dicke states also have applications in the
areas of Quantum Game Theory, Quantum Networking, among
others. One can refer to [2] for getting a more in-depth view
of these applications.
There has been several probabilistic and deterministic Dicke
state algorithms designed in the last two decades [3], [4],
[8]. In this paper we focus on the algorithm described by
Ba¨rtschi et.al [2] which gives a deterministic algorithm that
takes O(kn) CNOT gates and O(n) depth to prepare the state
|Dnk 〉. To the best of our knowledge this circuit description
has the best gate count among the deterministic algorithms.
Here it is important to note that the paper by Cruz et.al [5]
describes two algorithms for preparing the |Dn1 〉 states, also
known as Wn states. Both the algorithms have better gate
count than the description by Ba¨rtschi et.al [2] and one of
the algorithms has logarithmic depth. However, their work is
restricted to |Dn1 〉 and has no implication on the circuits for
|Dnk 〉 , 2 ≤ k ≤ n− 2. We further observe in Section IV that
the circuit obtained by us after the improvements for |Dn1 〉 is
same as the linear Wn circuit described in [5].
Because of the noisy behavior of current generation Quan-
tum Computers the exact number of elementary gates needed
and the distribution of the gates over the corresponding circuit
become crucial issues which need to be optimized in order to
prepare a state with high fidelity. An example of a very recent
work done in this area is [7] which reduces the gate count of
AES implementation. In this regard we discuss the following
important problems in the domain of Quantum Circuit Design.
A unitary transformation acting on n qubits can be ex-
pressed as a 2n × 2n unitary matrix and can be decomposed
into elementary gates in several ways. Therefore finding the
decomposition that needs the least amount of elementary gates
is a very fundamental problem, with [6], [9] being examples
of work done in this area. It is crucial to minimize the
number of gates while decomposing a unitary matrix as every
gate induces some amount of error into the result. Especially
reducing the number of CNOT gates is of importance due to
the well known fact that it induces more error compared to
single qubit gates.
ar
X
iv
:2
00
7.
01
68
1v
1 
 [q
ua
nt-
ph
]  
3 J
ul 
20
20
2In this work we first describe a fundamental problem that
decomposition of matrix using a universal set of gates poses.
Let there be a unitary transformation that is to be performed on
a system of n qubits. This task can be represented as a unitary
matrix Un that works on the Hilbert Space Hn of dimension
2n. If we know the intended transformation for all the states
of any orthonormal basis of Hn, that completely defines the
unitary matrix Un. Let us consider such a transformation for
n = 1. If the transformation is defined for the two states in the
computational basis |0〉 and |1〉 then the corresponding unitary
matrix is completely defined. If the transformation is defined
as |0〉 → 1√
2
(|0〉 + |1〉) and |1〉 → 1√
2
(|0〉 − |1〉) then the
corresponding matrix is the Hadamard matrix, expressed as[
1√
2
1√
2
1√
2
− 1√
2
]
. However if the transformation is only defined
for one state, |0〉 → 1√
2
(|0〉 + |1〉) and not defined for |1〉
then there can be uncountably many unitary matrices that can
perform the said transformation. Specifically, any matrix of the
form
[
1√
2
α
1√
2
−α
]
can perform this task, where α ∈ C, |α|2 =
1
2 .
There exists many quantum algorithms where at a step a
particular transformation on n qubits is defined only for a a
subset of the states of a orthonormal basis. This creates the
possibility of there being uncountably many unitary matrices
capable of such a transformation. The algorithm described
in [2] contains such transformations that are not completely
defined for all basis states. We call such a transformation a
partially defined unitary transformation on n qubits. There
are possibly multiple unitary matrices that can perform this
transformation. In that case it becomes an important problem
to find out which candidate unitary matrix can be decomposed
using the minimal number of elementary gates.
Furthermore, the number of elementary gates needed to
implement a well defined Quantum Circuit also varies with
the architecture of the actual Quantum Computer. The architec-
tures of current generation Quantum Computers do not allow
for CNOT gates to be implemented between any two arbitrary
qubits. This CNOT constraint may further increase the total
number of CNOT and single qubit gates needed to implement a
Quantum Circuit on a specific Quantum Architecture. Against
this backdrop, let us draw out the organization of the rest of
the paper along with our contributions.
A. Organization and Contribution
In Section II we first describe the preliminaries needed to
support our work. We first define the concept of maximally
partial unitary transformation. We then describe the the circuit
in [2] for preparing Dicke States. We Denote the circuit
described in [2] for preparing |Dnk 〉 as Cn,k.
We start Section III by showing that a transformation
implemented in Cn,k is in fact a partially defined construction.
We then show that the unitary matrix used to represent the
transformation is not optimal in terms of number of elemen-
tary gates needed to decompose it. We propose a different
construction that indeed requires lesser number of elementary
gates and we also argue its optimality w.r.t the Universal gate
set.
In Section IV we use the construction to improve the gate
count of the circuit Cn,k in a generalized manner. We remove
the redundant gates in the circuit and analyze the different
partially defined transformations implemented in the circuit
to further reduce the gate counts of the circuit. We denote
the improved circuit for preparing any Dicke State |Dnk 〉 as
Ĉn,k. To the best of our knowledge this is the most optimal
implementation of a deterministic Dicke state preparation
circuit for |Dnk 〉 , 2 ≤ k ≤ n− 2.
Next in Section V we discuss the architectural constraints
posed by the current generation Quantum Computers that are
available for public use through different cloud services. We
discuss the restrictions in terms of implementing CNOT gates
between two qubits in an architecture and how it increases
the number of CNOT gates needed to implement a circuit in
an architecture. In this regard we show that the improvements
described by us in Section IV not only reduces gate counts
but also reduces architectural constraints.
We implement the circuits C4,2 and Ĉ4,2 on the IBM-QX
machine “ibmqx2” [11] and calculate the deviation in each
case from ideal measurement statistics using a simple error
measure. Next we show how two circuits with the same
number of CNOT gates and the same architectural restrictions
can lead to different expected error due to different CNOT
distribution across the qubits. We analyze this by proposing
modifications in the circuit Ĉ4,2 possible because partial nature
of certain transformations and how it reduces the number of
CNOT gates functioning erroneously on expectation in a fairly
generalized error model. We finish this section by drawing
out the general CNOT map of Ĉn,k, shown as the graph Gn,k
and observing that there in fact exists n− k − 1 independent
modifications each leading to a different CNOT distribution.
We conclude the paper in Section VI by describing the
future direction of work in this domain and also note down
open problems in this area that we feel will improve our
understanding both in the domains of partially defined trans-
formations transformations and architectural constraints.
II. PRELIMINARIES
We first define some terminologies that we frequently use
before moving onto some definitions and the preliminaries.
A. Notations
1) |v2〉: If we look at a system with n qubits then all the
2n orthogonal states in the computational basis can be
expressed as |b1b2 . . . bn〉 , bi ∈ {0, 1}, 1 ≤ i ≤ n.
In that case for representing the state |b1b2 . . . bn〉 we
treat it as a binary string as express it as |v2〉 where
v =
n∑
i=1
bi2
n−i.
2) Ry(θ): The Ry gate is a single qubit gate defined as
follows. Ry(θ) ≡ e−θY =
[
cos( θ2 ) − sin( θ2 )
sin( θ2 ) cos(
θ
2 )
]
.
3) X: This is a single qubit gate defined as X =
[
0 1
1 0
]
.
34) CU ij : While implementing a controlled unitary on a two
qubit subsystem we use the following notations. Let
there be a n-qubit system. CU ij represents a two qubit
controlled unitary operation where the i-th qubit is the
control qubit and the j-th qubit is the target qubit.
B. Maximally Partial Unitary Transformation
Let there be a unitary transformation that acts on n qubits.
To perform this transformation we have to create a correspond-
ing unitary matrix. If the transformation is defined for all 2n
states of some orthonormal basis then the unitary matrix is
completely defined. On the other hand if the transformation is
defined for a single state belonging to the computational basis,
only a single column of the corresponding 2n × 2n matrix
is filled. The rest can be filled up conveniently, provided its
unitary property is satisfied. In this regard we call a unitary
transformation on n qubits to be maximally partial if it is
defined for 2n − 1 states of some orthonormal basis. That
implies only a column of the matrix is not defined. In this
paper we observe how corresponding to a maximally partial
unitary transformation there can be multiple unitary matrices
and how the minimal number of elementary gates needed to
implement these matrices may vary.
We end this section by describing the structure of Dicke
states and a circuit designed for its preparation.
C. The Dicke State Preparation Circuit Cn,k
The circuit Cn,k as described in [2] works on the n qubit
system |q1q2 . . . qn〉. The circuit Cn,k is broken into n − 1
blocks of the form SCSxy of which the first n− k blocks are
of the form SCSn−tk , n − t > k which is then followed by
k − 1 blocks of the form SCSii−1, k ≥ i ≥ 2.
A block SCSnk consists of a two qubit transformation and
k−1 three qubit transformations. The two qubit transformation
works on the n−1 and n-th qubits and we denote it as µn. We
describe the overall structure of the circuit again in Section V.
The three qubit transformations are of the formMln, n−1 ≤
i ≤ n−k+1 whereMnl works on the qubits l−1, l and n. This
construction is interesting in how the transformations µ and
M are partially defined which raises different implementation
choices, with possibly different number of gates needed for
elemental decomposition. We now describe these two trans-
formations for reference. We denote by |ab〉x the qubits in the
x− 1 and x-th position in a system.
µn : |00〉n → |00〉n
|11〉n → |11〉n
|01〉n →
√
1
n
|01〉n +
√
n− 1
n
|10〉n
• Ry(2 cos−1
√
l
n ) •
•
Fig. 1: Implementation of µn
Mln : |00〉l |0〉n → |00〉l |0〉n
|01〉l |0〉n → |01〉l |0〉n
|00〉l |1〉n → |00〉l |1〉n
|11〉l |1〉n → |11〉l |1〉n
|01〉l |1〉n →
√
n− l + 1
n
|01〉l |1〉n
+
√
l − 1
n
|11〉l |0〉n
• Ry(2 cos−1
√
n−l+1
n ) •
•
•
Fig. 2: Implementation of Mln
The implementations of these transformations in [2] is
shown in Figure 1 and 2 respectively. The first transformation,
µn is in fact a maximally partial unitary transform. Because
of the partially defined nature of the transformation the CRy
and CCRy gates are also not fed all possible inputs. Instead
the input to the CRy gates is only from the subspace spanned
by the computational basis states |00〉 , |10〉 and |01〉. Similarly
the input to the CCRy gate is only from the subspace spanned
by the states |000〉 , |010〉 , |001〉 , |011〉 , and |110〉.
Next in Section III we look how partially defined transfor-
mations can be implemented more efficiently, and argue the
optimality of this improvement with respect to this particular
building block. Then in Section IV we reduce the gate count
of the circuit Cn,k by removing redundancies and analyzing
how the µ and M transformations act only on a subset of the
defined computational basis states in specific cases.
III. EXAMPLE OF OPTIMALITY FOR A MAXIMALLY
PARTIAL UNITARY TRANSFORMATION
We have described the two partially defined unitary trans-
formations used in the circuit Cn,k. The implementation of
the first transformation, µn is done using a controlled Ry gate
and two CNOT gates. This CRy gate only acts on the states
|00〉 , |10〉 , |01〉 and their superpositions and the transforma-
tion never acts on the |11〉 state. If we take θ = 2 cos−1 (√ 1n)
We denote the transformation implemented by the CRy(θ)
4gate on the defined basis states as T1(θ), and the corresponding
transformation is as follows:
T1(θ) : |00〉 → |00〉 (1)
|10〉 → |10〉
|01〉 → ( cos(θ
2
) |0〉+ sin(θ
2
) |1〉 ) |1〉
This is in fact a maximally defined partial unitary transforma-
tion. While the gate CRY (θ) can perform this transformation,
it needs at least 4 elementary gates to implement. We first
prove this necessary requirement using an important result
from [6, Theorem B], which we note down for reference.
Theorem 1. [6]
1) For a controlled gate CU if tr(UX) = 0, tr(U) 6=
0, detU = 1, U 6= ±I then the minimal number of
elementary gates needed to implement CU is 4.
2) For a controlled gate CU if tr(U) = 0, detU =
−1, U 6= ±X then the minimal number of elementary
gates needed to implement CU is 3.
3) For a controlled gate CU the minimal number of number
of elementary gates needed to implement CU is less than
three 3 iff U ∈ {eiφI, eiφX, eiφZ}, 0 ≤ φ ≤ 2pi.
Our lemma follows immediately.
Lemma 1. It takes minimum 4 elementary gates to implement
the CRy(θ) gate.
Proof. We calculate the values of detRy(θ) and tr(Ry(θ)X)
to confirm the minimal number of gates needed to decompose
CRy(θ).
detRy(θ) = sin
2(
θ
2
) + cos2(
θ
2
) = 1
Ry(θ)X =
[− sin( θ2 ) cos( θ2 )
cos( θ2 ) sin(
θ
2 )
]
=⇒ tr(Ry(θ)X) = 0
The result (1) of Theorem 1 concludes the proof.
However the transformation T1(θ) can in fact be imple-
mented using three elementary gates as follows.
T1(θ) ≡
(
Ry(
−α
2
)⊗I2
)
CNOT21
(
Ry(
α
2
)⊗I2
)
,
α
2
=
pi
2
− θ
2
This decomposition has also been used by Cruz et.al [5] in
the Wn (Dn1 ) state preparation algorithm. However, the corre-
sponding transformation is defined only for the states |00〉 and
|01〉 and no insight into the optimality of the implementation
is given.
We first derive the underlying 4 × 4 unitary matrix U0(α)
that describes this three gate transformation. Next we prove
that U0(α) needs at least three gates to be implemented
by verifying the conditions of result (2) of Theorem 1. We
end this section by showing that the transformation T1(θ)
needs at least three elementary gates (including one CNOT)
to be implemented, proving the optimality of the U0(α)
implementation.
Theorem 2. The gate U0(α) performs the partially defined
unitary transformation T1(θ) where α = pi − θ and needs
minimum three elementary gates to be implemented.
Ry(−θ2 ) Ry(
θ
2 )
• •
Fig. 3: Implementation of CRy(θ)
Ry(α2 ) Ry(
−α
2 )
•
Fig. 4: Implementation of Uo(α)
Proof. We first study the transformation carried out by U0 in
the subspace of T1.
|00〉
Ry(α2 ) on q1−−−−−−−−→
(
cos(
α
4
) |0〉+ sin(α
4
) |1〉
)
|0〉
CNOT21−−−−→
(
cos(
α
4
) |0〉+ sin(α
4
) |1〉
)
|0〉
Ry(−α2 ) on q1−−−−−−−−−→
(
cos(
α
4
)
(
cos(
α
4
) |0〉 − sin(α
4
)) |1〉 )+
sin(
α
4
))
(
sin(
α
4
)) |0〉+ cos(α
4
)) |1〉 )) |0〉
=
(
cos2(
α
4
) + sin2(
α
4
)
) |00〉 = |00〉
|10〉
Ry(α2 ) on q1−−−−−−−−→
(
− sin(α
4
) |0〉+ cos(α
4
) |1〉
)
|0〉
CNOT21−−−−→
(
− sin(α
4
) |0〉+ cos(α
4
) |1〉
)
|0〉
Ry(−α2 ) on q1−−−−−−−−−→
(
− sin(α
4
)
(
cos(
α
4
) |0〉 − sin(α
4
)) |1〉 )+
cos(
α
4
))
(
sin(
α
4
)) |0〉+ cos(α
4
)) |1〉 )) |0〉
=
(
cos2(
α
4
) + sin2(
α
4
)
) |10〉 = |10〉
|01〉
Ry(α2 ) on q1−−−−−−−−→
(
cos(
α
4
) |0〉+ sin(α
4
) |1〉
)
|1〉
CNOT21−−−−→
(
sin(
α
4
) |0〉+ cos(α
4
) |1〉
)
|1〉
Ry(−α2 ) on q1−−−−−−−−−→
(
sin(
α
4
)
(
cos(
α
4
) |0〉 − sin(α
4
)) |1〉 )+
cos(
α
4
))
(
sin(
α
4
)) |0〉+ cos(α
4
)) |1〉 )) |1〉
=
(
2 cos(
α
4
) sin(
α
4
) |0〉+ (cos2(α
4
)− sin2(α
4
) |1〉 ) |1〉
=
(
sin(
α
2
) |0〉+ cos(α
2
) |1〉
)
|1〉
Setting α = pi − θ gives us the same transformation as
defined by T1(θ).
Now we completely define the gate U0 by studying the
5transformation acted on the state |11〉.
|11〉
Ry(α2 ) on q1−−−−−−−−→
(
− sin(α
4
) |0〉+ cos(α
4
) |1〉
)
|0〉
CNOT21−−−−→
(
cos(
α
4
) |0〉 − sin(α
4
) |1〉
)
|0〉
Ry(−α2 ) on q1−−−−−−−−−→
(
cos(
α
4
)
(
cos(
α
4
) |0〉 − sin(α
4
)) |1〉 )−
sin(
α
4
))
(
sin(
α
4
)) |0〉+ cos(α
4
)) |1〉 )) |0〉
=
(
(cos2(
α
4
)− sin2(α
4
) |0〉 − 2 cos(α
4
) sin(
α
4
) |0〉 ) |1〉
=
(
cos(
α
2
) |0〉 − sin(α
2
) |1〉
)
|1〉
So the overall transformation provided by U0(α) is:
|00〉 → |00〉
|10〉 → |10〉
|01〉 →
(
sin(
α
2
) |0〉+ cos(α
2
) |1〉
)
|1〉
|11〉 →
(
cos(
α
2
) |0〉 − sin(α
2
) |1〉
)
|1〉
Therefore the gate U0(α) is a two qubit gate which can be
expressed as a controlled gate CU(α) gate where U(α) =[
sin(α2 ) cos(
α
2 )
cos(α2 ) − sin(α2 )
]
. Now trU(α) = 0 and detU(α) = −1
for all α. Therefore we can conclude from the result (2) in
Theorem 1 that this gate requires at least three gates to be
implemented.
We finally show the optimality of this implementation for
implementing the two qubit transformation T1(θ).
Lemma 2. The transformation T1(θ) needs at least one CNOT
and two single qubit gates to be implemented for 0 < θ < pi.
Proof. The transformation T1(θ) is only defined for the basis
states |00〉 , |01〉 and |10〉. Any matrix M(θ) that can carry
out the transformation is of the form

1 0 0 a
0 cos( θ2 ) 0 b
0 0 1 c
0 sin( θ2 ) 0 d

where a, b, c, d are complex unknowns. However since M(θ) is
unitary we have M(θ)M†(θ) = I . Therefore 1+aa∗ = 1 =⇒
a = 0 and 1 + cc∗ = 1 =⇒ c = 0. That is the matrix Mθ is
a controlled unitary CM1(θ) and M1(θ) =
[
cos( θ2 ) b
sin( θ2 ) d
]
.
Now for 0 < θ < pi both cos( θ2 ) and sin(
θ
2 ) are non
zero. This implies that the matrix cannot fulfill the necessary
conditions defined in result (3) of Theorem 1 and therefore
cannot be expressed with less than three elementary gates.
Now we use our observations to improve the gate count of
the circuit Cn,k.
IV. IMPROVED GATE COUNTS FOR CIRCUITS OF |Dnk 〉
We first count the number of CNOT and single qubit gates
in Cn,k by reviewing the circuit. The circuit is composed of
n−1 blocks of gates called SCS. There are n−k−1 blocks
of the form SCStk, k < t ≤ n and k − 1 blocks of the form
SCSi+1i , 1 ≤ i ≤ k − 1.
Each block SCStk consists of one two qubit transformation
µt which is implemented on the qubits t− 1 and t and k− 1
three qubit transformations of the typeMlt, t−1 ≤ l ≤ t−k−
2. Here µt is implemented on the t−1 and t-th qubit andMlt
is implemented on the l − 1, l and t-th qubit, as described in
Section II. Each transformation of type µ is decomposed into
two CNOT and a T1(θ) transformation which is implemented
as a CRy gate by adjusting the value of θ. We have shown
in Lemma 1 that a CRy transformation needs minimum 4
gates to implement. In fact it needs at least two CNOT gates.
Therefore each µ transformation needs four CNOT and two
single qubit gate. The number of transformations of type Mln
is
(n− k)(k − 1) +
k−2∑
i=1
i
= nk − n+ k − k2 + (k − 1)(k − 2)
2
= nk − k(k + 1)
2
− n+ 1.
Each Mln transformation is shown to require six CNOT and
four single qubit gates. However one CNOT gate of for each
Mln transformation can be canceled by rearranging the first
two CNOT gates of the next transformation.
The total number of CNOT gates and single qubit gates used
to prepare the state |Dnk 〉 is shown in Table I.
CNOT gates 5(nk − k(k+1)
2
− n+ 1) + 4(n− 1)
single qubit gates 4(nk − k(k+1)
2
− n+ 1) + 2(n− 1)
TABLE I: Gates needed to prepare |Dnk 〉 as in [2]
Figure 5 shows the circuit C6,3 in terms of CNOT, CRy and
CCRy gates.
We show the circuit of |D42〉 formed according to this
construction method in Figure 5.
We now improve the gate counts of the circuit Cn,k in the
following ways.
Replacing CRy with CU
We first use the CU gate shown in Section III to implement
the transformation T1 corresponding to each µ transformation
which needs one CNOT and two single qubit gates to be
implemented. Therefore each µ transformation needs three
CNOT and two single qubit gates. Since there are n − 1 µ
transformations this reduces the number of CNOT by n − 1
for any |Dnk 〉.
Next we observe that some of the µn and Mln transfor-
mations act as identity transformation, which we count as a
function of k for any |Dnk 〉.
The µ and M transformations that act like Identity
Let there be a n qubit system in some state |φ〉. This
state can be uniquely represented as a superposition of all 2n
(computational) basis state. The amplitude of a particular basis
6|0〉 •
√
3
4
• •
√
2
3
• •
√
1
2
•
|0〉 •
√
3
5
• •
√
2
4
• • •
√
1
3
• • •
|0〉 •
√
3
6
• •
√
2
5
• • •
√
1
4
• • • •
|0〉 X •
√
2
6
• • •
√
1
5
• • • • •
|0〉 X •
√
1
6
• • • • •
|0〉 X • • •
Fig. 5: Description of the circuit C6,3
state may or may not be zero depending on the description of
|φ〉. We call a basis state with non zero amplitude a active basis
state. The affect of a unitary transformation T on this state can
be completely described by observing how it transforms the
active basis states of |φ〉. If the k-th qubit is in the zero (one)
state in all the active basis states and the transformation T
doesn’t act on the k-th qubit in non trivial way on any of
those basis states then the k-th qubit in all the active basis
states of T |φ〉 will also be in the zero(one) state. Using this
simple fact we prove the following theorem using induction.
Theorem 3. If the n qubit system is expressed as superposition
of computational basis states after the block SCSn−tk has
acted then it can be expressed as
2t+1−1∑
a=0
2t+1−1∑
b=0
αa,b
(
|0〉⊗n−k−1−t ( t+1⊗
i=1
|abini 〉
)
|1〉⊗k−1−t ( t+1⊗
j=1
|bbinj 〉
))
.
Proof. The statement implies that the first n−k− t−1 qubits
are all in the state |0〉 and the (n − k + 1)-th qubit and the
next k− t− 1 qubits are all in the state |1〉 in all active basis
states of the n qubit system after the block SCSn−tk has acted
on it.
We first prove the statement for t = 0. The n qubit system
is at first in the state |ψ0〉 = |0〉⊗(n−k) |1〉⊗k and the block
to be applied is SCSnk . This block consists of the gates µn,
Min, n− 1 ≤ i ≤ n− k+1. We know that the transformation
µn affects the (n−1) and the n-th qubit and the transformation
Mln affects the (l−1) and n th qubit. Additionally µn acts as
identity on a basis state if the n− qth qubit of the basis state
is in the |1〉 state. Similarly Mln acts as identity on a basis
state if the (n− l − 1)-th qubit is in the |1〉 state.
The last k qubits of |ψ0〉 are in the state |1〉 and therefore
µn and Mn−1n ,Mn−2n , . . .Mn−k+2n act as identity transfor-
mations. Finally the transformation Mn−k+2n is applied. The
first qubit to this transformation is in the state |0〉 and therefore
this transformation may lead to basis states with either |0〉 or
|1〉 in the (n−k)-th and n-th positions. Therefore the resultant
state can be written as∑
a1,a2∈{0,1}
αa1a2 |0〉⊗n−k−1 |a1〉 |1〉⊗k−1 |a2〉 .
Thus the first n − k − 1 qubits are all in the state |0〉 and
the (n−k+1)-th qubit and the next k−1 qubits are all in the
|1〉 state in all active basis states of the system.This concludes
the base case.
Now assuming that our statement holds true for some t−1 <
k−2 we show that the statement also holds for t. The SCSn−tk
block is composed of the transformations µn−t, Min−t, n −
t− 1 ≤ t ≤ n− t− k + 1. The n qubit system is in the state
|ψt−1〉 =
2t−1∑
a=0
2t−1∑
b=0
αa,b
(
|0〉⊗n−k−t ( t⊗
i=1
|abini 〉
)
|1〉⊗k−t ( t⊗
j=1
|bbinj 〉
))
.
That is, the first n−k−t qubits are in the state |0〉 in all active
basis states and the n−k+1 and the next k−t−1 qubits are in
the state |1〉. These are the first qubits to the transformations
µn−t,Min−t, n − t − 1 ≤ i ≤ n − k. This implies the µ
transformation and the (k − 2− t)M transformations act as
identity transformations on all active basis states.
The next tM transformations may get the |0〉 state as the
first qubit and therefore the n-qubit system before the last M
has been applied is in the state
|ψ′t−1〉 =
2t−1∑
a=0
2t+1−1∑
b=0
αa,b
(
|0〉⊗n−k−t ( t⊗
i=1
|abini 〉
)
|1〉⊗k−1−t ( t+1⊗
j=1
|bbinj 〉
))
.
Finally the last three qubit transformation of the block
SCSn−tk Mn−t−k+1n−t acts on the system. Now since the
(n− t−k)-th qubit is in the state |0〉 in all active basis states,
theM gate may non trivially act on it and the (n−t)-th qubit.
This results in the state
|ψt〉 =
2t+1−1∑
a=0
2t+1−1∑
b=0
αa,b
(
|0〉⊗n−k−t−1 ( t+1⊗
i=1
|abini 〉
)
|1〉⊗k−1−t ( t+1⊗
j=1
|bbinj 〉
))
.
This completes the proof. It is important to note that there
may be many basis states in the expression of |ψt〉 with zero
amplitude. However our focus is on qubits that are definitely
going to be either in the zero state or in the one state in all
active basis states.
7This proof also shows that the µ transformation and the
k−2−tM transformations in the block SCSn−tk , t < k−1 act
as identity transformations and therefore can be removed from
the circuit, which is the second improvement. Therefore the
number of µ transformations omitted is k− 1 and the number
of M transformation omitted are (k−2)(k−1)2 . This removes
3(k − 1) + 5(k−2)(k−1)2 CNOT and 2(k − 1) + 4(k−2)(k−1)2
single qubit gates.
The first non identity M transformation in SCSnk
Having removed the µ andM transformations we now look
at the first transformation of the block SCSn−tk , t < k − 1
which is Mn−k+1n−t . This transformation depends on the state
of the n− k, n− k + 1 and (n− t)-th qubits and affects the
state of the (n− k)-th and the (n− t)-th qubit. At this stage
the n = qubit system is at the state
2t−1∑
a=0
2t−1∑
b=0
αa,b |0〉⊗n−k−t
( t⊗
i=1
|abini 〉
)
|1〉⊗k−t
( t⊗
j=1
|bbinj 〉
)
.
Therefore in all the active basis states both the n− kth and
the n − tth qubits are in the state |1〉. Therefore the three
qubit transformation applied byMn−k+1n−t can be expressed as
follows substituting l = n− k + 1:
|11〉l |1〉n−t → |11〉l |1〉n−t
|01〉l |1〉n−t →
√
n− t− l + 1
n− t |01〉l |1〉n−t
+
√
l − 1
n− t |11〉l |0〉n−t .
This is in-fact can be implemented as a a two qubit transfor-
mation of the type µ. as the (n− k+1)-th qubit is in the |1〉
state in all the active basis states.
The transformation acts on the (n − k)-th and (n − t)-th
qubits as Mn−k+1n−t ≡ (CNOTn−kn−t )(CUn−tn−k(θ))(CNOTn−kn−t )
where θ = 2 cos−1
(√
n−t−l+1
n−t
)
. We know that the CU gate
requires one CNOT and two Ry gates to be implemented
therefore Mn−k+1n−t requires only three CNOT and two Ry
gates. This improvement is reflected for all SCSn−tk such that
n − t ≥ n − k + 2 that is for 0 ≤ t ≤ k − 2. Therefore it
reduces the number of CNOT gate in the circuit by further
2(k− 1) and the number of single qubit gates by 2(k− 1) as
well.
Additionally for |Dnk 〉 , k > 1 when SCSnk is applied the
n-qubit system is in the state |0〉⊗n−k |1〉⊗k and therefore the
transformation Mn−k+1n−t only acts on the basis state |011〉.
The corresponding transformation is
|01〉n−k+1 |1〉n →
√
k
n
|01〉n−k+1 |1〉n
+
√
n− k
n
|11〉n−k+1 |0〉n .
This can be implemented using a Ry(cos−1
√
k
n ) on the
(n− k)-th qubit followed by a CNOT gate CNOTn−kn which
removes further two CNOT and one single qubit gate.
We denote this circuit by Ĉn,k. Figure 6 shows the structure
of Ĉ6,3. Combining the results we get the following count of
CNOT and single qubit gates in the improved circuit. We now
calculate the total improvement in the CNOT and single qubit
gate counts for the |Dnk 〉 , k > 1 preparation circuit Ĉn,k.
• The total number of CNOT gates removed = n − 1 +
3(k − 1) + 5(k−2)(k−1)2 + 2(k − 1) + 2.
• The total number of single qubit gates removed = 2(k−
1) + 4(k−2)(k−1)2 + 2(k − 1) + 1.
Therefore the total number of CNOT gates present in the
circuit is
5(nk − k(k + 1)
2
)− n+ 1
−
(
n− 1 + 3(k − 1) + 5(k − 2)(k − 1)
2
+ 2(k − 1) + 2
)
= 5nk − 5k2 − 2n
The number of single qubit gates present in the circuit is
4(nk − k(k + 1)
2
− n+ 1) + 2(n− 1)
−
(
2(k − 1) + 4(k − 2)(k − 1)
2
+ 2(k − 1) + 1
)
= 4nk − 4k2 − 2n+ 1
For k = 1 We get the number of CNOT as 3n − 3 (from
n−1 µ transformations) and the number of single qubits gate
as 2n− 2. However, one CNOT gate can be further removed
from each µ gate as the active basis states in input to the µ
transformations are only |00〉 and |01〉. The resultant circuit
is identical to the linear Wn preparation circuit in [5] and
contains 2n−2 CNOT and 2n−2 single qubit gates and thus
we don’t elaborate it further.
We know that the state |Dnk 〉 can be prepared by first
forming the state |Dnn−k〉 and then applying a X gate to each
qubit. On that note it is interesting to observe that after the
improvements the Circuits for |Dnk 〉 and |Dnn−k〉 require the
same number of CNOT gates.
k
n 4 5 6 7 8
2 22,12 31,20 40,28 49,36 58,44
3 27,7 41,20 55,33 69,46 83,59
4 46,10 65,28 84,46 103,64
5 70,13 94,36 118,59
6 99,16 128,44
7 133,19
TABLE II: CNOT gate count of the pair Cn,k, Ĉn,k
k
n 4 5 6 7 8
2 14,9 20,15 26,21 32,27 38,33
3 18,5 28,15 38,25 48,35 58,45
4 32,7 46,21 60,35 74,49
5 50,9 68,27 86,45
6 72,11 94,33
7 98,13
TABLE III: Single qubit gate count of the pair Cn,k, Ĉn,k
8|0〉 •
√
3
4
• •
√
2
3
• •
√
1
2
•
|0〉 •
√
3
5
• •
√
2
4
• • •
√
1
3
• • •
|0〉
√
3
6
• •
√
2
5
• • •
√
1
4
• • • •
|0〉 X • • •
|0〉 X • •
|0〉 X
Fig. 6: Description of the Circuit Ĉ6,3
Table II and III show the number of CNOT and single qubit
gates needed to implement the states |Dnk 〉 for 4 ≤ n ≤ 8, 1 ≤
k ≤ n− 1, respectively.
In the next section we show that our observations not only
reduces the gate counts of the circuit but also reduces its
architectural constraints.
V. ACTUAL IMPLEMENTATION AND ARCHITECTURAL
CONSTRAINTS
A. Architectural Constraints
We are at the stage where quantum circuits can be imple-
mented on actual quantum computers using cloud services,
such as IBM Quantum Experience, also known as IQX [11].
However the architecture of the individual back-end quantum
machines pose restrictions to implementation of a particular
circuit. The most prominent constraint is that of the CNOT im-
plementation. In this regard we use the terms architectural con-
straint and CNOT constraint interchangeably. Every quantum
system Q with n (physical) qubits has a CNOT map, which we
express as GQA(V
Q, EQ) where V Q = {q1, q2, . . . qn}. In this
graph the nodes represent the qubits and the edges represent
CNOT implementability. A directed edge qi → qj implies
that a CNOT can be implemented with qi as control and qj as
target in the system Q. The edges in the CNOT maps of all
the publicly available IQX machines are bidirectional.
Let there be a circuit C on n qubits. We denote the (logical)
qubits c1, c2, . . . cn. We also have a CNOT map corresponding
to the circuit, which describes the CNOT gates used in the
circuit. We describe this as the directed graph GC(V C , EC)
where V C = {c1, c2, . . . cm} and there is a directed edge ci →
cj if there is one or more CNOT with ci as control and cj as
target.
Therefore if the graph GC can shown to be the subgraph
of GQA by some mapping of the logical qubits to the phys-
ical qubits then the circuit C can be implemented on the
architecture with the same number of CNOT gates. However,
if such a map is not possible there may be many ways of
implementing the circuit on that architecture, either by using
swap gates which require additional CNOT gates or changing
the construction of the circuit. There are mapping solutions
such as the one applied by IQX which dynamically changes the
structure of the circuit to implement a circuit in an architecture
that does not meet the circuit’s CNOT constraints. Similarly in
their paper Zulehner et.al [10] have also proposed an efficient
mapping solution. However it is not always possible to avoid
an increase in the number of CNOT gates. Given a circuit it
is crucial to find it’s minimal architectural needs in terms of
the CNOT map without increasing the CNOT gate count. The
IBM-Q systems mapping solutions show the modified circuit
as the transpiled circuit given any circuit as input, although its
solutions are not always optimal. In this paper we first consider
the system “ibmqx2” (Q1) of IQX. The CNOT map of Q1 is
shown in the Figure 7.
1
2
30
4
Fig. 7: The CNOT map of Q1 represented as G
Q1
A
Against this backdrop we observe the CNOT constraints
of the circuit C4,2, implemented to prepare |D42〉. Then we
implement the circuit Ĉ4,2 which is the result of the improve-
ments shown in Section IV. We observe that the improvement
proposed by us not only reduces gate counts but also reduces
CNOT constraints. We implement these circuits in the system
Q1 and compare the measurement statistics of the two cir-
cuits by measuring the deviation from the ideal measurement
statistics and find that the results of Ĉ4,2 is much more closely
aligned with the ideal results. We end this section by showing
how some changes in the circuit Ĉ4,2 possible because of
partially defined transformations can lower the error in the
circuit due to CNOT on expectation without a reduction in
number of CNOT gates or change in the CNOT constraints.
B. Implementation and Improvement for |D42〉
We start by constructing the circuit C4,2. We implement
every CRy gate using two CNOT gates and two Ry gate
as we know that the CRy gate needs at least 4 gates to be
implemented and every three qubit Mln transformation using
five CNOT and four Ry gates (as given in the description
9of [2]). The resultant circuit C4,2 is shown in Figure 8. This
circuit contains 22 CNOT gates. We use the notation θxy to
denote the angle 2 cos−1(
√
x
y ).
The CNOT map of the circuit is shown in Figure 9.
q0 q1
q2 q3
Fig. 9: The CNOT map of C4,2
We then implement Ĉ4,2 by making the following changes
to C4,2.
1) Implement the CUo gate instead of CRy gates.
2) Remove the Redundant µ and M transformation.
3) Reduce the gate count in implementation of Mn−k+1n−t
type transformations.
This brings the total number of CNOT gates in the circuit to
12. We name the circuit at this stage Ĉ4,2.
These steps not only reduce the CNOT gates in the circuit
but also reduces the CNOT constraints of the circuit. Figure 10
shows the circuit at this stage and the reduced CNOT map
GĈ4,2 as shown in the Figure 11.
q0 q1
q2 q3
Fig. 11: The CNOT map GĈ4,2 corresponding to the circuit
Ĉ4,2
In fact the Graph GĈ4,2 can be shown to be a subgraph
of GQ1A under several mappings of qubits. Therefore this
circuit can be implemented in the “ibmqx2” (Q1) machine
with 12 CNOT. However the CNOT constraints of the circuits
corresponding to even D5k, k > 1 cannot be met by any IBM-
Q architecture at this stage. Now we compare the results of
the circuits C4,2 which is due to [2] and Ĉ4,2 which is what
we obtained after the reductions and modifications.
Comparison of Measurement Statistics of C4,2 and Cir14,2
The output by an ideal Quantum Computer would produce
the state
√
1
(nw)
∑
wt(i)=w
|i2〉 on a correct |Dnw〉 preparation
circuit. We first verify the resultant state vectors to of the two
circuits to see that they both ideally produce
√
1
6
(
|0011〉 +
|0101〉+ |0110〉+ |1100〉+ |1010〉+ |1001〉
)
and then use a
primary error measure based on measurement in computational
basis to estimate the closeness of the states formed by the two
circuits from the ideal state |D42〉.
We run both the circuits for the maximum possible shots
(8192) and use the measurement statistics to estimate the
closeness to the desired state using the following error mea-
sure. We define our error measure EMn,w for the Dicke
state |Dnw〉 as follows. Let pi be the percentage of times the
measurement of the circuit C yields the result i2. Then we
have
EMn,w(C) = 1
2
( ∑
j,wt(j)=w
∣∣∣∣∣pj − 1(n
w
) ∣∣∣∣∣+ ∑
j,wt(j)6=w
pj
)
An EM value of 0 signifies that the measurement statistics
are exactly aligned with the expected ideal results while the
EM value can at maximum be 1. We have calculated the EM
values for results of different mappings for the circuit C4,2.
It is very interesting to see that under different mapping
of logical qubits to physical qubits in Q1 from the user
end the IQX mapping solution provided different transpiled
circuits. We know that the number of CNOT in C4,2 is 22.
The transpiled circuits for C4,2 had a minimum of 25 CNOT
gates and were as high as 31 in some cases. The corresponding
transpiled circuit contains 25 CNOT gates which is the least of
all the transpiled circuits. Figure 12 shows the measurement
statistics corresponding to the circuit with the minimum EM
value, which is equal to 0.4088.
Next we look at the measurement statistics of the circuit
Ĉ4,2. There are many mappings between logical and physical
qubits in this case such that the CNOT constraint of the
circuit is met. Let such a map be M : {q0, q2, q3, q4} →
{0, 1, 2, 3, 4}. Then if there is a CNOT between qi and qj
then there is an edge M(qi) ↔ M(qj) in graph GQ1A . In
such mappings the IQX mapping solution didn’t implement
any modification in the transpiled circuit as expected.
Here we present the result for the following map M1
M1 : q0 → 3, q1 → 2, q2 → 4, q3 → 0.
Figure 13 shows the measurement statistics corresponding to
this mapping and the resultant EM value is 0.282103. These
results show that the circuit C4,2 needs more than the specified
number of CNOT while being implemented on “ibmqx2”
and the measurement statistics of Ĉ4,2 is much more closely
aligned with the ideal measurement statistics compared to C4,2.
C. Modifications leading to different CNOT error distributions
We now discuss how we can in fact use partially defined
transformations to further fine tune the circuit Ĉ4,2 depending
on the specifications of the architecture. We consider a four
qubit architecture A4 with the same CNOT connectivity as
GĈ4,2 and only differs in CNOT error distribution. We then ob-
serve how further modifying the circuit Ĉ4,2 can lead to lower
CNOT error on expectation against some error distributions in
the architecture A4. We assume every edge in the CNOT map
of A4 is bidirectional, as is the case with all currently publicly
available IBM-Q machines.
The CNOT error rate when applying a CNOT between
qubits i and j (such that the edge i ↔ j is present in GA)
is denoted as eij .Figure 15 shows the CNOT map of the
architecture.
10
|0〉 • −θ23
4
θ23
4
−θ23
4
θ23
4
• • −θ12
2
θ12
2
•
|0〉 • −θ24
4
θ24
4
−θ24
4
θ24
4
• • −θ13
2
θ13
2
• • • •
|0〉 X • −θ14
2
θ14
2
• • • • • •
|0〉 X • • • •
Fig. 8: The circuit C4,2 due to [2]
|0〉 • −θ23
4
θ23
4
−θ23
4
θ23
4
• • pi
2 − θ
1
2
2 −(pi2 − θ
1
2
2 ) •
|0〉 θ24 • • pi2 − θ
1
3
2 −(pi2 − θ
1
3
2 ) • • •
|0〉 X • • •
|0〉 X
Fig. 10: The Circuit Ĉ4,2
Fig. 12: Measurement statistics for C4,2 with 25 CNOT in transpiled circuit
Fig. 13: Measurement statistics for Ĉ4,2 corresponding to the map M1
CNOT error model: In this regard we define our error
model to calculate CNOT error on expectation of a circuit
implemented in the architecture A4. The probability of a
CNOT placed between qubits i and j acting erroneously in
a circuit is dependent on the error rate of the corresponding
CNOT coupling in the architecture. We call this CNOT error.
We denote this probability with fe : [0, 1] → [0, 1]. We do
not assume the exact nature of fe, but only that it is directly
proportional to error rate (i.e. an increasing function) which
is by definition.
Next we define the following Bernoulli random variables
to calculate the the number of CNOT acting erroneously on
expectation when a circuit is applied on this architecture. We
define a variable xk corresponding to each CNOT used in
a circuit. The variable is assigned zero if the k-th CNOT is
applied correctly while executing a circuit, and one otherwise.
Let us suppose the k-th CNOT is applied between qubits i
and j. Then we have Pr(xk = 1) = fe(eij) and The expected
error while applying the CNOT is E(xk) = fe(eij). Therefore
the CNOT error on expectation while implementing a circuit
C on the architecture is
E(C) =
∑
vijfe(eij).
Having described the error model we look at the CNOT
11
distribution of the circuit Ĉ4,2 as a weighted graph Gf . The
vertices and the edges of this graph is same as that of GĈ4,2 .
The weight of an edge qi ↔ qj is the number of CNOT gates
applied between the two qubits in the circuit. The graph Gf
is shown in Figure 14.
Now we implement the circuit Ĉ4,2 on the architecture A4
so that all CNOT constraints can be met. We observe that
only the qubit 1 has degree 3 and therefore q1 is mapped to
the physical qubit 1. Then we can have the following maps
which satisfies all the CNOT constraints.
1) q1 → 1, q0 → 0, q2 → 2, q3 → 3.
2) q1 → 1, q0 → 2, q2 → 0, q3 → 3.
Then the expected CNOT error of the circuit Ĉ4,2 when applied
on the architecture A4 is
E(Ĉ4,2) = 5fe(e01) + 3fe(e02) + 3fe(e12) + fe(e13).
We now show the circuit Ĉ4,2 (described in Figure 10) can
be further modified using partially defined transformations so
that the CNOT error in the circuit on this architecture will re-
duce on expectation under some error distribution conditions.
q0 q1
q2 q3
5
3 13
Fig. 14: Graph Gf corresponding to GĈ4,2 .
0 1
2 3
e01
e12 e13e02
Fig. 15: CNOT map of the architecture A.
q0 q1
q2 q3
5
3
1
3
Fig. 16: Graph G′f corresponding to Ĉ′4,2
The first Ry gate that acts on the second qubit of Ĉ4,2
is followed by the CNOT gates CNOT23 and CNOT
2
4. The
combined transformation T4 of these two CNOT is defined
only for two basis states on 4 qubits |0011〉 → |0011〉 and
|0111〉 → |0100〉.
We use the partial nature of the transformation to modify
the circuit as follows. Note that transformation T4 is the first
transformation that acts on q3. Then if we start the circuit
from the state |0010〉 instead of |0011〉 then we can define the
transformation T 14 such that
T 14 ≡ (I2 ⊗ CNOT32 ⊗ I2)(I2 ⊗ I2 ⊗ CNOT21)
=⇒ T 14 |0010〉 = |0011〉 , T 14 |0110〉 = |0100〉
resulting in the same output states as T4 for all the computa-
tional basis states for which T4 is defined. It is important to
note that this implementation would not have been possible if
the transformation was defined for all the 8 basis states of the
second third and fourth qubits.
We denote this circuit as Ĉ′4,2 and it is drawn in Figure 17.
We denote the weighted CNOT map of the circuit Ĉ′4,2 as G′f
and it is shown in Figure 16.
We now see that in G′f q2 has degree three and therefore any
map that meets all the CNOT constraints will have q2 → 1.
Therefore we can have the following maps that satisfies all the
CNOT constraints.
1) q2 → 1, q0 → 0, q1 → 2, q3 → 3.
2) q2 → 1, q0 → 2, q1 → 0, q3 → 3.
The CNOT error on expectation for both the circuits is
E(Ĉ′4,2) = 5fe(e02) + 3fe(e01) + 3fe(e12) + fe(e13).
Now we calculate the conditions when E(Ĉ′4,2) is less than
E(Ĉ′4,2).
E(Ĉ′4,2) < E(Ĉ4,2)
=⇒ 5fe(e02) + 3fe(e01) + 3fe(e12) + fe(e13)
< 5fe(e01) + 3fe(e02) + 3fe(e12) + fe(e13)
=⇒ fe(e02) < fe(e01)
=⇒ e02 < e01.
This gives us an insight into how different CNOT distributions
in a circuit may lead to better results without reduction in
the number of CNOT gates or a reduction in the architectural
constraints. We conclude this section by describing the archi-
tectural constraint of the circuit Ĉn,k.
D. The CNOT map of Ĉn,k
The CNOT gates in the circuit Ĉn,k are due to implementa-
tion of the µ and M transformation of the different SCSnk
blocks. µn forms an edge in the CNOT map of the form
n − 1 ↔ n. where as Mln forms the edges (l − 1) ↔ n and
l → (l − 1). However in the circuit Ĉn,k the transformations
Mn−k+1t do not have a CNOT between the neighboring qubits
n− k and n− k + 1.
We divide the edges into two groups. One corresponding to
CNOT gates between neighboring qubits and one where the
positions of the qubits differ at least by two. We calculate the
edges of each of these types.
• The neighboring qubits with CNOT connections are the
qubits (n − k + 1 − i) and (n − k − i) where i varies
from 0 to n − k − 1. The other neighboring qubits do
not have CNOT connections due to removal of identity
transformations form those qubits. This results in n − k
edges.
12
|0〉 • −θ23
4
θ23
4
−θ23
4
θ23
4
• • pi
2 − θ
1
2
2 −(pi2 − θ
1
2
2 ) •
|0〉 θ24 • pi2 − θ
1
3
2 −(pi2 − θ
1
3
2 ) • • •
|0〉 X • • • •
|0〉
Fig. 17: Circuit description of Ĉ′4,2
• Now we consider the second kind of connections. These
connections are formed between l − 1 and t th qubit for
any Mlt transformation.
There are n− k SCSnK blocks with originally k − 1M
transformations in Cn,k which forms the edges:
(n−t)↔ (n−t−2−i), 0 ≤ i ≤ k−2, 0 ≤ t ≤ n−k−1.
Then there are k − 1 blocks of SCSi+1i with i − 1 M
transformations which forms the edges:
(k−t)↔ (k−t−2−i), 0 ≤ i ≤ k−t−2, 0 ≤ t ≤ k−2.
However in Ĉn,k there are no M transformations of the
type Mn−k+1+xy , x > 0 Removing such edges n− k +
x ↔ y gives us the complete description of the CNOT
map of Ĉn,k, which we denote by Gn,k. Additionally, in
the transformationMn−k+1n the edge n→ (n−k) is not
present.
Figure 18 and 19 show the CNOT maps G6,2 and G6,3
respectively.
1
2
3
4
5
6
Fig. 18: The CNOT map G6,2
We now count the number of edges in Gn,k. The number
of edges present due to M transformations is nk − k(k+1)2 −
n + 1 − (k−1)(k−2)2 = nk − n − k2 + k. There are further
n − k edges due to the µ transformations. Which brings the
total number of edges in Gn,k to nk − k2. It is important to
note that although the number of edges in Gn,k and Gn,n−k
are same they are not isomorphic. Moreover the Graph Gn,i
is not a subgraph of Gn,i+1.s
Finally we observe that the circuit Ĉn,k can be modified so
that the number of CNOT gates between the qubits change
for certain cases, although the total number of CNOT gates
1
2
3
4
5
6
Fig. 19: The CNOT map G6,3
and the overall CNOT map does not change. We call these
different instances as different CNOT distributions of Ĉn,k.
Different CNOT distributions for Ĉn,k
We know from the description of Cn,k [2] that the number of
CNOT gates in the three qubit transformation M is reduced
from 6 to 5 by canceling the last CNOT of every transfor-
mation by rearranging the first two CNOT gates of the next
transformation. Figure 20 shows the original layout as per the
algorithm and Figure 21 shows the reduction due to [2].
Now let us consider the last transformation (k-th) of each
SCSn−ik , 0 < i < n− k block, Mn−i−k+1n−i . This transforma-
tion acts on the qubits n − i − k, n − i − k + 1 and (n − i).
This is in fact the first transformation that affects the qubit
(n − i − k) and thus the qubit is in the state |0〉. If we do
not cancel the last CNOT of the preceding M transformation
(CNOTn−i−k+1n−i ) this then enables us to remove of the CNOT
gate (CNOTn−i−kn−i ), changing the CNOT distribution of the
circuit without a change in CNOT map or number of CNOT
gates. This leads to the implementation shown in Figure 22.
Since there are n − k − 1 such transformations, this leads
to a total of 2n−k−1 different CNOT distributions. However,
these modifications have do not affect CNOT map due to
the fact that there are other CNOT gates applied between
these qubits, which is evident from the circuit description.
As we have observed in Section V-C such different CNOT
distributions may lead to different number of CNOT gates
acting erroneously on expectation and thus affect the overall
error induced in the circuit.
13
n− k − i •
n− k − i+ 1 • •
n− i
Fig. 20: Initial Implementation
n− k − i •
n− k − i+ 1 •
n− i
Fig. 21: Modification in Cn,k
n− k − i
n− k − i+ 1 • •
n− i
Fig. 22: Alternate Implementation
VI. CONCLUSION
In this paper we have explored the domain of optimal circuit
implementation in terms of CNOT and single qubit gates. In
this regard we have concisely realized partially defined unitary
transformations to improve the gate count of the most optimal
deterministic Dicke state (|Dnk 〉) preparation circuit (Cn,k). We
have improved the implementation of one such transformation
and have also proven the optimality of our implementation.
We have further improved the Dicke State preparation circuit
by removing redundant gates and modifying implementations
of certain partially defined unitary transformations depending
on the active basis states that that act as input to these
transformations. We have then shown that these improvements
not only reduce the number of CNOT and single qubit gates
but also reduces the architectural constraints of the circuit
using the case of |D42〉. The resultant circuit is the deterministic
Dicke State (|Dnk 〉 , 2 ≤ k ≤ n − 1 Preparation Circuit
with the least number of elementary gates to the best of our
knowledge. We have implemented the circuits C4,2 and the
improved circuit Ĉ4,2 on the IBM-Q machine “ibmqx2” and
observed that the deviation from ideal measurement statistics
is significantly lesser in case of Ĉ4,2. Furthermore, we have
shown that how different CNOT distributions can help a circuit
without changing the number of gates or the architectural
constraints by comparing the expected CNOT error of two
such distributions against a fairly generalized error model. We
have concluded by described the CNOT map of the circuit
Ĉn,k and observe the exponential number of different CNOT
distributions that can be derived by modifying the circuit to
complete our generalization.
We observed that even the circuits for |D52〉 could not be
implemented in the IBM back end machines without adding
further CNOT gates to our description. This is because of
incompatibility of the architecture and circuit CNOT maps.
Therefore it is of all the more importance to form the circuit
for an algorithm in the most concise way possible. Against
this backdrop we have shown how optimally realizing partially
defined unitary transformations can lead to better implemen-
tation results. In conclusion we note down the following
optimization problems that will help us implement algorithms
more efficiently in the current scenario.
1) Given a maximally partial unitary transformation what
is the corresponding unitary matrix that can be decom-
posed using the least number of elementary gates?
2) Given two circuits corresponding to an algorithm with
isomorphic CNOT maps and the same number of CNOT
gates, but different CNOT distribution across the qubits,
which circuit will produce less erroneous outcome?
REFERENCES
[1] A. Barenco, C. H. Bennett, R. Cleve, D. P. DiVincenzo, N. Margolus, P.
W. Shor, T. Sleator, J. A. Smolin, and H. Weinfurter. 1995. Elementary
gates for quantum computation. Phys. Rev. A, 52:3457-3467.
[2] A. Ba¨rtschi and S. Eidenbenz. 2019. Deterministic Preparation of Dicke
States. Fundamentals of Computation Theory, 126-139.
[3] K. Chakraborty, B. Choi, A. Maitra and S. Maitra. 2014. Efficient
quantum algorithms to construct arbitrary Dicke states. Quantum Inf
Process 13, 20492069.
[4] A. M. Childs, E. Farhi, J. Goldstone, and S. Gutmann. 2002. Finding
cliques by quantum adiabatic evolution. Quantum Information & Com-
putation, 2(3):181191, Apr 2002.
[5] D. Cruz, R. Fournier, F. Gremion, A. Jeannerot, K. Komagata, T. Tosic,
J. Thiesbrummel, C.L. Chan, N. Macris,M.-A. Dupertuis and C. Javerza-
cGaly. 2019. Efficient Quantum Algorithms for GHZ and W States, and
Implementation on the IBM Quantum Computer. Adv. Quantum Technol.,
2: 1900015.
[6] G. Song and A. Klappenecker. 2003. Optimal realizations of controlled
unitary gates. Quantum Info. Comput. 3, 2 (March 2003), 139156.
[7] B. Langenberg, H. Pham and R. Steinwandt. 2020. Reducing the Cost of
Implementing the Advanced Encryption Standard as a Quantum Circuit.
IEEE Transactions on Quantum Engineering, vol. 1, pp. 1-12, 2020,
2500112.
[8] M. Mosca and P. Kaye. 2001. Quantum Networks for Generating Ar-
bitrary Quantum States. Optical Fiber Communication Conference and
International Conference on Quantum Information ICQI, page PB28, Jun
2001.
[9] M. Mo¨tto¨nen, J.J Vartiainen, V. Bergholm, and M.M Salomaa. 2004.
Quantum Circuits for General Multiqubit Gates. Phys. Rev. Lett, 93,
130502.
[10] A. Zulehner,A. Paler, and R. Wille. 2019. An Efficient Methodology for
Mapping Quantum Circuits to the IBM QX Architectures. IEEE Trans-
actions on Computer-Aided Design of Integrated Circuits and Systems,
38, 1226-1236.
[11] IBM Q Experience Website, https://quantum-computing.ibm.com
