Data-stationary Architecture to Execute Quantum Algorithms Classically by Burger, J. R.
Burger Quantum Emulator Architecture 
 
 
DATA-STATIONARY ARCHITECTURE TO EXECUTE 
QUANTUM ALGORITHMS CLASSICALLY 
J. R. Burger 
Department of ECE 
California State University Northridge 
December 9, 2004 
 
Abstract – This paper presents a data stationary architecture in which each word has an 
attached address field.  Address fields massively update in parallel to record data 
interchanges.  Words do not move until memory is read for post processing.  A sea of 
such cells can test large-scale quantum algorithms, although other programming is 
possible.   
1. INTRODUCTION 
   Quantum algorithms are better than classical algorithms for certain applications, for 
example function identification [1, 2].  Quantum algorithms generally have three parts, 
pre processing, logic implementation, and post processing.  Pre processing usually forms 
a non-sparse state vector whose entries are to be processed in parallel.  Logic can be 
interpreted to mean data interchanges within the state vector as specified by the steps in a 
‘wiring’ diagram [3, 4].  Output processing can use various (real time) digital filtering 
methods, including Hadamard or Fourier transform. 
 
   A quantum algorithm takes a state vector through a sequence of unitary 
transformations.  Circuits are greatly simplified by exploiting the fact that a classical 
system need not be unitary.  That is, a state vector need not be normalized in a classical 
calculation to achieve a desired result.  Another major simplification results by avoiding 
complex numbers within the state vector.  Simply initialize to real integers and perform 
only real transformations prior to post processing. 
 
   To create a system as large as possible in a given technology, it was decided to pre 
process the state vector prior to loading it into core memory.  Core memory represents a 
state vector.  After programmed interchanges within the core, it was decided to perform 
the post processing outside of the core, where it is easier to do.  So core memory does 
nothing but controlled real integer interchanges as specified by an algorithm, discussed in 
Section 2 below.  Section 3 introduces a concept for practical implementation.  Section 4 
contains a basic example algorithm.  A block diagram is given in Figure 1. 
CORE
MEMORY
PRE
PROCESSING
USER
INPUTS
POST
PROCESSING
RESULTS
DISPLAY
Figure 1.  Quantum-inspired Architecture 
 
 
 
 
1
Burger Quantum Emulator Architecture 
 
 
appears in Figure 2.  
Figure 2.   Wiring Diagram Example 
 2.   CORE MEMORY DESIGN 
   To show what is needed, an example ‘wiring’ diagram of a certain quantum algorithm 
A3
A2
A1
A0
(1)           (2)             (3) 
 
   Unconditional, single controlled NOTs or double controlled NOTs, steps 1, 2 or 3 of 
s 
resses of Interchanges: Example: 0011 means M[0011]; UNC = 
 
Figure 2, can be analyzed as a sequence of interchanges of data within the state vector a
in Table 1.   
Table 1  Add
Unconditional; SCN = Single Controlled NOT; DCN = Double Controlled NOT
Orig. UNC SCN Changes  SCN Changes DCN Changes O
Addre NOT 
A3A2A1A0 TO=A
0000 0001    
0001 0000    
0010 0011    
0011 0010    
0100 0101 0101 100 1  
0101 0100 0100 1101  
0110 0111 0111 1110 0111 
0111 0110 0110 1111 0110 
1000 1001    
1001 1000    
1010 1011    
1011 1010    
1100 1101 1101 100 0  
1101 1100 1100 0101  
1110 1111 1111 0110 1111 
1111 1110 1110 0111 1110 
 
ss 
nly 
0 FM=A2,TO=A0 FM=A2,TO=A3 FM=A2,FM=A1,TO=A0
When Ak is a target, the addresses of the swaps are spaced by 2k.  This also holds for 
double controlled NOTs or any transformation ‘aimed’ at Ak 
 
 
 
 
 
2
Burger Quantum Emulator Architecture 
 
 
. PRACTICAL CONSIDERATIONS 
, address fields, attached to each datum in the 
ken 
ven only N = 32 bits as an example, although technology 
he circuit plan is given in Figure 4. 
N ) 
he vertical buses can be connected to the horizontal buses at the dots using pass 
ft.  For example, 
01) = 1 and also 
e 2).  
3
   Data need not physically move.  Instead
state vector, can be updated.   During output processing it is a simple matter to sort the 
data into an ordered form, if desired.  Each word may be structured as in Figure 3. 
Data needs only M = 2 bits (+1, -1 or 0) after the common normalization factor is ta
Figure 3.  Word Structure 
N ADDRESS BITS  M DATA BITS
out.  The address field is gi
supports much larger N.  T
 
Figure 4.  Method of Address Parallel Processing L Words (L = 2
A00
A01
A0n
D00
D0m
IN
ST
R
U
C
TI
O
N
 D
EC
O
D
ER
AL0
AL1
ALn
DL0
DLm
.....
.....
.....
.....
.....
.
.
.
TO
FM1
FM2
TO
FM1
FM2
TO
FM1
FM2
TO
FM1
FM2
TO
FM1
FM2
TO
FM1
FM2
T
T
T T
T
T
MULTIPLEXED I/O PORTS
T
transistors.  These fire according to the ID (instruction decoder) on the le
consider word 0 address bit 0, that is A00.  If TO(A00) = 1 while FM1(A
FM2(A02) = 1, then the bit A00 will be flipped (implements DCN, Step 3 in Figur
This process happens simultaneously for all words, 2N in number, depending on the FM1 
and FM2 bits in each individual word.  SCN is easily implemented by having FM2 read a 
hardwired TRUE, leaving FM1 to read the controlling bit of interest. 
   Size and speed of the processing depend on technology whose performance has been 
increasing exponentially for many decades now.  The following gives a snapshot of 
today’s established technology to see what is possible [5]. 
 
 
 
3
Burger Quantum Emulator Architecture 
 
 
Size -- Each word in the above plan takes take 2 transistors for data, and 32 cells for 
addressing, in which each take 4 transistors; the bus system for each word uses 3 x 32
96 transistors; the total is 226 transistors per word.    
 or 
ords is over 51 GW (giga words, 
30 r 
   To estimate number of words, assume a 20 cm radius wafer, and assume that a 
transistor requires 0.01 u2, where 1 u = 1 x 10-6 m).  Thus 12, 500 x 109 transistors are 
available in a single wafer.  The resulting number of w
1G = 2 ).  This corresponds to more than 35 bit of address space, leaving room fo
decoders at the boundary. 
Speed – Distance is assumed 40 cm, so average delay is roughly 320 ns before all buses 
activate.  Any quantum gate for any number of lines in the above architecture can ex
in some similar amount of 
ecute 
time because of the parallel processing.  Post processing time 
hree lines A2, A1, A0 translate to eight states in a state vector called y.  Lines 
A2 and A1 correspond to 4 binary-like codes for the domain of the function.  The A0 line 
t is possible to answer the following questions about a binary 
xample, 0110|0110 is symmetric; 0110|1001 is anti-
 balanced?  Balanced means equal numbers of ones and zeros as the input counts 
f work grows exponentially.  In contrast, a quantum computer (or the above 
is not included. 
 4. EXAMPLE ALGORITHM 
   Three lines can be used to identify a binary function f(x) of x = 2 bits.  Figure 5 illustrates a 
certain function.  T
‘records’ the value of the function.  I
function with only one call to the function.   
1) Is the function constant?  Constant means it gives either all ones or all zeros as its input counts 
from 0 to 2x-1. 
2) Is the function symmetric (or anti-symmetric)?  Symmetric means the existence of symmetry 
in the truth table about the center point.  For e
symmetric. 
3) Is the function
through its full range 0 to 2x-1.   
A classical computer might require up to 2x calls to a function for answering such questions, so 
the amount o
architecture) requires only one call to the function.  The following example shows how the 
algorithm works. 
 
Pre Processing -- Assume initialization to (0 1 0 0, 0 0 0 0), that is, a state vector whose sh
hand notation is  |001 〉 .  After a transform known as Hadamard transform, the state vector, 
inus the normali
ort 
zation, reads (1 –1 1 –1, 1 –1 1 –1).   m
    
Core Processing -- A certain quantum function involving lines A1 and A2 is implemented.    
 
2
A1
A0
A
 
Figure 5.  Example Quantum Algorithm 
 
 
 
4
Burger Quantum Emulator Architecture 
 
 
 
 
 
 
The first SCN transforms the state vector to (1 –1 –1 1, 1 –1 –1 1) as can be seen when 
addresses re ordered as in Table 2.  The second SCN further transforms the state vector to 
(1 –1 –1 1, -1 1 1 –1) as seen when addresses are placed in order (right side). 
 
able 2 XOR Example 
a Re-ordered y Adr  / Data Re-ordered y 
T
A(3)a(2)A(1) y Adr  /  Dat
00 1 000      1 0 1 000      1 1 
001 -1 001      -1 -1 001      -1 -1 
010 1 011      1 -1 011      1 -1 
011 -1 010      -1 1 010      -1 1 
100 1 100      1 1 101      1 -1 
101 -1 101      -1 -1 100      -1 1 
110 1 111      1 -1 110      1 1 
111 -1 110      -1 1 111      -1 -1 
 
Post Processin sing may deco  the obvious an symmetry in  state vector, 
or on y H sforms to s  ba 11 〉 , that tate
0 0, 0 0 nor
 
1) It oned easured ou s 8 
(zero all , ma
components in | 01 fficient will  in a physic ant
computer is that in one cal n, it is not vector is alway
is, it lcul oes.  Conse tly, 
g -- Output proces de ti  the
e may appl adamard tran how a sis vector 8 |1 is, a s  vector (0 0 
0 8).  The 8 implies the state vector is not malized. 
can be reas  that if the m tput i |001  then the function is indeed constant 
 or one for input values).  Another function, for example the AND function y have 
0 but the coe  not be 8.  A akness al qu um 
culatio  certain that the output s |001 .  That 
does not ca ate the coefficient 8 as a classical calculation d quen many runs 
ight be required to provide technical confidence in the result. 
〉
 we〉
〉
symmetric about its center (the comma).   
 
3) Balanced means equal numbers of ones and zeros as the input counts through its 
range 0 to 2x.  Note that the function 01,10 is balanced.  Many functions can be b lanced,
yet neither symmetric nor anti-symmetric.  For example, 0111,0001 is balanced, but is 
NEITHER symmetric nor anti symmetric.  A balanced func
m
 
2) It can be reasoned that if the output is a basis vector other than 8 |00 〉 then the function is 
either symmetric or anti-symmetric (refer to the above definitio
1
ns).  In this example the result is 8 
.  The function truth table|111 〉
Deutsch-Josa algorithm [1]).   The above system obviously computes in polynomial tim
what otherwise requires exponential time, as predicted for special cases [6].   
 
 
 (implied by the wiring diagram) is 01,10; obviously it is 
full 
a  
tion is distinguished by the 
 guaranteed to not have any component in the state |001   (Refer to 
e 
fact that it is 〉  
CONCLUSIONS 
   The above architecture is mainly a scientific curiosity.  Its main purpose is to test larger 
 
 
 
5
Burger Quantum Emulator Architecture 
 
 
ombination 
f bits in the address.  Once an algorithm finishes, solutions to a problem are established 
y reading core memory, and by processing to discover symmetries of interest.   
 inspired by plans for a quantum computer.  Although its applications 
quantum algorithms using existing technology.  Research into quantum algorithms is 
ongoing [7, 8]. 
   The design presented above is characterized by a very large number of very small 
registers, each with real signed integers.  Each has attached to it a relatively large address 
field.  All addresses can be modified in parallel and conditioned on a logical c
o
b
   This concept was
are currently limited, it could be prove useful someday. 
 
REFERENCES 
[1]  M. A. Nielson and I. L. Chuang, Quantum computation and quantum information, 
Cambridge, 2000. 
[2] J. R. Burger, ‘Novel Quantum Computer Emulator Chip,’ Proc. of the 
International Conference on VLSI, VLSI’04, June 21-24, 2004, Las Vegas, Nevada, pp. 
 Yamashita, ‘Transformation 
les for designing CNOT-based quantum circuits,’ DAC 2002 June 10-14, 2002, New 
rleans, Louisiana, USA. 
de, Aditya Prasad, Igor Markov, John Hayes, ‘Reversible logic circuit 
309-315. 
[3]  Kazuo Iwama, Yahiko Kambayashi, and Shigeru
ru
O
[4]  Vivek Shen
synthesis,’ DAC 2002 June 10-14, 2002, New Orleans, Louisiana, USA. 
[5] D. A. Hodges, H. G. Jackson, R. A. Saleh, Analysis and Design of Digital 
Integrated Circuits In Deep Submicron Technology, McGraw-Hill, 2004, Ch. 1
[6]  Leslie Valiant, ‘Quantum computers that can be simulated classically in 
polynomial time, ‘ STOC 2001, July 6-8, Hersonissos, Crete, Greece. 
0. 
[7]  Christof Zalka, Fast Versions Of Shor’s Quantum Factoring Approach, 
http://arXiv.org/abs/quant-ph/9806084, 24 Jun 1998. 
[8]  Christof Zalka, Could Grover’s quantum algorithm help in searching an 
actual database, http://arXiv.org/abs/quant-ph/9901068, 26 Jun 1999. 
 
 
 
6
