e Pauli frame mechanism allows Pauli gates to be tracked in classical electronics and can relax the timing constraints for error syndrome measurement and error decoding. When building a quantum computer, such a mechanism may be bene cial, and the goal of this paper is not only to study the working principles of a Pauli frame but also to quantify its potential e ect on the logical error rate. To this purpose, we implemented and simulated the Pauli frame module which, in principle, can be directly mapped into a hardware implementation. Simulation of a surface code 17 logical qubit has shown that a Pauli frame can reduce the error rate of a logical qubit up to 70% compared to the same logical qubit without Pauli frame when the decoding time equals the error correction time, and maximum parallelism can be obtained.
INTRODUCTION
antum computing is an emerging technology that promises to solve problems which are intractable by classical computers.
antum computers exploit quantum phenomena for computational purposes using qubits. Various implementations of qubits and small quantum systems already exist, and they share one property: qubit states are fragile.
bits interact with the environment and information stored in the qubits tends to get corrupted, which is Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro t or commercial advantage and that copies bear this notice and the full citation on the rst page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permi ed. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior speci c permission and/or a fee. Request permissions from permissions@acm.org. DAC '17, Austin, TX, USA © 2017 ACM. 978-1-4503-4927-7/17/06. . . $15.00 DOI: h p://dx.doi.org/10.1145/3061639.3062300 known as decoherence. As a result, qubits cannot reliably store information for a long time, and quantum operations are error prone.
To enable quantum computing using quantum systems with high error rates, antum Error Correction (QEC) was introduced [15] . QEC allows quantum states to be encoded in logical qubits and errors to be detected based on error syndromes that are obtained by executing Error Syndrome Measurement (ESM) circuits. e error syndromes are decoded using classical algorithms which identify the most likely errors in the system. By using QEC, we can satisfy the demands of quantum algorithms to have qubits with low error rates. Besides from the bene ts, QEC introduces overhead which creates new challenges. ESM and decoding should be performed in as short time as possible to reduce the overhead of QEC. e requirement of fast error decoding introduces high demands on classical algorithms and computational devices.
e concept of Pauli frames was proposed in [12] to loosen the timing constraints on ESM and decoding. A Pauli frame allows detected errors to be tracked in classical electronics, making it unnecessary to apply corrections on qubits. When using a Pauli frame fewer gates need to be applied, and the execution of ESM and decoding can be performed in parallel instead of sequential. Hence, the timing constraints on ESM and decoding are relaxed, making it easier to implement fully functional QEC. As the overhead of QEC is reduced, the error rate of logical qubits can in principle also be reduced.
e contributions of the paper are as follows: (i) we implement and simulate the working principles of a Pauli frame for a Surface Code 17 (SC17) logical qubit and (ii) quantify under what conditions the Pauli Frame Unit (PFU) reduces, up to 70%, the logical error rate.
Our paper is organized as follows: Section 2 provides a background and introduces the relvant quantum concepts used throughout this paper. Section 3 presents the working principles and applications of Pauli frames. Section 4 introduces the heterogeneous antum Computer Architecture (QCA) as proposed by [6] . Our simulation so ware and setup is explained in Section 5 while the simulation results are presented in Section 6. We conclude the paper in Section 7.
BACKGROUND
While classic bits can only be in a 0 or 1 state at a certain point in time, qubits can be in a superposition of both.
bits can be in a linear combination of the two basis states, |0 and |1 , and are therefore represented as: |ψ = α |0 + β |1 , where α, β ∈ C are complex probability amplitudes. e sum of all probabilities within a system is 1, therefore: |α | 2 + |β | 2 = 1. By de ning |0 = 1 0 T and |1 = 0 1 T as a computational basis, we can represent a qubit as a vector |ψ = α β T . When a qubit is measured in the computational basis, it is projected into the |0 or |1 state with probabilities |α | 2 or |β | 2 , respectively. e second feature of qubits that extends the capabilities of classical bits is entanglement.
bits can be entangled with each other, which means that the superposition state of the entangled qubits cannot be represented as a tensor product of individual qubit states.
To manipulate a qubit state, we use quantum gates which can be expressed as unitary matrices.
antum gates are represented as a 2 n ×2 n unitary matrix where n equals the number of qubits the gate acts on. A few common single qubit gates with their corresponding matrices are shown in Equation (1) . Examples of common two-qubit gates are the CNOT and the CZ gate.
antum gates together with initialization and measurement operations can be combined into a quantum circuit to perform quantum computations.
We list three groups of quantum gates [19] which are in our interest: Pauli gates, Cli ord gates, and Non-Cli ord gates. e Pauli gates are a basic group of single qubit gates which includes gates such as the X and Z gate.
e Cli ord group is nite and is de ned as the normalizer of the Pauli group which means that for every Cli ord gate C and Pauli gate P there exists a Pauli gate P such that CP = P C. Examples of Cli ord gates are the H and CNOT gate. All quantum gates that are not in the Cli ord and Pauli group are known as non-Cli ord gates, such as the T and T † gate. Both Cli ord and Non-Cli ord gates are required for universal quantum computing, as explained in [14] .
antum error correction
ere are di erent ways to implement qubits physically, and all of them have one factor in common: physical qubits su er from decoherence. A qubit loses its state in a short period which makes it hard to maintain a quantum state for a long time. Also, the execution of operations on physical qubits are not perfect and can introduce errors. To enable meaningful quantum computation with high delity, QEC was introduced [15] . In QEC a quantum state can be encoded redundantly by entangling multiple physical qubits which form a logical qubit. is logical qubit may have lower error rates than their underlying physical qubits. ese error rates are also referred to as the Logical Error Rate (LER) and Physical Error Rate (PER). A popular QEC code is the surface code [5] which is derived from Kitaev's toric code [11] . In this paper, we will focus on the surface code with 17 qubits encoding a single logical qubit which we will refer to as the Surface Code 17 (SC17). Figure 1 shows a schematic overview of a SC17 logical qubit consisting of 9 data qubits (blue) and 8 ancilla qubits (X /Z ancilla qubits in red/green) while the lines between qubits indicate the allowed two-qubit interactions. e nine data qubits encode the logical qubit state, and the eight ancilla qubits are used to detect possible errors. As described in [5, 20] , we can use the eight ancilla qubits to measure the parities among the data qubits, resulting in 8-bit of parity data where X /Z ancilla qubits yield information about Z /X errors. is process, which exclusively contains Cli ord gates and initialization/measurement operations, is referred to as an ESM and the resulting 8-bit data a er measuring the ancilla qubits is known as an 8-bit error syndrome. Error syndromes can be processed by a decoder which performs a classical graph algorithm and returns the most likely error happened on the data qubits. e identi ed errors, which are always a combination of X and Z errors, can be corrected by performing Pauli gates on the data qubits. Figure 2 presents the di erent steps of the QEC process for a SC17 logical qubit of which the variable descriptions are shown in Table 1 . In the presented schedule, we rst perform a logical operation which is followed by r rounds of ESM. e obtained error syndromes are given to a decoder that outputs a set of errors most likely happened. e errors are corrected, and the cycle is repeated.
e presented schedule performs decoding at run-time enabling the execution of logical non-Cli ord gates, which is required by universal quantum computing. Based on Figure 2 , we have t cycle = t lop + t ec + t d + t c . For current superconducting qubits, t ec and t d will dominate t cycle [18] . To maximize e ciency, t ec and t d should be as short as possible which puts major time constraints on error correction and decoding. In the next section, we will discuss Pauli frames, which is a technique that can ease the time constraints on error correction and decoding. Pauli gates. e Pauli records of all qubits in a quantum system together form a Pauli frame. is idea was rst proposed in [12] but has also been discussed in [2, 4, 9, 13, 19] . Previous research on Pauli frames mainly discusses the theoretical working principles, but do not take into account their implementation. In this section, we present the basic mechanism of a Pauli frame and study the impact of a Pauli frame in the context of a heterogeneous QCA as proposed by [6] . A single Pauli record R q tracks all the Pauli gates that are applied on qubit q. Due to the mathematical properties of Pauli gates, every set of tracked Pauli gates can be reduced to one of the elements in the set {I, X , Z , XZ }. Hence, every Pauli record R ∈ {I, X , Z , XZ } and requires a two-bit memory. As a result, a system with n qubits requires 2n bits of memory for the Pauli frame.
To be able to update Pauli records at run-time, the Pauli frame needs to be aware of all operations applied on the qubits. erefore the Pauli frame can be seen as a quantum operation lter. All qubit operations and returned measurement results pass through the lter and can be modi ed by the Pauli frame. To make the Pauli frame system suitable for universal quantum computing it should be able to handle ve types of quantum operations. (i) Initialization of a qubit q will result in clearing the Pauli record of the corresponding qubit, R q = I . (ii) A measurement operation on qubit q passes the Pauli frame system, but the returned measurement result m q can be corrected based on the current state of its Pauli record. For instance, if m q = +1 and R q = X then −m q is returned. (iii) Pauli gates are directly stored in the Pauli frame and do not require to be physically applied on the qubits. (iv) As mentioned in Section 2, the group of Cli ord gates is de ned as the normalizers of the Pauli group which means that Cli ord gates map Pauli records to new valid Pauli records. A er mapping the Pauli records, the Cli ord gate is still applied on its target qubits. (v) e execution of non-Cli ord gates requires the Pauli records of the target qubits to be ushed (i.e. physically apply the Pauli gates stored in R q on qubit q and clear the Pauli record R q = I ) before the non-Cli ord gate can be executed. e execution steps for the di erent qubit operations are summarized in Table 2 .
A QUANTUM COMPUTER ARCHITECTURE WITH PAULI FRAME
Multiple papers [2, 4, 9, 12, 13, 19] have covered the topic of Pauli frames, and [3, 16, 17, 21] have discussed various architectures for quantum computer so ware and hardware, but no practical implementations of a Pauli frame for future quantum computers have been presented so far. In this section, we provide a high-level description of how a Pauli frame can be implemented as part of a QCA and we will discuss the expected bene ts which directly relate to the logical error rate. 
Bene ts
e most interesting application of a Pauli frame is to use it for physical qubits in combination with QEC. In such a structure, correction gates, which are all Pauli gates, can be directly stored in the Pauli frame, reducing the number of gates being applied on the qubits. Also, logical Pauli gates can be directly stored in the Pauli frame and do not need to be applied on the physical qubits. To quantify the potential impact such a mechanism can have, we analyzed some benchmarks provided with the Sca CC compiler [8] and found that the resulting quantum circuits contain up to 6% Pauli gates. We found that compiled quantum programs contain 20 to 50% non-Cli ord T and T † gates which require ushing of the involved Pauli records. Flushing can be prevented by applying T and T † gates using particular ancilla states and Cli ord circuits as discussed in [5, 14] . So the rst bene t is that Pauli gates can be processed faster and with a delity of 100% which can potentially reduce the error rate of a logical qubit.
e second bene t is directly related to the rst but taps in QEC as correction gates for detected errors are always Pauli gates, and ESM circuits only contain Cli ord gates. Hence, we can track correction gates without the need to ush while performing ESM. As a result, the QEC system does not have to wait for the decoder to generate corrections and apply them before execution can continue. By eliminating this dependency, we can create a new execution schedule which is shown in Figure 3 . e new schedule e ectively removes the time reserved for applying corrections and allows parallel execution of error correction and decoding. For the new schedule with Pauli frame t cycle PF = max t ec + t lop , t d . As a consequence of the more e cient schedule, we can perform the same number of cycles in less time compared to the system without Pauli frame, potentially reducing the LER. On top of that, the new schedule also eases the timing constraints on t ec and t d .
Implementation
A heterogeneous QCA is proposed in [6] which supports the execution of QEC and logical operations for a single SC17 logical qubit implemented with transmon qubits. Figure 4 shows a simpli ed version of the proposed architecture which focuses on the antum Control Unit (QCU) part of the QCA. e QCU decodes the instructions belonging to the antum Instruction Set Architecture (QISA), inserts QEC routines, and manages feedback control.
e QCU can communicate with the host CPU where classical computations are executed. e QCU outputs a sequence of timed quantum operations which are forwarded to the Physical Execution Layer (PEL).
e PFU consists of a Pauli frame (PF data) and mapping logic (PF logic) and works closely together with the Pauli arbiter. Figure 5 shows a detailed schematic of the PFU. All quantum operations will pass the Pauli arbiter which will decide if the operation is forwarded to the PFU, the PEL, or both. We distinguish the following di erent situations. (i) Pauli gates are only forwarded to the PFU where the PF logic module will store it in the Pauli record of the target qubit.
(ii) Cli ord gates are forwarded to both the PFU and the PEL. e PF logic module will map the Pauli records of the target qubits to new valid records based on the type of Cli ord gate. (iii) Initialization operations are also forwarded to both the PFU and the PEL where the PF logic module will reset the Pauli record of the target qubit. (iv) Non-Cli ord gates are directly forwarded to the PFU which ushes the Pauli records of the target qubits and forwards the pending Pauli gates to the Pauli arbiter.
e Pauli arbiter again forwards the received Pauli gates and the non-Cli ord gate to the PEL. (v) Measurement operations received by the Pauli arbiter are only forwarded to the PEL. A er the PEL has performed the measurement, the outcome is forwarded to the PF logic block which will read the Pauli record of the corresponding qubit and correct the measurement outcome if required. e corrected measurement result is forwarded to other parts of the QCU.
SIMULATION SETUP
To quantify the impact of the Pauli frame mechanism, we developed the antum Platform Development framewOrk (QPDO) which allows us to simulate a SC17 logical qubit with and without a Pauli frame. QPDO is a so ware package that can simulate quantum execution platforms and has a layered structure where each layer can implement di erent functionality. Layers can be combined to create various control stacks allowing simulation of di erent platforms. Simulations are performed by supplying a stream of operations to a control stack. antum simulations are not conducted by QPDO, but by external tools which are connected to QPDO as a back-end simulation layer. QPDO connects to the universal QX Simulator [10] which allows simulation of arbitrary quantum circuits. e second simulation back-end is the CHP stabilizer simulator [1] which allows e cient simulation of Cli ord circuits based on the Go esmanKnill theorem [7] .
QPDO represents a quantum circuit as a set of qubit operations divided into discrete time slots where we assume that every operation takes a single time slot to execute. Operations in a single time slot are executed in parallel and qubits can only be assigned to one operation per time slot. During our simulations, the ESM circuits for X and Z ancilla qubits are performed in parallel as shown in [20] . e result is an ESM circuit containing a total of 48 operations divided over eight time slots. Hence, t ESM = 8 time slots. Table 3 summarizes which time slot contains what operations.
To introduce errors in our quantum system, we developed a QPDO layer that implements the symmetric depolarizing error model as presented in [14, 20] . In this model, the PER p is the probability of an error occurring while executing a single operation on a physical qubit where idling is also considered to be an operation.
Time slot Description 1
Initialize X ancilla qubits. 2
Initialize Z ancilla qubits and apply H gates on X ancilla qubits.
3-6
CNOT gates data and ancilla qubits. 7 Apply H gates on X ancilla qubits. 8
Measure all ancilla qubits. e depolarizing model assumes that errors are independent and that the error probability is the same for each quantum operation.
Decoding of the error syndromes is done using a Rule-Based Look-Up Table ( RBLUT) decoder as presented in [20] . e RBLUT decoder is speci cally designed for the SC17 and uses a sliding window of three ESM rounds to detect errors. Every window uses one ESM result from the previous window as shown in Figure 6 which means that every cycle contains two rounds of ESM. Hence, r = 2 and t ec = 16 time slots. All corrections can always be applied in a single time slot, therefore t c = 1 time slot.
e Pauli frame is implemented as a layer which allows us to add a Pauli frame to any platform easily. e control stack used for our simulations consists of a CHP simulation back-end with an error layer on top. e simulations that use a Pauli frame also add a Pauli frame layer on top of the error layer. e full control stack used for our simulations is shown in Figure 7 .
Logical error rate calculation
By simulation, we can nd the LER P L of a SC17 logical qubit for di erent values of PER p. In [20] , P L is de ned as the probability of a logical error happening within a single cycle where no logical operations are performed, t lop = 0 time slots. During a simulation, a SC17 logical qubit is initialized to an error-free state before we repeatedly execute cycles. A er every cycle, we check if a logical error occurred. is procedure is repeated while counting the number of cycles executed C and the total number of logical errors detected m until m reaches a prede ned maximum value. P L corresponding to a given p can be wri en as:
RESULTS
We performed separate LER simulations for X L /Z L errors, with a maximum of 20 logical errors per simulation, using the test setup shown in Figure 7 . Simulations were performed for a PER ranging from 1.0 × 10 −4 to 1.0 × 10 −2 with a step size of 1.0 × 10 −4 . For every PER we take 50 samples with and without Pauli frame, and the average of the resulting LER per sample yields our nal result. We found that the combination of 50 samples and a maximum of 20 logical errors per simulation yields good precision in reasonable simulation time. We performed simulations with decode time t d = {0, 8, 16, 24} time slots, equivalent to {0, 0.5, 1, 1.5} · t ec , and the resulting graph for X L errors is shown in Figure 8 . e graph for Z L errors is not shown since it is equivalent to the graph for X L errors, which is expected when using a symmetric depolarizing error model. Results for the test setup without Pauli frame are plo ed with squares while the results with Pauli frame are plo ed with circles. e results for the test setup with Pauli frame and t d = {0, 8, 16} time slots are equivalent and therefore only one of them is plo ed. We also added the line x = and vertical dashed lines which indicate the intersection between the linear interpolated results and the line x = , also known as the pseudo-threshold. For a PER lower than the pseudo-threshold, the LER is lower than the PER which means that we bene t from using QEC.
From Figure 8 , we can see that for every value of t d the system with Pauli frame has a lower LER than the system without Pauli frame. For the system without Pauli frame, the LER increases directly when t d increases while for the system with Pauli frame the LER only increases when t d > t ec where in our simulations t ec = 16 time slots. e Pauli frame e ectively reduces the cycle time t cycle by allowing ESM and decoding to be performed in parallel which results in a reduced LER.
To see if the reduction in t cycle by using a Pauli frame is proportional to the observed reduction in LER P L , we plo ed the relative reduction by using a Pauli frame R PF () for both over the range t d = [0, 40] time slots where the relative reductions are de ned as: Figure 9 shows the resulting graphs where for the relative reduction in LER (red squares) only data for t d = {0, 8, 16, 24} time slots is available. From Figure 9 , we can see that the reduction in LER appears to be proportional to the reduction in t cycle with a proportionality constant ≈ 1.4. e maximum relative reduction by using a Pauli frame can be found at t d = 16 time slots which is the point where t d = t ec and maximum parallelism can be obtained.
CONCLUSIONS
We presented an implementation of a PFU and quanti ed through simulation what the potential bene ts are of using it. Based on our analysis of the application of a Pauli frame for systems using QEC, we can conclude that a Pauli frame enables us to perform ESM and decoding in parallel. Hence we can create a more e cient execution schedule for a SC17 logical qubit that reduces the cycle time t cycle . Besides that, the new execution schedule also relaxes the timing constraints on the ESM and decoder. Simulation has shown that as a result of the reduced t cycle , the LER of a SC17 logical qubit can be reduced up to 70% when the time required for error correction equals the decoding time t ec = t d . On the basis of these results, including a PFU in e.g. a QCA such as proposed in [6] pays o in terms of reduction in the LER. is payo is maximal when the decoder time is equal to the time needed for error correction as there is no idle time induced by either operation. e e ect goes down if those times are no longer equal. Future work will involve verifying if the observations also hold for SC49 and to embed a Pauli frame in a larger architectural simulation platform.
