1 The PFA gate decomposes into two V gates, a V+ gate and six CNOT gates. Each V gate and V+ gate has a T-count of 3. [29] [33] [18] 2 The MIG gate decomposes into two V gates, a V+ gate and four CNOT gates. [33] 3 The NFT gate decomposes into two V gates, a V+ gate and seven CNOT gates. [34] 4 The DCZ or double controlled Z gate has a T-count of seven. [35] 5 The RPA gate decomposes into a V gate, a V+ gate and three CNOT gates. [31] 6 The Peres gate has a T-count of seven. [18] 7 The Fredkin gate has a T-count of seven. [18]
Abstract-Quantum circuits of arithmetic operations such as addition are needed to implement quantum algorithms in hardware. Quantum circuits based on Clifford+T gates are used as they can be made tolerant to noise. The tradeoff of gaining fault tolerance from using Clifford+T gates and error correcting codes is the high implementation overhead of the T gate. As a result, the T-count performance measure has become important in quantum circuit design. Due to noise, the risk for errors in a quantum circuit computation increases as the number of gate layers (or depth) in the circuit increases. As a result, low depth circuits such as quantum carry lookahead adders (QCLA)s have caught the attention of researchers. This work presents two QCLA designs each optimized with emphasis on T-count or qubit cost respectively. In-place and out-of-place versions of each design are shown. The proposed QCLAs are compared against the existing works in terms of T-count. The proposed QCLAs for out-of-place addition achieve average T gate savings of 54.34% and 37.21%, respectively. The proposed QCLAs for in-place addition achieve average T gate savings of 72.11% and 35.87% respectively.
I. INTRODUCTION
Quantum circuits of arithmetic operations are needed to design quantum hardware for implementing quantum algorithms such As integer factoring, searching and quantum mechanical simulation. Quantum circuits for addition are fundamental building blocks crucial to implementing these quantum algorithms [1] [2] [3] [4] [5] . Thus, researchers have invested considerable effort in designing quantum adders such as ripple carry adders or carry lookahead adders (QCLAs) [6] [7] [8] [9] [10] [11] .
Quantum circuits possess properties that make them distinct from circuits for other technologies. For example, there is a one-to-one relationship between the inputs and outputs in a quantum circuit. As a result, the quantum circuit designer will face additional sources of circuit overhead such as ancillae and garbage output. Ancillae are constant inputs to the quantum circuit. Garbage output are any circuit output that is not a circuit input or needed output. To make full use of the qubit resources of the quantum machine, the garbage output will need to be cleared. This process will add to the overall qubit cost and gate cost of a quantum circuit and is discussed in [12] .
Physical quantum computers are prone to noise errors [13] [14] [15] . Quantum circuits based on Clifford+T gates have caught the attention of researchers because can be made Himanshu [15] [16] . This set is universal and has been used to realize basic reversible logic gates and larger functional blocks [17] [18] [16] [19] [20] . However, fault tolerance comes with increased implementation overhead especially the overhead associated with implementing the T gate [16] [21] [22] . The high cost to implement the T gate has caused the measures of T-count and T-depth to become important measures to evaluate quantum circuit cost [17] [14] [23] [9] .
QCLAs have caught the interest of researchers because they perform the addition operation in order O(log(n)) circuit depth (while the ripple carry adder has a depth of O(n)). Lowdepth circuits such as QCLAs have use in quantum hardware applications where longer computation time increases the risk of errors [24] [25] [26] . Several QCLA designs have been proposed in the literature such as [27] or [8] . Designs that return the sum on ancillae (or out-of-place QCLA) and designs that replace one of the primary inputs with the sum (or in-place QCLA) have been proposed. Table I and Table II summarizes the existing work. Table I presents existing out-ofplace QCLAs and Table II presents in-place QCLAs. Further discussion on the existing work is in Section VII. The T-count is 14 · n − 21. 5 The Fredkin gate has a T-count of seven. [18] To overcome shortcomings in existing works, this work proposes a family of designs for QCLAs. The first design (FT-QCLA1) is optimized with an emphasis on T-count. The second design (FT-QCLA2) is optimized with an emphasis on number of qubits. All proposed designs enjoy reduced T gate cost and qubit cost compared to the existing works. In-place QCLA implementations (In-FT-QCLA1 and In-FT-QCLA2) and outof-place QCLA implementations (Out-FT-QCLA1 and Out-FT-QCLA2) for each design are shown. All proposed QCLAs can be made fault tolerant with error correcting codes. The proposed QCLA designs are based on the NOT gate, CNOT gate, Toffoli gate, logical AND gate and uncomputation gate. The logical AND gate and uncomputation gate are presented in [19] and [23] . The proposed QCLAs are compared against existing works and shown to be superior in terms of qubit cost and T-count.
This work is organized as follows: Section II provides background on the Clifford+T quantum gate set, introduces the logical AND gate and the uncomputation gate. Section III presents the proposed out-of-place QCLAs. Each design (Out-FT-QCLA1 and Out-FT-QCLA2) is addressed in its own subsection within Section III. Section IV illustrates the comparison of the proposed out-of-place QCLAs against existing out-of-place QCLAs. Section V presents the proposed in-place QCLAs. Each design (In-FT-QCLA1 and In-FT-QCLA2) is addressed in its own subsection within Section V. Section VI illustrates the comparison of the proposed in-place QCLAs against existing in-place QCLAs. Section VII provides a review of the existing work in quantum carry lookahead addition.
II. BACKGROUND

A. Quantum Gates
Hermitian of T Gate The proposed QCLA circuits in this work are based on the Clifford+T quantum gate set shown in Figure 1 . The Clif-ford+T gates have caught the interest of researchers because the gate set can be made tolerant to noise errors [13] [14] [15] . The Clifford+T gate set is universal in nature permitting the fault tolerant quantum realization of any function of interest [39] [17] . However, the fault-tolerance comes with the tradeoff of the high implementation costs of the T gate relative to the Clifford gates. Details on the fault tolerant implementation of the T gate (and its associated costs) are in [21] [22] . The T gate is required because the Clifford gates by themselves are not universal [17] [14] [39] [16] . The high implementation cost of the T gate has made T-count and T-depth metrics of interest to evaluate quantum circuit performance. Existing quantum computers (such as those shown in [25] ) have a small number of qubits available. As a result, the number of qubits in a quantum circuit is an important cost measure. We now define the T-count, T-depth and qubit cost resource measures.
• T-count: T-count is the total number of T gates used in the quantum circuit. [18] for this work (see figure 3 ). Figure 2 shows the temporary logical-AND gate and the uncomputation gates used in this work. The Clifford+T implementation and the graphical image are shown for each gate. The design of the logical-AND gate used in this work is presented in [23] . The T-count for these logical gates is as follows: The Toffoli gate implementation designed in [18] has a T-count of 7, the temporary logical-AND gate has a T-count of 4 and the uncomputation gate has a T-count of 0.
B. Carry Look Ahead Addition
Carry look ahead addition (CLA) has caught the attention of researchers because carry lookahead addition can perform addition in O(log(n)) time while alternative adders such as ripple carry addition requires O(n) time. Thus a CLA circuit can complete the addition operation with a critical path depth of order O(log(n)) whereas the critical path for ripple carry addition has a depth of order O(n). The decrease in computation depth comes at the cost of added circuitry.
Given two n bit inputs a and b, the CLA circuit generates the sum of a and b by executing the CLA algorithm illustrated in Figure 4 . The CLA algorithm is an established technique to quickly perform addition [40] [41] . The CLA algorithm can be divided into two basic steps: (i) implement generate (g i ) and propagate bits (p i ) for each bit of the inputs a and b and (ii) compute the sum s by computing
Algorithm 1: Carry lookahead Addition
Function CLA(a, b) Requirements: //Takes 2 n bit values a and b as input. //Returns the sum as an n + 1 bit number s. For i = 0 to n − 1 
III. PROPOSED DESIGNS OF OUT-OF-PLACE QCLA CIRCUITS
We now show our proposed out-of-place QCLA circuits. The proposed QCLA circuits have lower T-count and qubit cost than existing works. The QCLA designs save T gates by using the temporary logical-AND gate, the existing uncomputation gate or proposed uncomputation gate where possible. The out of place FT-QCLA1 (or Out-FT-QCLA1) is optimized for T-count. We also propose an out of place FT-QCLA2 (or Out-FT-QCLA2) which is optimized for qubit cost. The design methodologies of the proposed QCLAs are generic and each can be used to implement a QCLA circuit of any size.
The proposed out-of-place QCLA circuits operate as follows: Given 2 values a and b (each n bits wide) stored in quantum registers A and B as well as n + 1 ancillae stored in register X. X 0 is set to 0 and the remaining locations are set to A where A = 1
. Lastly, Out-FT-QCLA2 requires a register Z with n − w(n) − ⌊log(n)⌋ elements (where w(n) = n − ∞ y=1 ⌊ n 2 y ⌋ and is the number of ones in the binary expansion of n) all set to A where A = 1
. Out-FT-QCLA1 requires a register Z with 3·n−2·w(n)−2⌊log(n)⌋ ancillae set to A. At the end of computation, A and B are restored to their initial values and X will contain the sum of the addition of a and b. Lastly, Z is transformed into a register of classical states. These qubits can be restored to computational basis values for reuse as ancillae.
This Section is organized as follows: The proposed design of Out-FT-QCLA1 is shown in Section III-A. The proposed design of Out-FT-QCLA2 is shown in Section III-B.
A. Proposed Out-of-Place FT-QCLA1 Design (Out-FT-QCLA1)
The out-of-place FT-QCLA1 (Out-FT-QCLA1) is optimized for T-count. The proposed QCLA is based on the quantum NOT gate, CNOT gate, logical AND gate and uncomputation gate. The steps of the proposed design methodology to realize Out-FT-QCLA1 are shown along with an illustrative example of the QCLA circuit in Figure 5 . are unchanged and the ancillae will have the result of computation. The ancillae will be renamed p[j,k]. The equation for indexes are j = 2 t · m, k = 2 t · m + 2 t , and l = 2 t · m + 2 t−1 , respectively. • Step 4 (G-rounds): We use a nested loop in this step.
For t = 1 to ⌊log(n)⌋ and For m = 0 to ⌊ n 2 t ⌋ − 1: At locations g[j, l], p[l, k], and g[l, k] apply a Logical AND gate and uncomputation gate pair such that locations g[j, l] and p[l, k] are unchanged and location g[l, k] will have the result of computation. Location g[l, k] is renamed to g[j, k]. The equation for indexes are j = 2 t · m, k = 2 t · m + 2 t , and l = 2 t · m + 2 t−1 , respectively. will be restored to its original value. The equation for indexes j = 2 t · m, k = 2 t · m + 2 t , and l = 2 t · m + 2 t−1 . • Step 7: This step has two sub-steps:
-Sub-step 1: 
Out-FT-QCLA2 is optimized for qubit cost. A trade-off for the reduced qubit count is an increase in the T gate cost of Out-FT-QCLA2. We reduce the qubit cost by replacing logical AND gate and uncomputation gate pairs with alternative Toffoli gate implementations such as the design in [18] were appropriate. To retain a low T-count, Out-FT-QCLA2 does still incorporate logical AND gate and uncomputation gate pairs. We save qubits because the logical AND gate and uncomputation gate pairs have a qubit cost of 4 while implementations of the Toffoli gate (like the design in [18] ) have a qubit cost of 3. However, alternative Toffoli gate implementations have higher T-counts (such as the design in [18] with a T-count of 7). The design methodology for Out-FT-QCLA2 is identical to the methodology for Out-FT-QCLA1 except Step 4 and Step 5. In Step 4 and Step 5, Toffoli gates based on the design in [18] are used. An illustrative example of Out-FT-QCLA2 is in Figure 6 . The T-count of Out-FT-QCLA-1 is shown for each Step of the proposed design methodology. Total T-count is determined by summing the T-count for each Step of the proposed design methodology. This design uses logical AND gate and uncomputation gate pairs to implement the Toffoli gate as shown in [19] and [23] . The pair has a T-count of 4. The total T-count is
Step 1 uses n logical AND gates. The T-count for this step is 4 · n. • Step 2 does not need T gates. • Step 3 uses n − w(n) − ⌊log(n)⌋ logical AND gates. The T-count for this step is 4 · (n − w(n) − ⌊log(n)⌋). • Step 4 uses n − w(n) logical AND gate and uncomputation gate pairs. The T-count for this step is 4 ·(n− w(n)). • Step 5 uses n − ⌊log(n)⌋ − 1 logical AND gate and uncomputation gate pairs. The T-count for this step is 4 · (n − ⌊log(n)⌋ − 1). • Steps 6 through 8 does not need T gates.
B. T-count analysis of Out-FT-QCLA-2
The T-count of Out-FT-QCLA-2 is shown for each Step of the proposed design methodology. Total T-count is determined by summing the T-count for each Step of the proposed design methodology. This design uses the Toffoli gate implementation in [18] which has a T-count of 7. The total T-count is 22 · n − 11 · w(n) − 11 · ⌊log(n)⌋ − 7 where w(n) = n − ∞ y=1 n 2 y . • Step 1 uses n logical AND gates. The T-count for this step is 4 · n. • Step 2 does not need T gates. • Step 3 uses n − w(n) − ⌊log(n)⌋ logical AND gates. The T-count for this step is 4 · (n − w(n) − ⌊log(n)⌋). • Step 4 uses n − w(n) Toffoli gates. The T-count for this step is 7 · (n − w(n)). • Step 5 uses n − ⌊log(n)⌋ − 1 Toffoli gates. The T-count for this step is 7 · (n − ⌊log(n)⌋ − 1). • Steps 6 through 8 does not need T gates. Table III 1 26 · n 6 · n + 1
C. Cost Comparison of Proposed Out-of-place QCLAs 1) Cost Comparison in Terms of T-count:
1 Circuits modified to remove garbage output. We use the methodology in [12] to remove the garbage output. 2 Out-FT-QCLA1 is optimized emphasizing T-count. 3 Out-FT-QCLA2 is optimized emphasizing qubit cost. Table III indicates that all proposed out-of-place QCLAs have qubit costs of order O(n). The existing works also have a T-count of order O(n). Out-FT-QCLA1 requires 2 · n + w(n) + ⌊log(n)⌋ − 1 additional ancillae compared to Out-FT-QCLA2. The added qubit cost of Out-FT-QCLA1 illustrates the trade-off between qubits and T gates that occurs when logical AND gate and uncomputation gate pairs are replaced by Toffoli gates in the QCLA implementation.
2) Cost Comparison in Terms of Qubits:
Out-FT-QCLA2 requires w(n)+ ⌊log(n)⌋ fewer qubits than the design by Thapliyal 
V. PROPOSED IN-PLACE QCLA CIRCUITS
We now show our proposed in-place quantum carry lookahead (QCLA) circuits. The proposed QCLA circuits have lower T-count and qubit costs than existing works. The proposed in-place FT-QCLA1 (or In-FT-QCLA1) and the proposed in-place FT-QCLA2 (or In-FT-QCLA2) save T gates by using the temporary logical-AND gate and uncomputation gate where possible. In-FT-QCLA2 is optimized for qubit cost and the In-FT-QCLA1 is optimized for T gates. The methodologies of the proposed in-place QCLAs are generic and each can be used to implement a QCLA circuit of any size.
The proposed QCLA circuits operate as follows: Given 2 values a and b (each n bits wide) stored in quantum registers A, B and an n bit register Z of ancillae set to a where A = 1 √ 2 |0 + e iπ 4 |1 . Lastly, In-FT-QCLA2 QCLA requires a n − w(n) − ⌊log(n)⌋ (where w(n) = n − ∞ y=1 n 2 y and is the number of ones in the binary expansion of n) bit register of ancillae (X) set to 1 √ 2 |0 + e iπ 4 |1 . In contrast, register X of the In-FT-QCLA1 has 3·n−2·w(n)−2⌊log(n)⌋ ancillae set to A. At the end of computation, A is restored to their initial values and B will contain sum bits 0 through n − 1 of the addition of a and b. For each QCLA, at the end of computation, Z will contain the sum bit s n and the remaining locations in Z are transformed into a register of classical states. These qubits can be restored to initial values for reuse as ancillae. In-FT-QCLA1 and In-FT-QCLA2 will transform the remaining locations in X to a register of classical states. These qubits can be restored to initial values for reuse as ancillae. This Section is organized as follows: In-FT-QCLA1 is shown in Section V-A. The In-FT-QCLA2 is shown in Section V-B.
A. Methodology of the Proposed In-Place FT-QCLA1 (In-FT-QCLA1)
The T gate optimized fault tolerant QCLA is targeted for quantum hardware that can support the resource requirements needed for fault tolerant quantum computation. Because of the high implementation cost of the fault tolerant T gate, this QCLA is optimized for T-count. The proposed QCLA is based on the quantum NOT gate, CNOT gate, logical AND gate and the uncomputation gate presented in [23] . The steps of the methodology to implement the proposed In-FT-QCLA1 are shown along with an illustrative example of the QCLA circuit in Figure 7 . and will be renamed to the value g[j, k]. The equations for the indexes are j = 2 t · m, k = 2 t · m + 2 t , and l = 2 t · m + 2 t−1 . n] has the sum bit s n . For i=0 to n-2: At location p[i, i +1] apply a NOT gate. Location p[i, i +1] will have the sum bit s i .
B. Proposed In-Place FT-QCLA2 Design (In-FT-QCLA2)
In-FT-QCLA1 is optimized for T-count. A trade-off for the reduced T-count of In-FT-QCLA1 is an increase in the qubit cost of In-FT-QCLA1. We can reduce the qubit cost by replacing logical-AND gate and uncomputation gate pairs with alternative Toffoli gates implementations (such as the design in [18] ) where appropriate. We save qubits because the logical AND gate and uncomputation gate pairs have a qubit cost of 4 while implementations like the design in [18] have a qubit cost of 3. However, alternative Toffoli gate implementations have higher T-counts (such as the design in [18] with a T-count of 7).
The steps of the methodology to implement In-FT-QCLA2 are identical to the methodology to implement In-FT-QCLA1. To implement In-FT-QCLA2, replace the logical AND gate, uncomputation gate pairs with a Toffoli gate in Step 4, Step 5, Step 11 and Step 12. An illustrative example of the In-FT-QCLA2 circuit is in Figure 8 . 
A. T-count analysis of In-FT-QCLA1
The T-count of In-FT-QCLA1 is shown for each Step of the proposed design methodology. Total T-count is determined by summing the T-count for each Step of the proposed design methodology. The total T-count is 20·n−8·w(n)−4·⌊log(n)⌋− 8·w(n−1)−4·⌊log(n−1)⌋−8 where w(n) = n− ∞ y=1 n 2 y . • Step 1 uses n logical AND gates. The T-count for this step is 4 · n. • Step 2 does not need T gates. • Step 3 uses n − w(n) − ⌊log(n)⌋ logical AND gates. The T-count for this step is 4 · (n − w(n) − ⌊log(n)⌋). • Step 4 uses n − w(n) Toffoli gates. The T-count for this step is 4 · (n − w(n)). • Step 5 uses n − ⌊log(n)⌋ − 1 Toffoli gates. The T-count for this step is 4 · (n − ⌊log(n)⌋ − 1). • Steps 6 through 9 does not need T gates. • Step 10 uses n − 1 − w(n − 1) − ⌊log(n − 1)⌋ logical AND gates. The T-count for this step is 4 ·(n− 1 − w(n− 1) − ⌊log(n − 1)⌋). • Step 11 uses n − ⌊log(n − 1)⌋ − 2 Toffoli gates. The T-count for this step is 4 · (n − ⌊log(n − 1)⌋ − 2). • Step 12 uses n − 1 − w(n − 1) Toffoli gates. The T-count for this step is 4 · (n − 1 − w(n − 1)). • Steps 13 through 16 does not need T gates.
B. T-count analysis of In-FT-QCLA2
The T-count of In-FT-QCLA2 is shown for each Step of the proposed design methodology. Total T-count is determined by summing the T-count for each Step of the proposed design methodology. The total T-count is 40 · n − 11 · w(n) − 11 · ⌊log(n)⌋ − 11 · w(n − 1) − 11 · ⌊log(n − 1)⌋ − 32 where w(n) = n − ∞ y=1 n 2 y . • Step 1 uses n logical AND gates. The T-count for this step is 4 · n. • Step 2 does not need T gates. • Step 3 uses n − w(n) − ⌊log(n)⌋ logical AND gates. The T-count for this step is 4 · (n − w(n) − ⌊log(n)⌋). • Step 4 uses n − w(n) Toffoli gates. The T-count for this step is 7 · (n − w(n)). • Step 5 uses n − ⌊log(n)⌋ − 1 Toffoli gates. The T-count for this step is 7 · (n − ⌊log(n)⌋ − 1). • Steps 6 through 9 does not need T gates. • Step 10 uses n − 1 − w(n − 1) − ⌊log(n − 1)⌋ logical AND gates. The T-count for this step is 4 ·(n− 1 − w(n− 1) − ⌊log(n − 1)⌋). • Step 11 uses n − ⌊log(n − 1)⌋ − 2 Toffoli gates. The T-count for this step is 7 · (n − ⌊log(n − 1)⌋ − 2). • Step 12 uses n − 1 − w(n − 1) Toffoli gates. The T-count for this step is 7 · (n − 1 − w(n − 1)). • Steps 13 through 16 does not need T gates. : Table IV indicates that the proposed in-place QCLAs have qubit costs of order O(n). The existing works also have a T-count of order O(n). In-FT-QCLA2 requires w(n) + ⌊log(n)⌋ fewer qubits than the design by Thapliyal et al. and has the same qubit cost as the designs by Draper et al. and Trisetyarso et al. Further, In-FT-QCLA2 requires n + w(n) + ⌊log(n)⌋ − 1 fewer qubits than the designs by Takahashi et al. in [8] and [36] . In-FT-QCLA1 achieves its T gate savings with only an order O(1) increase in qubits compared to the existing work.
C. Cost Comparison of Proposed In-Place QCLAs 1) Cost Comparison in Terms of T-count:
≈ 84 · n − 56 3 · n − 1 Mogenson (Design 2) 2 ([37]) ≈ 84 · n − 56 3 · n − log(n) − 1 In-FT-QCLA1 3 20n − 8w(n) − 8w(n − 1) − 4⌊log(n)⌋ − 4⌊log(n − 1)⌋ − 8 6n − 2w(n) − 2⌊log(n)⌋ In-FT-QCLA2 4 40n − 11w(n) − 11⌊log(n)⌋ − 11w(n − 1) − 11⌊log(n − 1)⌋ − 32 4n − w(n) − ⌊log(n)⌋ + 1 w(n) = n − ∞ y=1 n 2 y 1
VII. EXISTING WORK
Quantum carry lookahead adders (QCLA) have caught the attention of researchers and many have contributed many designs to the literature. Several designs such as [29] [31] target reversible computing and, therefore, produce significant garbage output. Other designs such as [42] present promising designs but they are not generic, prohibiting scaling the designs to alternative input qubit lengths. Designs that can be implemented on quantum hardware include [29] and [28] .
In-place and out-of-place QCLAs have been proposed. We discuss the existing work for in-place and out-of-place QCLAs in separate sections.
A. Out-of-Place QCLAs
Existing out-of-place QCLAs that can be implemented on quantum hardware include [29] [31] [28] [30] [32] . Table I summarizes important performance measures for these QCLAs. T-count, qubit cost, gates used and if the design produces garbage output are shown. The designs in [29] and [31] produce garbage output and must be made garbageless before use in quantum algorithms. As a result, the T-count cost is doubled and the qubit cost is increased by at least n + 1. The reported T-count and qubit cost reflect the added cost from removing garbage outputs. The designs in [28] [30] and [32] have no garbage outputs and can be used as is. The design in [31] offers the lowest T-count in exchange for a O(1) increase in qubit cost compared to existing works. The designs in [28] [30] and [32] achieve the lowest qubit cost with only a modest O(1) increase in T-count compared to more T-count efficient works such as [31] . The design in [29] has the highest resource costs of the existing works. These are all interesting works that offer options with reduced qubit or T gate costs. However, with advances such as recent T gate efficient Toffoli gate implementations shown in [23] , we have designed quantum QCLAs that have reduced T-count compared to these works. Further, we can design quantum QCLAs that offer T-count savings yet maintain comparable qubit costs.
B. In-Place QCLAs
Existing in-place QCLAs that can be implemented on quantum hardware include [28] [8] [30] [32] [27] and [37] . Table II summarizes important performance measures for these QCLAs. T-count, qubit cost, gates used and if the design produces garbage output are shown. All the designs have no garbage outputs and can be used as is. The in-place QCLA in [27] is based on CNOT, Toffoli gates and Multiple Control Toffoli gates. To decompose a multiple control Toffoli gate into quantum gates, first the multiple control Toffoli gate must be decomposed into Toffoli gates. One multiple control Toffoli gate translates into 2 · n − 3 Toffoli gates which has a corresponding T-count of 14 · n − 21 (using the Toffoli gate implementation in [18] ). As a consequence, the design in [27] has a T-count of order O(n 3 ) which becomes prohibitively costly for large n. The other existing works have more reasonable T gate costs of order O(n). Of these works, the designs by Mogenson have the lowest T-count and can perform their operation with only 3O(n) qubits. Among the works with T gate costs of order O(n), the design in [8] is the most costly in terms of T-count and qubit costs. Prior works such as [30] [32] and [37] present QCLAs that offer low T-count, low qubit cost and no garbage outputs. However, with the recent T gate efficient Toffoli gate implementations shown in [23] , we have designed quantum QCLAs that have reduced T-count compared to these works. Further, we have designed quantum QCLAs that offer T-count reduction with only a modest O(1) qubit cost increase compared to existing work.
VIII. CONCLUSION
In this work, we propose quantum circuits for carry lookahead addition. We present proposed designs for in-place QCLAs and out-of-place QCLAs. We present three designs for in-place QCLAs and three designs for the out-of-place QCLAs. The in-place FT-QCLA1 (In-FT-QCLA1) and out-ofplace FT-QCLA1 (Out-FT-QCLA1) are optimized for T-count. The in-place FT-QCLA2 (In-FT-QCLA2) and out-of-place FT-QCLA2 (Out-FT-QCLA2) are optimized for qubit cost while providing low T gate cost. The proposed QCLAs are based on NOT gates, CNOT gates, Toffoli gates, logical AND gates, uncomputation gates as well as a proposed uncomputation gate for near term quantum hardware. These designs are compared and shown to have reduced T gate and qubit costs compared to the existing work. We conclude that the proposed inplace QCLAs and out-of-place QCLAs can be used in larger quantum data-path circuits where gate count and/or qubit cost is of concern. We also conclude that the proposed QCLAs can be used to increase the amount of computation possible on quantum hardware with limited coherence times.
