Abstract. Optimizing arithmetic primitives such as quantum-dot cellular automata (QCA) adders is important for investigating high-performance QCA computers in this emerging nano-technological paradigm. In this paper, we demonstrate that QCA ripple carry adder and bit-serial adder designs actually outperform carry-look-ahead and carry-select adder designs because of the increase in required interconnects. Simulation results obtained by using the QCADesigner tool for the proposed adder designs are also presented.
Introduction
Scaling of the predominant silicon complementary metal-oxide semiconductor (CMOS) technology is finally approaching its limit after decades of exponential growth. This has prompted the development of many nano-scale molecular devices [1] . Quantum-dot cellular automata (QCA) is one promising emerging nanotechnological paradigm. Initially developed by Prof. Craig Lent at the University of Notre Dame, QCA performs computation not on electron flow but on Coulombic interactions of electrons trapped in quantum dots [2] . The properties of extremely high density, low power, and potentially high processing speed make QCA one of the most attractive alternatives for CMOS. At the IEEE ISCAS 2004 conference, QCA was identified as a promising research area for the CAS and VLSI societies.
Although a scalable physical realization of QCA for high speed computing has not emerged, some experimental devices and circuits have been created as a proof of concept [1] [2] [3] . QCA devices have been fabricated and tested based on metal dots on an oxidized silicon substrate [1] [2] [3] . The operating temperatures for these devices were approximately 80mK. Also, molecular QCA devices (1~10nm) are being explored, and are predicted to operate at room temperature. Circuits using molecular QCA can be built at an estimated density of up to 1013 devices per cm2, and operated as fast as 2.5THz with a theoretical maximum of 25THz.
QCA is currently being studied in both devices and circuits level. This paper focuses on the QCA circuit design and analysis, which is one of the fundamental parts in QCA research. Similar to CMOS, QCA circuit optimizations are targeting high-speed and area-efficient design. Since QCA devices such as wires, gates are based on QCA cells rather than CMOS transistor, QCA circuit requires different design methodologies compared to CMOS design.
In this paper, we first summarize the design techniques of QCA clocking and interconnect. Recent theoretical work suggested clocked control of QCA circuitry by periodically modulating interdot barriers, in which the design speed is determined completely by the clocking. Also, we found out that QCA coplanar crossover is not reliable. Thus, we suggest that a multi-layer QCA interconnect structure could be used. The multi-layer interconnect can reduce the total area required to implement a circuit. This study is importance for many QCA circuit designs. Using the clocking and interconnect techniques, we then carry out a study to obtain efficient QCA adders in this paper. Unlike standard CMOS technologies that often benefit from parallel architectures, QCA technology is differentiated by its fine pipelining and multiple clocking zones, which benefit more from classical serial techniques.
QCA Circuit Architectures QCA Basics. The operation of a QCA cell has been demonstrated experimentally in a four-dot system, which is based on the interaction of bi-stable QCA cells constructed from four quantum dots. The cell is charged with two electrons, which are free to tunnel between adjacent dots. These electrons tend to occupy antipodal sites as a result of their mutual electrostatic repulsion. Thus, there exist two equivalent energetically minimal arrangements of the two electrons in the QCA cell as shown in Fig. (1a) . These two arrangements can represent logic '1' and '0' respectively so that binary information can be encoded.
The basic logic gate in QCA is the majority gate. It can be realized with 5 QCA cells, as shown in Fig.   (1b) . Assuming the inputs are A, B and C, the logic function of a majority gate is ( , , ) M A B C AB AC BC = + +
(1). Then, logic AND and OR functions can be implemented from a majority gate by setting one input permanently to '0' or '1', respectively. Along with the inverter, the majority gate forms a universal logic set and can be used to implement any logic function.
(a) QCA cell (b) QCA majority gate Figure 1 . QCA cell and majority gate.
QCA Clocking. QCA clocking controls information flow around the circuit and it also enables power gain in QCA devices. Moreover, clocking can avoid the problem that the QCA array may not settle in its energetically minimal state when the inputs are switched abruptly. A physical schematic of a QCA clocking system is shown in Fig. 2 . A single surface of QCA cells is located in the x-axis and a series of buried wires run in the z direction underneath them. Voltages applied to the wires produce an electric field at the QCA surface that affects the cells͎͆ activity state (latched, relaxed, or switching). The QCA surface, containing a line of QCA cells, is found on the x axis. Wires buried under this surface run perpendicular to the page and induce an electric field at the QCA surface when a voltage is applied to them. The electric field in the ŷ direction influences the cell's state: latched, relaxed, or switching; and not its polarization: '1' or '0', which will be determined by the cell's neighbor. The voltage signals on the wires are periodic and adjacent wires have a π/2 phase shift between them, so that every fourth wire will have the same applied signal. Fig. 3 shows the voltage signals on four adjacent wires, which are referred to as the four-phase clocking signal (clocking zones). The signal on the buried wires will induce a roughly sinusoidal clocking electric field that propagates across the QCA surface so as to control the information flow over QCA cells. QCA D-Latches. Voltages applied to the wires produce an electric field at the QCA surface that affects the cells͎͆ activity state (latched, relaxed, or switching). When the clock signal is low the cell is latched. When the clock signal is high the cell is relaxed and has no polarization. In between these states the cell is either latched or relaxed (switching), entirely determined by its neighbors. Since the cells in one clocking zone become latched and remain latched until the next group of cells is latched, they can be considered as comprising a D-latch . For example, a clocked QCA wire is shown in Fig. 4 . If C0, C1, C2 and C3 (different shades of gray) denote the four clocking zones of a clock cycle, then, cells D0, D1, D2 and D3 will be considered as a series of D-latches. QCA Latency. If a section of the wire is replaced with a majority gate having the same clocking zone, as shown in Fig. 5 , adding the majority gate does not increase the delay, or clocking latency, of the wire compared to that without a majority gate. In QCA, the design speed is determined completely by the largest number of clocking zones between input and output. We will refer to delay in terms of clocking latency, since the number of clocking zones between two signal points dominates the delay. Minimizing the number of clocking zones leads to a better design. Although majority gates themselves do not directly increase the clocking latency, interconnects which associated with connecting each majority gate together do increase the overall latency. Therefore a design that consists of fewer gates will, in most cases, requires less interconnects and therefore has lower overall latency.
554
Nanoscience and Technology QCA Multi-layer Interconnect. As shown in Fig. 6 , two QCA wires, one comprised of regular cells (90° cells in horizontal wire) and the other comprised of rotated cells (45° cells in vertical wire), were believed to be able to crossover each other without interfering. Recently, however, QCA coplanar wire crossings do not perform in what was previously supposed. Thus, a multi-layer QCA interconnect structure should be used. The Coulombic interaction between regular cells (90° cells) can be described by the kink energy E kink associated with the energetic cost of two cells having opposite polarization. E kink can be expressed as between cells. From Eq. 2, we can estimate that E kink between cells 2 and 3 in Fig. 6 is about 1/32 that of the adjacent cells (r cell2,3 = 2r adjacent_cells ). Thus, due to the weak coupling (Coulombic interaction), the information in the horizontal wire will fail to propagate across the gap between cells 2 and 3 when the effect of neighboring cells is not ignored. Previous work has examined the possibility of multi-layer QCA, using these multi-layer QCA cells we can effectively cross signals over on another layer. To do this, we require a vertical interconnect. By stacking cells one on top of another we can transmit the signal to another layer where the signal is again transmitted horizontally. The multi-layer crossover is shown in Fig 7. The vertical separation between cells can be tuned to match the E kink of the horizontally cells. Unlike present integrated circuits, where metal layers are used to connect discontinuous sections of a circuit and cannot perform any intelligent functions, the extra layers of QCA can be used to create circuits. Thus, the multi-layer interconnect can reduce the total area required to implement a circuit.
QCA Addition
In this section, we first review the existing 1-bit QCA full adder designs. Then, for the n-bit case, several QCA bit-parallel and bit-serial adders have been studied. Here, it is point out that the new proposed layouts differ from those in previous publications [14, [17] [18] [19] , because it is implemented by using a new multi-layer feature of the QCADesigner tool, a design and simulation tool at the University of Calgary. A 1-bit QCA full adder was first proposed by P. D. Tougaw and his co-workers in 1994 [14] . Since then, the QCA full adder design has been improved. A 1-bit QCA full adder in [3] reduces the hardware count by two majority gates and one inverter. The expressions for the sum and carry output for this adder are n-bit QCA Bit-parallel Adders. Based on the 1-bit full adder, an n-bit QCA ripple carry adder can be constructed. A QCA carry-lookahead adder and a carry-select adder can be constructed using the optimization method introduced in [3] . The architectures of these adders are the same as those using CMOS.
The QCA ripple carry adder is made by cascading n 1-bit QCA full adders in series. The overall latency can be obtained by considering the critical path from the carry input of the 1st bit to the sum output of the nth bit. Since the carry output of a QCA full adder only requires 1 majority gate, the carry-propagation chain to the nth bit will be (n-1) majority gates and the critical path will consist of (n+1) majority gates and 1 inverter (2 majority gates and 1 inverter for the sum output of the nth bit adder). If each majority gate (inverter does not require an independent clocking zone) occupies a clocking zone, the overall latency will be (n+1) clocking zones. A ripple carry adder is considered the slowest bit-parallel adder design in CMOS circuits. However, the QCA ripple carry adder has better performance with respect to area and speed considerations compared to both the QCA carry-lookahead adder and carry-select adder designs. This is due to the fact that the carry-lookahead adder and carry-select adder consist of more majority gates and more clocking zones in their critical paths as shown in Table 1 . Each majority gate or inverter has a clocking zone. Thus, the total number of clock cycles for a QCA adder depends on the number of majority gates and inverters in its critical path. For example, since a 4-bit QCA ripple carry adder consists of 4 QCA 1-bit full adders in series with a critical path of 5 majority gates, the latency for this adder is 1 1/4clock cycles (5 clocking zones in total). From Table 2 , we can see that the 4-bit QCA ripple carry adder takes up less area and has a lower calculation time compared to the other two adders. We have also used these 4-bit adders as building blocks to build several large word-length adders, a 32-bit adder, and have verified this result. Thus, the ripple carry adder is the fastest and also the most areaefficient of the three bit-parallel adder designs. QCA Bit-serial Adder. Based on the 1-bit QCA full adder, a 1-bit QCA bit-serial adder was proposed by A. Fijany [19] and can be used for implementing of n-bit addition. As shown in Fig. 9 , it is constructed with 3 majority gates and 2 inverters, with the feedback controlled by the D-latch (represented by a clocking zone). The QCA bit-serial adder is very small, simple and relatively fast. The calculation of n sum bits for this adder will take 4n clocking zones (4 clock cycles). Thus, for n-bit case, the QCA bit-serial adder will be more area efficient but slower than the QCA ripple carry adder. It is our opinion that QCA ripple carry adder and QCA bit-serial adder will be basic arithmetic structures for QCA multiple-bit addition in the future, acting as a high-speed version and an area-efficient version respectively.
Conclusion
In this paper, a multi-layer interconnect technique is proposed for QCA. We also show that QCA ripple carry adder and bit-serial adder designs outperform carry-look-ahead and carry-select adder designs in terms of speed and area requirement.
