Quantum-dot fabrication and characterization is a well-established technology, which is used in photonics, quantum optics, and nanoelectronics. Four quantum-dots placed at the corners of a square form a unit cell, which can hold a bit of information and serve as a basis for quantum-dot cellular automata (QCA) nanoelectronic circuits. Although several basic QCA circuits have been designed, fabricated, and tested, proving that quantum-dots can form functional, fast and low-power nanoelectronic circuits, QCA nanoelectronics still remain at its infancy. One of the reasons for this is the lack of design automation tools, which will facilitate the systematic design of large QCA circuits that contemporary applications demand. Here we present novel, programmable QCA circuits, which are based on crossbar architecture. These circuits can be programmed to implement any Boolean function in analogy to CMOS field-programmable gate arrays and open the road that will lead to full design automation of QCA nanoelectronic circuits. Using this architecture we designed and simulated QCA circuits that proved to be area efficient, stable, and reliable.
I. INTRODUCTION
Q UANTUM-DOT cellular automata (QCA) is a promising nanoelectronic technology, in which information is stored as configurations of electron pairs in coupled quantum dot arrays. In QCA circuits these arrays are used to implement Boolean logic functions [1] . More specifically, taking advantage of the quantum mechanical effects, QCA significantly reduce the size of digital circuits and operate at high speeds in very low power levels. QCA cells change their states due to interactions with neighboring cells via electrostatic or magnetic fields. Consequently, QCA, instead of using ranges of voltages or currents to represent binary values, they use electron localization in quantum dots. QCA integrated circuits Manuscript Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCAD. 2016.2618869 have been implemented in densities up 1012 cells/cm 2 and the circuit switching frequency can be close to a terahertz [2] , [3] .
Since 1993 when QCA were introduced by Lent et al. [4] , several logic gates, circuits and design rules [5] - [7] have been proposed such as the binary wire [8] , the majority gate, AND, OR, NOT, and XOR gates [9] , bit-serial adder, full adder [10] - [12] , multiplier [13] , multiplexer [14] - [16] , flipflop [17] - [19] , arithmetic logic units [20] , [21] , and serial or parallel memories [22] - [24] . QCA circuits are generally stable, very fast and they consume very small amounts of energy, but the lack of design automation tools that will enable the design and simulation of large circuits and the lack of a scalable and modular architecture that will facilitate circuit fabrication do not allow the full development of this promising nanoelectronic technology. Furthermore, programmable prefabricated QCA circuits, such as microelectronic fieldprogrammable gate array (FPGA) circuits, are expected to boost the use and applications of such circuits published in the literature.
In this paper, we propose a novel design method of implementing Boolean functions using programmable QCA crossbar circuits. Crossings of horizontal and vertical nanowire lines form a crossbar, which is considered as one of the most promising solutions for nanoelectronic circuit architectures [25] , because of its fabrication simplicity and the inherent redundancy, which supports defect tolerance [26] - [30] . Its favorable properties include a periodic geometry, straightforward fabrication procedures, and a very compact definition of devices and interconnections, facilitating large-scale fabrication and ultrahigh device density [25] . In the architecture proposed here, a QCA logic gate is formed at each cross-point of the crossbar. The logic gate can be programmed to operate as an OR, AND, or NOT gates. These programmable gates form a universal Boolean set and any Boolean logic function can be implemented using this architecture, leading to QCA circuits that can execute any computation task.
Furthermore, in order to provide designers with as much as possible flexibility a detailed methodology is introduced to enable the robust and efficient design of the corresponding QCA circuits. The proposed method takes into account the input/outputs as well as the considered programming lines of the crossbar architecture to implement Boolean functions and standard QCA circuits with the help of QCADesigner simulation tool [31] . The timing issues of QCA cells and gates are successfully handled in every case with a cascadable, easy 0278-0070 c 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
to follow way. Moreover, while the limiting factors of existing QCA clock schemes have been already studied in [32] , in this paper a complete QCA design methodology addressing among others also issues of QCA circuits' routing is successfully proposed. In addition, in some cases the resulting QCA circuits use less unit cells and occupy less area compared to the state of the art QCA circuits. The structure of this paper is as follows. In Section II, the necessary background for the QCA circuits is provided. In Section III, the proposed programmable basic logic QCA gates, i.e., AND, OR, and NOT, that are formed at line crossings are presented. In Section IV, the design method of programmable crossbar QCA circuits is described analytically. In Section V, the proposed method is applied and evaluated by implementing several Boolean functions. The corresponding results are discussed and compared with well-known QCA circuits published in the literature. Finally, the conclusions and future perspectives are drawn in Section VI.
II. QCA PRELIMINARIES
The basic building block of QCA devices is the QCA cell. It consists of four quantum dots in a square array coupled by tunnel barriers. The physical mechanisms for interactions between dots are the Coulomb interactions and quantum-mechanical tunneling. Electrons are able to tunnel between the dots, but cannot leave the cell. If two excess electrons are placed in the cell, in the ground state and in absence of external electrostatic influence, Coulomb repulsion will force the electrons to dots on opposite corners [4] .
For an isolated cell, the two polarization states are energetically equivalent. However, when a neighboring QCA cell is near, the equivalency breaks and only one of the two states becomes the cell ground state [4] . Polarization, P measures the extent to which the charge distribution is aligned along one of the diagonal axes. If we label the four dots from 1 to 4 anti-clockwise starting from the upper right dot of the cell, and assign ρ i as the electron density of the ith dot, P is defined as
The stability of QCA circuit is based on the assumption that the system falls to the ground state every time is excited by the inputs. However, this is not always guaranteed so in this case the system settles in a metastable state, affecting the functionality of the design. This problem can be solved by adiabatic switching [1] . In adiabatic switching the system is always kept in its instantaneous ground state by using a clocking scheme sequence of four periodic phases. QCA cells receive the clock signal through an electric field which can raise or lower the tunneling barrier between dots inside the cell. When the barrier is low the electrons can move from one dot to another according to the overall external electrostatic influence. In case of high barriers the electrons are locked inside the dots so the external fields cannot change the state of the cell. The adiabatic switching clocking scheme consists of four phases: 1) Switch; 2) Hold; 3) Release; and 4) Relax. The adiabatic switching clocking scheme is implemented by applying four clock signals to the QCA circuit in order to control the clocking phase of each QCA cell in the circuit. In order to use the adiabatic switching clocking scheme the QCA circuit must be partitioned into clocking zones in such a manner so that all cells in a clocking zone are controlled by the same clock. The clocking zone partitioning is a crucial factor of the QCA design, because the order of appearance of the four clocking zones in the circuit controls the flow direction of the signals inside the QCA circuit.
At the beginning of the Switch phase, the QCA cells in the zone are unpolarized since the cell tunneling barriers are low. During the switch phase the barriers are raised and the QCA cells become polarized according to the state of the cells that drive the zone. The driver cells must belong to a different clocking zone and specifically at the hold phase (90 • phase difference, leading). Switch phase is the clock phase that the actual computation (or switching) occurs. At the end of switch phase, barriers are high enough to block any electron tunneling and this locks the state of the cell. Next phase is the Hold phase. During this phase the barriers remain high so the zone outputs can drive the inputs of the next clocking zone subcircuit. The next zone (which is driven from our reference zone) must be at switch state (90 • phase difference, leading). At the Release phase, barriers are gradually lowered and finally cells are allowed to relax to an unpolarized state. Finally, during the fourth clock phase, the Relax phase, cell barriers remain lowered and cells remain in an unpolarized state. Fig. 1 presents an adiabatic clocking scheme application example on a binary wire. In the example the state of the two upper QCA cells constitute zone 0 is propagated gradually to the bottom zone 3 cells, according to the clocking mechanism presented above. The diagram in the figure shows the clocking phases of each zone for the duration of the signal propagation.
A row of QCA cells acts like a wire usually called binary wire [8] . In QCA circuit designs of binary wires, inverters and three-input majority gates are the fundamental parts. The inverters are constructed with a fork structure and the majority gates with a cross structure [9] . Coplanar binary wire crossovers can also be implemented in QCA designs which is a very useful technique for designing more realizable circuits [9] .
Recently, significant advancements were made in the QCA technology. To be more specific, there are papers like [33] that demonstrate new methods for the construction of QCA circuits. Wolkow et al. [33] demonstrated that their method can produce Semiconductor QCA circuits that are functional in room temperature. Actually they manifested that "the material stability . . . is comparable to conventional electronics." The proposed design methodology is developed with the intention to be applicable in Semiconductor QCA technologies. A methodology for converting the proposed methodology to other QCA technologies is feasible and future research is expected to focus on this.
III. PROGRAMMABLE LOGIC GATES
FOR QCA CROSSBAR CIRCUITS Crossings of "horizontal" and "vertical" QCA wires form programmable majority gates which have been fabricated and tested [3] . QCA majority gate formed at such a crossing. The gate comprises five cells in each direction. The three inputs to this gate are the top, left, and bottom cells, A, B, and C and the output is the right cell. The output is equal to the majority of the input states. The central cell, which is the one that performs the calculations is called the "device cell" and in every case is in the same state as the output. One of the three inputs of the majority gate can be used as a "program line." Let us assume, that the program line comes in from the top input cell, namely B, (of course any one of the input cells can serve as a program line due to symmetry), and the rest of the two inputs are free to be set at any state. In such a case, one can see that if the program line state is set to one, the majority gate actually performs the OR Boolean logic, while when the program line is set to zero, the majority gate becomes an AND gate. Therefore, the majority gate can be programmed to function as an OR or an AND gate by setting the state of any one of its inputs to logic one or zero and keeping it constant during the operation of the gate. Although QCA wire crossings can be programmed to function as AND or OR gates, they cannot be programmed to operate as NOT gates by setting any two inputs to any logic value.
The frequently used QCA NOT gate [ Fig. 2(a) ] is based on the fact that the input signal comes from the left of a QCA wire and splits into two parallel offset wires. The signal is inverted at the right end of the circuit forming thus a NOT gate. As it can be easily observed, such a gate cannot be integrated in the crossbar architecture, due to the fact that its structure is different from the one the crossbar demands.
Consequently, in the context of this paper, a new implementation of a QCA NOT gate is proposed. As presented in Fig. 2(b) , the proposed NOT gate is suitable for the design of QCA circuits in crossbar architectures, because it can be implemented in a cross point of the crossbar. In particular, the proposed NOT gate has the same cell topology as the aforementioned majority logic QCA gate, namely it comprises , we find out the energy (A) is always certainly larger than the energy (B), regardless of the value of the indifferent input signal. This can be explained by considering the operation of the adiabatic clocking scheme as follows. At the proposed NOT QCA implementation input cell is clocked by clk3, cells 1-4 are clocked by clk0, and cells 5-9 are clocked by clk2. So when the output cells (cells 3 and 4 in Fig. 3 ) are in Switch phase (which is the phase that the cell's state is evaluated), the effect of the input cell value on them is strong because input cell is in Hold phase. At the same time, the effect of the value of cells 5-9 ( Fig. 3 ) on output cells 3 and 4 are weak because they are in Release phase, which means that according to the adiabatic clocking scheme, as the time advances cells 5-9 tend to go to Relax phase. A QCA cell in Relax phase could not propagate energy to its neighboring cells. Summing up, the output cell value depends on left and bottom binary wire (cells 1-4). Also calculating the polarization of these four cells [using (2)], we find out that cells 1 and 2 have the same polarization, and the same goes for cells 3 and 4. As for the inversion, it takes place between cells 2 and 3.
This theoretical estimation is confirmed by two of the most widespread simulation tools, QCADesigner [31] and QCAPro [34] . The probabilistic error model used by these tools is presented in [35] . This model estimates the QCA circuit output error by calculating the ground state polarization probabilities of the QCA cell circuits. A brief description of the model is given below.
The steady state polarization of a QCA cell can be derived from the Hamiltonian of the cell using the Hartree approximation [34] . Expression of Hamiltonian is shown as follows:
where the sums are over the cells in the local neighborhood. E k is the "kink energy" or the energy cost of two neighboring cells having opposite polarizations. f i is the geometric factor capturing electrostatic fall off with distance between cells. P i is the polarization of the ith cell and, γ is the tunneling energy between two cell states, which is controlled by the clocking mechanism. The notation can be further simplified by usingP to denote the weighted sum of the neighborhood polarizations P i and f i . Using this Hamiltonian the steady state polarization is given by
Equation (3) can be written as
where E = 0.5σ i E k P i f i , the total kink energy, = E 2 kP 2 /4 + γ 2 , the Rabi frequency, and = ( /kT) is the thermal ratio. We use the above equation to arrive at the probabilities of observing (upon making a measurement) the system in each of the two states. Specifically, P(X = 1) = ρ ss 11 = 0.5(1 + P ss ) and P(X = 0) = ρ ss 00 = 0.5(1P ss ), where we made use of the fact that ρ ss 00 + ρ ss 11 = 1. Here ρ ss 11 (ρ ss 00 ) is the probability of observing the system (in particular a QCA cell) in state 1 (state 0).
Calculation of power dissipation of the proposed NOT QCA circuit is also provided. The nonadiabatic power dissipation model presented in [36] and [37] has been used for this reason. According to this model the equation for instantaneous power is given as follows:
where − → λ is the coherence vector and − → is the 3-D energy vector, respectively. The first term captures the power in and out of the clock and cell to cell power flow. We are concerned with the second term (P diss )
which represents the instantaneous power dissipated. Power dissipated during switching can thus be calculated by integrating P diss over time.
The comparison of the proposed QCA NOT gate with the standard QCA NOT gate, prove its functionality. More specifically, in Fig. 4 the polarization drop maps of the two gates are shown. The polarization of the output cells is the opposite of the polarization of the input cells for both circuits; while, the comparison of the darkness of the two output cells prove that the proposed QCA NOT gate is almost as stable as the standard QCA NOT gate. The polarization drop data of the two gates confirm the above. More specifically, the polarization drop data for the input and the output cells for both gates are presented in Table I . These data, as in the case of Figs. 4 and 5 are provided by the QCAPro simulation tool [34] . QCAPro makes use of nonadiabatic model presented in [37] , to estimate switching power losses in a QCA circuit. This model was derived from the quasi-adiabatic model presented in [36] . It should be noted that QCAPro provides the worst case (upper bound) estimation of polarization drop and power dissipation. It should be noted that QCAPro provides the worst case (upper bound) estimation of polarization drop and power dissipation, namely QCAPro derives the pessimistic estimation of polarization probability of individual QCA cell. When this probability is approaching 1 the polarization is +1, and when this probability is approaching 0 the polarization is −1. The presented in Table I data are these probabilities.
Finally, Fig. 5 shows the power dissipation maps for the two gates. Fig. 6 illustrates two simple circuits, each of which contains only one of the prementioned QCA NOT gates and the same input is applied into both of them. The results of both implementations of the QCA NOT gates are demonstrated in Fig. 7 . By observing these results, it is obvious that the proposed QCA NOT gate is able to invert both states with high reliability and stability.
Moreover, in Fig. 8 the functionality of the proposed NOT gate is analyzed for all possible input combinations combined with all possible values of the indifferent signal I. The results are shown in Fig. 8 . These results prove the right functionality of the proposed NOT gate because they match with the theoretical values which are seen in NOT gate's logic table presented in Table II . For all values of signal I, the input signal InA is always inverted in the output signal OutA, which is located at the lower cell of the crossbar design. The successful inversion of input InA in all possible cases, no matter which values take input signal I as presented in Fig. 8 , corroborate the high reliability and stability of the inversion succeeded by the proposed NOT gate. We would like to emphasize that the only condition for this inversion is the proper timing of the input-output related cells shown in Fig. 2 with green color, which are triggered by the same clock, while the rest five cells (blue color) of signal I and the indifferent most right cells are triggered by another clock. The two clocks do not overlap and this timing asymmetry compensates for the spatial asymmetry of our NOT gate, resulting in the inversion of both "1" and "0" with the same reliability.
IV. DESIGN METHODOLOGY AND PROGRAMMING
OF CROSSBAR QCA CIRCUITS In this section, we will describe the method for designing and programming crossbar QCA circuits that implement any given Boolean functions. The programming of crossing points to form the basic logic gates described in the previous section must follow certain rules that will be described later on. The crossbar architecture demands on one hand an appropriate location for each gate of the circuit, corresponding to suitable connection between the gates of the circuit and on the other hand a proper timing, meaning that each cell should be triggered by a specific clock wave at the correct time phase. We want to develop a design methodology for programmable QCA crossbar circuits in analogy to the CMOS FPGA circuits. However, the combination of the QCA circuits with the crossbar architecture is not only a novel task but also a quite promising one, because of the resulting stability in conjunction with adaptability and regularity. As Fig. 9 shows the input signals should be applied to the one side (left), while the output signal should be extracted from the opposite side (right). Program cells are located at the top and the bottom of the crossbar and their polarization is determined by the designer, during the design process. That means that the same circuit can implement a different Boolean function by changing one of the program cells polarization.
The following design rules should be followed.
A. Inputs/Outputs
As shown in the crossbar architecture of Fig. 9 , each input signal should be applied to the one side of the crossbar, while the desired output should be extracted from the opposite side. This way there will be no conflict between the inputs and outputs of the circuit, since the incoming and outgoing signals will be in different sides. In the cases treated in this paper, without loss of generality, the input cells are the ones on the left side of the crossbar while the output ones are located on the right side. For the sake of brevity, the left side of the crossbar will be called input side, while the right one output side.
B. Program Cells
The cells located at the top and bottom of the crossbar, are the cells used to program the circuit. These cells are "anchored," i.e., their polarization is constant and is determined (by the designer) during the design process. This provides the designer the ability to program the circuit so that it can execute any Boolean function. Actually program lines manage to change the executed Boolean function by changing one input of the majority gate used by the designer. Thus the same majority gate operates like an OR gate, when the designer applies logic "1" as the third input from the program lines, and like an AND gate, when the logic "0" is applied. If there is no need of a specific value, the value of these program cells is indifferent and does not affect the function of the rest of the circuit. A characteristic example of this indifferent value is demonstrated in Fig. 2 , where the proposed QCA NOT gate functions correctly and properly despite of the value of the cell called Indifferent. Otherwise, this desired value enters the circuit through these cells. At this point it should be noticed that the value inserted from the top or bottom of the crossbar remains constant during the circuit operation and, therefore, it is not handled as a separate input. Thus, the top and bottom cells will referred to as program lines.
C. Branches
The circuit should not have any branches, given that the crossbar architecture does not support them.
D. Basic Gates
Each circuit should contain only the basic gates already analyzed in Section III. That means that every other gate, i.e., XOR gate of the circuit should be constructed using the basic gates that are considered to be the basic modules of the circuit.
E. Gate Placement
The cross point where each gate is located into the crossbar is of great importance. Each OR and AND gate operation stems from the majority logic gate. As a short reminder the last one presupposes three input signals. In particular, the programming input signal is different in both gates and is essential in order for them to operate appropriately. However, this signal cannot be considered as an input. Hence, it should be inserted to the main circuit through the program line of the crossbar, as described earlier in this section. This fact implies that the implementation of these gates is only available next to the program lines, namely next to the bottom/top side. The NOT gate can be placed wherever into the circuit, since it does not necessitate any extra input signal.
After dealing with the aforementioned parameters that the designer should take into account, the next step of the proposed design method refers to the dimensions of the resulting QCA circuit in conjunction with crossbar architecture. These dimensions can be defined according to the following design rules.
1) The number of crossbar horizontal lines N lines should be at least equal to the number of the circuit inputs, i.e., N inputs and less or equal to the sum of the number of the circuit inputs and the number of the circuit NOT gates N NOTgates as follows:
This inequality express the facts that each input needs at least one line in order to be inserted to the circuit and that each NOT gate is used, demands two lines to be implemented in the crossbar.
2) The number of crossbar vertical columns N columns , corresponding to the program lines, is directly associated to the number of the circuit stages. The term "stage" refers to the minimum number of columns N columns needed in order for all the gates of the circuit to be placed. In every circuit there are routes from every input to the output. Introspectively, the determination of the N stages emanates from the number of the gates are placed in the largest route, given that it indicates the minimum N columns needed so that all signals exit the circuit. At this point, it is worth mentioning that the NOT gates, through which a signal passes, are not taken into account, since a NOT gate, when available, can be implemented into a single column and therefore, its input signal is able to enter two different gates at a single column. However, as already mentioned, there are cases where this incorporation of gates in one column is not possible, e.g., the incorporation of three AND gates in one column is not feasible, given that they demand program cells that according to the aforementioned rule are located to the top/bottom of the crossbar. Consequently, in worst case scenarios where no incorporation is achievable, each gate occupies one column and therefore, the maximum number of columns needed equals to the number of the circuit gates N gates . In that way, the crossbar N columns is described by the following inequality: Fig. 10 depicts the application of these two basic design rules for a Boolean circuit. For example, let us assume, a simple circuit with three inputs and four gates, two of which are NOT gates. N lines , i.e., 4 of the example circuit is greater than N inputs , i.e., 3, but less than the sum of N inputs and N NOTgates , i.e., 4. This arrives from the fact that each NOT gate occupies two lines, (please refer to Section III). However, in case of the first (top) NOT gate, there is an already available line while in the second one there is not. On the other hand, N columns is less than N gates , i.e., 4, and in particular is equal to the N stages , i.e., 2, since both input signals In1 and In2 pass though one AND and one OR gate. Hence, the first stage contains the AND gate and one NOT gate, while the second NOT gate is incorporated in the same column with the OR gate, constituting the second stage. The next step of the design procedure refers to the timing of each cell. The timing is determined and linked to the clock function. The clock signal for each single cell is quite crucial, given that the timing of the signal should be accurate in order for the circuit to operate properly.
Initially, input cells of the input side are activated first, triggered by the first clock, namely Clock0. Then, all other cells connected to these input cells are triggered by the same clock (Clock0), creating in that way wires of cells functioning under the same clock. Thus, the input signal is able to be propagated from the input cell through the main circuit.
Next, the timing of the gates follows. The timing of both OR and AND gates, implemented by a majority gate, is essential, and every gate's cell must operate under the same clock. This clock should be the very next to the one operating the input cells of each gate. On the other hand, NOT gate does not need a different clock, since its output/inverted signal can function under the same clock with its input signal. In other words, the cells of OR and AND gates change their operating clock, but the cells of NOT gate do not.
The design and timing of a gate is followed by the next step, in which the operating clocks of cells are defined. As expected when a signal passes through a few cells, it can be transferred without any alteration. However, this does not happen when the signal comes in through more than 6-7 cells in a row. So, in order to achieve a better circuit's reliability, the designer has to keep the number of cells in the same clock zone under this limit. In order for the gate to operate properly, all its input cells should be triggered by the same clock. That means that in cases where the signals arrive at a gate unsynchronized, the gate cannot generate the expecting output. Thus, the wires of cells leading to the gate inputs should change their operating clock earlier, namely before they come into the gate, resulting in possible alternation of clocks before the given limit of 6-7 cells.
Apart from the aforementioned wire length, another important factor regarding the operating clocks definition is that of the interaction of adjacent cells. In particular, an indifferent or no longer needed signal of a cell should not influence its adjacent cells. This is feasible by the proper timing of that Fig. 11 . Timing of cells and gates. Clock0 is represented by green color, Clock1 by purple, Clock2 by blue, and Clock3 by white color, respectively. An AND gate is depicted and its result drives another part of the circuit, while it is also an output of the circuit. cell, i.e., when the operating clock of that particular cell is set to be a following clock, or to be a clock that does not have any relevance with the clocking of this cell.
If, for example, a desired signal is propagated through the Clock1 and meets in a cross-point an indifferent signal, then the clock signal should be set to be Clock3, given that these two clocks do not overlap.
The produced by the proposed methodology circuit is presented in Fig. 10(b) . For visual comparison reasons, the same circuit without the entire crossbar extra unit cells, i.e., consisting only of the smallest number of QCA cells corresponding to circuit's functionality is depicted in Fig. 10(a) . Fig. 11 illustrates a characteristic example of the timing process described earlier. The input cells of the AND gate are triggered by the first clock, Clock0-all of them at the same time, while the gate is operating at the next clock, namely Clock1. The wire that carries the desired signal is operating under Clock1 for seven cells and then its clock is set to Clock2. The first from the right in Fig. 11 program line carries an indifferent for the rest of the circuit signal and thus, its cells are functioning under Clock3, which is the second following in line clock after Clock1.
It should be noticed that the designer should repeat the above steps that refer to the timing of the cells and the gates as many times as needed, in order to successfully accomplish the desired design.
In conclusion, the proposed design method of crossbar QCA circuits introduced in this paper, as depicted in diagram form in Fig. 12 , can be summed up as follows.
1) Provide a universal set of programmable QCA Boolean logic gates. 2) Implement the corresponding circuit without branches with the help of the previous set. The cases of circuits with branches should be also examined. According to the universality of the provided Boolean set, it is expected that a circuit with branches to be redesigned as one without branches. This is accomplished as analytically described in [38] , where a detailed transformation of many cases using the majority logic gate is presented. Moreover, the proposed design method is based on a crossbar architecture where all the input signals enter the circuit/crossbar from the input side. However, given this structure and the crossbar architecture, the design of a circuit where more than two AND or OR gates have to be calculated at the same stage simultaneously is not feasible. In more detail, in the input side it is possible to have as many NOT gates as wanted but only two AND/OR gates, since these gates demand a constant input from the program line. This can be achieved without the limitations introduced earlier, namely if not only the inputs but also the constant values needed for the function of the AND/OR gates are inserted from the input side of the crossbar.
V. CIRCUIT SIMULATIONS
To evaluate the efficiency of the proposed method several design simulations were conducted. In particular, eight different Boolean functions of various complexity, which were successfully implemented in crossbar QCA circuits following the above method, are presented in this section. The first function was selected to be simple so that the proposed method can be applied and described analytically. The next circuit is the 4-to-1 multiplexer. Then, in order to reveal the ability of the proposed method to be applied to more complex circuits, two ISCAS bench mark circuits were selected (benchmark c17 from ISCAS'85 collection and benchmark s27 from ISCAS'89 collection) to be designed and simulated according to the proposed approach. Finally, the remaining two designs are the half adder and the full adder, that were selected to indicate the efficiency of the method introduced in this paper and to compare the circuits designed using this method with other QCA implementations found in literature. All circuits were designed and simulated using the QCADesigner, a design and simulation tool [31] .
The first Boolean function consists of three input and one output signals as shown in Fig. 13 . Thus, according to (7) N inputs ≤ N lines ≤ N inputs + N NOTgate and so 3 ≤ N lines ≤ 4. As far as the number of columns is concerned, these are determined from (8) as follows: N stages ≤ N columns ≤ N gates and so 3 ≤ N columns ≤ 4. Fig. 14 illustrates the corresponding QCA circuit. It can be easily observed that N lines = 3 and N columns = 3. The numbers of the lines and the columns are the minimum ones according to (7) and (8) , respectively, since the inverted signal of the NOT gate can be transferred via the second, nonanymore-used line of the crossbar architecture and the gate itself is able to be incorporated in the same stage/column with the OR gate.
Regarding, the timing of the circuit, the input cells of the input side, are triggered by the first clock, i.e., Clock0, that is distinguished by its green color. The first OR gate from the left, functions under Clock1, that is actually the next clock of the one its input cells are triggered. The output signal of the first OR gate is entered both to a NOT and an AND gate. This output signal is triggered by Clock1. Thus, as far as the AND gate is concerned, all its input signals should function under the same clock, namely Clock1, and therefore the gate itself is triggered by the next clock, i.e., Clock2. Clock1 and Clock2 are presented with purple and blue-turquoise color, respectively. Finally, the second OR gate functions under Clock3, that is represented by white color. This clock is also the next in line from the one its input signals function in. This automatically entails that the output signal of NOT gate changes its clock to Clock2. The clocking of the cells that are considered as indifferent follows the rules described earlier, meaning that the operation clock of these cells is set to be a following one or a clock that does not interfere with the clock of their adjacent useful cells. The input (In1, In2, and In3) and the output (Out) signals of the QCA circuit illustrated in Fig. 14(a) as well as the clock under which Out operates are demonstrated in Fig. 14(b) . It should be noted that although the three inputs and the output are triggered by the same clock, i.e., Clock0, the output delays four clock phases from the inputs. The obtained results are in accordance with the truth table of the logic circuit, proving the effectiveness of the proposed methodology.
The next digital circuit was designed and simulated according to the proposed methodology is the 4-to-1 multiplexer. As far as the design of the 4-to-1 multiplexer is concerned, a more generalized form of the proposed method is needed. As already mentioned in the previous section, in order for a circuit that contains more than two AND gates or more than two OR gates at the same stage to be designed in the crossbar architecture, it is demanding to slightly modify the design rule regarding the program cells: in this case the constant values needed for the operation of the AND or OR gate are able to be inserted to the circuit from the input side and not only from the program lines. Based on this relaxation of the design rules, any complex digital circuit (without any limitations) is able to be successfully designed in a crossbar by simple following the proposed design steps. Finally, the logical QCA implementation of the 4-to-1 multiplexer is demonstrated in Fig. 15 . The simulation results of the prementioned multiplexer are depicted in Fig. 22(a) The next application of the proposed methodology comprises the design and simulation of two ISCAS benchmark circuits [39] : 1) the combinational ISCAS'85 c17 and 2) the sequential ISCAS'89 s27 circuits [40] . As it is easily noticed, both circuits contain branches and therefore, their successful design presupposes, according to the proposed method, the transformation of each logic circuit to one without any branches; an efficient tactic to achieve the aforementioned is, simply, the multi-insertion of one or more input signals to the circuit. As in the case of the 4-to-1 multiplexer QCA design, both these circuits demand the implementation of more than one AND or OR gate in a single stage. Hence, following the same logic as before, in both QCA implementations of the ISCAS circuits (as shown in Figs. 16 and 17 ) an appropriate number of constant values is inserted to the circuit from the input side. The QCADesigner simulation results, demonstrated in Fig. 22(b) and (c), respectively, indicate the applicability of the proposed method to any large and complex circuits and showed stable operation in both cases.
The proposed methodology is applied to the design of a half adder. As it is known, the half-adder circuit consists of an AND and an XOR gate. Taking advantage of the proposed programmable Boolean set, namely AND, OR, and NOT, described earlier, all other gates are decomposed into the aforementioned gates without any further loss of generality. In the method introduced in this paper, each circuit under construction should contain only the basic gates, namely the AND, OR, and NOT gates. Therefore, as described in the previous section all other gates are decomposed into the basic gates in order for the method to be applicable. Thus, we use an implementation of the half adder based on the provided Boolean set, as shown in Fig. 18. Fig. 19 illustrates the QCA implementation of the half-adder circuit. Following the same procedure as before the QCA circuit consists of N lines = 3 and N columns = 2, both corresponding, according to the proposed method, to the minimum available numbers of lines and columns. As far as the timing of the useful as well as of the indifferent cells of the QCA circuit is concerned, the procedure for the construction of the desired QCA circuit does not present any differences from the one presented already in the two previous designs.
The logic circuit of a full adder using majority gates is shown in Fig. 20 . We choose to use this logic circuit in order to display the flexibility of the method. In more details, the circuit contains three majority gates and two inverters, whereas all three input signals of the 1-bit full adder are inserted into the circuit twice as presented.
The resulting QCA circuit using the crossbar architecture is depicted in Fig. 21 . The designed crossbar has six lines and only two columns. At this point, it should be mentioned that the input side of the circuit contains all three input signals twice, since there is no need using more lines for the operation of the NOT gates. As far as the columns of the QCA circuit are concerned, these are equal to the number of stages of the circuit, since in the first column/stage the operation of the two majority and one NOT gate takes place, whereas the second one consists of the third majority and the other NOT gate. The clocking strategy followed to trigger each cell is the one described in both this and the previous sections. The results shown in Fig. 22(d) evidence the proper and stable operation of the proposed 1-bit QCA full-adder design.
Using QCADesigner, complexity, delay and area consumption of QCA circuits can easily be obtained [31] . Table III demonstrates the corresponding implementation details (approximated area, number of cells and clock phases) of all the eight QCA circuits designed using the proposed methodology. The definition of the Approximated Area is made using the default QCA cells size, so the occupied area by a cell is 18 × 18 nm 2 . Also the space between two QCA cells is 2 nm. For example, in case of 1-bit full adder, the resulting QCA crossbar is implemented using 92 QCA cells, takes only three clock phases (0.75 clock cycles) to produce the desired output signals and occupies approximately 0.087 um 2 . Table III also shows that the circuits produced by the proposed methodology have comparable size with other implementations found in the literature. Therefore, the increase of the area cost of our methodology is not significant. Furthermore, the proposed QCA implementation of the 1-bit full adder is compared with other QCA 1-bit full adders found [9] , [12] , [13] , and [42] - [47] . The effectiveness and superiority of the proposed QCA 1-bit full-adder implementation in terms of power delay, area and number of cells can be easily acknowledged as shown in Table IV . Please notice that in the designs found in [12] , [13] , and [42] - [44] the provided circuits result from multilayer crossover designs. Unfortunately, until now there are serious questions about how these multilayer crossover designs can be realized in practice, since they require two overlapping active layers with via connections [45] .
VI. CONCLUSION
Although QCA is a nanoelectronic technology recognized as one of the top emerging technologies, its issues have not been adequately addressed and therefore, there is no consistent or standard framework in designing QCA circuits. Thus, in this paper, a novel design methodology aiming at implementing Boolean logic functions using programmable crossbar QCA circuits is presented. A similar QCA design methodology, to the best of our knowledge, does not exist in the literature. As an additional contribution, in the context of this paper, a novel implementation of the QCA NOT gate is introduced facilitating the combination of the QCA nanotechnology with the crossbar architecture. This means that, the programmable AND and OR QCA gates in conjunction with the proposed QCA NOT gate can be implemented at a cross point of the crossbar and hence, to form a universal Boolean set suitable for designing any Boolean logic QCA circuit and implementing general computation.
In particular, we provide the QCA circuits' designer with a unified methodology based on specific rules, organized in design steps associated with: 1) the proper dimension of the crossbar QCA circuit; 2) the appropriate timing of cells; and 3) the exact programming and activation of the gates. The timing together with the programming issues of QCA cells and gates are successfully handled in every case with a cascadable easy to follow way. Following the proposed strategy, the QCA design of any Boolean logic circuit in a crossbar can be successfully achieved.
The proposed approach was applied to several Boolean logic circuits in order to evaluate its effectiveness; in this paper, six of them were presented: one simple circuit, the 4-to-1 multiplexer, the ISCAS'85 c17 and ISCAS'89 s27 benchmark circuits, the QCA half adder and, finally, the 1-bit QCA full adder. The corresponding QCA crossbar circuits were constructed according to the proposed methodology and were designed and simulated using the standard QCA computeraided tool QCADesigner [31] . The obtained results reveal the effectiveness, the stability, and the reliability of our method. Even though the produced by proposed methodology QCA circuits are bigger, their size is comparable. That makes the methodology's area cost not formidable. The circuits are not so big because the proposed NOT gate is smaller and the crossbar makes an efficient usage of the area. In cases which the indifferent cells are just a few, as the full adder, the circuits might be even smaller than the other published in the literature.
In conclusion, the proposed methodology produces size and delay efficient circuits. These aspects of the proposed methodology along with its normalization, generality and programmability, are expected to make it very attractive to the designers.
