Abstract-In the post CMOS scenario NanoMagnets Logic (NML) has attracted a considerable attention due to its characteristic features. The ability to combine logic and memory in the same device, and a possible low power consumption, allows NML to overcome some of the CMOS intrinsic limitations. However, considering realistic circuit implementations where both theoretical and technological constraints are kept into account, performance could not be reduced with respect to the expectations. The reason lies in the fact that a huge area is wasted with interconnection wires.
I. INTRODUCTION
NanoMagnets Logic (NML) is one of the two main implementations of the more general Quantum-dot Cellular Automata (QCA) [1] principle. It uses single domain rectangular nanomagnets to represent digital values '0' and '1' [2] (Figure 1 .A). The other main implementation is instead the Molecular QCA [3] [4] , where the base cell is a complex molecule. Circuits are built placing and arranging magnets on a plane (Figure 1 .B) in specific orders. Information propagates through the circuit thanks to magnetostatic interaction among neighbor elements [2] .
To successfully switch magnets from one state to the other a RESET mechanism must be used. Magnets are forced in an unstable state applying an external magnetic field. When the magnetic field is removed magnets align themselves following the input element (Figure 1 .B). This mechanism is called clock [5] . To avoid errors during the magnets switching due to thermal noise, only a limited number of magnets can be cascaded [6] . To overcome this limitation a multiphase clock system is applied. For example, as shown in, [7] , circuits are divided in small areas called clock zones, made by a limited number of magnets (typically 5 or 6). Each clock zone is subjected to one of three clock signals (Figure 1 .C). Thanks to this mechanism, at every time step, when magnets within a clock zone are switching, magnets within the neighbor clock zones are either in a stable state and act as inputs, or are in the RESET state and have no influence on signals propagation. This assures the correct propagation of information at room temperature. However it also give to the circuit a characteristic pipelined behavior. For every group of three consecutive clock zones, signals acquire a delay of 1 clock cycle. As a consequence special solutions must be adopted to synchronize signals in the case of complex circuits [8] . NML circuits have very interesting features like the possibility to mix logic and memory in the same device and an expected very low power consumption [9] . Unfortunately, while NML logic is very efficient in case of very simple circuits [10] [11], this is not true in case of complex and realistic layouts. We have conducted extensive investigations of complex NML circuits, like microprocessors [12] [8] , decoders for wireless communication [13] and systolic arrays for Biosequences Analysis [14] [15] . In all these designs the circuit efficiency was severely reduced, due to a huge area wasted on to interconnection wires. More than 99% of the circuit area is due to magnetic interconnections. The reason is twofold: First, for now NML technology does not allow multilayer structures, second, technological and theoretical limitations severely constraint the placement of magnets.
It is important to mention that, in NML technology, more area means more circuit latency and power consumption. To solve this problem and enhance circuits performance, we propose in this work a new kind of magnetic logic, called Domain Wall Logic (DML). In this logic we use NML for logic computation and horizontal interconnections, while domain walls [16] are used as vertical interconnections. DWL logic is compatible with the technological constraints related to the fabrication of clock wires, it allows a great reduction in circuit area and power consumption and higher Proceedings of the 14th IEEE International Conference on Nanotechnology Toronto, Canada, August 18-21, 2014 clock frequencies are also expected.
II. BACKGROUND
Clock in NML technology has important consequences on circuits layout and performance. Magnets must be forced in an intermediate state by an external mean. This mean can be a magnetic field [2] , the STT-coupling of a current flowing through the magnets [17] or the mechanical deformation of magnets provided by a piezoelectric material [9] . The only one experimentally demonstrated since now is the magnetic field clock [18] . The magnetic field is normally generated by a on-chip current flowing through a metal wire. The wire is buried under the magnets plane and is made of copper. The wire is normally surrounded by a ferrite yoke to confine the magnetic flux lines. The magnetic field is generated through a pulse current flowing through this wire.
As stated in Section I, a correct information propagation requires a multiphase clock system. In the case of a three phases clock, three distinctive clock signals must be generated and applied to the circuit. As a consequence multiple clock wires are required. Figure 2 .A shows the circuit layout considering a three clock phases. Clock zones are made by parallel stripes, which correspond to the clock wires buried under the magnets. The maximum number of magnets cascaded in a clock zone can be 5, according to [6] . This is the maximum number of magnets that assures a correct signal propagation in presence of noise. This clock zones layout is chosen because it has several advantages. It is compatible with up to date fabrication processes [2] and it automatically solves all the issues related to signals synchronization [19] . However, technological constraints on the clock wires and theoretical constraints on the number of magnets for each clock zones, have a serious impact on circuit layout. Particularly, when signals need to propagate in vertical direction, wires assume a characteristic stair-like shape (Figure 2 .A). When complex circuits are designed, vertical interconnections drastically increase circuit area, as demonstrated in [13] . Improving fabrication processes, allowing therefore more freedom in the clock zones layout, will help to reduce the wasted area, but more radical solutions are required to obtain effective results.
Trying to solve this issue we started to explore other magnetic technologies. Considering the magnetism theory, structures that are particularly promising for being integrated in NML circuits are domain walls. A domain wall is a mobile interface which divides different magnetic domains, namely the regions with a uniform magnetization (Figure 2 .B). In a domain the magnetic moments of atoms have the same versus and direction. Therefore a domain wall is a region of transition between different zones where atoms have different magnetic moments, and where a gradual orientation of the magnetic moments occurs ( Figure 2 ). Domain walls were discovered by Russell P. Cowburn and co-workers [16] . They normally consists of a long stripe of magnetic material uniformly magnetized in one direction. When one of the stripe tips is forced in the opposite state, a domain wall is created. The domain wall then start to propagate until it reaches the far end of the line, propagating therefore the information. Using domain walls it is possible to design all kind of logic gates, like NOT, AND, fan-out and cross-over junctions. Using these logic gates it is then possible to design also complex circuits [20] .
III. DOMAIN MAGNET LOGIC
From our architectural analysis [19] , it is clear that logic gates in NML technology are very compact. Interconnections instead lead to a huge wasted area. Considering on the contrary domain walls, they can be used to build logic circuits, but logic gates are not as compact as NML gates. On the other hand, domain walls appears to be very efficient as interconnections, since they are essentially long magnetic wires. As part of a continuous effort to enhance and improve magnetic circuits, we propose to merge NML and domain walls together. The main idea is to exploit both technologies using them for the function they are most suited for: NML for logic computation and domain walls for interconnections and signals propagation. We have therefore created a new kind of magnetic technology that we have baptized Domain Magnet Logic. The DML basic structure is shown in Figure  3 . One nanomagnet is used as input for the line and one nanomagnet is used to read the line state. The domain wall represents the magnetic interconnection.
The proposed structure ( Figure 3 ) is composed by two nanomagnets of 60x90x20 nm 3 , and a line with the same thickness and width of the magnets but with a variable length. In case of Figure 3 the line is 1µm long, but it can be shorter or longer. Magnets are made with Cobalt-Iron, while the line is based on Permalloy. This choice relies on the fact that Cobalt-Iron magnets require a lower magnetic field to be forced in the reset state. However, the structure can work also using the same material for both line and magnets. The structure was simulated and validated throughout a finite
Simulation of a DML structure. A) Starting from a generic state, for example with magnets and line magnetized in the same direction, B) the input magnet is switched to the opposite state, while a magnetic field perpendicular to the longer magnets side is applied to the line and to the output magnet. When this magnetic field is removed a domain wall (two in this case) is created. C) G) Domain walls propagate through the line, until it is uniformly magnetized. H) The magnetic field is removed from the output magnet that switches correctly in the new state.
element simulator, NMAG [21] . Figure 3 highlights the main simulation phases. Starting from an initial state with all the elements magnetized in the same direction (Figure 3.A) , the line and the output magnet are forced in an unstable (RESET) state through an external magnetic field, as normally happens in NML circuits. This magnetic field is perpendicular to the longer line side. Successively, the input magnet is switched in the opposite direction and the magnetic field is no more applied to the line (Figure 3 .B). Two domain walls are created near the input and the output regions of the line and then they propagate initially toward the beginning of the line ( The layout of DML circuits is similar to classic NML, where every clock zone is based on aligned magnets made by parallel stripes, but long vertical interconnections are substituted with domain walls (Figure 4.A) . At every clock zone is then associated one of three clock signals ( Figure  4 .B) as it happens in normal NML technology. There is, however, a small difference on the clock signals shape, as it appears comparing Figure 1 .C and Figure 4 .B. DML circuits require that the magnetic line must be clocked independently from input and output magnets. This can be obtained placing both input and output magnets on different clock zones, and in this case the clock waveform is identical to the one of Figure 1 .C. We have however chosen a different solution, because it assures higher flexibility in circuits design. The input magnet is effectively placed on a different clock zone, but the output magnets are placed on the same clock zone of the magnetic line. This is possible because the magnetic field required to reset the line is double than the magnetic field required to reset magnets. Applying therefore a clock signal with two different amplitudes (Figure 4 .B), allows therefore to place both the line and the output magnets on the same clock zone. When the magnetic field is applied both line and output magnets are forced in the reset state. Following the clock waveform the magnetic field is again applied, but with half amplitude. The consequence is that magnets are still in the reset state, but the domain wall is generated inside the line, because the magnetic field is not strong enough to keep the line in the RESET state. When the magnetic field is reduced to zero, the domain wall has correctly propagated through the line and the output magnet switches therefore correctly. This solution allows to create two virtual clock zones inside one, emulating therefore the behavior shown in Figure 3 . The might necessity of using an higher magnetic value of magnetic field appear as a disadvantage, because it leads to an higher power consumption. But this is not true, as it will be clear from the performance analysis described in Section IV.
IV. PERFORMANCE
To validate this new kind of magnetic logic we have performed a full characterization, in terms of area, power requirements and signal propagation speed.
A. Area
To demonstrate the impact of DML on circuit area, we have designed and analyzed two different circuits, a simple full adder and a 32 bit adder similar to the one implemented in the Pentium 4. Figure 5 .A shows an example of full adder implemented in pure NML logic. This layout was created following both theoretical [6] and technological [2] constraints. It is composed by 11 gates, in particular 7 AND and 4 OR [22] , and by 7 cross-wires [2] . Moreover, 4 NOT functions are required, but they can be obtained adding 1 nanomagnet in the relative clock zones. Figure 5 .B shows instead a full adder implemented using DML logic, in which the logic functions are the same as before, but the area is reduced by 20%. The area gain is relatively small because the full adder is a simple and quite compact circuit, where the interconnections overhead is limited. Considering instead a much more complex circuit, the area gain is substantially increased. We have considered, as an example, the 32 bit Sparse Tree Adder presented in [23] . The pure NML implementation is presented in Figure 6 .A. This adder is similar in structure to the adder used in the Pentium 4 adder. It is one of the most complex NML circuits ever presented in literature. It is based on two substructures: a carry generator network and an adder block composed by 8 ripple carry adders with 4 bits each. This circuit was selected because in CMOS it is one of the most effective adders. Figure 6 .B shows instead the Pentium 4 version implemented using DML logic. It is interesting to note that the gain in terms of area increases with the circuit complexity, thanks to the interconnections overhead increment. Comparing the full 32 bits adder based on NML logic and the same adder based on DML logic, the gain in area is around 50%, which is an astonishing result. It is important to underline that lower area means lower latency. A 50% area reduction means therefore a 50% latency reduction.
B. Power
Power consumption in NML circuits is related both to the area and the magnetic field intensity. An increment in circuit area corresponds to an increment of the same entity on power consumption. An increment of the magnetic field value increases quadratically the power consumption, because clock losses depend on the square value of the current used to generate the magnetic field. We have analyzed how the magnetic field intensity changes with the line length and width. Results are reported in Figure 7 . Three working regions can be identified. If the applied magnetic field is lower than H min, the domain wall is not created and therefore the circuit does not work. If the magnetic field lies in the range between H min and H max, the domain wall is created and the final line magnetization will be equal to the magnetization of the input magnet. The magnetic field range included between H min and H max is quite small. However, if the applied magnetic field is bigger than H max, the domain wall is still created but the final line magnetization will be opposite to the magnetization of the input magnet. It is clear that in a complex circuit there will be many lines with different lengths. Figure 7 .A shows therefore how the magnetic field varies with the line length, keeping the line width constant and equal to 60nm. Both values of magnetic field are nearly constant over the entire range from 300nm to 1200nm, only the minimum magnetic field slightly decreases with the length. If the length is smaller than 300nm the domain walls is not created, so 300nm is the smallest line length that can be used. DML structures works also with lengths much bigger than 1200nm. However we are unable to provide a complete characterization of bigger structures due to the limitations of our simulation environment. The minimum value of magnetic field required is around 95-99kA/m, a value nearly double with respect to the magnetic field required to switch magnets alone (45kA/m). The consequence is that, if the circuit area is the same, a DML circuits dissipates 4 times more than a NML circuit. This is however true only if the line width is kept constant. We have evaluated how the minimum and maximum magnetic fields change keeping the line length constant to 530nm, but increasing the width from 60nm to 110nm. Results can be observed in Figure 7 .B. Increasing the line width greatly reduces the required magnetic field. With a width of 110nm the minimum value of magnetic field is around 57kA/m. A further increment of the line width, reduces the required magnetic field to a value smaller than 45kA/m. Considering the circuits structure shown in Figure 4 .A, the maximum line width must be chosen properly, so that the magnetic field required to reset the line is at least slightly bigger than the one that must be applied to the magnets. This solution leads to a slightly higher power consumption with respect to a pure NML circuit, if the circuit area is the same. However, compared to NML circuits, DML are much smaller, therefore the power consumption is greatly reduced in any case.
C. Speed
Domain walls are also know for the relative high propagation speed of signals. As a consequence they can potentially lead to an increment of clock frequency. We have analyzed the signal propagation speed inside a DML structure. The speed is evaluated considering the time difference between the generation of the domain wall, and when the line reach the final state. The speed is only a rough estimation, due to the limitations of our simulation environment. Table I shows the speed range obtained considering the minimum and maximum magnetic field, changing the line length. It is worth noticing that an increment of magnetic field causes an increment of propagation speed. The magnetic field can be further increased, going therefore in the third operation region mentioned in Section IV-B. This generally causes a further increment of speed at the cost of increased power consumption. Since magnetic technologies are studied also for their potential low power consumption, it is better to keep the magnetic field as low as possible. As it can be observed in Table I the propagation speed greatly decreases with the line length. Moreover, for each value of length, the speed varies differently with the magnetic field. This phenomenon is due to the fact that above a critical value of magnetic field, named "Walker field", the domain wall structure changes [24] . Different types of domain walls have different propagation speeds. Table II shows instead the propagation speed keeping the line length fixed at 530nm and varying the width. The speed greatly increases with the width, but decreases with higher values of magnetic field. As a consequence, keeping the line as wide as possible and the magnetic field as low as possible allows to maximize both power consumption and signals speed. Finally, a comparison to a pure NML circuit can be done. Considering magnets of 60x90x20 nm 3 , with a maximum of 5 chained magnets for each clock zone [6] , and a clock frequency of 100MHz, the propagation speed of an horizontal wire is 132m/s, while for a vertical wire it is 32m/s. Comparing these values with the results of Table I and Table  II , DML speed is 3 times higher in the worst case and 165 times higher in the best case scenario. A higher speed can be exploited in two ways, either increasing the clock frequency, or keeping the clock frequency constant and increasing the line length in each clock zones. In both cases DML logic greatly overcomes NML circuits in terms of performance.
V. CONCLUSION
We have proposed and study a new kind of magnetic technology, the Domain Magnet Logic (DML). This technology uses nanomagnets for logic computation and domain walls for interconnections, combining the advantages of both technologies. We have simulated and validated this solution through low level simulations. To demonstrate the superiority of DML to classic NML circuits, we have designed a complex 32bits adder, similar to the one employed in the Pentium 4 processor. Performance analysis shows that DML logic, greatly overcome pure NML circuits in all aspects, from reduced circuit area, power consumption and latency to greatly increased signals propagation speed.
This implementation represents an initial study and can be extended to further innovative solutions. We are now working on the analysis and characterization considering different structures and materials. We are also studying further structures, where domain walls are used for horizontal interconnections.
