Abstract-In recent years, magnetic-based technologies, like nanomagnet logic (NML), are gaining increasing interest as possible substitutes of CMOS transistors. The possibility to mix logic and memory in the same device, coupled with a potential low power consumption, opens up completely new ways of developing circuits. The major issue of this technology is the necessity to use an external magnetic field as clock signal to drive the information through the circuit. The power losses due to the magnetic field generation potentially wipe out any advantages of NML. To solve this problem, new clock mechanisms were developed, based on spin transfer torque current and on voltage-controlled multiferroic structures that use magnetoelastic properties of magnetic materials, i.e., exploiting the possibility of influencing magnetization dynamics by means of the elastic tensor. In particular, the latter shows an extremely low power consumption. In this paper, we propose an innovative voltage-controlled magnetoelastic clock system aware of the technological constraints risen by modern fabrication processes. We show how circuits can be fabricated taking into account technological limitations, and we evaluate the performance of the proposed system. Results show that the proposed solution promises remarkable improvements over other NML approaches, even though state-of-the-art ideal multiferroic logic has in theory better performance. Moreover, since the proposed approach is technology-friendly, it gives a substantial contribution toward the fabrication of a full magnetic circuit and represents an optimal tradeoff between performance and feasibility.
I. INTRODUCTION
T HE continuous scaling of transistors is the reason behind the incredible development that CMOS technology has undergone in the last decades. While this scaling process is reaching its physical limits, new technologies are studied as possible CMOS substitutes. Particularly, magnetic-based technologies are of increasing interest due to the very low power consumption expected and the possibility to combine memory and logic in the same device. Among these technologies, nanomagnet logic (NML) was one of the first studied and demonstrated at experimental level [1] - [3] . Single-domain nanomagnets, which have only two stable states thanks to magnetic anisotropy, are used to represent the logic values "0" and "1" [see Fig. 1(a) ]. Circuits are built placing magnets in close proximity to each other: To reach the minimum energy state horizontally coupled magnets align themselves antiferromagnetically, while vertically coupled magnets align themselves ferromagnetically [4] , as can be seen from Fig. 1(a) . We also demonstrated that even a multidomain element may behave in a similar way, in the so-called quasi-single domain logic, helping to reach interesting results with low-resolution cost-effective lithographic capabilities [5] . In this way, information propagates through the circuit, and logic gates can be built. In particular, by changing the shape of the magnets, it is possible to build AND/OR gates [6] . Moreover, exploiting the antiferromagnetic coupling of horizontally placed magnets, it is possible to implement an inverter by placing an odd number of magnets in a row [4] . NML circuits also show a good tolerance to process variations [7] due to errors in the fabrication processes [8] , [9] .
Unfortunately, an external magnetic field is necessary to help the magnets to switch from one stable state to the other [10] . This magnetic field in the most classical approach is generated by a current (I) flowing through a wire placed under the magnets plane [see Fig. 1(a) ]. The generated magnetic field is therefore parallel to the short side of the magnets, so when it is applied, magnets are forced in an intermediate unstable state with the magnetization vector rotated along the short side. When the magnetic field is removed, magnets realign themselves following the input magnet. This mechanism is called "clock" [4] . Several solutions have been proposed in the literature, as discussed in Section II, based on current induced magnetic field [4] , [11] , on STT-current induced clocking [12] , and on multiferroic structures [13] . In this paper, we propose an alternative solution [14] , where the basic element is a simple magnet and not a multiferroic structure. Magnets are deposited on a piezoelectric layer (PZT) driven by two parallel electrodes buried inside or deposited on top of the PZT itself. After a background description in Section II, the basic idea is described in Section III, while in Section IV the circuit layout is shown. In Section V, the performance of the proposed magnetoelastic clock is shown and compared to the other NML technologies. The work presented in this paper provides three important contributions to the NML circuits theory: 1) It demonstrates that the performance in terms of speed and power consumption are much better than other NML systems. 2) We show that, while from a pure theoretical point of view this solution has lower performance than a pure multiferroic structure, it is instead feasible with current technological processes and represents therefore a good trade off between performance and technological feasibility. Finally, 3) we reach the results through accurate simulations of realistic structures constrained by technological procedures currently available and ready to be experimentally demonstrated.
II. BACKGROUND ON NML CLOCKING
Even the magnetic field induced by a magnetic force microscope tip, used to investigate the static magnetization configuration of the system, may be used to switch a single element, as already shown in [15] . However, to propagate the information through the circuit, a further mechanism is required. During the removal of the magnetic field, when magnets are switching, there can be errors due to the influence of external factors like thermal noise [16] . This is true also applying the so-called adiabatic switching, which means a slow rise and fall time for the magnetic field. This problem originates when too many elements are cascaded, while when the number is limited, the information safely propagates with a small error probability.
To solve this problem, a multiphase clock system must be applied at the circuit [see Fig. 1(d) ] [11] . Three clock signals with a phase difference of 120
• are applied to different areas of the circuit, called clock zones. These areas are composed of a limited number of magnets. As shown in Fig. 1(d) , at every time instant when magnets of a clock zone are switching (SWITCH phase) magnets on their left are in the HOLD phase, they are in a stable state and act like an input for the switching magnets. Elements of the clock zone on the right are in the RESET state and have no influence on the switching magnets. At the next time step [see Fig. 1(d) ], the situation is repeated but the switching clock zone is the next in the sequence; therefore, information propagates through the circuit avoiding errors. Ideally, every clock zone must be wide exactly as one magnet. With this solution, the error probability would be reduced at the minimum possible value, and at the same time, it would be possible to reach the maximum clock frequency [17] . The downside of this solution is that a precise spatial control is required to influence only one element and not its neighbors. Moreover, the pipeline level of the circuit greatly increases and this can reduce the throughput in sequential circuits [18] . As a consequence, the number of elements for each clock zone must be carefully chosen considering speed and reliability constraints, but also technological and circuit architecture issues [19] .
The clock frequency obtainable are in the range of 50-500 MHz [20] , [21] , depending on the clocking technology chosen, so it is lower than the frequency obtainable with CMOS [22] or with emerging technologies based on molecular structures [23] , [24] . However, the main interest beyond NML is the expected very low power consumption, lower than the expected power consumption of ultimate scaled CMOS transistors [25] , [26] . The power consumption is lower than CMOS if only the energy required to switch the magnets is considered. Nonetheless, if the losses in the clock generation system are considered as well, this is no more true and most of the advantages of this technology are wiped out [19] . In [4] , a current of 545 mA in a copper wire of 1-μm width is considered necessary to switch all the magnets, leading to a very high power consumption due to Joule losses. Moreover, using this approach, the local control of a clock zone is difficult to reach, because the magnetic field of one clock zone influences also the neighbors clock zones [7] . To solve this problem, new clocking technologies were studied. An STT-current induced clock was proposed as a suitable way to reset the magnets [see Fig. 1(b) ] [12] , [27] . In this NML implementation, magnetic tunnel junctions (MTJs) are used as basic cells. MTJs are multilayer structure composed of an insulator layer sandwiched between two magnetic layers. This is the same structure used in magnetic RAM and allows us to reset every element with a current flowing "through" each element. The advantages of this approach are many: Much lower power consumption, built-in read/write system, perfect local control of each element, and the possibility to use the well-developed MRAM technology. Another solution recently proposed uses multiferroic [13] structures as base elements [see Fig. 1(c) ] [17] , [21] . The basic dots are composed of 40 nm of piezoelectric material (PZT-lead zirconate titanate [28] ) and a 10-nm magnetic layer. Every element is then controlled by applying a voltage of few millivolts (mV). When the voltage is applied, the strain of the magnetic layer, induced by the coupled piezoelectric material, makes the magnetization vector rotate toward the short side of the magnet, working as a reset mechanism. This system allows us to reach the highest possible frequency with the lowest possible power consumption, with, at the same time, the possibility to use a voltage instead of a current to control the circuit.
While this approach is a very good solution for NML, that might allow in the future to exploit the full potential of NML, it presents two major problems that make the fabrication of the circuit quite difficult. The aspect ratio of every element is very low, with a difference of two nanometers between the two sides. Since most of the properties of nanomagnets depend on their aspect ratio, changing it drastically implies a change on how the correspondent circuit works. With a so-small difference between the shorter and the longer side of magnets, the presence of unavoidable process variations can easily alter the magnets behavior, leading to improper magnets behavior. Moreover, a very precise local control on magnets is required, making the application of the electric field and the related electrodes fabrication quite complex, and currently almost unfeasible. The solution here discussed aims at a feasible structure, which unavoidably constrains the results, but that still has remarkable performance and potentials for improvements.
III. MAGNETOELASTIC CLOCK SYSTEM

A. Structure Description
The basic idea is shown in Fig. 2(a) ; magnetic thin film is deposited above a piezoelectric substrate and it is patterned through lithography [see Fig. 2(a) ]. When an electric field is applied to the substrate, the piezoelectric material increases its length. If the piezoelectric figure of merit is such that the resulting strain is large enough to induce a stress on the ferromagnetic layer which is above the film mechanical stiffness, a strain is produced in the nanomagnet too. Of course, this depends on both materials choice (properties) and on device geometry, as shown in detail further in this article. The induced stress anisotropy causes the magnetization vector to rotate along the direction of the applied strain [see Fig. 2(b) ]. This is the direct mapping of the clock principle that drives NML.
It is a rather simple idea that was already demonstrated in a simplified form in [29] . In [29] , an electric field was applied using two parallel electrodes placed on top and on the bottom of a piezoelectric (PZT-Lead Zirconate Titanate) substrate [see Polyvinylidenfluoride (PVDF). PZT is clearly the best choice for this kind of applications, as discussed in the following. When applying the same concept of [29] to NML, some issues arise. Electrodes placed on top of the PZT substrate are difficult to contact, because the surface of the PZT must be patterned with nanomagnets. Moreover, with this configuration, the electric field is perpendicularly applied, while the strain is parallel to the PZT surface. In this way, the strain and the electric field are coupled through the d 31 coefficient (d coefficients, normally expressed in pm/V, describe the coupling between strain and electric field in the stress-charge tensor). PZT d 31 is much lower than the d 33 coefficient, that applies when the voltage and the strain lie along the same direction. The solution that we propose comprises electrodes placed under the PZT (see Fig. 2 ). As a consequence, electric field and strain lie along the same direction and they are therefore coupled through the d 33 coefficient. The remarkable consequence is that a lower voltage is required to generate the same strain and the power consumption is reduced. Moreover, with this configuration, electrodes can be contacted from the bottom, without interfering with nanomagnets that are placed on top of the PZT layer and resulting in a great process simplification. Further details on the structure are given in Section IV.
B. Choice of Magnetic Material and Magnet Sizes
In order to choose the proper magnetic material and the nanomagnets geometry, the maximum and minimum stress that can be applied must be accurately evaluated. To evaluate the maximum stress, first of all the maximum strain due to dielectric rigidity must be considered as
where Ef MAX = 20 MV/m is the maximum electric field that the PZT layer can tolerate without electrical breakdown, and d = d 33 = 150 pm/V is longitudinal piezoelectric coefficient that relates the strain induced with the applied field. The previous value (ξ MAX RIG ) must be compared with the maximum strain achievable in the PZT due to structural limitations (ξ MAX STRUCT ), as
where ξ MAX STRUCT = 500 × 10 −6 [30] . Between these two components, the more constraining in the PZT is the maximum strain due to the dielectric rigidity. Once the maximum strain (ξ MAX ) is known, it is possible to evaluate the maximum stress applicable to the magnets (σ MAX PIEZO ), making the assumption that the former are thin enough to make the PZT strain totally transferred on them:
where Y M agnet is the Young modulus of the magnetic material chosen. But we also need to consider the fracture stress of the magnets, which depends on the selected material. Consequently, the maximum stress that can be transferred to the magnets is indicated:
where σ MAX STRUCT is the maximum mechanical stress that can be applied to the magnets. The minimum stress is related to the height of the energy barrier between the two stable states, which depends on magnetic shape anisotropy. Shape anisotropy is related to magnets shape: If magnets have an aspect ratio different from 1, at the equilibrium magnetization will lie along the longer side of the magnets. In this case, the height of the energy barrier between the two stable states depends on the aspect ratio of the magnets. The minimum applicable stress is, therefore, the stress that generates a stress anisotropy at least equal to the shape anisotropy [17] :
where N d is the demagnetization factor [31] , M s is the saturation magnetization, V is the volume, and λ s is the magnetostrictive coefficient. The minimum applicable stress is therefore
NML requires the use of single-domain nanomagnets, which means with sides shorter than 100 nm for typical soft ferromagnets. In the literature, magnets are normally 50×100 nm 2 [3] or 60 × 90 nm 2 [4] . We choose, therefore, a shorter side of the magnets of 50 nm with a thickness of 10 nm. The magnets aspect ratio determines the value of the shape anisotropy, i.e., the height of the energy barrier. To have a reasonably small value of error probability (p < e −30 ≈ 10 −13 ), the energy barrier at room temperature must be at least
This means that the value of the shape anisotropy must be at least equal to ΔE
From this equation, it is possible to evaluate the value of N d and, therefore, the minimum value of aspect ratio. The minimum aspect ratio is 1.06, which means minimum sizes for the magnets of 50 × 53 × 10 nm 3 . Smaller magnets will have an energy barrier lower than 30K b T , and therefore, the error probability will be too high. To choose a suitable magnetic material, we have evaluated the minimum stress necessary to reset the magnets starting from an aspect ratio of 1.06 to 2, comparing this value Ms is the saturation magnetization, λ 1 00 and λ 1 11 are the magnetostrictive coefficients, while Y is the Young modulus and σ (abbreviated form of σ M A X S T R U C T ) is the fracture stress.
to the maximum applicable stress. Table II shows the main characteristics of some magnetic materials. Results of the analysis are shown in Fig. 3 . For most classical magnetic materials, like Iron or Cobalt, there is no range in which the circuit can work properly. Fig. 3(a) shows the results obtained for Iron; the minimum required stress, evaluated from (6), is always bigger than the maximum applicable stress. This is caused because Iron is a material with negligible magnetostriction. The same thing happens for Cobalt as shown in Fig. 3(b) . Cobalt has higher magnetostriction than Iron, but its saturation magnetization is much higher, as shown in Table II . As a consequence, Cobalt cannot be used for this application as well as Iron. Fig. 3(c) Fig. 3(d) ]. In this case, the working range increases a lot, from 1.06 to 1.57 aspect ratio (53-78.5 nm). Moreover, the required stress is lower than the required stress for the Nickel (100 MPa for Nickel, 28 MPa for Terfenol).
Although both Nickel and Terfenol can be suitable targets for this technology, the limited operative range of Nickel can be a problem if process variations are considered. For example, considering a process variation of +/−10%, different results are obtained, as shown in Fig. 3(e) . The central curve represents the minimum stress in normal conditions, evaluated from (6). The lower and upper curves represent the minimum required stress, evaluated from (6), considering a variation of −10% (lower curve) and +10% (higher curve) of the shorter magnets side. The value of stress to be applied to the circuit must be chosen according to the central curve, which represents how the minimum stress varies with the aspect ratio, in normal conditions. If a random process variation will cause a random variation in one or more magnets aspect ratio, the operating points will shift up or down. The consequence is that another value of stress, different from the design parameters, will be required. If the working point shifts outside the limits, magnets will not be reset properly. As a consequence, the aspect ratio must be chosen in a way that, in case of random shifting due to process variations, it still falls in the acceptable range (between 0 and the maximum applicable stress). Fig. 3(e) shows the working range of Nickel considering process variations of +/−10%. There is only one point that lies in the operative range, correspondent to an aspect ratio of 1.16. A negative variation due to the process increases the minimum aspect ratio. This can be understood by (8) . A negative aspect ratio reduces the magnet volume. Since the value of ΔE is constant, the demagnetization factor N d increases, and so does the correspondent value of minimum aspect ratio [31] . This means that Nickel is very sensitive to process variations, it tolerates variations lower than 10%. Fig. 3(f) shows the working range for Terfenol instead. The minimum value for the aspect ratio becomes 1.16, while the maximum becomes 1.42. This means that Terfenol has a very good working range and can tolerate process variations even near +/−20%. We can conclude from these analyses that high magnetostriction materials, like Terfenol, are the best candidates for this application.
As a consequence, then, in this work, we choose to use nanomagnets made of Terfenol, with sizes of 50 × 65 × 10 nm 3 . Comparing this geometry with the one proposed in [17] , the difference between the smaller and bigger magnet is higher (15 nm instead of 2 nm), and magnets are simple single-layer structures. This means that they are easier to fabricate and also tolerant to process variations.
IV. CIRCUIT LAYOUT
The layout of the circuit must take into account two important problems: Signal propagation and fabrication processes.
A. Process
The solution that we propose is shown in Fig. 4(a) . Parallel electrodes are buried under a PZT layer, and nanomagnets are deposited directly on top of it. This solution is technologyfriendly, because it is compatible with CMOS planar technology and, supposing to have a high-end resolution lithographic system, can be fabricated. After the deposition of metal to create the electrodes, the PZT is deposited on top of them either by means of a sol-gel process or by means of sputtering. The processes used for PZT deposition create a layer with a very small roughness (less than 3 nm). Electrodes can be fabricated with platinum or copper, however in case of copper, a seedlayer of Titanium Oxide (TiO 2 ) must be used. Nanomagnets can be fabricated by depositing a thin film of magnetic material on top of it and then, patterning the film using lithography. The small roughness of the PZT substrate has no influence on the magnets, neither in the magnetic material deposition and in the following lithographic phase, neither on the magnetic properties of the deposited material. The fabrication process is relatively simple, but the problem arising is how the electric field will be distributed in the PZT. Fig. 4(e) shows a comsol multiphysics [32] , simulation of the structure, which enlights the distribution of the electric field. Electrodes are 50 nm width, while the distance among them is 250 nm. According to the ITRS roadmap, the Metal 1 pitch, the center-to-center distance between two neighbor metal lines in case of the lowest interconnection level, is 54 nm for the 2013 year. This is a value compatible with the requirement of this clock solution and it also leaves space for further scaling. The applied voltage is 1 V and an electric field of 3-4 MV/m is generated almost uniformly between the two electrodes. In correspondence of the electrodes, the electric field abruptly decreases and reaches a value of about 2 MV/m near the borders. The strain of the PZT is proportional to the electric field, so it is clear that the strain will be smaller near the areas corresponding to the electrodes. However, due to mechanical continuity, the higher strain of the central area will induce a strain also in the area exactly above the electrodes, where the electric field has a very low value. This issue could be improved reducing the distance between the electrodes and the PZT surface. However, from the technological point of view, it is more complex to fabricate. From the results of Fig. 4(c) , the strain can be approximated as uniformly applied in the area between the two electrodes. The consequence is that, to obtain working circuits, magnets must not be placed in the area correspondent to the electrodes. An alternative structure is shown in Fig. 4(b) , where electrodes are placed on top of PZT. The distribution of the electric field is similar to the previous case [see Fig. 4(a) ] is that PZT is fabricated on top of the electrodes (made of Copper). Nonetheless, PZT requires high temperature (600 • C ) processes, which can oxidize the Copper. Moreover, a seed-layer is required to attach the PZT on the electrodes. Platinum can be used instead of Copper but is expensive. This second solution has the advantage that the PZT is fabricated before the electrodes. Circuits structure remains the same, because even in this case magnets cannot be placed in the area of the electrodes. Another advantage of placing electrodes on top of the PZT layer is that they can be contacted from above, making the fabrication of wires for the clock distribution network easier. Additional layers can be used to route clock wires, similarly to what happens in CMOS chips.
B. Logic Gate Organization
We therefore base our design on two-input AND/OR gates [6] , as shown in Fig. 4(a) and (b) . AND/OR gates are made by three magnets, the shape of the central magnet is changed to obtain the desired logic function, and the corner is cut so that the magnet get a preferred direction for the magnetization. The advantage of this solution is that inputs come from vertical directions (up or down), where there are no electrodes. Another point is that in NML, the horizontal coupling is antiferromagnetic, i.e., every magnet has the inverted value of its predecessor. So, if the number of magnets in the clock zone (the zone between two electrodes) is odd, the signal is inverted. Placing, therefore, an AND/OR gate in a clock zone with a width equal to an odd number of elements generates a universal NAND/NOR gate that can be used as basic block to build any circuit. Ideally, the width of the clock zone should be equal to one magnet to obtain the maximum possible clock frequency, as shown in [17] . However, this approach has two disadvantages: It increases the latency of the circuit and it makes the fabrication of the structure and the signal propagation almost impossible. Increasing the latency of the circuit reduces the throughput in presence of sequential circuits [33] . Moreover, the distance between the electrodes will be smaller and the whole structure is more difficult to fabricate. Also, since magnets cannot be placed over the area of the electrodes, with a width of one magnets, there is not enough space to propagate the output signal of the logic gate. We, therefore, choose a width of the gate of three or five magnets, as shown in Fig. 4(c) .
C. Signal Propagation in Gates
Inputs come from up-left and bottom-left corners, output of the AND/OR gate is propagated to the up-right and down-right corners. In this way, signals can propagate to the others parts of the circuit avoiding the area of the electrodes. Helper/shielding blocks [34] are used to help the signal propagation and to reduce the error probability. With a width of five magnets, the critical path (the maximum number of magnets between input and output) is higher, seven magnets instead of five magnets in case of a width equal to three elements. Since the clock frequency depends on the critical path, with a width of five magnets, the clock frequency will be lower but the structure is bigger and easier to fabricate. Sizes bigger than these are not possible, because not only the clock frequency would be much lower, but the length of the critical path would be too big, increasing the error probability during magnets switching.
D. Complete Layout
A circuit example, a 2 to 1 multiplexer, is shown in Fig. 4(d) . Clock zones are made by mechanically isolated cells of 3 × 5 or 3 × 3 magnets. Every cell is an independently actuated clock zone, where logic gates or interconnection wires can be placed. To create this layout, it is possible to pattern the PZT substrate, removing the PZT (see Fig. 5 ) [35] , [36] . It is possible to dig through the PZT until the bottom, or to remove only a part of the PZT to mechanically isolate the areas. In both solutions, a perfect mechanical isolation is obtained, but probably the complete removal of the PZT will reduce parasitic parameters. Clearly, the resolution of the optical lithography must be quite high to remove only a small area of the PZT. Theoretically, it would be sufficient to remove few nanometers between the clock zones, but it is quite difficult to obtain this result using lithographic processes currently available.
E. Overall Signal Propagation
Signal propagation happens through the corner of each clock zone, to avoid the area of the electrodes. To allow this, there must be a shifting in each row of clock zones, as can be seen from Fig. 4(d) . With this layout, the width of the clock zone must therefore be chosen according to the size of the electrodes. With three magnet zones, electrodes must have an ideal width of 30-40 nm, while in five magnet clock zones, electrodes can be approximately 70-100 nm wide, a size that can be reached in scaled CMOS technology. Since this approach is based on universal NAND/NOR gates, in principle, every kind of circuit can be implemented; moreover, the circuit layout is quite regular, and this always helps the technological fabrication as well as the circuit physical design.
F. Technology Scaling
One of the advantages of this clock solution is that it is feasible with available technological processes. The structure sizes reflect this choice. However, like in CMOS, scaling can reduce the circuit area and improve power consumption as discussed in Section V. The minimum sizes of the NAND gate, however, cannot be changed. Every NAND gate must be at least 3 × 3 magnets, but it is possible to reduce the magnet sizes. The energy barrier must be higher than 30 K b T to have a reasonably small value of error probability. This pose a limit to the minimum values of magnet sizes. As shown in [7] , magnets can be as smaller as 15 × 30 × 5 nm 3 , and still have an energy barrier of 30K b T . Thus, provided to have lithographic processes with high enough resolution, it is possible to further scale magnets reducing their size to half the actual value, reducing consequently the circuit area by four times.
V. PERFORMANCE ANALYSIS
To verify the effectiveness of the solution proposed in this study, we have accurately estimated its performance in terms of both timing and power consumption. Fig. 6 shows the timing characteristics obtained through Magpar [37] simulations. Magpar is a finite element simulator that allows us the evaluation of the magnetoelastic effect applied to the dynamics of a magnetic circuit.
A. Timing
In Fig. 6(a) , the time required to reset the magnets is indicated. About 1 ns is necessary to completely reset the magnets. Fig. 6(b) shows that also the switching time (T SW ITCH ) of every magnet is near 1 ns. The clock frequency can, therefore, be estimated starting from these data. The clock period must lasts enough to allow the reset of the magnets and their successive realignment. So, as a first approximation, the minimum clock period can be calculated as
where N is the number of magnets in the critical path (5 considering a 3 × 3 NAND, 7 considering a 5 × 5 NAND). However, the situation is more complex, because in a chain of magnets one element starts to switch before its neighbor has reached a stable state. So, the clock period is not directly the sum of N switching times. As a consequence, the maximum clock frequency obtainable is around 200 MHz for 3 × 3 NAND/NOR gates and 150 MHz for 3 × 5 NAND/NOR gates. The frequency is lower than the one obtained in [17] , but this is due to the higher number of elements in the critical path. 
B. Power Consumption
It is worth remarking, however, that speed is not the major advantage of NML. This technology is particularly interesting for the expected low power consumption obtainable. There are two main sources of power consumption: The energy required to force the magnets in the RESET state, and the losses in the clock generation system. As explained in Section III, the energy required to RESET a magnet is about 180 K b T , which correspond to 0.85 aJ. The origin of this lies in the fact that an abrupt switching was applied to achieve the maximum circuit speed. Using an adiabatic switching (i.e., very slow rise and fall time for the clock signals, in the order of many nanoseconds), this energy can be reduced to 30 K b T , greatly reducing the obtainable circuit speed.
In NML circuits, the major source of power consumption are the losses in the clock generation system. In the magnetoelestic clock case, power consumption depends mainly on the energy required to charge the parasitic PZT capacitance. With this purpose, we show the equivalent circuit of a NAND/NOR gate, is shown in Fig. 7(a) .
The capacitor (C pzt ) represents the parasitic capacitance of the PZT substrate. Since the PZT is an insulator, it has an considerable parasitic resistance (R pzt ). This resistance is used to evaluate the leakage current between the two electrodes of the capacitor. The resistance value is around 10 18 Ω, so the leakage current can be assumed equal to 0. The capacitor is connected to the voltage source through resistances that represent the onchip interconnections. The exact evaluation of these resistances is not possible at this development stage, since we do not know the complete on-chip layout of interconnection wires. However, it can be assumed that the most important contribution to the resistance is due to VIAs used for the direct connection with the electrodes. This assumption is based on the fact that VIAs have the smallest section and the higher resistance, while global interconnections have normally a much wider section, and therefore, much smaller resistance. A possible example of interconnection made using an array of VIAs is shown in Fig. 7(b) . Assuming a VIA made of copper with a width (W) of 40 nm, a length (L) of 40 nm, and an height (H) of 1 μm, the obtained resistance is 10 Ω. In case 4, VIAs are connected in parallel; the whole resistance is divided by 4, leading to a value of 2.5 Ω.
When a capacitor is charged, part of the energy is dissipated on the parasitic resistance due to the joule effect. This energy can be evaluated as
where t1 and t2 are, respectively, the beginning and the end of the time period considered, V is the applied voltage, C pzt the capacitance, and R the interconnection wires resistance. The applied voltage can be evaluated as
where σ is the applied stress, Y is the Young modulus of the magnetic material, and d 33 is the piezoelectric coefficient of the PZT. The capacitance can be approximately evaluated as
where r is the relative dielectric constant of the PZT, t PZT is the thickness of the PZT, and h NAND and w NAND are the height and the width of the NAND gate. Considering a NAND/NOR gate with a size of 3 × 3 Terfenol magnets and a PZT thickness of 40 nm, the voltage is 0.368 V, while the capacitance is 0.678 fF. It is worth noting that the r of the PZT is quite high; thus, further advantages are expected by studying other materials with similar properties, but smaller dielectric constant. The circuit time constant τ , given by the product RC pzt , rules the circuit dynamic behavior. The resulting value of τ is around few fs (10 −15 s), so each NAND/NOR gate can theoretically work at THz. However, the magnets dynamic is much slower and is in the order of nanoseconds (see Fig. 6 ), so it limits the overall circuit speed.
When the time constant value is much smaller than the integration period (t2 − t1), (10) becomes the well known equation:
which tells us that in the charging process, half of the energy supplied is dissipated on the resistance. Similarly to the CMOS case, the other half energy is dissipated on the resistance in the discharging process, and the total energy dissipation is then given by
Two important facts must be observed: First, this is an energy, so its independent from the value of frequency used; second, the energy value is totally independent from the resistance value, so it is not necessary to evaluate the parasitic resistance of interconnections. To be fair, the choice of properly setting up a RLC resonant circuit could help reducing the energy consumption. However, here the aim is to analyze the worst case scenario, so we assume that all the energy is dissipated on the resistance. In this case, the energy dissipated in the parasitic resistance is therefore 91 aJ, while 6 aJ are required to RESET magnets, leading to a total power consumption of 97 aJ for a NAND/NOR gate. Clock solutions at a glance. Finally, a comparison between the different clock systems is mandatory. Table III shows the total energy consumption and the obtainable frequencies for a NAND gate based on an NML implementations. In case of magnetic field clocked NML, an adiabatic switching is considered, therefore, the energy required to reset magnets is taken equal to 30 K b T for each magnet. The energy losses due to Joule effect was also estimated. The wire has a section of about 400 × 400 nm and a length of about 200 nm, it is made of copper and the current value is 2 mA (extrapolated from [4] ). This leads to an energy consumption of 62 fJ for a NAND gate (see "a" in Table III ). The frequency achievable, due to the use of adiabatic switching, is in the range of 50-100 MHz [20] . For STT-current induced clock, data are obtained from Das and coworkers [12] . An energy of 1.6 fJ is necessary to reset the magnets that gives a total of 11 fJ for a NAND gate (see "b" in Table III ). This system is much better than the magnetic-field based clock. Frequencies obtainable are among the highest in the range of 100-200 MHz [27] . Considering instead multiferroic logic, data shown in [17] indicate a total energy required to operate a NAND gate of about 4 aJ (see "c" in Table III) , at least three orders better than the current-based approaches. The frequencies is also relatively high, at about 500 MHz. Finally, in our magnetoelastic case, we obtain an energy consumption of 97 aJ and a maximum frequency of 200 MHz (see "d" in Table III ). Although these values are far better than the current based approaches, a pure multiferroic logic still appears to show better performance, at least in ideal conditions. The reason is easy to understand as the NAND gate in this case is made by seven magnets, while a pure multiferroic approach requires just three magnets. However, one important fact can be underlined: The technique we propose represents an easier and more feasible technological approach and circuits could be already fabricated with high-resolution techniques. In this paper, we are considering a realistic and feasible layout and a realistic process, differently from the previous multiferroic study.
A final interesting outcome can be observed from Table III lines e and f , where the energy consumption of a 28-nm CMOS NAND gate is shown for the same frequency used for the magnetic case (200 MHz) and for a typical frequency in 28-nmbased circuits (1 GHz). Data in the CMOS case are obtained using an industrial technology. The CMOS NAND has an input capacitance of around 0.55 fF, which is slightly smaller than our capacitance, but the supply voltage is much bigger, around 0.9 V. As a consequence, the dynamic power consumption is 12 nW at 200 MHz, i.e., an energy of 117 aJ. In CMOS technology, the impact of leakage currents must be considered as well: In the NAND case, the energy due to leakage assumes a value around 740 aJ. If adiabatic techniques are used with CMOS transistors, the dynamic energy consumption can be reduced, while the leakage energy remains big. Further, transistors scaling does not help, because while the dynamic energy consumption is reduced, leakage energy consumption is predicted to increase. It is than clear that NML circuits, in particular considering the magnetoelastic clock, hold a considerable advantage in terms of power consumption, which is one order of magnitude lower.
VI. CONCLUSION
We have proposed a magnetoelastic clock system for NML, which uses a PZT to strain the magnets and change their magnetization. We demonstrated that power consumption is 100 times smaller than current-based clock systems and also the clock frequency obtainable is higher. This clock solutions have also a lower power consumption compared to scaled CMOS transistors. Most importantly, this solution was designed keeping in mind the technological processes and their current limitations. Actually, our solution does not reach the performance of a full multiferroic system in theoretical conditions, but shows remarkable improvements being feasible with current technological limitations, marking then a difference with respect to previous proposed solutions. Moreover, the structure shows a good tolerance to process variations, which is always a remarkably important point for the fabrication of a real working circuit.
This solution soundly enhances the knowledge on NML, addressing its main issues, the high power consumption of the clock generation system, and the necessity to have a local control on a limited circuit area. At the same time, it allows us to make a huge step toward the fabrication of a complex magnetic circuits. Our efforts are now directed to the experimental demonstration of the results shown here.
