ABSTRACT All-spin logic (ASL) is a spin-based candidate for implementing logic in the next generation designs. The energy and the delay of ASL circuits are both inherently related to the geometric parameters of ASL gates, and the careful selection of the dimensions for ASL gates is required to achieve optimal performance. In this paper, a tradeoff relation between the energy and the delay is explored to optimally size the magnets and channels in an ASL gate to provide an optimal balance under various delay and energy demands. Results on optimizing interconnects and benchmark circuits are presented.
I. INTRODUCTION

S
PIN-BASED computing is a post-CMOS technology candidate that has recently seen an increased research focus. For spin-based logic, nonlocal spin transfer devices are very promising, particularly all-spin logic (ASL) [1] - [7] .
In this paper, we study the methods for improving the performance of ASL circuits through the careful selection of the dimensions of circuit elements, resulting in energy-delay tradeoffs. To motivate the problem, consider an ASL structure with three magnets connected by two separated channels in Fig. 1(a) . We fix the dimensions of the input/output magnets and channels and examine the energy and delay impacts of changing the length, l m,2 , of the middle magnet, temporarily assuming that this value can be varied continuously. Increasing l m,2 increases energy at both the input and output sides. However, the delay impact is nomonotonic: the time required to switch the middle magnet increases, because a larger magnet requires more spin torque, but the switching time of the output magnet reduces since a larger middle magnet can deliver more spin torque. Thus, there is an overall energy/delay tradeoff relation, as shown in Fig 1(b) . Furthermore, the choice of channel length also affects the switching speed in such nonlocal spin valve structures [8] - [10] , implying that buffer insertion in a long interconnect can help in reducing wire delays.
The major contribution of this paper is in developing and assembling a modeling and optimization framework for performance optimization of general ASL circuits through magnet sizing and buffer insertion, and the demonstration of the energy/delay tradeoff relation during the optimization. We introduce the energy and delay models for ASL circuits (Section II) and show the impact of geometric parameters on performance (Section III). An optimization problem formulation is proposed (Section IV) to obtain energy-delay tradeoffs. We show the results on a long interconnect line and large benchmarks under multiple technologies (Section V) and conclude in Section VI.
II. ASL PERFORMANCE MODELING A. STRUCTURE OF A BASIC ASL GATE
A basic ASL gate [1] consists of three major components, as shown in Fig. 2 : an input magnet at the left that polarizes the charge current and injects spin current into the channel, a channel that transfers the spin current from an input magnet to an output magnet, and an output magnet that sets its state based on the incoming spin torque. A metal contact, connected to the supply voltage, lies above each magnet, and a ground connection is placed beneath the input end of the channel. To allow a magnet to serve both as an output to its previous magnet and an input to its following magnet, an isolation feature is placed under it, separating the part of the channel beneath the magnet into two segments-an input side and an output side-thus ensuring that the input and output spin currents interact minimally. Since this is a drawn feature, its size is constrained by lithography and corresponds to the minimum feature size. For the ASL inverter in Fig. 2 , at the input side, a charge current (solid arrow) flows from V dd to ground. The polarizing action of the input magnet results in a spin accumulation, opposite to the magnet spin, at the input end and this diffuses toward the output (dotted arrow), creating a spin torque at the output end that sets the output magnet state. A buffer is similar in structure, except that the role of V dd and ground is interchanged: this ensures that the input magnet introduces a spin current of the same polarity into the channel.
B. ANALYTICAL MODEL FOR SWITCHING DELAY IN ASL CIRCUITS
For the gate in Fig. 2 , annotated with its geometrical parameters, we consider each contributor to switching: spin current generation at the input, nonlocal spin transport through the channel, and spin-torque-based switching at the output.
1) CHARGE CURRENT AT THE INPUT MAGNET
The injected charge current is converted into spin current at the input end of the channel. For the structure in Fig. 2 , the positioning of the ground terminal on the input side, along with the presence of the isolation feature, introduces an asymmetry that causes charge current, I c , to be injected to the input side, given by
where R s , R m , R n , and R g indicate the resistance of the contact to supply voltage, magnet, channel, and ground connection, respectively, on the input side. The parasitics of both the supply and ground connections are included in R s and R g , ensuring that the ohmic losses associated with power and ground distributions are incorporated in our models. The other two quantities, R m and R n , can be calculated as
where A F,1 = w m,1 l m,1 is the interface area between the magnet and the contact, with width w m,1 and length l m,1 . The factor of 2 indicates that only half of the magnet is effectively available for injecting charge/spin current; the other half receives spin current from the gate that drives this magnet. The area A N = w n · l n between the magnet and the channel is used for calculating the channel resistance. The parameters ρ F and ρ N are the resistivity of the magnet and the channel, and t m and t n are the thicknesses of the magnet and the channel, respectively.
2) SPIN TRANSFER THROUGH THE CHANNEL
The charge current at the input magnet is transformed into a spin current at the source end, which drifts down toward an output magnet through a lossy interconnect medium. We capture these factors and arrive at an expression for the input-output delay of an ASL gate. For a single-fan-out structure, i.e., a channel without branches, such as an ASL inverter or buffer, the spin current can be calculated by an analytical expression for the spin injection efficiency, while in more complicated structures with multiple fan-outs, the spin current at each output can be evaluated using numerical computations [6] . The spin injection efficiency, η, is the ratio of the spin current, I s , at the end of the channel to the injected charge current, I c . In a single-fan-out structure, η is given by [2] , [11] 
where L is the length along the channel from the point of injection of spin current at the input magnet to the channel region below the output magnet, and λ N is the spin diffusion length of the channel. The terms x 1 and x 2 are defined as
where P 1 and P 2 are the polarization factors for the input and output magnets, respectively, R 1 and R 2 are the spin accumulation resistances for the input and output magnets, respectively, and R N is the spin accumulation resistance of VOLUME 2, 2016 the channel. These terms are given by
with λ F and λ N standing for the spin diffusion lengths, and ρ F and ρ N being the resistivities, with subscripts F and N for the ferromagnet and the channel, respectively.
C. SWITCHING THE OUTPUT MAGNET
The Landau-Lifschitz-Gilbert (LLG) [12] equation describes the magnet switching dynamics due to a spin current
Vector m indicates the normalized magnetization and changes from 1 to −1 or the opposite during switching over a time variable t, γ is the gyromagnetic ratio, α is the Gilbert damping coefficient, q is the electron charge, and N s is the net number of Bohr magnetons of the magnet to be switched. A complete analysis of the LLG equation is computationally intensive, especially within the inner loop of an optimizer. However, the equation can be used to infer information about the switching time t sw under a spin torque switching current in a computationally inexpensive way. From (3), writing the spin current at the end of the channel as I s = ηI c , the switching time of the gate is given by [2] t sw = 2f sw qN s /(ηI c ) (8) where I c is given by (1) . The factor f sw captures the fact that the spin current is partly responsible for switching, and the switching event also includes the contributions from other related fields. In [2] , f sw was considered over a single magnet size, but our optimizer requires f sw over a range of magnet sizes. In Section III-C, we will show that f sw is well approximated as a constant over a wide range of magnet sizes.
1) DELAY IN MULTIFAN-IN/MULTIFAN-OUT STRUCTURES
General ASL gates are based on majority logic and involve more complex structures than that in Fig. 2 . For example, Fig. 3 (a) shows an ASL NAND gate with two fan-outs, and the channel has multifan-in and multifan-out substructures. For such structures, there is no known simple analytical form for the spin current, analogous to (3), at the output magnet(s). However, the spin current at each output magnet can be calculated numerically by dividing the channel into wire segments [6] . Specifically, each component in the circuit-the input and output magnets as well as the channel segments-can be described as a π -network of conductance matrices. By considering each logic stage separately, we divide this into two substructures, and based on the π structures for each stage, shown in Fig. 3 (b), we form a modified nodal analysis (MNA) matrix for the system and solve the resulting set of equations to obtain the charge and spin currents at any nodes. The currents injected into the output magnets are then used to compute the spin injection efficiency, replacing the closed form in (3), and the remainder of the process of computing t sw is identical to the single-fan-out case. A complete description of this interconnect model, along with a comparative evaluation against the analytical model, is provided in the Supplementary Material.
2) FROM GATE DELAYS TO CIRCUIT DELAYS
Computing circuit delays from gate delays is a relatively straightforward process. As in the static timing analysis for CMOS circuits, once the delays of each logic stage (i.e., a gate and its fan-out interconnect) are computed using the techniques described earlier in this section, a topological traversal from the primary inputs to the primary outputs can be used to find the delay of the circuit.
D. MODELING ASL SWITCHING ENERGY
For any single-fan-out or multifan-out structure, the energy that is supplied comes from the V dd source. Over a switching period, T , the total energy E for the gate is given by [2] , as E = V dd I c T . Note that the energy dissipation can be attributed to the charge current, and the spin diffusion current and the spin torque at the output are a consequence of the charge current. Therefore, for a logic circuit consisting of an interconnection of gates, the energy can be computed as
where I c,i is the charge current injected into the magnet i.
III. IMPACT OF ASL GEOMETRIES
From the analysis of the energy and delay models in Section II, it can be seen that the dimensions of the magnets enter into several expressions. We now analyze the impact of geometry choices on circuit performance, specifically focusing on optimizable layout parameters: the magnet length and the channel length. We assume that the technology-specific parameters, such as the magnet thickness or channel thickness, are fixed. We consider each component of switching one by one. For illustration, we will primarily consider the ASL inverter in Fig. 2 : the quantities associated with the input and output magnets are represented with subscripts 1 and 2, respectively.
A. INFLUENCE ON CHARGE CURRENT INJECTION
The dependence of the injected charge current, I c,1 , and the geometry can be shown by combining (1) and (2) 
where r 1 and r 2 are constants that absorb terms other than the optimizable layout parameters listed above. The value of I c,1 is directly related to the system energy, as indicated by (9), and as we will see soon, also the delay.
B. INFLUENCE ON NONLOCAL SPIN TRANSFER
The charge current creates spin current that is transported across the channel to the output magnet. For the single-fanout ASL inverter, an analytical expression for the spin current at the output magnet can be derived based on spin injection efficiency η and charge current at the input magnet I c , as I s = ηI c . From (3), (4), and (10), the dependence of I s on the magnet lengths and the channel lengths is given by
where k 1 , k 2 , and k 2 are constants that absorb all fixed geometry parameters, which depend on the technologyspecific parameters, as well as material and physical constants. This expression can be analyzed to understand how the spin current changes with the magnet and channel geometries in the ASL inverter. We focus on the optimizable layout parameters: the lengths of the input and output magnets, l m,1 and l m,2 , and the length of the channel, L. It can be seen that the following holds.
1) Increases in l m,1 and l m,2 will result in a larger spin current at the output magnets. The increase with l m,1 occurs, because a larger input magnet has a smaller resistance and injects more charge current, resulting in a larger spin current at the output magnet. A larger output magnet as the result of longer l m,2 absorbs more spin current from the channel, improving η.
2) A longer channel length, L, results in weakened spin current at the output magnet, i.e., spin diffusion becomes more inefficient with the increasing channel length. For the multifan-in/multifan-out case, these closed-form expressions cannot be used, but the impact of changing these parameters broadly follows the same trend as described above.
C. INFLUENCE ON THE SWITCHING OF THE OUTPUT MAGNET
The spin current at the end of the channel switches the output magnet, as governed by the LLG equation, with an inputto-output switching delay as expressed in (8), based on an integration of the LLG equation over time. We assume the magnet to be a single domain, since macrospin simulation is a good approximation to reflect the switching time trends, as influenced by various factors [13] .
This integration involves two geometry-dependent terms. The first is that the net number of Bohr magnetons, N s , of the output magnet is proportional to its volume through N s = M s V /µ B , with µ B as the unit Bohr magneton. This factor appears and affects t sw through (8) . The second is the demagnetizing field H d , an internal field related to the saturated magnetization, M s , and demagnetizing factor, N d , through the relation
The demagnetizing factor N d of a magnet is a function of its dimensions and shape. We follow the equation in [14] to calculate the demagnetizing field along all three axes for a rectangular prism in our LLG simulation. The effective anisotropy constant is calculated as K = (N xx − N yy )M 2 s /2, with N xx and N yy being the demagnetizing factor along the minor and major axes. Based on our geometric and physical parameter settings, we find that the minimum thermal stability for the magnet sizes we consider is 29.5k B T , corresponding to a retention time of 6.7 × 10 3 s, which is adequate for the circuit switching frequencies considered in this paper. The impact of H d is incorporated in factor f sw in (8) .
In order to precharacterize the factor f sw and determine how it varies with ASL geometries, we design a series of simulations to examine the influence of magnet geometries to the relation between switching time t sw and spin current I s . We choose a discrete set of magnet lengths in the range from 30 to 100 nm. The parameters we used in the simulations are the same with those given later in Table 1 in Section V-A with the damping factor α = 0.007 [2] .
As shown in Fig. 4 , the switching time t sw under a series spin currents I s for various magnet lengths is obtained through the LLG simulations and denoted by square markers. A data fitting procedure was then performed based on (8) , and the best fit, shown by the continuous curves in Fig. 4 , is seen to match the data points well at each magnet size. For the specific parameters used in this experiment, we obtained f sw = 4.7, and Fig. 4 shows that f sw does not change significantly with geometry, i.e., the geometric impact through H d is minimal. Therefore, from (8) and (11) 
where k 1 modifies k 1 to capture the constants in 2f sw qN s .
IV. OPTIMIZATION
The net conclusion of our analysis in (12) is that the switching time t sw of an ASL inverter stage reduces sublinearly with l m,1 , increases linearly with l m,2 , and reduces by an exponential dependence with L.
Therefore, the switching delay can be improved by adjusting the sizes of the magnets and reducing the length of the channel. For a global interconnect of fixed length, the insertion of buffers/inverters can reduce the switching time by reducing the channel lengths between buffers, with overheads due to the intrinsic delays of individual buffers. We now develop optimization formulations for an ASL buffer chain and a general circuit.
A. OPTIMIZATION OF AN ASL BUFFER CHAIN 1) PROBLEM FORMULATION
We now present an optimization formulation that optimizes the energy and the delay of a long wire, driven by an ASL buffer and feeding an ASL load, through buffer insertion and sizing. We keep the width of each magnet constant, setting it to the width of the channel for better spin injection into the channel, and optimize the lengths of the magnets. The insertion of n buffers divides the wire of length L into n + 1 stages of length L i , 1 ≤ i ≤ n + 1. In the ith stage, we denote the input magnet length by l m,i and the output magnet length by l m,i+1 ; note that the output magnet for the ith channel also serves as the (i + 1)th input magnet.
Denoting the delay from the ith to the (i + 1)th buffer in the buffer chain as T i (l m,i , l m,i+1 , L i ), the total delay is
and the total energy over a clock period of P clc is
The optimization problem can be formulated as minimizing the energy over a delay constraint related to P clc , as
2) BUFFER OPTIMIZATION AS A POSYNOMIAL PROGRAMMING PROBLEM
In this section, we consider a simpler and more practical version of the optimization problem in (15), using equal channel lengths, and then optimizing the magnet lengths. We show that for the buffer chain, the total delay and the energy consumption of the ASL circuit are both posynomial functions, which implies that the optimization problem is a posynomial program [15] that can be solved to find the length of each magnet as well as the interconnect length in each stage. These problems can be efficiently solved with concrete guarantees of optimality since, unlike general nonlinear optimization problems, posynomial programs possess the property that any local minimum is a global minimum.
In Section V, we will use a posynomial program solver, gpposy from the geometrical programming optimizer GGPLAB [16] to optimize these ASL circuits. To the best of our knowledge, this is the first time that this problem has been formulated as a posynomial program. For a buffer chain with n magnets inserted between the input and output magnets, we denote the length of the ith magnet by l m,i and assume that the channel length between any two neighboring magnets is equal, i.e., L i = L/(n + 1), and the magnet width is constant and set to the minimum value. The total delay for the buffer chain can be obtained from (12) and (13) as
Assuming the buffer chain is run at its fastest speed, with P clc = T tot , then the total energy E tot for the buffer chain is derived from (10), (12) , and (14) as
In (16) and (17), if we take T tot and E tot as the functions of l m,i , the coefficients for all terms that include l m,i are always positive. Therefore, both functions are posynomial. It can be shown that even when the L i values are not uniform, these are posynomial functions in l m,i and L i .
For a more specific case where all magnets are assumed to have the same length, i.e., l m,i is the same for all i, it is possible to find a closed form minimum for the delay of the buffer chain. Using (16) , the delay for the optimal l m can be shown as
Note that in the above formulation, all the coefficients of l m are positive, and therefore, it is a polynomial of l m , leading to a closed-form solution of l m for a minimum delay.
B. FORMULATION FOR A GENERAL CIRCUIT
We now consider the sizing problem without buffer insertion for a user-specified clock period, P clc . The energy consumed by an ASL circuit over the clock period is the summation of contributions over all gates in the circuit
The optimization problem of geometries for an ASL circuit to give minimum delay under certain delay requirement is (20) where T tot is the delay of the critical path. In order to explore the maximum amount of delay reduction that can be achieved through the optimization, we propose an optimization algorithm for general circuits and its pseudocode is shown in Algorithm 1. It solves the above formulation using a variant of the TILOS algorithm [17] . for each magnet j on critical path do
Algorithm 1 Geometric Optimization for ASL Circuit
6:
if l j × α < l upper-bound then
7:
Calculate the sensitivity ∂Delay j /∂Power j from sizing magnet j.
8: end if
9:
end for
10:
Identify the magnet k with the most negative sensitivity.
11:
l k ← l k × α.
12:
Compute corresponding circuit delay T i and new critical path.
13:
T min ← T i .
14:
Line 1 calculates the initial delay of the circuit based on the netlist and ASL gate and interconnect delays (Section II-B) and finds out the critical path. Initial assignment for the minimum circuit delay is performed in lines 2 and 3. Next, lines 5-9 compute the sensitivity, ∂Delay j /∂Power j , for each magnet in the gates on the critical path if its size will not exceed the upper bound of magnet size l upper bound after being sized up. This sensitivity is numerically achieved by upsizing one magnet by a geometric factor α at a time and calculating the delay reduction and the power increase caused by changing this single magnet. By our algorithm, the delay of the circuit is reduced with the minimal amount of power penalty. Then, line 10 finds out the magnet with the largest impact on circuit delay and sizes it up by a factor α to get the largest delay improvement for the smallest overhead (line 11). The circuit delay in iteration i is updated as T i (lines 12 and 13), and the process continues until the stopping criterion is met when no more delay improvement can be made (lines 4, 14, and 15). This provides the tradeoff curve of interest.
V. RESULTS
A. SIMULATION PARAMETERS
We present some material and geometric parameters used in our simulations in Table 1 . These parameters, chosen in consultation with technologists, are intended to be representative and indicative of current and future technologies.
To realistically estimate the ohmic loss of the power delivery network in (1), we evaluated a standard set of power grid benchmarks [18] , and determined that the effective resistance from each pin to the supply node is on the order of 0.25 . Since these benchmarks evaluate the top few layers of a power grid (a typical number is five layers), we multiply this number by 2× to model the impact of lower metal layers. Therefore, we use an effective resistance of 0.5 each for the supply and the ground line. This effective resistance is effectively translated into a dimension of 140 nm × 140 nm × 1400 nm in width, thickness, and length, respectively, where the crosssectional dimensions are based on [19] . We note that for an efficient ASL implementation, it is essential for the power grid resistance to be around this value, which is lower than the corresponding value for CMOS technologies. This is because R m + R n ≈ 7 , and if R s + R g is much larger, then a large fraction of power will be wastefully dissipated in the power grid resistors.
For the parameters that most closely affect performance metrics, recognizing that the technology is rapidly evolving today, we explore a range of values in our experiments that reflect various technology scenarios to reflect current-day and project future technologies. In our experiments, the value of λ F is chosen in the range of 5-50 nm [4] , [20] , and the polarization factor P from 0.5 to 0.7 [11] . The channel spin diffusion length, λ N , can take values in a large range, since various materials could be considered [21] . Given this background and the strong material research in this area, we choose two possible values of λ N of 450 and 1000 nm, which could represent the spin diffusion lengths of bulk copper under room temperature and low temperature from various experimental measurements [21] , [22] . However, as pointed out in [22] and [23] , the spin diffusion length will degrade significantly in small geometries. Therefore, two more sets of simulations with λ N of 180 and 400 nm are added, corresponding to a degradation to 40% of the bulk values, estimated under the channel dimensions in this paper through the results shown in [22] . The supply voltage is chosen in the range of 10-30 mV [2] . It is unrealistic to show the results for all crossproducts of these choices, and we focus on two parameter sets with bulk and degraded spin diffusion lengths in Table 2 : from parameter set 1, a nearer-term technology, to set 2 for projected technologies and with higher V dd . We calculate the switching time and energy, and perform static timing analysis and optimizations using MATLAB and C++ on a 2.53-GHz Intel Core i3 with 4-GB RAM. 
B. OPTIMIZING A BUFFERED WIRE
We provide a simple example of an ASL buffer chain to illustrate the use of the posynomial formulation to individually optimize the size of each magnet. A total interconnect length of 1800 nm is considered with nine equally spaced buffers inserted between the input and output magnets. We consider the spin diffusion length of magnet λ F = 14.5 nm [21] , channel width w n = 20 nm, and thickness t n = 30 nm. The lengths of both the input magnet and the output magnet are set to 30 nm. The posynomial formulation is fed to the GGPLAB solver [16] , which optimizes the length of each magnets to minimize the delay of the entire buffer chain. These optimized lengths (chosen to be multiples of the feature size, 10 nm) are shown in Table 3 for the case when nine buffers are inserted. Next, we repeat these posynomial programming optimizations for a set of buffer chains with a varying number of buffers under the above technology parameters based on optimization (15) . For a specified number of equally spaced buffers (n), we provide the delay and corresponding energy under three cases in Fig. 5 .
1) Optimized Delay: The length of each magnet is sized individually for the optimal delay. 2) Closed-Form Delay: All the inserted magnets are assumed to have the same length, i.e., l m,i = l m,i+1 = l m , except for the first and last magnets in the buffer chain, whose lengths are fixed. In this case, the delay is very similar to the situation described in (18) and a closedform solution of l m can still be found for the minimum delay. 3) Unoptimized Delay: The lengths of all inserted magnets are of the minimum length of 30 nm, i.e., no optimization is performed. The nature of these curves is similar for each value of the spin diffusion length in Table 2 , and we choose a representative value of 400 nm to illustrate the results. The zero buffer case is not considered as it cannot supply the critical spin current, I cr s , required for switching the output [2] . As shown in Fig. 5(a) , the minimum delay occurs when four magnets are inserted, corresponding to a delay of 37.6 ns for the case where each inserted magnet is sized individually. As a comparison, the delay with the same number of unsized magnet insertion is 44.9 ns, implying that the optimization provides a 16.3% improvement with only a small energy overhead, as shown in Fig. 5(b) . It is also observed that when all the magnets are identically sized, the delay curve virtually coincides with that for the individually sized case. Therefore, the closed form is a fast predictor for the optimal delay.
It is noteworthy that these optimizations employ the analytical method described in Section II-B. An alternative to analytical modeling is the MNA-based modeling method described in Section II-C1. Although the results obtained by these two modeling methods are close to each other only under certain specific conditions, the analytical modeling method shows a good fidelity in finding a minimum delay and is, therefore, very useful for delay optimization. Further details about the comparison of the two modeling methods and the notion of fidelity are provided in the Supplementary Material.
C. OPTIMIZATION OF BENCHMARK CIRCUITS
In order to demonstrate the feasibility and the benefits of our optimization methods on general ASL circuits, we tested Algorithm 1 on ISCAS85 benchmark circuits. The benchmarks are placed using CAPO placement tool [24] , using an estimation of the ASL cell area, with magnet lengths changed at a granularity of 10nm steps.
A buffer insertion step is performed for any interconnect longer than L 0 to strengthen the signal before the circuit is optimized through Algorithm 1. In the optimization for ISCAS benchmarks, we choose L 0 to be equal to λ N , because in the π model for the channel mentioned in Section II-C1, the spin signal loss will be close to saturation when the ratio L/λ N exceeds 1. The row utilization leaves sufficient space for inserting buffers and sizing these cells.
Two sets of parameters, representing two technologies, as shown in Table 2 , are used. The delay before optimization, after optimization, improvement in percentage, and the runtime for each benchmark under the two technology parameter sets with bulk and degraded spin diffusion lengths are shown in Table 4 . Although the degradation of spin diffusion length will inevitably induce higher delay, the optimization through sizing could still bring a good amount of improvements for all circuit benchmarks, indicating the effectiveness and robustness of our algorithm across various technologies. Various techniques have been applied to enhance the efficiency of Algorithm 1, including the use of a precharacterized lookup table for the intrinsic delay of ASL gates and incremental VOLUME 2, 2016 timing analysis after a change in the TILOS-like optimization algorithm. It can be seen from Table 4 that the more advanced technologies have shorter delays and larger delay improvements with a reasonable runtime on the ISCAS85 benchmark circuits.
Detailed results are presented for the C6288 benchmark under the two technology parameters with bulk and degraded spin diffusion lengths of Table 2 to demonstrate the effectiveness of our optimization algorithm. The delay-power tradeoff curve under parameter set 1 is shown in Fig. 6(a) . The optimization begins at the highest delay, at the right of the curve. As the delay reduces through the optimization, the power increases as a penalty. The delay reduction and the energy for C6288 benchmark is shown in Fig. 6(b) , and clearly through each iteration, the delay of the circuit keeps decreasing. The energy, however, behaves differently. At the beginning of the optimization, it decreases together with delay, since sizing the gates helps overcome gross inefficiencies in the interconnect bottleneck. The reduction of delay dominates the power increase in the power-delay product at this moment. As the benefit of delay reduction becomes smaller as the optimization proceeds, the increase in power finally dominates and the energy starts to increase. Similar trends are seen under three other sets of results. The trend of power-delay curves indicates that at the beginning of the optimization, power is relatively insensitive to upsizing of the magnets, yet as the magnets on the critical path become larger and are still sized up for smaller delay, power becomes more sensitive to sizing.
VI. CONCLUSION
This paper has explored the energy/delay tradeoff relation and presented a systematic approach to optimizing ASL circuits. We have presented a posynomial programming approach for buffered lines and a numerical optimization scheme for general circuits. Under realistic parameters that include factors, such as degradation in the spin diffusion length due to scaling, our results demonstrate the utility of sizing ASL circuits to reduce delay by about 30%. This framework can enable technology-circuit codesign by allowing the evaluation of technology parameters on circuit performance.
