This paper presents several optimization problems occurring in VLSI interconnect, Networks on Chip (NoC) design and 3D VLSI integration, all possessing closed-form solutions obtained by well-solvable Quadratic Assignment Problems (QAP). The first type of problems deals with the optimal ordering of signals in a bus bundle such that the switching power, delay and noise interference are minimized. We extend a known solution of ordering the signals in a bus bundle to minimize the impact of the first order wire-to-wire parasitic capacitance occurring between adjacent wires into a model accounting for also secondary components of wire-to-wire parasitic capacitances. The second type of problems arises in the mapping of computation tasks into an array of processors sharing a common bus, such as those found in NoC. We show a QAP closed-form solution to the optimal mapping problem which simultaneously minimizes the switching power and the average delay of the bus. The third problem deals with the optimization of 3D VLSI, vertically stacking ordinary ICs. Some of the above problems involve k-salesmen Traveling Salesman Problem (TSP), where costs are evaluated for elements located at k-distance apart along the tour. We show a simple proof that these are well-solvable problems and obtain their solution. This is then generalized to well-solvable QAPs obtained by superposition of such TSPs. A simple proof shows that if k-distance TSPs are well-solvable, so is the QAP obtained by their sum, where the solution of 1-distance TSPs dominates all the others.
Introduction
In the following, we present three types of problems occurring in VLSI circuit and system design, and their relation to Traveling Salesmen Problem (TSP) and Quadratic Assignment Problem (QAP). TSP and QAP are well known intractable problems, but there are special instances where TSP [9, 4, 6] and QAP [21, 3, 7, 17] are well-solvable. In the following, we show that those VLSI problems are mapped into well-solvable TSPs and QAPs. Fig. 1a illustrates a commonly used n-wire bus bundle. There, logic gates called drivers drive signals propagating along interconnecting wires. These signals stimulate other logic circuits, called receivers, connected at the opposite end of the wires. The bus is shielded by wires connected to ground. Cross-coupling parasitic capacitance (cross-cap for short) which is a predominant cause of signal propagation delay, dynamic (switching) power consumption and crosstalk noise interference, occurs between any two wires of the bus [1] . The primary component of the cross-cap is occurring between adjacent wires, where we say that the wires are at 1-distance of each other. Secondary cross-cap components exist in the bus as shown in Fig. 1b . Most interconnect optimization algorithms account for only 1-distance cross-cap, which is claimed to reach 90% of the total cross-cap. A secondary component of about 6% is due to wires at 2-distance and the rest are due to higher distances. The secondary cross-cap impact on power dissipation and the accuracy of power estimation were studied in [15] . The question of how to order the wires in the bus to yield best performance (dynamic power consumption, delay and noise immunity) has been studied for 1-distance cross-cap, and it was shown to be a well-solvable TSP [20] . It is shown in the sequel that accounting for secondary cross-cap components results in a well-solvable QAP, which generalizes the former result.
Minimizing delay and dynamic power in VLSI interconnect bus bundle
A well known VLSI optimization problem is the setting of wire widths and spaces in a bus bundle whose total width (the distance between the shields in Fig. 1a ) is constrained [19] . The question of how to order the wires in the bus such the above optimization will yield best bus performance was discussed in several works [13, 12, 18, 10, 8, 22] , where at each paper focused on a single objective, an all considered only 1-distance cross-cap. We show subsequently that the consideration of secondary cross-cap does not change the optimal order of wires in the bus. Accounting for a k-distance component alone, 1 ≤ k ≤ n − 1, implies a sort of TSP with k salesmen. Considering all distances simultaneously yields a special QAP, which is a sum of all k salesmen TSPs. We show in Section 3 that in the combined problem the 1-distance solution, corresponding to the ordinary TSP (one salesman), dominates all the others.
Bus modeling associates with every wire a parameter r whose meaning depends on the optimization problem of interest. For delays r is the resistance of the signal's driver. For dynamic power consumption and noise interference r represents signal's relative switching probability, called activity factor in VLSI jargon [11] . Given an arbitrary order of the wires in the bus, let n real nonnegative parameters r 1 , . . . , r n be associated with the wires. It was shown in [13, 12, 18, 10, 8, 22 ] that up to a multiplicative factor which is independent of problem's setting, once the widths and the spaces are set to minimize delay, dynamic power or noise interference, the objective function satisfies the following expression:
(1.1)
In [18] the minimization of noise interference between signals was considered. In [13] average delay of a signal was the minimization objective and in [12, 10, 8, 22] 
and a more general functions, where square root is just a special case. In (1.2) successive parameters r i and r i+1 correspond to wires residing at 1-distance of each other, and n + 1 ∆ = 1. Assume w.l.o.g that r 1 < r 2 < · · · < r n . Let Π denote the set of all permutations π : {1, . . . , n} → {1, . . . , n}, and the sequence ⟨i 1 , . . . , i n ⟩, called also a tour, be obtained by π (j) = i j . The works in [13, 12, 18, 10, 8, 22] showed that the permutation yielding the sequence ⟨1, 3, . . . , n, . . . , 4, 2⟩ is minimizing the expression:
This permutation was studied thoroughly in combinatorial optimization and is called Symmetric Pyramidal Tour Permutation (SPTP) [9] . It was shown in [13] that 10%-15% of delay reduction can be achieved by SPTP order of wires compared to the order used for a real microprocessor designed in 65 nm process technology. Same amount of power reduction was reported in [12] for the same microprocessor design. Similar numbers were reported in [10, 8, 22] . SPTP is a well known solution of a special case of TSP [4, 6] whose cost matrix satisfies the so called ''four-point'' conditions [6] . The work in [20] showed that all the above mentioned VLSI problems satisfy Supnick conditions for TSP [16] , for which SPTP is indeed optimal. It also showed that SPTP is optimal for a more general form of (1.3), where
, is a symmetric real function defined for x ≥ 0 and y ≥ 0, twice differentiable, and satisfying ∂ 2 f (x, y) /∂x∂y < 0.
The above works explored the optimal wires order in the bus in the presence of only primary cross-cap. The question of whether accounting for secondary cross-cap components may change the SPTP optimal order is interesting. The following discussion takes the secondary cross-cap into account and maps the optimization problem into a special QAP as a motivation for the study in Sections 2 and 3. The main result there is that the addition of the approximated secondary cross-cap preserves the SPTP optimal solution. One can expect for slight changes in the optimal wire widths and spacing resulting by an optimization as applied in [8, 19] . This is however a classical convex optimization which is beyond the scope of this paper.
Given a wire located at position j in the bundle, we account for its cross-cap to wires positioned at j + k, 1 ≤ k ≤ n − 1, where wire indices are numbered cyclically. Let 
 n−1 k=1 α k = 1, be the k-distance cross-cap coefficient, e.g., α 1 = 0.9, α 2 = 0.06, etc. The minimum power and delay when higher than 1-distance cross-cap components are considered cannot be expressed as a linear sum of expressions as those in (1.3). Still, simulations show that the subsequent weighted sum, where the coefficients are monotonic decreasing in wire distance, yields a fair approximation [15] . We are therefore interested in finding the permutation minimizing the expression
The consideration of higher distances turns the problem of finding the optimal permutation into a special QAP case, for which SPTP closed-form solutions exist. A variety of engineering optimization problems yielding SPTP optimal QAP solution are presented in [21] . Papers [3, 7] discuss well-solvable QAP and provide good references to this topic. Given a n×n real cost
, QAP aims at finding a permutation π minimizing the expression
and d ij = 0 otherwise. Section 3 shows that when a QAP is obtained by a sum of k-distance well-solvable TSPs the optimal permutation is SPTP of 1-distance TSP, which dominates all the others. Fig. 2a illustrates architecture of an array of identical processors sharing a common segmented bus, through which each pair of processors can communicate. Such an array is used to execute in parallel several computation tasks where tasks are communicating with each other through the bus. The goal in allocating tasks to processors is to minimize both the power consumed on the bus and the average data transfer latency. Such problems arise lately in the area of Networks on Chips (NoC) and [2] provides many references to works done in the area. The results of those works have been obtained by experiments and simulations, without analysis of problem's combinatorial properties, something done below. We assume that at any time only two processors can communicate through the bus. Since transferred data is switching between 0 and 1, the dynamic power consumed is dictated by the active portion of bus. The bus is therefore automatically configured to minimize the capacitive load and avoid switching of any unnecessary portion. This is achieved by switches connecting the segment between processors P i and P j when they are communicating, and disconnecting the segments extending from P 1 to P i and from P j to P n . A switch S is shown in Fig. 2(b) , comprising three internal switches: S P connects the bus and processor P, while S L and S R establish leftward and rightward connections, respectively. The switches are controlled by signals to yield the desired configuration.
Minimizing the dynamic power and latency in a bus shared by an array of processors
A computation task is associated with a probability p i , corresponding to the relative time along the entire run-time of an application it is active on the bus. The probability p i is known in advance, e.g. from simulations. Since only a single pair of processors can communicate at any time, it follows that
We assume that the bus capacitive load incurring between P i and P j is proportional to their distance, given by γ |i − j|, where γ is some factor independent of the allocation. This holds since capacitance is proportional to area of wires, which is proportional to |i − j|, the length of wire connecting P i with P j .
Let π denote the order of tasks allocated to processors. The total dynamic power consumed along the entire run time of an application is given by: 
and d ij = 0 otherwise, expression (1.6) turns into (1.5). Task allocation problem is reminiscent of the average access time minimization problem [17] , and it is an Anti-Monge-Toeplitz well-solvable QAP solved by SPTP [3] .
Consider now the minimization of average data transfer latency along the bus. The travel time of data between P i and P j is proportional to the RC delay of the bus segment connecting P i with P j , given by δ(j − i) 2 , where δ is some factor independent of the allocation. Notice that minimizing the average latency is equivalent to minimizing its total, given by:
(1.7)
Eq. (1.7) can be represented as QAP by defining the costs same as in (1.6), and distances by d ij = (j − i) 2 , 1 ≤ i < j ≤ n, and d ij = 0 otherwise, turning (1.7) into (1.5). This problem is also an Anti-Monge-Toeplitz well-solvable QAP solved by SPTP [3] .
Notice that a ring bus may share the same results as obtained for the linear bus in Fig. 2 , provided that the ring connectivity topology is implemented in such a way that the distances between pair of connected processors is more or less the same for the entire ring. In that case there is no difference between the linear and ring bus with equal distances between processors. However, if the distances between processors are arbitrary, the problem for both buses turns to be ordinary intractable QAP.
Optimizing 3D VLSI physical integration
The steady progress of VLSI CMOS technology, lasting for already five decades and known as Moore's law, which doubles the transistor count on silicon area every two years, will probably come to end within a decade or so due to technology limitations. On the other hand, the ever growing demand for computational power is driving the Integrated Circuit (IC) industry to explore and develop new integration technologies. 3D IC integration is a novel technology of growing importance, offering significant performance and functional benefits [14, 5] . As shown in Fig. 3 3D ICs incorporate individual, vertically stacked ordinary planar ICs, where the interconnections between ICs are implemented by the so-called Through-Silicon Via (TSV). While today's 3D VLSI technology is capable of stacking up to eight ICs, it is expected that within a few years this number will grow into few dozens.
3D integration extends most of the performance affecting factors like wire length, the amount of data traffic, and few more, from planar into 3D models, for which the order of vertical stacking is critical. Moreover, cumulative and bulky factors occurring in the individual planar ICs like the number of interconnects, the amount of switching occurring during IC operation, heat dissipation, peak current and the total capacitance of IC, have a primary impact on the 3D IC stacking order [14] . Those factors affect not only the individual IC, but also the ICs above and below. The interrelations between ICs and their impact on performance, reliability and cost of the entire 3D product lead to similar expressions as those mentioned in former examples, emphasizing the importance of TSP and QAP for those technologies and design methods.
Consider for instance the Through-Silicon Vias (TSVs), being the critical resource for 3D integration due to their limited number. Their size (pitch) is far larger compared to ordinary planar 2D vias used in the individual ICs, making their number very limited [14] . Since the number of inter-IC signals may reach tens of thousands, we would like to stack the ICs such that the total required number of TSVs is minimized. Considering n vertically stacked ICs, let p ij be the number of signals connecting IC i with IC j, 1 ≤ i, j ≤ n. Since an interconnect between IC i with IC j must pass through all the ICs in between, the number of TSVs required to implement those inter-IC connections is p ij × |i − j|. The total required number of TSVs is therefore
(1.8) Though (1.8) is an ordinary intractable QAP problem, p ij may in many cases have a special form of sum or product matrix, resulting in a well-solvable QAP.
In the above we described several VLSI problems; all represented as QAP, where their costs imply types of Monge matrix and their distances imply types of Toeplitz matrix. All problems can therefore be represented as sums of weighted TSPs where costs are taken between k-distances elements along the tour. It is therefore worthwhile to further explore the relation between QAP and sums of TSP problems. In the rest of the paper we will return to the types of QAP as in (1.4) and prove that those can be treated as a superposition of well-solvable TSPs, a fact that enables a simple proof showing that the combined problem is also well-solvable QAP. First, a k-distance,1 ≤ k ≤ n − 1, TSP special case is discussed and a closed-form optimal permutation is shown, obtained by an arbitrary evenly interleave of k SPTPs of independent problems with n/k elements each. We then derive a closed-form optimal permutation for the special QAP obtained by summing k-distance TSPs, 1 ≤ k ≤ n − 1. Though every k-distance TSP yields different optimal permutation, their sum is always optimally solved by SPTP of 1-distance problem (ordinary TSP). We will conclude with few directions for further research.
The minimization of k-distance special TSP
The following lemma was proved in [20] and is used in the subsequent discussion to derive solutions for the TSP and QAP special cases. 
In the following we will use the term k-distance SPTP to denote the optimal permutation when the cost is measured at k-distance along the tour. In this terminology the ordinary SPTP corresponds to 1-distance SPTP. In order to derive the permutation which minimizes the TSP cost taken at k-distance, we first consider the case of 2-distance.
Lemma 2. Let f satisfy the conditions of Lemma 1, 0 ≤ r 1 < r 2 < · · · < r n−1 < r n be real numbers and let n be even. An optimal permutation π * ∈ Π minimizing Fig. 4a . The set of elements is ordered in increasing order r 1 < r 2 < · · · < r n−1 < r n and its division into two subsets, each ordered in SPTP. Fig. 4a . satisfies the following: Proof. We will prove first that the elements must be partitioned into two independent, equal sizes subsets where each must be SPTP ordered, as any other than (B 1 , B 2 ) partition necessarily results in higher TSP cost. Since n is even and the cost is calculated for elements at 2-distance of each other, it follows that any tour must divide the elements into two sets A 1 and A 2 satisfying |A 1 | = |A 2 | = n/2. Moreover, if A 1 is ordered by a permutation π 1 and A 2 by π 2 , any combined permutation π must be a result of even interleave of π 1 and π 2 (Fig. 4b) π 1 by a, b, c and d , and in π 2 u, v, w and x, as shown in Fig. 5a . There exists max{a, u} < min{b, v} and min{c, w} > max{d, x}. The order obtained after the swap (up to cyclic shift which does not change cost) is shown in Fig. 5b . Recall that the cost between 1-distance elements is taken cyclically, hence the extreme elements are also at 1-distance of each other. Comparison of the costs of π and π ′ yields:
Fig. 4b. An optimal permutation is obtained by any even interleave of the two SPTPs obtained in
Inequality (2.3) follows from expression (1) which is positive due to max{a, u} < min{b, v} so the left hand side of inequality (2.1) applies. Expression (2) is also positive since min{c, w} > max{d, x} and Lemma 1 applies again. Inequality (2.3) means that π is not optimal. This is the outcome of the assumption that A 1 ̸ = B 1 , hence equality must exist and the lemma follows.
Few comments are in order. It follows from Lemma 2 that 2n equivalent permutations minimizing F (π) defined in (2.2)
can be constructed. Those are obtained by the n/2 relative shifts of the evenly interleaved elements of B 1 with respect to those of B 2 , and mirroring (reverting) π * 1 and π * 2 . The partition into B 1 and B 2 , and their implied permutations are called by some papers sub-tours. In case of odd n there is no separation into sub-tours and the optimal permutation is uniquely defined to yield SPTP 2-distance cyclic traversal, thus yielding ⟨1, n − 1, 3, n − 3, . . . , n − 2, 2, n⟩. As can evidently be seen, a 2-distance TSP cyclic traversal yields SPTP indeed.
We next generalize Lemma 2 for any k-distance cost TSP.
Theorem 1.
Let f satisfy the conditions of Lemma 1, 0 ≤ r 1 < r 2 < · · · < r n−1 < r n be real numbers and n, m and k positive integers satisfying n = mk. An optimal permutation π * ∈ Π minimizing
satisfies the following: Proof. Since n = mk and the cost is calculated for elements at k-distance of each other, every k contiguous elements in the tour must belong to distinct sets and k sub-tours result in. It follows therefore that any tour must divide the elements into k sub-tours comprising sets
. . , r j im } be a sets of m elements in a sub-tour ordered by π i . Since the elements of A i s are not interacting with each other it follows that
. Each π i must therefore be 1-distance SPTP as otherwise F (π) was not minimal; hence 2 follows.
To prove that A i = B i , 1 ≤ i ≤ k, assume in contrary that there is A l ̸ = B l for some 1 ≤ l < k and let l be the smallest index where such inequality holds. By its very definition A l ∩  l−1 j=1 B j = ∅, and A l ∩  k j=l+1 B j ̸ = ∅. It follows from 2 that A l is ordered such that π l is a 1-distance SPTP. The elements of A l ∩  k j=l+1 B j are therefore evenly centered in π l , while those of A l ∩ B l are evenly tailed in π l . This is illustrated in Fig. 6a , where the red elements belong to  k j=l+1 B j and the blue ones to B l . 
We now modify A l and A q into A 
> 0. 
The QAP as a sum of j-distance TSPs
In this section we consider a weighted sum of j-distance TSPs. Let α j ≥ 0, 1 ≤ j ≤ k < n, be nonnegative real numbers and consider the cost
which generalizes (1.4). The expression in (3.1) is obtained by QAP comprising a n × n cost matrix C = (c ij ) defined by f , and a n × n distance matrix D = (d ij ) defined by the weights α j . In this setting D is a Toeplitz matrix satisfying
We will show subsequently that (3.1) implies a well-solvable QAP. Theorem 1 showed that when the cost matrix C is derived from a function f (x, y) satisfying the conditions of Lemma 1, a k-distance SPTP solves the k-distance TSP. It is not intuitively obvious what permutation solves the special QAP given by a weighted sum of j-distance TSPs, 1 ≤ j ≤ k. In the following we will show that the 1-distance TSP is dominating all the other j-distance TSPs, and the optimal solution for their sum is always that of 1-distance SPTP. As a first step we consider the special QAP obtained by summing j-distance TSPs for α j = 1, 1 ≤ j ≤ k, where the indices are considered cyclically
We will then discuss the case of α j ≥ 0, 1 ≤ j < n. Though the case α j = 1 can be derived as a special case of [3, 7] , its proof of being well-solvable by 1-distance SPTP is far simpler and more intuitive. Proof. Let k be a fixed number. The proof follows by induction on n. For the basis we use n = k. Since the maximal distance between two elements in a problem having k elements is k − 1, the cost consists of the function evaluated for all possible pairs of elements. Since all the off-diagonal elements of D are equal to 1, all the permutations yield the same cost, and 1-distance SPTP in particular is optimal.
Assume by induction that 1-distance SPTP is minimizing the cost of m-size QAP, k ≤ m, given as a sum of j-distance TSPs, 1 ≤ j ≤ k, and consider a problem of m + 1 elements. Assume w.l.o.g that r 1 < r 2 < · · · < r m+1 . Let π be a permutation of {1, . . . , m + 1} resulting in the sequence ⟨j 1 , . . . , j m+1 ⟩. The cost of π is given by: Let π * be 1-distance SPTP. We will show subsequently that π * is a lower bound of (3.3). Let l = π(1). The proof follows by decomposing (3.3) into two terms F (π) = G(π ) + H(π ) and then showing that π * is a lower bound of both G and H.
The term G(π ) is obtained by excluding r 1 ( ∆ = r π −1 (l) ∆ = r j l ), so π is restricted to {2, . . . , m+1}. The term H(π ) compensates G(π ) to satisfy (3.3) equality. Since we consider distances from 1 to k, H(π ) exactly involves the 2k + 1 elements r j l−k , r j l−k+1 , . . . , r j l−1 , r 1 , r j l+1 , . . . , r j l+k−1 , r j l+k . We similarly decompose the cost of π * into F (π * ) = G(π * ) + H(π * ). The smallest element r 1 is centered in the valley of π * ; hence its exclusion leaves π the selection of those 2k elements does not affect G(π ) since its evaluation involves all the m elements {r 2 , . . . , r m+1 } and its minimization is therefore only a matter of π . We claim that the 2k elements must be selected to be the smallest ones of {r 2 , . . . , r m+1 }, which are {r 2 , . . . , r 2k+1 }. This follows from H being monotonic increasing in each of its variables as shown in the following. Take any element r π −1 (l+r) , −k ≤ r ≤ k, r ̸ = 0, and let it get any two values r
