American University in Cairo
AUC Knowledge Fountain

Theses and Dissertations

2-1-2015

# A framework for fine-grain synthesis optimization of operational amplifiers 

Taher Essam

Follow this and additional works at: https://fount.aucegypt.edu/etds

## Recommended Citation

## APA Citation

Essam, T. (2015).A framework for fine-grain synthesis optimization of operational amplifiers [Master's thesis, the American University in Cairo]. AUC Knowledge Fountain.
https://fount.aucegypt.edu/etds/45

## MLA Citation

Essam, Taher. A framework for fine-grain synthesis optimization of operational amplifiers. 2015. American University in Cairo, Master's thesis. AUC Knowledge Fountain.
https://fount.aucegypt.edu/etds/45

This Thesis is brought to you for free and open access by AUC Knowledge Fountain. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of AUC Knowledge Fountain. For more information, please contact mark.muehlhaeusler@aucegypt.edu.

# The American University in Cairo 

School of Science and Engineering

# A Framework for Fine-grain Synthesis Optimization of Operational Amplifiers 

A Thesis Submitted to<br>\title{ Electronics and Communication Engineering Department }

In partial fulfillment of the requirements for the degree of Master of Arts/Science

By Taher Essam Ali Kourany

Under the supervision of:
Prof. Yehea Ismail, Dr. Emad Hegazi

January/2015
Cairo, Egypt

The American University in Cairo

School of Science and Engineering (SSE)

## A Framework for Fine-grain Synthesis Optimization of Operational Amplifiers

A Thesis Submitted by<br>Taher Essam Ali Kourany<br>Submitted to Department of Electronics January/2015<br>In partial fulfillment of the requirements for<br>The degree of Master of Science<br>has been approved by

Thesis Supervisor
Affiliation:
Date $\qquad$

Thesis first Reader
Affiliation:
Date $\qquad$

Thesis Second Reader
Affiliation:
Date $\qquad$

Department Chair
Date $\qquad$

Dean of SSE
Date $\qquad$

To my Parents and Egypt,
You Mean the World to me.

A man ought to read just as his inclination leads him; for what he reads as a task will do him a little good.

SAMUEL JOHNSON

## Acknowledgment

First of all, I have to thank God for giving me power and patience to finish my master thesis.

I'd like to show my gratitude for my supervisor Prof. Yehea Ismail for the great opportunity that he gave to me. I'd like to thank him for his support, guidance and for the great effort that he exerted during my masters. He provided me with all the facilities that I need in my research.

I'd like to show my gratitude to my dear brother Ali Kotb and my beloved friends Hoda Ahmed, Nehal Hussein, Hazem Medhat, and Dalia Ahmed for their support and help. They helped me a lot in solving the problems I faced. I'd like to thank them for their advice and good spirit. We spent unforgettable time together.

I'd like to make a special acknowledgment to Eng. Soha Hamed for her guidance throughout the difficulties i experienced while working on my thesis.

I am thankful for all those who discouraged me, it taught me perseverance. It's because of them, I did it myself.

## Abstract

# OF THE THESIS OF 

Taher Essam Ali Kourany
for Master of Science
Major: Electronics and Communication Engineering
The American University in Cairo

Title: A Framework for Fine-grain Synthesis Optimization of Operational Amplifiers
Supervisor: Prof. Yehea Ismail, Dr. Emad Hegazi
This thesis presents a cell-level framework for Operational Amplifiers Synthesis (OASYN) coupling both circuit design and layout. For circuit design, the tool applies a corner-driven optimization, accounting for on-chip performance variations. By exploring the process, voltage, and temperature variations space, the tool extracts design worst case solution. The tool undergoes sensitivity analysis along with Pareto-optimality to achieve required specifications. For layout phase, OASYN generates a DRC proved automated layout based on a sized circuit-level description. Morata et al. (1996) introduced an elegant representation of block placement called sequence pair for general floorplans (SP). Like TCG and BSG, but unlike O-tree, $\mathrm{B}^{*}$ tree, and CBL, SP is P-admissible. Unlike SP, TCG supports incremental update during operation and keeps the information of the boundary modules as well as their relative positions in the representation. Block placement algorithms that are based on SP use heuristic optimization algorithms, e.g., simulated annealing where generation of large number of sequence pairs are required. Therefore a fast algorithm is needed to generate sequence pairs after each solution perturbation. The thesis presents a new simple and efficient $O(n)$ runtime algorithm for fast realization of incremental update for cost evaluation. The algorithm integrates sequence pair and transitive closure graph advantages into TCG-S* a superior topology update scheme which facilitates the search for optimum desired floorplan. Experiments show that TCG-S* is better than existing works in terms of area utilization and convergence speed. Routingaware placement is implemented in OASYN, handling symmetry constraints, e.g.,
interdigitization, common centroid, along with congestion elimination and the enhancement of placement routability.

## Table of Contents

1. Introduction ..... 1
1.1. Literature Review ..... 3
2. Circuit level synthesis. ..... 20
2.1. Folded Cascode OTA ..... 20
2.1.1. Introduction ..... 20
2.1.2. Basic Operation ..... 20
2.1.3. Common Mode Feedback ..... 24
2.1.4. Bias Circuit. ..... 25
2.1.5. Advantages ..... 25
2.1.6. Disadvantages ..... 25
2.2. Sensitivity Analysis ..... 27
2.2.1. Classification of sensitivity analysis ..... 29
2.2.2. Local Sensitivity analysis ..... 30
2.2.3. Global Sensitivity analysis ..... 31
2.3. Overview of OASYN framework ..... 34
2.4. Circuit sizing tool ..... 36
2.4.1. The Sobol' Sensitivity Analysis ..... 36
2.4.2. Computation of Sobol' Indices by Monte-Carlo Sampling ..... 38
2.4.3. Circuit Sizing Algorithm ..... 39
3. Layout Floorplan ..... 42
3.1. Comments on TCG-S Representation ..... 42
3.1.1. Update of Constraints graph ..... 43
3.1.2. $\quad$ Packing Sequence $\Gamma$ - Update ..... 45
3.2. TCG-S* Perturbing Algorithm ..... 47
3.2.1. TCG Topology Update ..... 47
3.2.2. Packing Sequence Update ..... 52
3.2.3. Equivalence of TCG and SP ..... 54
3.3. Floor Planning Algorithm ..... 55
3.3.1. Slack Computation ..... 55
4. Placement and Routing ..... 60
4.1. Constraints-based Placement ..... 60
4.1.1. Overview of Analog Placement Methods ..... 60
4.1.2. A Review on Simulated Annealing Optimization Algorithm ..... 63
4.1.3. Inter-digitated matching style ..... 67
4.1.4. Common-centroid matching style ..... 68
4.2. Optimization-Based Router ..... 71
5. Experimental Results ..... 73
Conclusion. ..... 82
Future Works ..... 82
References ..... 83

## List of Figures

Figure 1.1 Overview of IDAC system chart ..... 4
Figure 1.2. Topology selection and translation process in OASYS ..... 6
Figure 1.3. Layout optimization process ..... 8
Figure 1.4. Layout-driven circuit sizing flow chart ..... 9
Figure 1.5. . Encoding of 8-node O-tree ..... 11
Figure 1.6. (a) Placement of four uncompact blocks. (b) The corresponding horizontal and vertical transitive closure graph $C h$ and $C v$ ..... 13
Figure 1.7. (a)-(f) Process to extract a $\Gamma$ - from block placement. (g) Resulting TCG-S ..... 15
Figure 1.8. A block placement with sequence $\Gamma-\langle a, b, c\rangle$ ..... 16
Figure 2.1. Folded cascode OTA circuit diagram ..... 21
Figure 2.2. Common Mode FeedBack Circuit ..... 24
Figure 2.3. Folded Cascode OPAmp Bias circuit ..... 26
Figure 2.4 Parameric bootstrap version of uncertainty and sensitivity analysis ..... 28
Figure 2.5. Overview of the OASYN framework ..... 35
Figure 3.1. Three types of perturbations. (a) The initial TCG (Ch and $C v$ ) and the placement.
Dimensions for the six blocks are: a ( $6 \times 4$ ), b ( $4 \times 6$ ), $c(7 \times 4), d(6 \times 3)$, e ( $3 \times 2$ ), and $f(3 \times 3)$. (b) The resulting TCG and placement after rotating module $d$ based on TCG-S. (c) The resulting TCG and placement after reversing nodes ncand ne based on TCG-S. (d) The resulting TCG and placement after swapping nodes $n c$ and $n d$ based on TCG-S ..... 44
Figure 3.2. Three types of perturbations. (a) The resulting TCG and placement after rotating module. (b) The resulting TCG and placement after reversing nodes ncand ne. (c) The resulting TCG and placement after swapping nodes ncand nd ..... 48
Figure 3.3. Slack computation (a) floorplan evaluation in left to right and bottom to top mode. (b)floorplan evaluation from right to left and top to bottom mode.56
Figure 4.1 Constraint-driven analog layout generation flow ..... 62
Figure 4.2 An example of inter-digitated array ..... 67
Figure 4.3 An example of common centroid array ..... 68
Figure 5.1. Generated Folded Cascode OpAmp Layout with the Common Feedback Circuit for Simultaneous Area and Matching Constraints Optimization. Area $=29.665 \times 102.065$ um2 ..... 78
Figure 5.2 Automated Placement and routing solution (Area $=146 * 47 \mathrm{um} 2$ ) ..... 79
Figure 5.3 Calibre DRC Message of the placement solution ..... 80
Figure 5.4 Calibre LVS Message of the layout solution ..... 81

## List of Tables

Table 1. MCNC Benchmark circuits ..... 56
Table 2. Area and Runtime Comparisons among SP (On Sun Sparc Ultra60), O-Tree (On Sun Sparc Ultra60), B -TREE (On Sun Sparc Ultra 60), Enhanced O-Tree (On Sun Sparc Ultra60), CBL (On Sun Sparc 20), TCG (On Sun Sparc Ultra60), TCG-S (On Sun Sparc Ultra60), and TCG-S* (On Intel Core-i3) for Area Optimization ..... 74
Table 3. Folded Cascode OpAmp Synthesis Results ..... 74
Table 4. Folded Cascode OpAmp Synthesis Results ..... 75
Table 5. Folded Cascode OpAmp Synthesis Results ..... 75
Table 6. Folded Cascode OpAmp Synthesis Results on Process, Voltage, and Temperature Corners ..... 76
Table 7. Folded Cascode OpAmp Synthesis Results on Process, Voltage, and Temperature Corners ..... 77

## 1.Introduction

An analog system is typically characterized by a set of performance parameters used to quantify the properties of the circuit. Given a fixed topology, circuit synthesis is the process of determining numerical values for all components in the circuit such that the circuit conforms to a set of performance constraints. The pervasive trend in recent years is the integration of whole systems into single-chip. Analog circuitry is widely used in systems applications such as telecommunications and robotics, where analog interfaces to an external environment are coupled with digital signal processing systems. The demands for high performance CMOS analog circuits increased dramatically in recent years, especially for digital-analog interface circuits, due to the emergence of system-on-chip (SoC). Although analog circuits take up only a minor part of most ASIC's, their design time and cost is very important. Most of knowledge, effort, and time are spent in designing analog blocks of the chip since they are largely dominated by heuristics and experience needed to achieve required specifications.

Given a set of specification/requirements that describe the system to be realized, the selection of the optimal implementation comes mainly out of experience. Many digital parts of such chips can nowadays be synthesized rapidly and reliably using CAD tools developed for semicustom design methods such as gate arrays, standard cells, and macro cells. On the other hand, analog subsystems still need to be entirely handcrafted by a specialist, due to the high degree of nonlinearity and interdependence among design variables. Therefore, the design time and cost associated with dedicated analog interface components often constitute a bottleneck in semicustom design of mixed analog/digital systems. The growing scale of industry and the rapid advancement in integrated circuits technology have led to dramatic increase in physical design complexity. The need to tackle this complexity and comply with time-to-market has encouraged the wide use of the hierarchical design and IP modules for a faster convergence to the optimum design in terms of area and speed. Some analog components are replaced with their digital counterparts, which are successful to a great extent. However, there are limitations to replace all the
analog blocks and what was left are considered to be intellectually challenging. The success of the digital design ideas and tools against analog design and its domination over the majority of the industry, due to sophisticated accurate tools empowering design time-tomarket, exposed the lack of comparable analog semi-custom tools.

For a top down, knowledge based approach, analog synthesis problem can be decomposed into two parts: firstly the synthesis of sized circuits from behavioral specifications and secondly the IC layout generation from these circuits. Design automation ideas from digital IC design have only recently begun to migrate into analog circuit design. In part, this reflects the inherent complexities of the analog design process. Outside of conventional analog/digital systems, there has recently been great interest in the design of parallel analog VLSI signal processing architectures. Hence, it is clear that CAD tools must be developed to cope with both the complexity of large-scale analog circuit designs, and with the requirement for rapid design times. In the digital domain, structured abstractions and hierarchy are commonplace, and are relied upon to make seemingly large synthesis tasks tractable by breaking them into smaller steps. Such abstractions and hierarchy do not currently play a central role in analog design. Some ideas from digital design methodologies, such as standard cell libraries and module generators, have recently been applied to analog design tasks. However, such techniques usually have several drawbacks, e.g., libraries allow the designer to make only crude tradeoffs among performance specifications, and they become obsolete rapidly in the face of technological evolution. The numerical circuit simulator SPICE is often used as a benchmark of comparison to determine the relative accuracy of alternative schemes for evaluating the performance of analog circuits.

### 1.1. Literature Review

Synthesis comprises two steps: topology selection and sizing. Topology selection means selecting the appropriate circuit topology from a library of topologies. Sizing consists of choosing appropriate transistor dimensions and biasing voltages to satisfy a given set of performance specifications. Topology selection has proven very difficult to automate due to its knowledge-intensive nature. Many attempts have been made in order to mimic the designer's expertise and knowledge into automation tools. There exists two approaches adopted in analog circuit synthesis: knowledge based approaches and optimization based approaches [20]. In the knowledge-based stream, the designer extracts design equations and integrates them into the tool to be reused for the same topology. In the optimizationbased approach, the optimizer searches the design space for the circuit that satisfies certain constraints and minimizes certain objectives. The optimization-based approach was further divided into two approaches: equation-based optimization and simulation-based optimization. In the equation-based optimization, circuit evaluation is done through prederived equations for performance specifications, initially extracted by the designer or by symbolic analysis. In the simulation-based optimization, the specifications are directly measured from the output waveforms of a simulator. The simulation-based approach has two major advantages over the equation-based approach:

- Accurate simulation models are used instead of approximate equations
- No long preparatory effort to extract all the describing equations. Practically, the extraction may rely fully on the simulator capabilities.

In order to reduce this design effort, analog standard cell libraries can be used. However, since the circuits are then not tailored to their application, an optimum solution, with respect to power dissipation and area, is not obtained. Furthermore, such libraries, which typically have required more than 20 man years of design effort, very rapidly become obsolete due to technology evolution. Stochastic combinatorial optimization methods such as simulated annealing and genetic algorithms (GAs) require the computation of
performance parameters for a large number of circuit sizing alternatives. It is, therefore, beneficial to reduce the time associated with generating performance estimates.

Synthesis tools adopting approaches to equation based strategies have been implemented. IDAC: An Interactive Design Tool for Analog CMOS Circuits [1] was one of the earliest tools developed in analog design automation, where designer has to specify the technology, desired building-block specifications. In IDAC, designer selects from different topologies existing in the database. Other tools [2], [3], [6], [22], [23], [24], [25], [26], and [27] adopted the same approach.


Figure 1.1 Overview of IDAC system chart
IDAC adopts a more knowledge based algorithm than an optimization one, by adopting equation based strategies and acquiring related circuit parameters e.g. minimum and
maximum value of the electrical parameters of MOS transistors, poly, well resistors, and layout rules, for computing circuit parasitics. In order to extract design worst-case solution, bias currents and mobility have been based on predictive equations which is not as accurate as models used nowadays in front-end simulators. These equations have been used to model the deterioration of chip performance under extreme high and low temperatures. IDAC system flow chart is shown in Fig. 1.1.

IDAC, KANSYS [4], and OPASYN [3] employed efficient equation based algorithms in terms of synthesis time and complexity; generating rough designs more quickly, creating an opportunity to explore design space. However as technology advances, it becomes much harder to render simple design equations to generate even rough specifications. OASYS[2] employed numerical optimization tools along with the circuit simulator to fine-tune device sizes in order to achieve the required performance.

OASYS [2], [28] adopted a hierarchical design strategy, in which analog circuit topologies are represented as a hierarchy of templates of abstract functional blocks. OASYS framework was based on three main ideas. Circuit topologies are selected from among fixed alternatives. A particular topology was chosen as a best candidate from which specifications were expected to be met. Secondly, the fixed alternatives for circuit topologies are identified hierarchically. A high level module was defined as an interconnection between sub-blocks. Finally, system level specifications could be then translated into sub-goals or specifications for the sub-block of a topology. The original motivation behind using separate selection and translation steps was to avoid the need to simultaneously design the interconnection and electrical characteristics of sub-blocks, where this hierarchical representation of topologies vastly simplifies the translation task since it tends to reduce the number of sub-blocks and simplify their connection. Hence, OASYS main contribution in the field of automated analog synthesis is the demonstration by which the analog behavior-to-structure synthesis problem could be recast in a highly structured form along with hierarchy as the key organizing principle. Translation involves knowledge of how performance specifications for a high-level block could be transformed into specifications for each sub-block, after which, these new specifications for each sub-
block would be used to design the transistors within each sub-block. The topology selection and translation process are shown in Fig. 1.2.

Each topology designed in OASYS has a design plan called plan steps in which three activities were performed. Heuristics, which are knowledge based decisions, make the design state more advanced by including some estimations that are based on the expertise of analog designers. After Heuristics planning, computation came next, where quantities like currents and biasing are computed from equations where sufficient information is available. These steps contributed mainly in assigning each sub-block certain specifications to achieve, and at last, a refinement step receives these new specifications, initiates sub-block design and retrieves the actual parameters that indicates the real performance of the circuit after synthesis. If simulated performance does not meet required specification, the topology is rejected and the search approach will be narrowed among the rest of the topologies.


Figure 1.2. Topology selection and translation process in OASYS

In the selection phase, the algorithm can correct itself and return to a previous successful node in order to make an alternative topology style if one of the plan steps failed. On the basis of expert designers' observation in OASYS selection strategies, the tool complies with certain structural constraints such as; choosing between differential and single pair input nodes, which are totally user defined. Predicting performance limitations of circuits is defined as heuristic discrimination, which is based on expert designers' mature assessments of each topology. Obviously, it is the hardest type of discrimination since it is based on qualitative decisions which are hard to codify.

The last type of discrimination is in generate-and-test style which seems to be naive, but it is much more natural to compare crafted designs by hand to get an insight into which will work better. Basically, the major innovation behind OASYS [2] is the need to create an alternative to flat representations and to represent the tool in a more structured hierarchical form. However, Optimization of sub-blocks performances and employing knowledge on how choices made in one sub-circuit affects other sub-circuits is a hard problem.

KANSYS: Kanpur Analog Synthesis from the Indian Institute of Technology overcomes the drawbacks of hierarchical design by allowing the transfer of expertise among different sub-circuits translation algorithms empowering topologies translation in a more efficient way. In case of a failing specification in one sub-block, analytical equations are modified affecting all the sub-blocks. In addition, a search algorithm traversing the space in a hierarchy-aware fashion accounting for multi-objective optimization and process variations, is adopted in [28] using GP [29] and age-layered population structure [30]. However, quantifying circuit parameters dependency and higher order terms remains a hard problem. [21] proposed an approach to reduce independent variables and speed up design runtime by computing correction factor (S-factor) from transistor level simulations. By multiplying this factor by linearized circuit equations, accurate design can be achieved.

Other CAD tools adopted a design-to-layout approach [31-33] accounting for post-layout synthesis performance deterioration. AIDA [31] is the integration of GENOM-POF for circuit synthesis, and LAYGEN II for automated layout generation. GENOM-POF
performs circuit synthesis using multi-objective optimization approach, accounting for worst-case solution by exploring process, voltage and temperature variations in the design space. LAYGEN II generates a DRC proved layout based on the sized circuit descripted generate by GENOM-POF and high level layout guidelines. In circuit synthesis, the designer specifies design objective, number of optimization variables, the size of the design space, and the number of independent variables. Circuit parameters are optimized to obtain a set of Pareto-optimal solutions that fulfill all the constraints and shows different tradeoffs between circuit specifications. LAYGEN II uses the hierarchical template description, the sized devices, and the technology node kit to perform placement and routing followed by a validation step. The router uses placed modules, connectivity, symmetry, and sensitivity constraints in the optimization process. However, routing-aware-placement solution which ensures a better routability and reliability is not considered in the placement process.


Figure 1.3. Layout optimization process
Dessouky et al. [32] proposed a layout-oriented circuit synthesis approach through passing the layout information at the beginning of the design phase. The approach guarantee a sized circuit performance that satisfied required specifications in the presence of layout parasistcs. Habal et al. [33] proposed an automated synthesis of circuit layout by investigating every feasible layout of each device, and the layout with best geometry are selected. The layout optimization process is driven by design, placement, and routing
constraints as shown in Fig. 1.3. Layout parasitics are extracted using an integral equation field solver without modeling. The first stage of circuit synthesis process involves formulating scalar minimization sub-problem on the basis of linearized objective function $\mathbf{f}$, followed by solving the sub-problem using generalized boundary curve algorithm (GBC). Layout-driven circuit sizing flow chart is shown is Fig. 1.4.


Figure 1.4. Layout-driven circuit sizing flow chart
A new layout is synthesized every iteration, where $\mathbf{f}$ has to be calculated for a new value of circuit parameters vector $X_{d}$ by simulating generated layout netlist. Finite forward difference technique is implemented to calculate the gradient of performance $\mathbf{f}$ with respect to $X_{d} . k^{(i)}$ represents the design parameters vector at ith iteration at which performance is evaluated to determine next step by GBC.

Most of previous work in analog circuit synthesis have adopted hierarchical flow approach to optimize performance at cell level. Knowledge and expertise are required to be implemented in the tool for inter-processes optimization. However, even if applicable, generated sized circuits are outperformed by manual designs in terms of area and performance. Other tools adopted optimization algorithms e.g. simulated annealing, genetic algorithm which, if not implemented with enough design knowledge, may take a very long run time and may fail in achieving high performance. Numerical optimization can be adopted in circuit synthesis, since it always gives an output, i.e., if the specifications are not met, one has quantitative information of how far away the target is. It is easier
compared to other engines to introduce new specifications and schematics. However, such optimization is computationally extensive and hides different design tradeoffs between circuit parameters. Furthermore, the goal specification depends heavily on the initial solution. A fast and intelligent circuit synthesis remains a challenging problem despite the high quality of previous work.

Floorplanning and building block placement are becoming more crucial in physical design as the circuit sizes are growing rapidly and hierarchical design with IP blocks are widely used in to order to reduce design complexity. In VLSI design, floorplan and block placement are considered critical to the performance of design process. Classical floorplanning optimizes the area and wirelength of the chip blocks, and therefore, generates a compacted overlap-free placement of blocks. Floorplan representations are classified into two types; slicing and non-slicing representations. Slicing representation involves repetitively subdividing floorplan area horizontally and vertically into finite number of non-overlapping structures. Slicing brings faster packing runtime and higher convergence speed, compared to non-slicing representation. Number of blocks per slicing structure and, hence, cost evaluations are significantly reduced, where each structure is considered a separate solution space. However, optimal solution may not be achieved in the solution space of slicing structures. Slicing tree [9] and normalized polish expression [10] are popular slicing representation.

For considerably moderate solution spaces, Non-slicing floorplan can bring optimal solution, i.e. minimum area, interconnect delay, and minimum critical path, in a reasonable convergence time. SP [11], TCG [12], O-tree [13], and corner block list [14] are widely used non-slicing representations. Murata et al. [15] defined P-admissible solution space to distinguish non-slicing floorplans by the following four requirements;

1) Solution space is finite
2) Every solution is feasible
3) Packing and cost evaluation can be performed in polynomial time, and
4) Best evaluated packing in solution space corresponds to an optimal placement.

According to this classification, SP, TCG, O-tree, and BSG [16] are P-admissible while slicing tree, normalized polish expression, $B *$ tree [17], and CBL are not. Since, slicing and
normalized polish expression do not generate optimal packing structures, they violate the conditions, and thus are not P -admissible representation.

Guo et al. [13] proposed an order tree (O-tree) representation for a left and bottom compacted placement with $n . l o g n$ run time complexity. An admissible placement is a compacted one where blocks can neither move down nor left. According to the representation, each rectangular block is defined by its tuple $\left\{h_{i}, w_{i}\right\}$, where $h_{i}$ and $w_{i}$ are height and width of blocks respectively. A constraint graph of the placement is $G=(V, E)$, where V presents each block in a form of a node. E represents geometric constraints between blocks which can be represented in a form of an edge drawn from the boundary of a block to another. Given an 8-node tree shown in Fig. 1.5, the placement can be encoded as ( $001010110100110, \mathrm{ABCDEFG})$. Starting from the root, node A is visited first and a bit ' 0 ' is recorded. Then node B is visited and a bit ' 0 ' is recorded. On the way back to the root, two bit ' 11 ' are recorded. The total number of possible configuration of an n-node tree is $\mathrm{O}\left(n!2^{2 n-2} / n^{1.5}\right)$. Placement post packing may not be compacted, resulting in a mismatch between O-representation and its placement after a series of compaction operations. Similar to O-tree, B*tree solutions may not be feasible, and thus they are not P -admissible representations.


Figure 1.5. . Encoding of 8-node O-tree
Nakatake et al. [16] proposed a method of modules packing based on bounded-sliceline grid (BSG) structure. BSG is a meta-grid which does not contain physical dimensions, however, it is a topological grid composed of orthogonal lines called the BSG-units. BSG divides the planes into rooms associated with binary information coding the geometric relations between modules, such that any two rooms are uniquely in either relation.

Modules are assigned to BSG rooms in which they inherent the geometric relations between their rooms and other room in the meta-grid. Modules packing run time is $\mathrm{O}\left(n^{2}\right)$.

Hong et al. [14] proposed an efficient and effective topological representation of NonSlicing Floorplan (CBL), which takes only linear time to derive modules placement from a representation. Unlike O-tree representation, corner block list defines the floorplan structure. Thus CBL is more flexible for floorplan optimization in terms of area and wirelength with different widths and heights of modules. Corner block list takes only $n(3+$ [ $\lg n]$ ) bits to describe, where $\lg n$ is the minimum integral number which implies that corner block list need fewer bits to describe than SP and BSG needs. Corner block list performs recursive detection of corner block in a top-right mode to describe block placement. When the detection ends, block names and their orientations are concatenated in a reversed manner. The orientation of the corner block is defined by the joint of its left and right segment of the block and T-junction containing the joint. If the T-junction is rotated by 90 degress, the block is considered as vertically oriented, therefore its orientation is denoted by 0 . The number of 1 's in the T -junction list denotes the number of T -junctions attached to the block. Each string of 1 's in T-junction list is ended by a 0 to separate it from other block detection. The advantage of CBL representation is that it does not only represent slicing structures, however, it can also represent non-slicing floorplan. The time complexity of floorplan realization is $\mathrm{O}(\mathrm{n})$ time which is better than SP, TCG, BSG, and TCG-S.

Lin et al. [12] proposed transitive close graph representation (TCG) for general nonslicing floorplans. TCG uses horizontal and vertical transitive closure graphs $C_{h}, C_{v}$ to describe the geometric relation between modules of the placement. Lin et al. extended the concept of P -admissible representation to that of $\mathrm{P}^{*}$-admissible one by adding a fifth condition; both horizontal and vertical geometrical information between modules are defined in the representation. The fifth condition ease the handling of the floorplan design problems with further requirements such as module sizing and constraints, e.g., boundary, symmetry and proximity constraints. Thereby, the representation corresponds to the packing if the $\mathrm{P}^{*}$-admissible conditions are satisfied.

Consider the uncompact placement in Fig. 1.6. Since O-tree is not a P-admissible
representation, it is not flexible in handling uncompacted floorplan structure. Geometric relations between modules cannot be directly derived using $O$-tree and $\mathrm{B}^{*}$-tree representation unless the placement is packed. Whereas, TCG can handle $\mathrm{P}^{*}$-admissible representation due to its flexibility and elegant features. Some geometric features cannot be obtained by O -tree and $\mathrm{B}^{*}$-tree representations, implying that O -tree and $\mathrm{B}^{*}$-tree representations are harder to handle floorplan design and render better results in area and wirelength optimization problems. Furthermore, due to their compaction operation, perturbing the placement solution in O-tree and $\mathrm{B}^{*}$-tree may results in an unpacked solution implying that placement will not correspond to the representation after packing harming the solution structure and thus the optimum solution.


Figure 1.6. (a) Placement of four uncompact blocks. (b) The corresponding horizontal and vertical transitive closure graph $C_{h}$ and $C_{v}$

TCG does not require any additional constraint graph for evaluation. Unlike SP, TCG supports incremental update after each solution perturbation and keeps positions of boundary modules as well as their geometric relation. Regarding SP, geometric relation among modules of a placement is not clear before packing and thus, SP constraint graphs are required to be generated from scratch for packing evaluation after each operation. CBL has a smaller feasible solution space and a faster packing scheme. However, CBL is not Padmissible as it represents general incompact placement. Given a TCG, its corresponding placement can be derived in $\mathrm{O}\left(n^{2}\right)$ by performing longest path algorithm, which is covered later in Chapter 4.

TCG representation is identified by three main properties; First, $C_{h}$ and $C_{v}$ are cyclic. A directed edge is constructed for each pair of nodes, which denotes modules in $C_{h}$ and $C_{v}$ graphs, according to geometric relations of these two modules. Since a pair of modules cannot be both below and above (left and right) to one another, the resulting $C_{h}$ and $C_{v}$ graphs must be acyclic. Second, for the aforementioned property 1, a pair of nodes must be connected by an edge in only one of the transitive closure graphs. Property 2 ensures that modules do not overlap since there is no horizontal and vertical relations between any pair of modules in a placement. The number of edges encoding the geometric relations between modules in a placement is $m(m-1) / 2$, where m is the number of modules. Third, the transitive closure graph $C_{h}\left(C_{v}\right)$ is equal to itself.

TCG-S [18] a general floorplan representation was introduced, through integrating the properties of TCG and SP for a faster $\mathrm{O}(n \log n)$ runtime packing scheme using a balanced-binary search tree [19]. Same perturbing algorithm is adopted in both TCG and TCG-S representations, only the packing scheme in TCG-S is faster. Sequence $\Gamma_{-}$is the topological order of $C_{h}$ and $C_{v}$ closure graphs and therefore can be determined by $C_{h}$ and $C_{v}$. Transparency of geometric relation between modules in placements and fast incremental update for cost realization are inherited from TCG. Furthermore, TCG-S shares the same feasibility properties with TCG. Given a floorplan, $\Gamma_{-}$can be derived by recursively extracting the module on the bottom-left corner of the placement as shown in Fig. 1.7. The run time of the extraction process is not indicated in [18]. $C_{h}$ and $C_{v}$ can be constructed based on $\Gamma_{-}$by constructing a directed edge from each node $b_{i}$ before $b_{j}$ in $\Gamma_{-}$ in $C_{h}\left(C_{v}\right)$ if $b_{i} \vdash b_{j}\left(b_{i} \perp b_{j}\right)$.


Figure 1.7. (a)-(f) Process to extract a $\Gamma_{-}$from block placement. (g) Resulting TCG-S.


Figure 1.8. A block placement with sequence $\Gamma_{-}\langle a, b, c\rangle$


Figure 1.9. Packing scheme for the TCG-S of Fig. 1.8. In each step, the red-black trees $T_{h}$ and, $T_{v}$ corresponding to the $R_{h}$ and $R_{v}$ right after the module insertion, are shown. $T_{h}^{\prime}\left(T_{v}^{\prime}\right)$ gives the resulting redblack tree after removing the modules no longer in $R_{h}\left(R_{v}\right)$ and performing rotation to balance the tree. Note that, as a fundamental property of the binary search tree, the search-tree (in-order traversal) order is still maintained after the tree rotation.

Lin et al. [18] proposed an $O(n \log n)$ time packing scheme using $\Gamma_{-}$and horizontal and vertical contours $R_{h}$ and $R_{v}$, where n is the number of modules in a placement. For each module in the sequence defined by $\Gamma_{-}$, the module is packed to a corner formed by two previously placed modules in $R_{h}$ or $R_{v}$ determined by the geometric relations defined by $C_{h}$ or $C_{v}$.

Definition: Horizontal contour $R_{h}$ and vertical contour $R_{v}$ are lists of modules $b_{i}$ 's in which there does not exist any module $b_{j}$ with $y_{j} \geq y_{i}^{\prime}, y_{j}^{\prime} \geq y_{i}^{\prime}$ and $x_{j} \geq x_{i}^{\prime}, x_{j}^{\prime} \geq x_{i}^{\prime}$ respectively.

The coordinates of the right and top boundaries modules in $R_{h}$ and $R_{v}$ are sorted and kept in a Red-Black search tree [19] $T_{h}$ and $T_{v}$ respectively. Module $b_{j}$ is packed by searching for the last module $b_{p}$, where $b_{p} \vdash b_{j}$ or $b_{p} \perp b_{j}$, in order to compute the xcoordinate or y-coordinate of $b_{j}$ according to the geometrical relation between modules $b_{p}$ and $b_{j}$. Module $b_{k}$ is traversed from its root to its right child if $b_{k} \vdash b_{j}\left(b_{k} \perp b_{j}\right)$, i.e. the right (top) boundary of module $b_{j}$ is larger than that of module $b_{k}$. Therefore, $b_{j}$ should be located in sub-tree of the search tree. The process alternates to the left child of $b_{k}$ if $b_{k} \perp$ $b_{j}\left(b_{k} \vdash b_{j}\right)$. Process continues until a leaf position is encountered and $b_{j}$ is then considered the leaf node. Fig. 9 shows an example of TCG-S packing scheme of Fig. 8 with sequence $\Gamma_{-}\langle\mathrm{abc}\rangle$.

For placement, [34], [35] and [37] used a feasible sequence pair representation to develop symmetry constraint-driven placement tool. In order to illustrate a sequence pair representation which is symmetrically feasible, one would be tempted to perform minor changes to the search space exploration: if the current encoding proves to be consistent with the symmetry constraints then the cost of the placement configuration is evaluated and the annealing algorithm operates normally. Otherwise, the current encoding is infeasible (in symmetry point of view) and therefore, disregarded. Unfortunately, such a simple solution is not effective taking into account that the size of the search space without symmetry constraints is $O\left(n^{2}\right)$ (the total number of sequence-pairs). The size of the solution space becomes significantly smaller if the placement configuration must contain
a symmetry group. A better strategy is to explore only those sequence-pairs which comply with the symmetry constraints; recognize such sequence pairs and efficiently restrict the annealer exploration only to their subspace.

Whereas, [38] used the SP to tackle the placement problem with boundary constraints. A new constraint-driven placement approach is adopted in [36] based on constraints extraction via topology and signal flow analysis. Constraints are classified according to their critical levels and flexibility. The least flexible constraints has the highest priority in the optimization process.

In high performance circuits, it is required to places groups of devices symmetrically with respect to each other. The reason is to match the layout-induced parasitics and mechanical stresses in fabrication process within the symmetric groups. Failing to meet matching constraints lead to different values of parasitic resistances and capacitances at the differential output node. Such parasitic mismatch leads to higher offset voltage at the input differential pair and hence, lower gain and common mode rejection ratio. Balasa et al. [34] proposed a method to realize and handle symmetry constraints in block placement problem using sequence pair representation. Only the symmetry-feasible sequence pairs are explored, then passed to the annealer for area optimization.

Dong et al. [36] proposed a new constraint-driven placement technique for analog integrated circuits, where constraint are prioritized according to their critical levels. Such classification facilitates the search for better placement solution by reducing devices mismatch and critical paths parasitics indicated by the extracted constraints. Circuit constraints are extracted according to the topology and the signal flow analysis combined with heuristic knowledge of analog design. Symmetry and matching constraints are extracted using isomorphism graphs by primitive cell recognition in signal flow analysis. Constraints priorities are assigned values indicating their critical effect on performance of analog circuit, e.g., differential pair has a higher priority than other symmetry constraints. The objective function includes area, wirelength and critical path minimization using less flexible first algorithm (LFF).

Placement congestion problem is handled in previous literature [39-45] by employing routing-aware algorithms in the context of placement problem to guarantee the reliability and routability of the optimal placement solution. Constraints driven placement optimization are greedy and results in a compact placement solution, where its feasibility is questionable in terms of the reliability and the routability of the placed modules. In order to make the solution feasible, highly net-congested devices should be separated to create free spaces for successful routing. One approach [44] is to expand devices with high net congestion during placement and then release them to create routing channels. A probabilistic model is used in order to determine which devices need to expand and the corresponding expandable levels.

Operational amplifier is one of the most fundamental components in analog integrated circuit design. One of the essential tasks is to provide a high-performance opamp with high gain and bandwidth, and fast settling time. High-speed opamps use only one stage to reduce devices parasitics in order to achieve higher bandwidth. Telescopic opamps and folded cascode opamps are commonly used for this purpose.

The aim of this research is to present an optimized framework for operational amplifiers coupling both circuit design, accounting for process variation, and layout. Automated layout process includes floorplan design empowering area minimization, device placement accounting for symmetry constraints, and optimization-based transistor-level routing. Hence, assist in the introduction of the concept of optimized standard-cell, which is wellestablished in the digital flow, in the analog circuit design.

## 2.Circuit level synthesis

### 2.1. Folded Cascode OTA

### 2.1.1. Introduction

Folded cascode operational transconductance amplifier (OTA) is one of the most used topologies in analog circuits. It is a one stage amplifier since it has only one high impedance node at the output. It is considered as a self-compensated OTA due to the high output impedance. The compensation is usually achieved by the load capacitance, thus as the load capacitance becomes larger the operational amplifier becomes more stable but this comes at the expense of a lower bandwidth. Folded cascode OTA provides higher swing compared to the telescopic OTA as the input differential pair is in a separate branch making the output swing only limited by the overdrive voltage of four transistors instead of five, the case of telescopic OTA.

### 2.1.2. Basic Operation

The theory behind the folded cascode amplifier is to apply cascode transistors to the input differential pair and use complementary type of devices, converting applied input voltages to current and apply the result to a common gate stage. Fig. 2.1 shows the schematic of the folded cascode OTA. The static current consumption equation is given by:
$I_{\text {Tot. }}=2 * I_{3}+I_{\text {bias }}+I_{\text {CMFB }}$
The resistance at the output node can be calculated by:
$R_{\text {out }}=R_{\text {down }} / / R_{\text {up }} \approx\left(\mathrm{gm}_{5} \mathrm{ro}_{5}\left(\mathrm{ro}_{3} / / \mathrm{ro}_{1}\right)\right) / /\left(\mathrm{gm}_{7} \mathrm{ro}_{7} \mathrm{ro}_{9}\right)$
Therefore, the DC gain can of the amplifier is given by:
$A_{v}=G m *$ Rout $=-g m_{1}\left(\left(g m_{5} r o_{5}\left(r o_{3} / / r o_{1}\right)\right) / /\left(g m_{7} r o_{7} r o_{9}\right)\right)$
Output swing which is difference between Vout $\max$ and Vout $_{\min }$ is calculated as follows:
Vout $_{\text {max }}=\min \left(V_{b, 1}+V_{t h, 7} \quad V_{d d}-V_{o d, 9}-V_{o d, 7}\right)$
Vout $_{\text {min }}=\max \left(V_{o d, 3}+V_{o d, 5} \quad V_{b, 2}-V_{t h, 5}\right)$


Figure 2.1. Folded cascode OTA circuit diagram
Vout $_{\text {min }}$ and Vout $_{\max }$ are determined according to dc bias of the circuit, the maximum output voltage swing is achieved by the condition:
$V_{b, 2} \leq V_{o d, 3}+V_{o d, 5}+V_{t h, 5}$
$V_{b, 1} \geq V_{d d}-V_{o d, 7}-V_{o d, 9}-V_{t h, 7}$

Therefore, the absolute maximum output voltage swing is given by:
Swing $_{o u t}=V_{d d}-\left(V_{o d, 3}+V_{o d, 5}+V_{o d, 7}+V_{o d, 9}\right)$
Input common mode range which is the difference between $\operatorname{Vin}_{C M, \max }$ and $\operatorname{Vin}_{C M, \text { min }}$ is calculated as follows:
$\operatorname{Vin}_{C M, \text { min }}=0$
$\operatorname{Vin}_{C M, \max }=V_{d d}-\left(V_{o d, C s}+V_{g s, 1}\right)$

Since the input differential pair are PMOS, input common mode voltage level can be lowered to 0 v without entering cut-off region of PMOS devices.

The maximum input common mode range is given by:
$C_{M R a n g e}^{i n p u t}=$ Vin $_{C M, \max }-$ Vin $_{C M, \min }=V_{d d}-\left(V_{o d, C s}+V_{g s, 1}\right)$
The unity gain frequency is calculated as follows:
$f_{u} \approx \frac{g m_{1}}{2 * \pi * C_{\text {out }}}$
Bandwidth of the OTA, which represents its dominant pole, can be approximated by:
$B W=f_{p d}=\frac{f_{u}}{A_{v}} \approx \frac{1}{2 * \pi * R_{\text {out }} * C_{\text {out }}}$
Where,
$C_{o u t}=C_{L}+C_{g d, 5}+C_{g d, 7}+C_{d b, 5}+C_{d b, 7}$
The first non-dominant pole is calculated by:
$f_{\text {pnd, } 1}=\frac{1}{2 * \pi * R_{\text {Fnode }} * C_{\text {Fnode }}}$
Where,
$R_{\text {Fnode }}=\left(r o_{1} / / r o_{3}\right) / /\left(\frac{1}{1+\left(g m_{5}+g m b_{5}\right)} *\left(1+\frac{g m_{7} r o_{7} r o_{9}}{r o_{5}}\right)\right) \approx \frac{1}{g m_{5}}$
$C_{F n o d e}=C_{g s, 5}+C_{g d, 3}+C_{g d, 1}+C_{d b, 3}+C_{d b, 1}$
Therefore, $f_{p n d, 1}$ can be approximated by:
$f_{p n d, 1}=\frac{g m_{5}}{2 * \pi * C_{\text {Fnode }}}$
The second non-dominant pole at node X is calculated by:
$f_{p n d, 2}=\frac{1}{2 * \pi * R_{X} * C_{X}}$
Where,

$$
\begin{equation*}
R_{X}=r o_{9} / /\left(\frac{1}{1+\left(g m_{7}+g m b_{7}\right)} *\left(1+\frac{R_{Y}}{r o_{7}}\right)\right) \tag{2.20}
\end{equation*}
$$

$$
\begin{align*}
& R_{Y}=g m_{5} r o_{5}\left(r o_{3} / / r o_{1}\right)  \tag{2.21}\\
& C_{X}=C_{g s, 7}+C_{g d, 9} \tag{2.22}
\end{align*}
$$

Therefore, $f_{p n d, 2}$ can be approximated by:
$f_{p n d, 2}=\frac{g m_{7}}{2 \pi C_{X}}$
The Phase margin, which determines the stability of the OTA, is given by:

$$
\begin{equation*}
P M=180-\tan ^{-1}\left(\frac{f_{u}}{f_{p d}}\right)-\tan ^{-1}\left(\frac{f_{u}}{f_{p n d, 1}}\right)-\tan ^{-1}\left(\frac{f_{u}}{f_{p n d .2}}\right) \tag{2.24}
\end{equation*}
$$



Figure 2.2. Common Mode FeedBack Circuit
The output voltage level of the amplifier is determined by the common mode level of the input differential signals. Since the output node is characterized by high impedance, it is hard to adjust the DC level of output. A negative feedback system is required to adjust the voltage at the output so that output current is the same at both sides of tail transistors. The output common mode of the amplifier is sensed by connecting them to a gate of sense transistors which are part of the CMFB circuits shown in Fig. 2.2.

### 2.1.4. Bias Circuit

Voltage biasing results in large current variations because of the process variations. Current biasing keeps the current constant in the device and independent of process variations. A simple current mirror has a low output impedance implying large variations in the mirrored current. A cascode current mirror is required to increase the output impedance and reduce the variations in DC output current, since the variations in the output voltage is reduced. Therefore, the current will be exactly mirrored the same to output transistor. Cascode current mirror circuit shown in Fig. 2.3 is used to simplify the design flow. The aspect ratio of the devices are chosen such that the sizing in both branches are related to the current by Eq. (2.26) and (2.27). PMOS devices are required to mirror the current to the input current source device in input stage, cascode load, and CMFB circuit. Currents supplied by the bias circuit to the OTA are adjusted by sizing's ratio between the mirrored devices.
$\frac{(W / L)_{3}}{(W / L)_{0}}=\frac{(W / L)_{2}}{(W / L)_{1}}$
$\frac{I_{2}}{I_{1}}=\frac{(W / L)_{2}}{(W / L)_{1}}$

### 2.1.5. Advantages

- Large gain due to high output resistance.
- Moderate output swing.
- Can be used a unity gain buffer as output swing is relatively higher than other amplifier architectures, e.g. telescopic cascode.
- Higher bandwidth compared to telescopic cascode amplifier due to lower impedance at the output node.


### 2.1.6. Disadvantages

- Large power dissipation compared to telescopic and two stage miller compensated amplifiers.
- Lower phase margin compared to telescopic cascode amplifier due to higher capacitance value at folding node.


Figure 2.3. Folded Cascode OPAmp Bias circuit

### 2.2. Sensitivity Analysis

The high number of parameters in analog circuit complex models constitute a significant problem in their design, since the parameter estimation becomes a high dimensional, multi-modal and predominantly a non-linear problem. Approaches are adopted to resolve the problem by implementing a wide range of optimization algorithms, which are neither feasible nor efficient in determining the performance dominating parameters in the non-monotonically, multi-dimension design space. A sensitivity analysis facilitates the search for the most influential parameters in the circuit, allowing the reduction of total number of parameter in the optimization process, or quantify some interactions effects between input parameter within the circuit model. Sensitivity analysis (SA) tools are of immeasurable value, allowing the study of how the uncertainty in model output can be apportioned to difference source of uncertainties represented in the model inputs. SA has a wide scope of usage and applications; model understanding, verification, simplifying models and prioritization of model parameters.

Definition of sensitivity analysis involves models, inputs and outputs. In order to define model input with respect to uncertainty and the sensitivity analysis, a model can be classified into:

- Diagnostic or prognostic: in which the model can be used to understand a law or in predicting the behavior of the system given an understandable law. Models thus can range from speculations to accurately predicting a system.
- Data-driven or law-driven: A law-driven model puts together trusted laws which have been attributed to the system, in order to predict its behavior. A data-driven model treats the components of a system as a signal and derives its properties statistically. Law-driven models have the higher capacity to understand system behavior under unobserved circumstances. Whereas, data-driven models is only limited to the behavior associated with data in their estimations.

Definition of model input depends on the model under study. In order to have an acceptable grasp of the uncertainty principle and sensitivity analysis, model input is defined as any parameters that derives variations in the model output.


Figure 2.4 Parameric bootstrap version of uncertainty and sensitivity analysis
Consider the flow chart in Fig. 2.4. At the end of the estimation step, parameter values as well as their error are known. The model is considered true and uncertainty analysis is performed through propagating uncertainty in the parameters of the model, all the way to the model output. From uncertainty analysis, the average output and standard deviation is computed. This analysis can be repeated with sufficiently large number of parameters variations, hence it is called 'parametric bootstrap'. It is a process of repeatedly propagating the uncertainty in the parameters through the model, each iteration computing the average output and the standard deviation, in order to increase the accuracy of the output values and hence reduce errors. Sensitivity analysis is then performed to determine which of the input parameters are more important in influencing the uncertainty of the model output. It
is of high significance that objectives and input parameters for uncertainty and sensitivity analysis are carefully selected. The more parameters considered as input, the greater and the more accurate a variance to be expected in the model output.

### 2.2.1. Classification of sensitivity analysis

Sensitivity analysis can serve a number of useful purposes in modelling. It can uncover errors in the model, establish priorities for research, and simplify models. SA can be categorized into two approaches; local and global analysis. Local analysis studies the small input perturbations on the model output which occur around nominal values, e.g., the mean of an input variable. Local SA is considered a deterministic approach, where output variations due these small perturbation are obtained by computing the partial derivative of the model at a certain point. Derivative-based approach has the advantage of being efficient in terms of run time. The model needs to be executed few times according to the dimension of the array of derivatives. However, the failing part of this approach is that it is unaware if the model input is uncertain or if the model is of unknown linearity. Derivatives are only informative around the nominal value where they are computed and hence, do not provide for any exploration of the rest of the space of the input parameters. Such disadvantage has a minor effect or even no effect for linear systems, however, it matters greatly knowing that the system is non-linear and non-monotonic.

The very basic definition of sensitivity Index (SI) is given by:
$S I_{i}$
$=\frac{y_{i}^{\max }-y_{i}^{\text {min }}}{y_{i}^{\max }}$

Where $y_{i}^{\text {max }}$ is the maximum between $\mathrm{y}\left(x_{i}^{\min }\right)$ and $\mathrm{y}\left(x_{i}^{\max }\right)$, and $\mathrm{y}\left(x_{i}\right)$ is computed at nominal value $x_{0}$. Variable $x_{i}$ is moved one-at-a-time (OAT) to its respective $x_{i}^{\max }$ and $x_{i}^{\min }$.

### 2.2.2. Local Sensitivity analysis

According to local sensitivity analysis, a simple calculation of sensitivity of $f(x)$ can be given considering second order Taylor series, is given by:
$f\left(x_{0}+\Delta\right)=f\left(x_{0}\right)+\sum_{i=1}^{k} \frac{\partial f\left(x_{0}\right)}{\partial x_{i}} \Delta_{i}+\frac{1}{2} \sum_{i=1}^{k} \sum_{j=1}^{k} \frac{\partial^{2} f\left(x_{0}\right)}{\partial x_{i} \partial x_{j}} \Delta_{i} \Delta_{j}$

Using the OAT approach realizing $\mathrm{k}+1$ runs, a finite difference approximation to the first order local sensitivity can be computed as follows:

$$
\begin{equation*}
\frac{\partial f}{\partial x_{i}} \cong \frac{y\left(x_{0, i}+\Delta_{i}\right)-y\left(x_{0, i}\right)}{\Delta_{i}} \tag{2.29}
\end{equation*}
$$

For uncorrelated inputs variables, expectation vector and the variance of the function $f(x)$ is defined as:
$E(Y)=f\left(x_{0}\right)$
and
$\operatorname{var}(Y)=\sum_{i=1}^{k}\left[\frac{\partial f\left(x_{0}\right)}{\partial x_{i}}\right]^{2} \cdot \operatorname{var}\left(x_{i}\right)$
In order to overcome the large limitation of local SA, which only considers local variations accompanied with limited range linearity calculations, global SA has been introduced in a statistical framework. Global SA considers the whole range of variations of input variables, therefore, can be used in the study of models in order to identify and prioritize the most influential inputs parameters, identify non-influential parameters which has a very minor effect on the output uncertainty in order to be fixed during design space exploration. Global SA can also be used to map the output behavior in function of input variables by focusing on certain range of inputs values, and the calibration and validation of model equations. The aim of this section is to provide a review on global sensitivity analysis which is one of the techniques in ANOVA family.

### 2.2.3. Global Sensitivity analysis

### 2.2.3.1. Regression-based correlation analysis

The correlation coefficient designate the strength and direction of a linear relationship between two random variables. The best known coefficient is the Pearson product-moment correlation coefficient, which calculated by dividing the covariance of the two variables by the product of their standard deviations. Pearson correlation coefficient is defined as:

$$
\begin{equation*}
\rho_{X, Y}=\frac{E(X, Y)-E(X) E(Y)}{\sqrt{E\left(X^{2}\right)-E^{2}(X)} \sqrt{E\left(Y^{2}\right)-E^{2}(Y)}} \tag{2.32}
\end{equation*}
$$

Combining MonteCarlo simulation, Person correlation coefficient is given by:
$r_{x, y}=\frac{\sum_{i=1}^{N}\left(x_{i}-\bar{x}\right)\left(y_{i}-\bar{y}\right)}{\sqrt{\sum_{i=1}^{N}\left(x_{i}-\bar{x}\right)^{2} \sum_{i=1}^{N}\left(y_{i}-\bar{y}\right)^{2}}}$

Where $\bar{x}$ is the mean value of $x_{i}$ and $\bar{y}$ is the mean value of $y_{i}$. Correlation coefficients values range in the interval $[-1,1]$, where 0 indicates a linear relationship and $(-1,1)$ indicate a strong relationship between random variables under study. Consider a variable $Y$ dependent upon number of variables $X=\left(X_{1}, X_{2}, X_{3}, \ldots, X_{n}\right)$, hence the correlation coefficient can be used as a sensitivity measure.
$S_{i}=\rho_{X, Y}$

The correlation is powerful measure to summarize linear relationships between variables. However, in case of non-linearity it may lead to wrong conclusions. Hence, a correlation analysis cannot replace individual examination of data.

Pearson correlation coefficient is combined with regression coefficient obtained by linear regression analysis. Regression analysis indicates the strength and direction of a relationship between two random variables $X$ and $Y$ as well. Random variable is defined to to be dependent and modeled as a function of an independent variable, its parameters, and a random error term. In linear regression, in order to model $n$ date points there is one independent variable $x_{i}$, two parameters a and b and an error term $\varepsilon_{i}$.

$$
\begin{equation*}
y_{i}=a+b x_{i}+\varepsilon_{i} \tag{2.35}
\end{equation*}
$$

In order to compute $a$ and $b$, least square method is used as follows:
$\hat{a}=\bar{y}-\hat{b} \bar{x}$
$\hat{b}=\frac{\sum_{i=1}^{N}\left(x_{i}-\bar{x}\right)\left(y_{i}-\bar{y}\right)}{\sum_{i=1}^{N}\left(x_{i}-\bar{x}\right)^{2}}$

The interrelation between linear regression and Pearson correlation coefficient is defined by
$\hat{b}=r_{x, y} \frac{S_{y}}{S_{x}}$
Where $S_{y}$ and $S_{x}$ are the standard deviation of the $n$ data points.

The proportion of variability in the data processed by the linear regression is defined by the coefficient of determination $R_{x, y}^{2}$. The variability of date is measured by computing the residuals as follows:
$\hat{u}_{i}=y_{i}-\left(\hat{a}+\hat{b} x_{i}\right)$

Hence, coefficient of determination $R_{x, y}^{2}$ can be calculated as follows:
$R_{x, y}^{2}=1-\frac{\sum_{i=1}^{N} \hat{u}^{2}{ }_{i}}{\sum_{i=1}^{N}\left(y_{i}-\hat{y}\right)^{2}}$
Where $R_{x, y}^{2}$ is the square of the Pearson correlation coefficient $r_{x, y}$, in case of linear regression.

### 2.2.3.2. Variance-based approaches

The models under study are described by a function $Y=f(X)$, where $X=$ $\left(X_{1}, X_{2}, X_{3}, \ldots, X_{n}\right)$ and $X$ is a random input vector consisting of $n$ random variables. $Y=$ $\left(Y_{1}, Y_{2}, Y_{3}, \ldots, Y_{m}\right)$ denotes the random output vector functions of random variables. $f(X)$ can be decomposed into summands of increasing order components.
$f(X)=f_{0}+\sum_{i=1}^{n} f_{1}\left(X_{1}\right)+\sum_{1 \leq i \leq j \leq n} f_{i, j}\left(X_{i}, X_{j}\right)+\cdots+f_{1,2, \ldots, n}\left(X_{1}, \ldots X_{n}\right)$
Each random model response $Y_{j}$, where $\mathrm{j}=1,2, \ldots \mathrm{~m}$, can be characterized by its variance $V^{j}$. Each variance $V^{j}$ is decomposed into partial variances corresponding to the single random input variables $X_{1}, X_{2}, X_{3}, \ldots, X_{n}$ according to equation (2.42), and to relate each partial variance to a single sensitivity measure according to equation (2.43):
$V^{j}=\sum_{i=1}^{n} V_{i}{ }^{j}+\sum_{1 \leq i \leq k \leq n} V_{i, k}{ }^{j}+\cdots+V_{1,2, \ldots n}{ }^{j}$
$S_{i_{1}, \ldots, i_{s}}=\frac{V_{i_{1}, \ldots, i_{s}}^{j}}{V^{j}}$ where $1<i_{1}<i_{2}<i_{3} \ldots<i_{s} \leq n$

Each of the sensitivity measures calculated by equation (11) describes which amount of each variance $V^{j}$ is generated due to the randomness of the associated random input variables and their mapping onto the output variables. As special case the sensitivity measures $S_{i}^{j}$ describing the sole influence of the single input variables $X_{i}$ are called the main effects. Whereas, sensitivity measures $S_{i_{1}, \ldots, i_{s}}$ describing the influence of combinations of input variables are denoted as interaction effects.

All partial sensitivity indices $S_{i}^{j}$ are summed up to the total sensitivity measure $S_{T i}^{j}$ in order to evaluate the total effect of the single input variable $X_{i}$. Hence, the total sensitivity measures consider the interactions among input variables. In order to quantify which amount of each variance $V^{j}$ is generated due to a single input variable $X_{i}$, the corresponding total sensitivity measure $S_{T i}^{j}$ can be normalized as follows:

$$
\begin{equation*}
\operatorname{norm}\left(S_{T i}^{j}\right)=\frac{S_{T i}^{j}}{\sum_{k=1}^{n} S_{T k}^{j}} \tag{2.44}
\end{equation*}
$$

### 2.3. Overview of OASYN framework

Figure 2.5 shows an overview of the OASYN framework. The tool acquires a topology from two well-known operational amplifiers structures; the Folded Cascode and the Two Stage Miller compensated amplifiers, according to designer preferences, along with required specification, e.g. gain, GBW, phase margin, output swing, slew rate, load capacitor, technology node, input common mode voltage level, and maximum power consumption. The tool acquires connectivity electrical constraints, e.g. max current density information in each circuit net, and matching constraints for device group placement along with matching styles. Circuit synthesizer generates a rough initial estimate sizing based on the analytical circuit equations. Then, the tool undergoes sensitivity analysis employing Sobol indices in the circuit sizing optimization engine, and a Pareto-optimal set is generated for immediate translation of specs to fully sized topology. To the authors' knowledge, this is the first work that examines the whole design space through sensitivity analysis in order to account for uncertainty of the non-linear behavior of analog circuits, by quantifying higher order interactions between parameters of the circuit taking into consideration extreme eprocess, supply, and temperature variations.


Figure 2.5. Overview of the OASYN framework

Layout generator tool consists of three main processes. First, a fixed-outline floorplanner employing multi-objective optimization on area and wirelength, accounting for block placement matching constraints, is implemented. This paper proposes a new, simple, efficient, and fast floorplan solution perturbing algorithm with $\mathrm{O}(\mathrm{n})$ runtime complexity, for fast realization of incremental update for cost evaluation, called TCG-S*. The algorithm integrates the advantages of TCG and SP representations, and eliminates their disadvantages, into a superior topology update scheme which facilitates the search for optimal desired floorplan.

In order to enhance routability and reliability of the packed optimal placement solution, a routing-aware algorithm is implemented within the placement process contemplating the congestion problem, smoothing the densities between placed blocks and preserving the relative location of the modules. An annealing-based detailed net routing is then executed to generate a free DRC layout.

### 2.4. Circuit sizing tool

The main purpose of the sensitivity analysis is to determine the most influential model parameters affecting a model response. Hence, reduce the computational complexity in optimization. Local and global analysis are major constituents of sensitivity analysis. The high priority parameters in one part of design space may not be the same in another, highlighting the importance of global SA. In addition, importance of a subset of variables may be subject to the interactions between these variables rather than the sum of the individual variables importance. Sensitivity analysis based optimization is employed in previous works [5], [7], [8], [46], and [47]. Variance-based Sobol method efficiently quantifies synergic effects along with uncertainties in the model input and their effect on the model output.

### 2.4.1. The Sobol' Sensitivity Analysis

The Sobol' decomposition [51, 52] is one of the family of ANOVA techniques. The Interaction of two or more parameters are denoted as Sobol' indices. The function $\mathrm{F}(\xi)$ of a set of input variables $\xi_{i}$, where $\Omega_{d}$ is a dimensional range and d is the total number of input variables, is defined by

$$
\begin{equation*}
F(\xi)=\sum_{u \subseteq(1.2, \ldots d)} F_{u}\left(\xi_{u}\right) \tag{2.45}
\end{equation*}
$$

Where $u$ is a set of integers, $\xi_{u}=\left(\xi_{u_{1}}, \ldots, \xi_{u_{s}}\right)$ and $\mathrm{s}=|u|$. In order to calculate the effect of certain input variables on the output uncertainty, $u$ represents these sets of variables as a subset of the whole variables set, presented in Eq. ( 2.45 ), as will be shown later in the section. Eq. ( 2.45 ) is decomposed as follows:

$$
\begin{align*}
F\left(\xi_{u}\right)= & F_{0}+\sum_{1 \leq i \leq d} F_{u_{i}}\left(\xi_{u_{i}}\right)+\sum_{1 \leq i \leq j \leq d} F_{u_{i j}}\left(\xi_{u_{i}}, \xi_{u_{j}}\right)+. . \\
& +F_{u_{12} \ldots d}\left(\xi_{u_{1}}, \ldots, \xi_{u_{d}}\right) \tag{2.46}
\end{align*}
$$

In this expansion, the individual terms can be calculated according to
$F_{0}=\int_{\Omega^{d}} F(\xi) d \xi$
$F_{u}\left(\xi_{u}\right)=\int_{\Omega^{d-\jmath u\}}} F\left(\xi_{u}\right) d \xi_{\sim u}-\sum_{\substack{v \subset u \\ v \neq u}} F_{v}\left(\xi_{v}\right)$
Where $\xi_{\sim u}$ is $\xi$ with set $u$ excluded
$\xi_{\sim(b)}=\left(\xi_{1}, \ldots, \xi_{b-1}, \xi_{b+1}, \ldots, \xi_{d}\right)$
Equation (2.50) defines the total variance of the output function $\mathrm{F}(\xi)$, denoted by D. $D_{u}$ denotes the partial output variance in response to a set of input variables.
$D=\int_{\Omega^{d}} F^{2}(\xi) d \xi-F_{0}^{2}$
$D_{u}=\int_{\Omega^{|u|}} F_{u}^{2}\left(\xi_{u}\right) d \xi_{u}$
$D_{u}$ can be represented as recursive function of conditional variances:
$D_{u}=V\left(E\left[y \mid \xi_{u}\right]\right)-\sum_{\substack{v \in u \\ v \neq u \\ v \neq 0}} D_{v}$
And therefore, D can be represented as the summation of the variances $D_{u}$ :
$D=\sum_{\substack{u \subseteq\{1,2, \ldots d\} \\ u \neq 0}} D_{u}$
$D_{u}$ measures the variance of output $F(\xi)$ according to the interaction between elements of $u$, subtracting the individual effect of elements $v \subset u$. The Sobol' sensitivity indices can be calculated by:
$S_{u}=\frac{D_{u}}{D}$
$\sum_{\substack{u \subseteq\{1,2, \ldots, d\} \\ u \neq 0}} S_{u}=1$
Where $S_{u}$ measures the sensitivity of $F(\xi)$ by the interaction of elements of $u$, excluding the effect each variable separately have on output function variance. There are $2^{d}-1$ sensitivity indices required to be calculated in order to determine the most significant design parameters.

### 2.4.2. Computation of Sobol' Indices by Monte-Carlo Sampling

Calculation of the variances using integrals is extensive process since circuit model equations are complex and non-linear. Therefore, a sample set of $n$ realizations of input variables $\xi_{u}$ is considered to calculate the average $\mathrm{E}[\mathrm{y}]$ and the variance D .
$D=E\left[y^{2}\right]-E[y]^{2}$
According to Eq. (2.47) and (2.56), the sampled estimates of $F_{0}$ and D are:
$\widehat{F}_{0}=\frac{1}{n} \sum_{i=1}^{n} F\left(\xi^{(i)}\right)$
$\widehat{D}=\frac{1}{n} \sum_{i=1}^{n} F^{2}\left(\xi^{(i)}\right)-\hat{F}_{0}{ }^{2}$
According to Eq. (2.52), Estimate of $D_{u}$ can be calculated by finding an expression for the conditional variance estimate as follows:

$$
\begin{align*}
V\left(E\left[y \mid \xi_{u}\right]\right)= & E\left[E\left[y \mid \xi_{u}\right]^{2}\right]-E\left[E\left[y \mid \xi_{u}\right]\right]^{2}=E\left[E\left[y \mid \xi_{u}\right]^{2}\right]-E[y]^{2} \\
& \approx \frac{1}{n} \sum_{i=1}^{n}\left(\frac{1}{n} \sum_{j=1}^{n} F\left(\xi_{\sim u}^{(j)}, \xi_{u}^{(i)}\right)\right)^{2}-F_{0}^{2} \tag{2.59}
\end{align*}
$$

However, time complexity of computing conditional variances is $O\left(n^{2}\right)$. Sobol [51] proposed a faster method to calculate the variances using Monte-Carlo sampling technique using two sample sets $\left.\xi^{(i)}\right|_{i=1} ^{n}$ and $\left.\eta^{(i)}\right|_{i=1} ^{n}$.

$$
\begin{gather*}
E\left[E\left[y \mid \xi_{u}\right]^{2}\right]=E\left[E\left[y \mid \xi_{u}\right] E\left[y \mid \xi_{u}\right]\right]=\int\left(\int F\left(\xi_{\sim u}, \xi_{u}\right) d \xi_{\sim u}\right) *\left(\int F\left(\xi_{\sim u}, \xi_{u}\right) d \xi_{-u}\right) \\
=\iiint F(\xi) F\left(\eta_{\sim u}, \xi_{u}\right) d \xi d \eta_{\sim u} \tag{2.60}
\end{gather*}
$$

Substituting Eq. (16) in Eq. (8), estimate of $D_{u}$ becomes:
$\widehat{D}_{u}=\frac{1}{n} \sum_{i=1} F\left(\xi^{(i)}\right) F\left(\xi_{u}^{(i)}\right)-\sum_{\substack{v \subset u \\ v \neq u}} \widehat{D}_{v}$
Where
$\left(\xi_{b}\right)_{u}^{(i)}=\left\{\begin{array}{lr}\xi_{b}^{(i)} & b \in u \\ \eta_{j}^{(i)} & \text { otherwise }\end{array}\right.$
Therefore,
$\widehat{D}_{\{b\}}=\frac{1}{n} \sum_{i=1} F\left(\xi_{1}^{(i)}, \ldots, \xi_{d}^{(i)}\right) * F\left(\eta_{1}^{(i)}, \ldots, \eta_{b-1}^{(i)}, \xi_{b}^{(i)}, \eta_{b+1}^{(i)}, \ldots, \eta_{d}^{(i)}\right)-\widehat{F}_{0}{ }^{2}$

### 2.4.3. Circuit Sizing Algorithm

Algorithm 2.1: Monte_Carlo( U $\xi$ _Sample $\eta_{-}$Sample )

1. // Initialize Sum with 0
2. FOR i 0 n-1 DO // no. of samples
3. FOR j 0 Var_NUM DO // total number of variables
4. 

$$
\text { SamList } 1=\operatorname{concat}\left\{\text { SamList } 1 \xi_{-} \text {Sample }[i, j]\right\} ;
$$

5. 

$$
\text { if }(\mathrm{j} \text { on } \mathrm{U}) \text { THEN }
$$

6. 

$$
\text { SamList2 = concat }\left\{\text { SamList2 } \xi_{-} \text {Sample }[i, j]\right\} ;
$$

7. 

$$
\text { ELSE SamList2 = concat }\left\{\text { SamList2 } \eta_{-} \text {Sample }[i, j]\right\} ;
$$

8. Out_Sim1 = SIMULATE(SamList1);
9. Out_Sim2 = SIMULATE(SamList2);
10. Sum $=$ Sum+Out_Sim1*Out_Sim2;
11. RETURN Sum/n;

Algorithm 2.2: Sobol_Decomp( List partial rest Result )

1. //Initialize partial , res, Result with nil
2. FOR i 0 length(List)-1 DO
3. $n=n t h(i$ List $)$;
4. rest $=$ List;
5. FOR j 0 i DO
6. rest $=\operatorname{REMOVE}(\mathrm{nth}(\mathrm{j}$ List $)$ rest $)$; // delete jth element
7. Result $=$ Sobol_Decomp (rest 0 concat $\{$ partial n\} nil concat $\{$ Result concat $\{$ partial n\} \} );
8. RETURN Result;

Algorithm 2.3: Sobol_Var( list(U) )

1. // calculate Variance D_U
2. FOR i 0 length(U)-1 DO
3. $\quad \mathrm{MC}=\operatorname{Monte-Carlo(nth(i~} \mathrm{U}) \xi_{-}$Sample $\eta_{-}$Sample $)$;
4. if $($ length $(\mathrm{i} \mathrm{U})==1$ THEN
5. 

D_U = D_U + MC - F_Avg;
6. ELSE
7. $\quad$ D_U $=$ D_U $+M C-\operatorname{Sobol\_ Var(~REMOVE(nth(i~U)~SobolDecomp(~nth(i~U)~nil~}$ nil nil ) - F_Avg;

## 8. RETURN D_U;

Optimization is done by computing Sobol' indices of all circuit parameters with equal weights. Each sensitivity index for a set of parameters $S_{u}$ measures the uncertainty of interactions of these parameters on circuit specs. In each iteration $i, 2^{d-1}$ number of indices $S_{u_{i}}$ are calculated constituting the combinations of parameters interactions in the set $u$. Let $S_{u_{i}}{ }^{G}$ denote the total sensitivity index for each specification per set of parameters $u$. In order to decide on the best parameters which contributes to the enhancement of circuit specifications, a cost function $S_{i}$ is to be determined. The cost function $S_{i}$ computes the highest effect of set of parameters $u$ on the all circuit specifications. The cost function $S_{i}$ for $m$ specifications (objectives) for each iteration $i$ is given by:

$$
\begin{equation*}
S_{i}=\max _{u}\left(\sum_{j=1}^{m} S_{u_{i}}^{j}\right) \tag{2.64}
\end{equation*}
$$

The algorithm rejects any solution that tends to change constraints outside the given their boundaries.

Since the problem deals with multi-objective function, in which optimal solution corresponding to each objective is not feasible, the goal is to find a Pareto-optimal set. The most significant parameters, which contribute to the highest output variances of output specs, are optimized to achieve a Pareto-frontier curve. Since the design space varies each step, Sobol' indices are computed in every iteration. If a Pareto-optimal solution is reached, the condition after which it is impossible to achieve higher spec without deteriorating others, a globally non-dominated solution is considered to be attained.

## 3.Layout Floorplan

Morata et al. (1996) introduced an elegant representation of block placement called sequence pair for general floorplans (SP). Like TCG and BSG, but unlike O-tree, B*tree, and CBL, SP is P-admissible. Unlike SP, TCG supports incremental update during operation and keeps the information of the boundary modules as well as their relative positions in the representation. Block placement algorithms that are based on SP use heuristic optimization algorithms, e.g., simulated annealing where generation of large number of sequence pairs are required. Therefore a fast algorithm is needed to generate sequence pairs after each solution perturbation.

### 3.1. Comments on TCG-S Representation

Lin et al. proposed a representation which uses the horizontal and vertical transitive closure graphs as well as $\Gamma_{-}$of SP to represent a placement. Based on $\Gamma_{-}$as well as horizontal and vertical contours $\mathrm{R}_{h}$ and $\mathrm{R}_{v}, \mathrm{O}(\mathrm{n} \log \mathrm{n})$ time packing scheme is obtained by sorting and keeping the coordinates of the right (top) boundaries of module in the search order of the Red-Black tree $\mathrm{T}_{h}\left(\mathrm{~T}_{v}\right)$ [19]. An $\mathrm{O}(\mathrm{n})$ runtime packing sequence update was proposed during solution perturbation. The topological ordering of $\mathrm{C}_{h}$ and $\mathrm{C}_{v}$ as well as sequence $\Gamma_{-}$are required to be changed to conform with the new placement under each of the four operations; rotation, swap, reverse, and move.

Although the three feasibility properties of TCG mentioned in [12] were maintained, they are not sufficient to guarantee an updated configuration of TCG graphs and $\Gamma_{-}$sequence which exactly corresponds to the new placement after each solution perturbation. The TCG-S tuples update algorithm would only be sufficient if the modules subjected to one of the four operations have exactly the same width and length. However, such condition may be satisfied for special constraint placement, e.g., proximity, interdigitated, and common centroid symmetry constraints. The algorithm proposed did not consider geometry of the modules with respect to each other during operations. Therefore, may result in discrepancies between horizontal (vertical) geometric relations of the modules and
the ones designated by $\mathrm{C}_{h}\left(\mathrm{C}_{v}\right)$. Post perturbation on modules $\mathrm{b}_{i}$ and $\mathrm{b}_{j}, \mathrm{~b}_{j} \perp \mathrm{~b}_{k}$ may accidently be updated as $\mathrm{b}_{j} \vdash \mathrm{~b}_{k}$ according to the geometric relation between $\mathrm{b}_{i}$ and $\mathrm{b}_{k}$. Also $\mathrm{b}_{k} \perp \mathrm{~b}_{j}$ through $\mathrm{b}_{i}$ will not be updated in $\mathrm{C}_{v}$ upon swapping $\mathrm{b}_{i}$ and $\mathrm{b}_{j}$. Edge $\left(\mathrm{b}_{k}, \mathrm{~b}_{j}\right)$ in $C_{v}$ will not be deleted and hence, the packing sequence $\Gamma_{-}$will also be incongruously updated. The mismatch between TCG-S representation and its placement will not only lead to non-optimal solution after a series of operations, it may also generate overlapping modules leading to infeasible solution.

In this section, limitations of TCG-S tuples update algorithm are discussed for each operation. Effect of such discrepancy between representation and its corresponding placement on the packing evaluation along with the convergence to the optimal solution will be outlined. Furthermore, a new simple and efficient $O(n)$ runtime algorithm for fast realization of incremental update for cost evaluation. The algorithm integrates SP and TCG advantages into TCG-S* a superior topology update scheme which facilitates the search for optimum desired floorplan. Experiments show that TCG-S* is better than existing works in terms of area utilization, stability, and convergence speed.

### 3.1.1. Update of Constraints graph


(a) initial configuration of TCG

(b) rotate module d


Figure 3.1. Three types of perturbations. (a) The initial TCG ( $C_{h}$ and $C_{v}$ ) and the placement. Dimensions for the six blocks are: $a(6 \times 4), b(4 \times 6), c(7 x 4), d(6 \times 3)$, $e(3 x 2)$, and $f(3 \times 3)$. (b) The resulting TCG and placement after rotating module d based on TCG-S. (c) The resulting TCG and placement after reversing nodes $n_{c}$ and $n_{e}$ based on TCG-S. (d) The resulting TCG and placement after swapping nodes $n_{c}$ and $n_{d}$ based on TCG-S.

Figure 3.1(a) shows the initial configuration of TCG and its corresponding placement. Module d is rotated as shown in Fig. 3.1(b) and, according to TCG-S, only the weights of the corresponding node d in $\mathrm{C}_{h}$ and $\mathrm{C}_{v}$ are exchanged. Although such an operation has O (1) runtime complexity, it did change the topology of the $\mathrm{C}_{h}$ and $\mathrm{C}_{v}$, prompting a mismatch between TCG and the corresponding placement. Placement shows that edge ( $\mathrm{n}_{d}, \mathrm{n}_{f}$ ) should be deleted from $\mathrm{C}_{h}$ and a new edge ( $\mathrm{n}_{f}, \mathrm{n}_{d}$ ) is to be drawn from node f to node d in $\mathrm{C}_{v}$.

Figure 3.1(c) shows a reverse operation between two modules cand e. Reverse operation involves reversing the direction of a reduction edge ( $\mathrm{n}_{c}, \mathrm{n}_{e}$ ) in a transitive closure
graph, which corresponds to deleting edge $\left(\mathrm{n}_{c}, \mathrm{n}_{e}\right)$, adding a new edge $\left(\mathrm{n}_{e}, \mathrm{n}_{c}\right)$ in the same transitive closure graph $\mathrm{C}_{v}$. According to TCG-S, for each node $\mathrm{n}_{k} \in \operatorname{fanin}\left(n_{e}\right) \cup\left\{\mathrm{n}_{e}\right\}$ and $\mathrm{n}_{l} \in \operatorname{fanout}\left(n_{c}\right) \cup\left\{\mathrm{n}_{c}\right\}$ in the new graph, the edge $\left(\mathrm{n}_{k}, \mathrm{n}_{l}\right)$ is to be added to the graph and the corresponding edges $\left(\mathrm{n}_{k}, \mathrm{n}_{l}\right)\left(\operatorname{or}\left(\mathrm{n}_{l}, \mathrm{n}_{k}\right)\right)$ is to be deleted in the other transitive closure graph to maintain the TCG. Therefore, for each node $\mathrm{n}_{k} \in\{\mathrm{a}, \mathrm{b}, \mathrm{e}\}$ and $\mathrm{n}_{l} \in\{\mathrm{c}\}$, edge $\left(\mathrm{n}_{k}\right.$, $\mathrm{n}_{l}$ ) is checked whether it exists in $\mathrm{C}_{v}$. Since all the edges already exists except ( $\mathrm{n}_{e}, \mathrm{n}_{c}$ ), nothing is changed. Geometric relation between module b and module e has changed as shown in the placement. Prior the reverse operation, $\mathrm{b}_{b} \perp \mathrm{~b}_{e}$, whereas post the reverse, $\mathrm{b}_{e}$ $\vdash \mathrm{b}_{b}$. Consequently, the edge $\left(\mathrm{n}_{b}, \mathrm{n}_{e}\right)$ is to be deleted from $\mathrm{C}_{v}$ and a corresponding edge $\left(\mathrm{n}_{e}, \mathrm{n}_{b}\right)$ is to be added to the other transitive graph $\mathrm{C}_{h}$. Since there are at most $\mathrm{O}(\mathrm{n}) \mathrm{n}_{k}$ 's nodes and $\mathrm{O}(\mathrm{n}) \mathrm{n}_{l}$ 's nodes, i.e., $\mathrm{O}\left(n^{2}\right)\left(\mathrm{n}_{k}, \mathrm{n}_{l}\right)$ edges, time complexity of the reverse operation is $\mathrm{O}\left(n^{2}\right)$ where n is the number of modules in a placement,

Figure 3.1(d) shows a TCG and its corresponding placement post swapping module c and d. According to TCG-S, in order to swap two modules c and d , only nodes $\mathrm{n}_{c}$ and $\mathrm{n}_{d}$ designating the modules are to be exchanged in both $\mathrm{C}_{h}$ and $\mathrm{C}_{v}$. Notice that nodes $\mathrm{n}_{c}$ and $\mathrm{n}_{d}$ have been exchanged in Fig. 3.1(d), where $\operatorname{fanin}\left(n_{c}\right)$ is exchanged with $\operatorname{fanin}\left(n_{d}\right)$. Similarly, $\operatorname{fanout}\left(n_{c}\right)$ is exchanged with $\operatorname{fanout}\left(n_{d}\right)$. $\operatorname{fanin}\left(n_{c}\right)$ are $\left\{n_{b}\right\}$ and $\operatorname{fanin}\left(n_{d}\right)$ are $\left\{n_{a}, n_{e}, n_{b}\right\}$. The placement shows that there is no geometric relation between modules b and d in $C_{v}$, but rather in $C_{h}$. The edge $\left(\mathrm{n}_{b}, \mathrm{n}_{d}\right)$ is to be deleted form $C_{v}$ and a corresponding edge $\left(\mathrm{n}_{d}, \mathrm{n}_{b}\right)$ is to be added to the other transitive closure graph $C_{h}$.

As a deduction, all operations are prone to changing the topology of the TCGs. The reason of such incongruousity between the TCG and its placement is that the geometry and dimensions of the blocks in a placement with respect to each other has not been considered while perturbing a placement solution.

### 3.1.2. Packing Sequence $\Gamma_{-}$Update

Consider the TCG and placement shown in Fig. 3.1(c). The packing sequence $\Gamma_{-}$can be obtained using equivalence of SP and TCG proposed by [18], by repeatedly extracting a node $\mathrm{n}_{i}$ with $\operatorname{fanin}\left(n_{i}\right)=0$ in $\mathrm{C}_{h}$ and $\mathrm{C}_{v}$. Similarly, $\Gamma_{+}$is obtained by repeatedly extracting
a node $\mathrm{n}_{i}$ with $\operatorname{fanin}\left(n_{i}\right)=0$ in $\mathrm{C}_{h}$ and fanout $\left(n_{i}\right)=0$ in $\mathrm{C}_{v}$. Accordingly, the sequences $\Gamma_{+}$ and $\Gamma_{-}$are ( $\left.\langle\mathrm{ce} \mathrm{adbf}\rangle,\langle\mathrm{abecdf}\rangle\right)$ respectively. For evaluating SP, packing cost can be calculated using the longest common sequence proposed by [11]. By computing lcs( $\Gamma_{+}$, $\left.\Gamma_{-}\right)$and $\operatorname{lcs}\left(\Gamma_{+}^{R}, \Gamma_{-}\right)$, width and height are determined and hence, the whole placement area. The positions of the modules during each solution perturbation can be computed while evaluating the packing cost using the last common sequence algorithm. Based on $\mathrm{C}_{v}$ graph shown in Fig. 3.1(c) and the aforementioned LCS algorithm, $y_{e}>y_{b}^{\prime}$. $\operatorname{lcs}\left(\Gamma_{+}, \Gamma_{-}\right)$which holds the value of block f position plus its weight in x -direction $\left(y_{f}^{\prime}\right)$, equals to 16 . Whereas, $\operatorname{lcs}\left(\Gamma_{+}^{R}, \Gamma_{-}\right)$, which holds the value of block c position plus its weight in y-direction, equals to 12 . Placement span in the initial TCG configuration is $(13,12)$, became $(16,12)$ after reverse operation. Thus the perturbing solution is diverging and deviating from the desired one. The mismatch between TCG and its corresponding placement during perturbation is obvious.

Lin et al. proposed a scheme for updating sequence $\Gamma_{-}$in reverse operation, in which module $\mathrm{b}_{i}$ is deleted and inserted following $\mathrm{b}_{j}$ in sequence $\Gamma_{-}$. For each module $\mathrm{b}_{k}$ between $\mathrm{b}_{i}$ and $\mathrm{b}_{j}$ in the sequence $\Gamma_{-}$, in which edge $\left(\mathrm{n}_{i}, \mathrm{n}_{k}\right)$ exists in the graph, $\mathrm{b}_{k}$ is deleted and inserted following the most recently inserted module. Consider the placement shown in Fig. 3.1(b), Assume that edge $\left(\mathrm{n}_{a}, \mathrm{n}_{e}\right)$ is reversed. Edges $\left(\mathrm{n}_{a}, \mathrm{n}_{k}\right)$, where node $\mathrm{n}_{k} \in\left\{\mathrm{n}_{c}\right.$, $\left.\mathrm{n}_{b}, \mathrm{n}_{e}\right\}$ and node $\mathrm{n}_{l} \in\left\{\mathrm{n}_{a}, \mathrm{n}_{c}\right\}$, that doesn't exist in the $\mathrm{C}_{v}$ graph will be added to $\mathrm{C}_{v}$ and deleted from the corresponding graph. Therefore, the new added edges are $\left(\mathrm{n}_{c}, \mathrm{n}_{a}\right),\left(\mathrm{n}_{e}\right.$, $\left.\mathrm{n}_{c}\right),\left(\mathrm{n}_{b}, \mathrm{n}_{a}\right)$, and $\left(\mathrm{n}_{e}, \mathrm{n}_{a}\right)$. Accordingly, $\mathrm{b}_{a}$ is deleted from $\Gamma_{-}$and inserted following $\mathrm{b}_{e}$. Since, edge $\left(\mathrm{n}_{a}, \mathrm{n}_{c}\right),\left(\mathrm{n}_{a}, \mathrm{n}_{b}\right)$ doesn't exist nothing is changed. The new $\Gamma_{-}$is $\langle\mathrm{b}$ c e adf $\rangle$, whereas transforming TCG into SP results in $\Gamma_{-}$equals $\langle\mathrm{b}$ e c a d f$\rangle$. Thus, the proposed algorithm for updating $\Gamma_{-}$is only feasible if the edge considered for move is a reduction edge, where no module $\mathrm{b}_{k}$ exists between $\mathrm{b}_{i}$ and $\mathrm{b}_{j}$. Incongruous TCG graphs and its corresponding $\Gamma_{-}$results in infeasible solution during packing cost evaluation by the binary search tree.

Therefore, the limitations of the proposed update scheme in [18] did not only tend to increase the convergence time of the floorplan and make it harder to converge to the desired
solution, by miscalculating packing cost, it may also generate infeasible solution after a series of operations.

### 3.2. TCG-S* Perturbing Algorithm

### 3.2.1. TCG Topology Update

The section proposes a new simple and efficient $O(n)$ runtime algorithm, where $n$ is the number of modules in a placement, for the update of the constraint graphs $\mathrm{C}_{h}$ and $\mathrm{C}_{v}$ during perturbation, based on the knowledge of the position of the modules.

### 3.2.1.1. Rotate

The rotate operation involves rotating a module $\mathrm{b}_{i}$ without changing its position. Rotate operation involves exchanging weights of module $\mathrm{b}_{i}$ in both $\mathrm{C}_{h}$ and $\mathrm{C}_{v}$. Edges $\left(\mathrm{n}_{i}, \mathrm{n}_{k}\right)$ are required to be updated in both $\mathrm{C}_{h}$ and $\mathrm{C}_{v}$, where $\mathrm{n}_{k} \in \operatorname{fanin}\left(n_{i}\right) \cup$ fanout $\left(n_{i}\right)$ in both $\mathrm{C}_{h}$ and $\mathrm{C}_{v}$. First, edge $\left(\mathrm{n}_{i}, \mathrm{n}_{j}\right)$ is deleted from $\mathrm{C}_{v}$ and added to $\mathrm{C}_{\mathrm{h}}$, where $\mathrm{n}_{j} \in \operatorname{fanout}\left(n_{i}\right) \cup$ fanin $\left(n_{i}\right)$. All modules $\mathrm{b}_{j} \in \operatorname{fanout}\left(n_{i}\right)$ in $C_{h} \cup$ fanout $\left(n_{i}\right)$ in $C_{v}$ in which $y_{j}>y_{i}$ are checked whether there exists a vertical relation with $\mathrm{b}_{i}$. If exists, an edge $\left(\mathrm{n}_{i}, \mathrm{n}_{j}\right)$ is added to $\mathrm{C}_{v}$ and the corresponding edge $\left(\mathrm{n}_{i}, \mathrm{n}_{j}\right)$ is deleted from the other transitive graph. Otherwise, an edge $\left(n_{i}, n_{j}\right)$ is added to $\mathrm{C}_{h}$. Similarly, to obtain $\operatorname{fanin}\left(\mathrm{n}_{i}\right)$ in $\mathrm{C}_{v}$ and its corresponding update in both $\mathrm{C}_{h}$ and $\mathrm{C}_{v}$, all modules $\mathrm{b}_{j}$ with $y_{j}^{\prime}<y_{i}^{\prime}$ are checked whether there exists a vertical relation with $\mathrm{b}_{i}$. If exists, an edge $\left(n_{j}, n_{i}\right)$ is added to $C_{v}$ and the corresponding edge $\left(\mathrm{n}_{\mathrm{i}}, \mathrm{n}_{\mathrm{j}}\right)$ is deleted from $C_{h}$. Otherwise, edge $\left(n_{i}, n_{j}\right)$ is added to $\mathrm{C}_{h}$.

(a) rotate module d


Figure 3.2. Three types of perturbations. (a) The resulting TCG and placement after rotating module. (b) The resulting TCG and placement after reversing nodes $\mathrm{n}_{\mathrm{c}}$ and $\mathrm{n}_{\mathrm{e}}$. (c) The resulting TCG and placement after swapping nodes $\mathrm{n}_{\mathrm{c}}$ and $\mathrm{n}_{\mathrm{d}}$.
Figure 3.2(a) shows the resulting TCG and its corresponding placement post reversing module d . Notice that weights of the node $\mathrm{n}_{d}$ have been exchanged in both $\mathrm{C}_{h}$ and $\mathrm{C}_{v}$. fanout $\left(n_{i}\right) \cup \operatorname{fanin}\left(n_{i}\right)=\left\{\mathrm{n}_{\mathrm{b}}\right\}$. Therefore, edge $\left(\mathrm{n}_{d}, \mathrm{n}_{j}\right)$ is deleted from $\mathrm{C}_{v}$ and added to $\mathrm{C}_{\mathrm{h}}$, where $\mathrm{n}_{j} \in\left\{\mathrm{n}_{\mathrm{b}}\right\}$. fanout $\left(n_{d}\right)$ in $C_{v}=\emptyset$, fanout $\left(n_{d}\right)$ in $C_{h}=\left\{\mathrm{n}_{\mathrm{f}}\right\}$. Since module $y_{f}<$ $y_{d}$, nothing is changed. To obtain $\operatorname{fanin}\left(\mathrm{n}_{\mathrm{d}}\right)$ in $\mathrm{C}_{\mathrm{v}}$, module $\mathrm{n}_{\mathrm{j}} \in\left\{\mathrm{n}_{\mathrm{b}}, \mathrm{n}_{\mathrm{f}}\right\}$ in which $y_{j}^{\prime}<y_{d}^{\prime}$ and $\mathrm{n}_{\mathrm{j}}$ has vertical relation with module d , is added to $C_{v}$ and the corresponding edge ( $n_{d}$,
$\left.n_{j}\right)$ is to be deleted from $C_{h}$. Therefore, edges $\left(n_{b}, n_{d}\right)$ and $\left(n_{f}, n_{d}\right)$ are added to $C_{v}$ and edge $\left(n_{d}, n_{f}\right)$ is deleted from $C_{h}$.

Theorem 1: Rotate operation takes $\mathrm{O}(\mathrm{n})$ runtime, where n is the number of modules in a placement.

Proof: The time complexity is dominated by checking whether $n_{i} \perp n_{j}$, where $n_{j} \in$ fanout $\left(n_{i}\right)$ in $C_{h} \cup \operatorname{fanout}\left(n_{i}\right)$ in $C_{v}$, and by deleting all edges $\left(n_{i}, n_{k}\right)$ from $C_{v}$, where $\mathrm{n}_{k}$ $\in \operatorname{fanout}\left(n_{i}\right) \cup \operatorname{fanin}\left(n_{i}\right)$. Since there are at most $\mathrm{O}(\mathrm{n}) \mathrm{n}_{j}$ 's and $\mathrm{O}(\mathrm{n}) \mathrm{n}_{k}$ 's, rotate operation only takes $\mathrm{O}(\mathrm{n})$ runtime in total.

### 3.2.1.2. Swap

To swap modules $b_{i}$ and $b_{j}$, their values in the position array are exchanged. Edge ( $n_{i}$, $n_{j}$ ) is deleted from a transitive closure graph and a corresponding edge ( $n_{j}, n_{i}$ ) is added to the same graph. Edges $\left(n_{k}, n_{j}\right)$, where node $n_{k} \in \operatorname{fanin}\left(n_{j}\right) \notin \operatorname{fanin}\left(n_{i}\right)$ in $C_{h}$, are deleted from $C_{h}$ and corresponding edges $\left(n_{k}, n_{i}\right)$ are added to $C_{h}$. Similarly, Edges $\left(n_{i}, n_{k}\right)$, in which node $n_{k} \in \operatorname{fanout}\left(n_{i}\right) \notin \operatorname{fanout}\left(n_{j}\right)$ in $C_{h}$, are deleted from $C_{h}$ and corresponding edges $\left(n_{j}, n_{k}\right)$ are added to $C_{h}$. Edges $\left(n_{j}, n_{k}\right)$, where node $n_{k} \in \operatorname{fanout}\left(n_{j}\right) \notin \operatorname{fanout}\left(n_{i}\right)$ in $C_{v}$, are deleted from $C_{v}$. Similarly, edges $\left(n_{i}, n_{k}\right)$, where node $n_{k} \in \operatorname{fanout}\left(n_{i}\right) \notin \operatorname{fanout}\left(n_{j}\right)$ in $C_{v}$, are deleted from $C_{v}$. Edges $\left(n_{j}, n_{k}\right)$, where node $n_{k} \in \operatorname{fanin}\left(n_{j}\right) \notin \operatorname{fanin}\left(n_{i}\right)$ in $C_{v}$, are deleted from $C_{v}$. Similarly, edges $\left(n_{i}, n_{k}\right)$, where node $n_{k} \in \operatorname{fanin}\left(n_{i}\right) \notin \operatorname{fanin}\left(n_{j}\right)$ in $C_{v}$, are deleted from $C_{v}$. For nodes $n_{k} \in \operatorname{fanout}\left(n_{j}\right)$ in $C_{v} \cup \operatorname{fanout}\left(n_{j}\right)$ in $C_{h}$, where $x_{k}^{\prime}>$ $x_{i}$ and $b_{i} \perp b_{k}$, an edge $\left(n_{i}, n_{k}\right)$ is added to $C_{v}$. If else, edge $\left(n_{i}, n_{k}\right)$ is added to $C_{h}$. Similarly, for nodes $n_{k} \in \operatorname{fanout}\left(n_{i}\right)$ in $C_{v} \cup \operatorname{fanout}\left(n_{i}\right)$ in $C_{h}$, where $x_{k}^{\prime}>x_{j}$ and modules $b_{k}$ and $b_{j}$ exhibits a vertical geometric relation, an edge ( $n_{j}, n_{k}$ ) is added to $C_{v}$. If else, edge $\left(n_{j}, n_{k}\right)$ is added to $C_{h}$. For nodes $n_{k} \in \operatorname{fanin}\left(n_{j}\right)$ in $C_{v} \cup$ fanout $\left(n_{j}\right)$ in $C_{h}$, where $x_{k}^{\prime}$ $>x_{i}$ and modules $b_{k}$ and $b_{i}$ exhibits a vertical geometric relation, an edge ( $n_{i}, n_{k}$ ) is added to $C_{v}$. If else, edge ( $n_{i}, n_{k}$ ) is added to $C_{h}$. Similarly, for nodes $n_{k} \in \operatorname{fanin}\left(n_{i}\right)$ in $C_{v} \cup$ fanout $\left(n_{i}\right)$ in $C_{h}$, where $x_{k}^{\prime}>x_{j}$ and modules $b_{k}$ and $b_{j}$ exhibits a vertical geometric relation, an edge $\left(n_{j}, n_{k}\right)$ is added to $C_{v}$. If else, edge ( $n_{j}, n_{k}$ ) is added to $C_{h}$.

Figure 3.2(c) shows the resulting TCG and its corresponding placement after swapping modules $b_{c}$ and $b_{d}$. Notice that their positions have been exchanged. Edge ( $n_{c}, n_{d}$ ) is deleted from $C_{h}$ and a corresponding edge $\left(n_{d}, n_{c}\right)$ is added to $C_{h} . \operatorname{fanin}\left(n_{d}\right)$ in $C_{h}=\left\{n_{e}\right.$, $\left.n_{a}\right\}, \operatorname{fanin}\left(n_{c}\right)=\{\emptyset\}$. Therefore, edges $\left(n_{e}, n_{d}\right)$ and $\left(n_{a}, n_{d}\right)$ are deleted from $C_{h}$ and corresponding edges $\left(n_{e}, n_{c}\right)$ and $\left(n_{a}, n_{c}\right)$ are added to $C_{h}$, where nodes $n_{e}$ and $n_{a} \in$ $\operatorname{fanin}\left(n_{d}\right) \notin \operatorname{fanin}\left(n_{c}\right) \cdot$ fanout $\left(n_{c}\right)=\left\{n_{f}\right\}$, fanout $\left(n_{d}\right)=\{\varnothing\}$. Accordingly, edge $\left(n_{c}, n_{f}\right)$ is deleted from $C_{h}$ and corresponding edge $\left(n_{d}, n_{f}\right)$ is added to $C_{h}$. Since $\operatorname{fanout}\left(n_{d}\right)$ in $C_{v}=$ $\{\varnothing\}$ and fanout $\left(n_{c}\right)=\{\varnothing\}$, fanout of nodes $n_{d}$ and $n_{c}$ in $C_{v}$ is not changed. $\operatorname{fanin}\left(n_{d}\right)=$ $\left\{n_{b}, n_{f}\right\}, \operatorname{fanin}\left(n_{c}\right)=\left\{n_{a}, n_{b}, n_{e}\right\}$. Edge $\left(n_{f}, n_{d}\right)$ is deleted from $C_{v}$, where $n_{f} \in \operatorname{fanin}\left(n_{d}\right)$ $\notin \operatorname{fanin}\left(n_{c}\right)$. Similarly, edges $\left(n_{e}, n_{c}\right)$ and $\left(n_{a}, n_{c}\right)$ are deleted from $C_{v}$, where nodes $n_{a}$ and $n_{e} \in \operatorname{fanin}\left(n_{c}\right) \notin \operatorname{fanin}\left(n_{d}\right)$. Since fanout $\left(n_{d}\right)$ in $C_{v} \cup \operatorname{fanout}\left(n_{d}\right)$ in $C_{h}=\{\varnothing\}$, fanout $\left(n_{c}\right)$ in $C_{h}$ is not changed. Since fanout $\left(n_{c}\right)$ in $C_{v} \cup$ fanout $\left(n_{c}\right)$ in $C_{h}=\{\varnothing\}$, fanout $\left(n_{d}\right)$ in $C_{h}\left(C_{v}\right)$ is not changed. fanin $\left(n_{d}\right)$ in $C_{v} \cup$ fanout $\left(n_{d}\right)$ in $C_{h}=\left\{n_{b}, n_{f}\right\}$. $\operatorname{Edge}\left(n_{f}, n_{c}\right)$ is added $C_{v}$ as modules $b_{f} \perp b_{c}$. Similarly, edges $\left(n_{a}, n_{d}\right)$ and $\left(n_{e}, n_{d}\right)$ are added to $C_{v}$.

Theorem 2: Swap operation takes $\mathrm{O}(\mathrm{n})$ runtime, where n is the number of modules in a placement.

Proof: The time complexity is dominated by checking whether $n_{i} \perp n_{k}\left(n_{j} \perp n_{k}\right)$, where $n_{k} \in \operatorname{fanout}\left(n_{j}\right)\left(\operatorname{fanout}\left(n_{i}\right)\right)$ in $C_{v} \cup \operatorname{fanout}\left(n_{j}\right)\left(\operatorname{fanout}\left(n_{i}\right)\right)$ in $C_{h}$, and checking whether $n_{i} \perp n_{l}\left(n_{j} \perp n_{l}\right)$, where $n_{l} \in \operatorname{fanin}\left(n_{j}\right)\left(\operatorname{fanin}\left(n_{i}\right)\right)$ in $C_{v} \cup \operatorname{fanout}\left(n_{j}\right)\left(f \operatorname{fanout}\left(n_{i}\right)\right)$ in $C_{h}$. Since there are at most $\mathrm{O}(\mathrm{n}) n_{k}$ 's and $\mathrm{O}(\mathrm{n}) n_{l}$ 's, operation takes $\mathrm{O}(\mathrm{n})$ runtime in total.

### 3.2.1.3. Reverse

Reverse operation reverses the geometric relation between two modules $b_{i}$ and $b_{j}$. If there exists a geometric relation $b_{i} \vdash b_{j}$, the new relation after reversing is $b_{j} \vdash b_{i}$.

Reverse operation is a derivative of swap operation, since it involves reversing the direction of an edge $\left(n_{i}, n_{j}\right)$, i.e. swap modules $b_{i}$ and $b_{j}$. Hence, TCG topology update in a reverse operation only Swap operation on block $b_{j}$.

### 3.2.1.4. Move

Move operation involves changing the geometric relation of two modules $b_{i}$ and $b_{j}$ between horizontal transitive closure graph and vertical one. The move operation can be classified into two instances, the one where $b_{i} \perp b_{j}$, and the other where $b_{i} \vdash b_{j}$.

To move an edge $\left(n_{i}, n_{j}\right)$ in $C_{h}\left(C_{v}\right)$, edge $\left(n_{j}, n_{k}\right)$ is deleted from $C_{v}$, where module $n_{k}$ $\in \operatorname{fanout}\left(n_{j}\right)$. Edge $\left(n_{k}, n_{j}\right)$ is deleted from $C_{v}$, where module $n_{k} \in \operatorname{fanin}\left(n_{j}\right)$. Edge ( $n_{j}$, $n_{l}$ ), where $n_{l} \in \operatorname{fanout}\left(n_{j}\right)$, is deleted from $C_{v}$. For each node $n_{k} \in \operatorname{fanout}\left(n_{j}\right) \cup \operatorname{fanin}\left(n_{j}\right)$ in $C_{v} \cup \operatorname{fanin}\left(n_{j}\right)$ in $C_{h}$, if $b_{j} \perp b_{k}$ or $b_{k} \perp b_{j}$, then edge $\left(n_{k}, n_{j}\right)\left(\left(n_{j}, n_{k}\right)\right)$ is deleted from $C_{h}$. If $b_{j} \perp b_{k}$, then edge ( $n_{j}, n_{k}$ ) is added to $C_{v}$. Else, edge ( $n_{k}, n_{j}$ ) is added to $C_{v}$. If no geometric vertical relation exists between modules $b_{j}$ and $b_{k}$ and $b_{j} \vdash b_{k}\left(b_{k} \vdash b_{j}\right)$ in xdirection, then edge $\left(n_{k}, n_{j}\right)\left(\left(n_{j}, n_{k}\right)\right)$ is deleted from $C_{h}$ and a corresponding edge ( $n_{j}$, $\left.n_{k}\right)\left(\left(n_{k}, n_{j}\right)\right)$ is added to $C_{h}$. To update fanout of node $n_{i}$ in $C_{h}$ and $C_{v}$, For each node $n_{k}$ $\in \operatorname{fanout}\left(n_{i}\right)$ in $C_{v} \cup \operatorname{fanout}\left(n_{i}\right)$ in $C_{h}$ and $y_{k}>y_{i}$. If $n_{i} \perp n_{k}$ or $n_{k} \perp n_{i}$, then edge ( $n_{i}$, $n_{k}$ ) is deleted from $C_{h}$ and the corresponding edge ( $n_{i}, n_{k}$ ) is added to $C_{v}$. If no vertical relation exists between modules $b_{i}$ and $b_{k}$, then the edge $\left(n_{i}, n_{k}\right)$ is deleted from $C_{v}$ and the edge $\left(n_{i}, n_{k}\right)\left(\operatorname{or}\left(n_{k}, n_{i}\right)\right)$ is added to $C_{h}$.

Figure 3.2(d) shows the resulting TCG and its corresponding placement after moving the edge $\left(n_{d}, n_{f}\right)$ in the $C_{h}$ in Fig. 3.2(c) to $C_{v} . \operatorname{fanout}\left(n_{f}\right)$ in $C_{v}=\left\{n_{c}\right\}$, $\operatorname{fanin}\left(n_{f}\right)$ in $C_{h}=\left\{n_{a}\right.$, $\left.n_{b}, n_{d}, n_{e}\right\}$, and $\operatorname{fanin}\left(n_{f}\right)$ in $C_{v}=\{\varnothing\}$. Consequently, edge $\left(n_{f}, n_{c}\right)$ is deleted from $C_{v}$. fanout $\left(n_{f}\right) \cup \operatorname{fanin}\left(n_{f}\right)$ in $C_{v} \cup \operatorname{fanin}\left(n_{f}\right)$ in $C_{h}=\left\{n_{a}, n_{b}, n_{c}, n_{d}, n_{e}\right\}$. Since modules $b_{a}$, $b_{e}$, and $b_{d} \perp b_{f}$, where $\left\{n_{a}, n_{d}, n_{e}\right\} \subset\left\{n_{a}, n_{b}, n_{c}, n_{d}, n_{e}\right\}$, edge $\left(n_{a}, n_{f}\right),\left(n_{d}, n_{f}\right),\left(n_{e}\right.$, $\left.n_{f}\right)$ is deleted from $C_{h}$ and corresponding edges are added to $C_{v}$. Edges $\left(n_{b}, n_{f}\right)$ is deleted from $C_{h}$ and edges $\left(n_{f}, n_{b}\right)$ and $\left(n_{f}, n_{c}\right)$ are added to $C_{h} . \operatorname{fanout}\left(n_{d}\right)$ in $C_{v} \cup \operatorname{fanout}\left(n_{d}\right)$ in $C_{h}=\left\{n_{c}, n_{b}, n_{f}\right\}$, from which only $b_{d} \perp b_{f}$. Therefore, edge $\left(n_{d}, n_{f}\right)$ is checked whether it exists in $C_{h}$. Since edge ( $n_{d}, n_{f}$ ) does not exist in $C_{h}$ and edge ( $n_{d}, n_{f}$ ) has already been added to $C_{v}$, nothing is done.

Theorem 3: Move operation takes $\mathrm{O}(\mathrm{n})$ runtime, where n is the number of modules in a floorplan.

Proof: The time complexity is dominated by checking whether $b_{j} \perp b_{k}\left(b_{k} \perp b_{j}\right)$, where $n_{k} \in \operatorname{fanout}\left(n_{j}\right) \cup \operatorname{fanin}\left(n_{j}\right)$ in $C_{v} \cup \operatorname{fanin}\left(n_{j}\right)$ in $C_{h}$, and checking $b_{i} \perp b_{l}\left(b_{l} \perp b_{i}\right)$, where $n_{l} \in \operatorname{fanout}\left(n_{i}\right)$ in $C_{v} \cup \operatorname{fanout}\left(n_{i}\right)$ in $C_{h}$. Since there are at most $\mathrm{O}(\mathrm{n}) n_{k}$ 's and $\mathrm{O}(\mathrm{n}) n_{l}$ 's, the operation takes $\mathrm{O}(\mathrm{n})$ in total.

Theorem 5: No reduction edges are required to be obtained for Swap, Reverse and Move operations.

Proof: An edge $\left(n_{i}, n_{j}\right)$ is considered a reduction edge if there does not exist another path from $n_{i}$ to $n_{j}$ except the edge ( $n_{i}, n_{j}$ ). Swap, Reverse and move perturbations do not require to operate only on reduction edges as in TCG-S representation, since operations in TCG-S* update the closure edges ( $n_{i}, n_{j}$ ) along with all the reduction edges that form other paths from $n_{i}$ to $n_{j}$. Therefore, the resulting TCGs are acyclic. Operating on both reduction and closure edges increase available move combinations, and facilitates the search for minimum packing cost, i.e. the desired solution.

Property 4: fanin (fanout) edges in $C_{v}$ and fanin edges in $C_{h}$ must be acyclic.
To guarantee feasible TCG, edges drawn from node $n_{i}$ to $n_{j}$ in the fanout ( $n_{k}$ to $n_{i}$ in the fanin) of $C_{v}$, as of geometric relation between modules $b_{i}$ and $b_{j} b_{i} \perp b_{j}$, and edges drawn from node $n_{k}$ to $n_{j}$ in the fanin of $C_{h}$ as $b_{k} \vdash b_{j}$ must be acyclic. Since acyclic edges in $C_{h}\left(C_{v}\right)$ does not guarantee a feasible solution, nodes $n_{i}, n_{j}$, and $n_{k}$ must be checked that their edges in $C_{v}$ and $C_{h}$ combined are acyclic. $b_{i} \perp b_{j}\left(b_{i} \vdash b_{j}\right), b_{k} \perp b_{i}\left(b_{k} \vdash b_{i}\right)$, and $b_{k}$ $\vdash b_{j}\left(b_{k} \perp b_{j}\right)$ cannot exist in a TCG, and thus edges $\left(n_{i}, n_{j}\right),\left(n_{k}, n_{i}\right)$ in $C_{h}\left(C_{v}\right)$ and $\left(n_{k}\right.$, $\left.n_{j}\right)$ in $C_{v}\left(C_{h}\right)$ cannot exist.

### 3.2.2. Packing Sequence Update

This section introduces an $\mathrm{O}(\mathrm{n})$ runtime algorithm, where n is the number of modules in a placement, for the update of packing sequences $\Gamma_{+}$and $\Gamma_{-}$based on knowledge of $\mathrm{C}_{h}, \mathrm{C}_{v}$,
and the positions of the modules. The algorithm depends on updating the TCG topology after each perturbation.

Algorithm 3.1: Update-SP (SeqX, SeqY, A)

//initialize SeqYNew Arrays with 0
//initialize Tmp List with nil

1. FOR i 0 NUM(SeqY)-1
2. $\quad \operatorname{IF}\left(\operatorname{Seq} Y[i] \in\right.$ Fout_Cv $(\mathrm{A})$ in $C_{v} \cup$ Fout_Ch(A) THEN $\{$
3. SeqYNew[i]=SeqY[i];
4. ELSE
5. Tmp $=$ concat $\{\operatorname{Tmp} \operatorname{Seq} \mathrm{Y}[\mathrm{i}]\} ;\}$
6. FOR i NUM(SeqY)-NUM(Tmp) NUM(SeqY)-1
7. $\quad$ SeqYNew[i] = nth(i-NUM(SeqY)+NUM(Tmp) Tmp);
8. $\quad \mathrm{Tmp}=\mathrm{nil}$;
9. RETURN SeqYNew

Algorithm 1 shows the update of $\Gamma_{-}$, sequence $\Gamma_{+}$update will be discussed shortly. The algorithm updates the position of the module $\mathrm{b}_{i}$, on which perturbation is applied, with respect to the ones that precedes and the ones that follows it in the sequence. Any module $\mathrm{b}_{k}$, belongs to fanout $\left(b_{i}\right)$ in $\mathrm{C}_{v}$ graph $\cup$ fanout $\left(b_{i}\right)$ in $\mathrm{C}_{h}$ graph, is to follow module $\mathrm{b}_{i}$ in the sequence $\Gamma_{-}$. When the algorithm ends, the array $\operatorname{Seq} \mathrm{YNew}[1 \ldots n]$ records the sequence $\Gamma_{-}$. Similarly, to update sequence $\Gamma_{+}$, Any module $\mathrm{b}_{k}$, belongs to $\operatorname{fanin}\left(b_{i}\right)$ in $\mathrm{C}_{v}$ graph $\cup$ fanout $\left(b_{i}\right)$ in $C_{h}$ graph, is to follow module $\mathrm{b}_{i}$ in the sequence $\Gamma_{+}$.

Tang et al. proposed a fast packing cost evaluation of sequence pair by computing the longest common subsequence with minimum time complexity of $\mathrm{O}(\mathrm{n} \log \log \mathrm{n})$. However, time complexity of the floorplan algorithm is dominated by the construction of constraint graphs from scratch after each perturbation for packing cost evaluation, since the geometric relations between modules are not transparent to the operations of SP. Thus, the time complexity of constructing the constraint graphs is $\mathrm{O}\left(n^{2}\right)$, where n is the number of modules in a placement. Implementing TCG-S* algorithm with $\mathrm{O}(\mathrm{n})$ runtime in total
decreases the time complexity of the sequence pair floorplan algorithm to $\mathrm{O}(\mathrm{n} \log \log \mathrm{n})$ for significantly large n .

Theorem 5: Algorithm 3.1 correctly returns the new sequence pairs $\Gamma_{+}$and $\Gamma_{-}$.
Proof: According to sequence pair representation, packing sequence $\Gamma_{-}$is constructed by concatenating the nodes in a placement as in (1) and (2) subject to the condition that either $b_{i}$ is left to or below $b_{j}$, where $b_{j}$ follows $b_{i}$ in the sequence. Therefore, $b_{j}$ follows $b_{i}$ in $\Gamma_{-}$only if $b_{i} \vdash b_{j}$ or $b_{i} \perp b_{j}$. Additionally, based on property (2) of TCG discussed in [2], the two nodes $n_{i}$ and $n_{j}$ are connected by exactly one edge either in $C_{v}$ or $C_{h}$. If $n_{j}$ $\notin \operatorname{fanout}\left(b_{i}\right)$ either in $C_{v}$ or $C_{h}$, then $n_{j} \in$ to $\operatorname{fanin}\left(b_{i}\right)$ either in $C_{v}$ or $C_{h}$. Therefore, algorithm 3.1 correctly returns the new sequence pair.

Theorem 6: Algorithm 3.1 updates the packing sequences in $\mathrm{O}(\mathrm{n})$ runtime.
Proof: The time complexity of updating sequence $\Gamma_{-}$in algorithm 3.1 is dominated by checking whether $\mathrm{b}_{j}$ is a member of fanout $\left(b_{i}\right)$ in both $\mathrm{C}_{v}$ and $\mathrm{C}_{h}$. Since, time complexity of updating sequence $\Gamma_{+}$and $\Gamma_{-}$are the same, and in worst case scenario there are at most $\mathrm{O}(\mathrm{n}-1)$ of $\mathrm{b}_{j}$ 's, time complexity of algorithm 1 is $\mathrm{O}(\mathrm{n})$ in total.

### 3.2.3. Equivalence of TCG and SP

Lin et al. proposed a transformation from TCG to SP using fanin and fanout of TCGs [18]. Time complexity of such algorithm merely depends on the configuration of TCG. For each node $\mathrm{n}_{k}$ in the TCG, a node $\mathrm{n}_{l}$ is checked whether edge $\left(\mathrm{n}_{k}, \mathrm{n}_{l}\right)$ or $\left(\mathrm{n}_{l}, \mathrm{n}_{k}\right)$ exists in $\mathrm{C}_{h}$ or $\mathrm{C}_{h}$. if exists, the edge is deleted. In worst case, there exist $\mathrm{O}(\mathrm{n}-1) \mathrm{n}_{k}$ 's and $\mathrm{O}(\mathrm{n}) \mathrm{n}_{l}$ 's, thus the time complexity is $\mathrm{O}\left(n^{2}\right)$. TCG- $\mathrm{S}^{*}$ packing sequence update algorithm returns the updated sequences $\Gamma_{+}$and $\Gamma_{-}$in $\mathrm{O}(\mathrm{n})$ runtime which makes it superior to the update proposed by [18].

Likewise, a reverse transformation from SP to TCG can be obtained. Given a sequence pair ( $\Gamma_{+}, \Gamma_{-}$), the fanin and fanout of all nodes in both transitive closure graphs can be obtained by determining the common nodes in the subsequence of the inspected node in each of $\Gamma_{+}$and $\Gamma_{-}$according to the horizontal and vertical constraints. Accordingly, in order to obtain fanout $\left(\mathrm{n}_{i}\right)$ in x -direction from $\Gamma_{+}$and $\Gamma_{-}$, subsequence of node $\mathrm{n}_{i}$ in $\Gamma_{+} \cap$
subsequence of node $n_{i}$ in $\Gamma_{-}$is determined. Subsequence of node $n_{i}$ in $\Gamma_{+}^{R} \cap$ subsequence of node $\mathrm{n}_{i}$ in $\Gamma_{-}$determines fanout $\left(\mathrm{n}_{i}\right)$ in y-direction. Subsequence of node $\mathrm{n}_{i}$ in $\Gamma_{+}^{R} \cap$ subsequence of node $\mathrm{n}_{i}$ in $\Gamma_{-}^{R}$ determines $\operatorname{fanin}\left(\mathrm{n}_{i}\right)$ in x-direction. Finally, subsequence of node $\mathrm{n}_{i}$ in $\Gamma_{+} \cap$ subsequence of node $\mathrm{n}_{i}$ in $\Gamma_{-}^{R}$ determines $\operatorname{fanin}\left(\mathrm{n}_{i}\right)$ in y-direction. Example, for the placement shown in Fig. 3.2(a) with sequence pair (〈e c a d b f $\rangle,\langle\mathrm{a}$ b c ef d $\rangle$ ), fanout $\left(\mathrm{n}_{a}\right)$ in x-direction $=\left\{\mathrm{n}_{b}, \mathrm{n}_{d}, \mathrm{n}_{f}\right\}$, fanout $\left(\mathrm{n}_{a}\right)$ in y -direction $=\left\{\mathrm{n}_{c}, \mathrm{n}_{e}\right\}$, fanin $\left(\mathrm{n}_{a}\right)$ in x-direction $=\{\varnothing\}$, and $\operatorname{fanin}\left(\mathrm{n}_{a}\right)$ in y-direction $=\{\varnothing\}$.

### 3.3. Floor Planning Algorithm

A simulated annealing based algorithm [54] is developed using TCG-S for non-slicing floorplan design with the updated perturbing algorithm TCG-S*. Given an initial solution represented by TCG and SP , the algorithm perturbs the placement to obtain new TCG and SP. The new TCG must satisfy the three properties mentioned in [12], and the new packing sequences pair must show equivalence with TCG as well. Slack computation proposed by [55] is implemented in order to improve move selection in simulated annealing. Contribution to wirelength minimization is discussed in this section as well.

### 3.3.1. $\quad$ Slack Computation

Blocks that constrain each other in the same direction in the order that any attempt to minimize path length will result in blocks overlap, lie on the critical path of floorplan. Hence, the slack value in that direction is zero. These blocks are good candidates for move selection towards reducing span of the floorplan. Slack based moves along with the moves of TCG give a directed movement towards area minimization through the determination of zero slack blocks, which represents the critical paths of floorplan.


Figure 3.3. Slack computation (a) floorplan evaluation in left to right and bottom to top mode. (b) floorplan evaluation from right to left and top to bottom mode.

Table 1. MCNC Benchmark circuits

| Circuit | \#Module | \#I/O Pads | \#Nets | \#Pins |
| :--- | :--- | :--- | :--- | :--- |
| apte | 9 | 73 | 97 | 214 |
| xerox | 10 | 107 | 203 | 696 |
| hp | 11 | 43 | 83 | 264 |

Slacks can be computed in left-to-right mode or right-to-left mode. Fig. 3.3 shows floorplan evaluation for the same sequence pair in bottom-left mode and top-right mode.

To compute slacks of blocks in floorplan, first, LCS of the two sequences is computed in the left to right mode. Then the two sequences are reversed for LCS computation is the left to right mode. For example, LCS of blocks in x-direction in the left to right mode is computed by calculating $\operatorname{lcs}\left(\Gamma_{+}^{R}, \Gamma_{-}^{R}\right)$, whereas to compute LCS in y-direction, $\operatorname{lcs}\left(\Gamma_{+}, \Gamma_{-}^{R}\right)$ is calculated. Algorithm 3.2 computes the LCS of the blocks using the sequence pair. Algorithm 3.3 calls LCS function after initializing the sequence pair in reversed order.

Algorithm 3.2:

LCS_Calc(X,Y, weights)

1. initialize_length_array $L$ with 0 ;
2. initialize_position_array P ;
3. initialize_result_array R ;
4. For $\mathrm{i}=0 \mathrm{TO} \mathrm{n}-1 \mathrm{DO}$
5. $\mathrm{p}=\operatorname{match}[\mathrm{X}[\mathrm{i}]]$;
6. $\quad \mathrm{b}=\mathrm{X}[\mathrm{i}]$;
7. $\quad \max =\mathrm{L}[\mathrm{p}]+$ weights $[\mathrm{i}]$;
8. $\quad \mathrm{P}[\mathrm{i}]=\mathrm{L}[\mathrm{p}]$;
9. $\quad$ For $\mathrm{j}=\mathrm{p}$ TO $\mathrm{n}-1$ DO
10. $\quad \operatorname{IF}(\max >L[j] \& \& Y[j] \in \operatorname{Fout}(b))$
11. 

THEN
12.

$$
\mathrm{L}[\mathrm{j}]=\max
$$

13. $\mathrm{R}[0]=\mathrm{P}[0, \ldots, \mathrm{n}-1]$;
14. $\mathrm{R}[1]=\mathrm{L}[\mathrm{n}-1]$;
15. RETURN R;

Algorithm 3.3:

Slack (X,Y, PosX, PosY, wX, wY)

1. initialize_arrays Rx_BL, Ry_BL;
2. initialize_array Rx_TR, Ry_TR;
3. /*evaluate LCS X in bottom-left mode*/
4. LCSX_BL = LCS_Calc(X,Y, wX);
5. /*evaluate LCS Y in bottom-left mode*/
6. For $\mathrm{i}=0$ TO $\mathrm{n}-1 \mathrm{DO}$
7. $\quad X^{R}[\mathrm{i}]=\mathrm{X}[\mathrm{n}-1-\mathrm{i}]$;
8. $\quad W Y_{B L}^{R}[\mathrm{i}]=\mathrm{w} \mathrm{Y}[\mathrm{n}-1-\mathrm{i}]$;
9. LCSY_BL $=\operatorname{LCS} \_\operatorname{Calc}\left(X^{R}, Y, W Y^{R}\right)$;
10. /*evaluate LCS X in top-right mode*/
11. For $\mathrm{i}=0 \mathrm{TO} \mathrm{n}-1 \mathrm{DO}$
12. $\quad Y^{R}[\mathrm{i}]=\mathrm{Y}[\mathrm{n}-1-\mathrm{i}] ;$
13. $W X^{R}[\mathrm{i}]=\mathrm{wX}[\mathrm{n}-1-\mathrm{i}]$;
14. $\quad L C S X^{R} \_T \mathrm{R}=\operatorname{LCS} \_\operatorname{Calc}\left(X^{R}, Y^{R}, \mathrm{~W} X^{R}\right)$;
15. /*evaluate LCS Y in top-right mode*/
$16 . L_{C S Y}{ }^{R} \_$TR $=\operatorname{LCS} \_\operatorname{Calc}\left(\mathrm{X}, Y^{R}, \mathrm{w} Y\right) ;$
16. For $\mathrm{i}=0 \mathrm{TO} \mathrm{n}-1 \mathrm{DO}$
17. LCSX_TR[i] $=L C S X^{R}$ _TR[n-1-i $]$;
18. $\quad$ LCSY_TR[i] $=\operatorname{LCSY} Y^{R}$ TR[n-1-i $]$;
20./*compute slack*/
19. For $\mathrm{i}=0$ TO $\mathrm{n}-1 \mathrm{DO}$
20. SlackX[i] = max(LCSX_BL[i])-LCSX_BL[i]-LCSX_TR[i]+wX[i];
21. SlackY[i] = max(LCSY_BL[i])-LCSY_BL[i]-LCSY_TR[i]+wY[i];

Based on the equivalence between TCG and SP, LCS function returns floorplan span in x -direction ( y -direction) faster. Since block $\mathrm{b}_{i}$ in a placement is only bounded by its fanout blocks in $\mathrm{C}_{h}\left(\mathrm{C}_{v}\right)$, only these blocks affect the total length of candidates sequences in the path of block $\mathrm{b}_{i}$. Let k denote the index of module $\mathrm{b}_{i}$ in sequence $\Gamma_{+}$and p denote the index of mobule $\mathrm{b}_{i}$ in sequence $\Gamma_{-}$. Therefore, computing $\operatorname{lcs}\left(\Gamma_{+}[1 \ldots \mathrm{k}], \Gamma_{-}[1 \ldots \mathrm{p}]\right)$ only considers the fanout of blocks in the common subsequence of ( $\left.\Gamma_{+}[1 \ldots \mathrm{k}-1], \Gamma_{-}[1 \ldots \mathrm{p}-1]\right)$.

## 4.Placement and Routing

### 4.1. Constraints-based Placement

Placement of analog circuits is an error prone and time consuming process. It can easily take an experienced designer weeks or months to layout even a relatively small circuit. Some devices are needed to be placed at close proximity and symmetrically with respect to an axis or to a center point. This can reduce the effect of parasitic mismatches, which will cause degradation of the circuit performance. Circuit sensitivity to thermal gradients and process variations can be reduced by placing symmetric devices close to each other.

### 4.1.1. Overview of Analog Placement Methods

In order to automatically produce analog device-level layouts matching in density and performance the high-quality manual layouts, a placement tool must not only provide a good rectangle packing functionality (which must be common to any placement method) but, additionally, it must include also analog-specific capabilities. Such specific features are, for instance; 1) the ability to deal with topological constraints for symmetry and device matching; 2) the ability to arrange devices such that critical structures are shared in common (also known as device merging) in order to reduce both layout density and induced parasitics; and 3) the existence of a (built-in) library of predefined module generators and the ability to exploit their reshaping capabilities during the placement process. Besides these specific features of analog placement, the main goal of optimally packing arbitrarily sized modules is similar to that of other very large scale integrated circuits (VLSI) placement problems-chip floorplanning, standard cell and macro cell digital placement. Due to the complexity of the basic problem, several heuristic classes of placement techniques have been attempted.

The constructive placement techniques, which consist in evolving gradually the placement solution by selecting one module at a time and positioning it in the "best" available location, were among the first developed for VLSI layout. Several systems for analog placement employ constructive methods: Kayal et al. developed an expert
knowledge base to guide the placement [56]; Mehranfar suggested a schematic-driven approach, using a constructive scheme based on connectivity and relative positioning in the input schematic [57]. Although these methods are fast, scaling well with the problem size, the results can be poor due to the order dependence, lacking of global view in dealing with a variety of interacting quality measures. Branch-and-bound placement techniques use a controlled enumeration of all possible layout configurations in the search space, where a lower bound of the chosen cost function is used to prune the search. The branch-and-bound algorithms eventually find the optimal solution as they explore exhaustively the search space. However, they are effective only for problems of very small size as the number of visited configurations grows exponentially with the size of the problem. The related integer linear programming (ILP) placement models suffer the same scaling drawback as most ILP packages are based on branch-and-bound approaches. Even if the placement problems are tackled hierarchically, the branch-and-bound methods are less attractive for analog device placement due to usually a much larger search space than digital problems of similar size (for instance, due to the presence of "soft" capacitors which can be implemented in a large number of versions). More recently, a placement technique iteratively combining min-cut partitioning and force-directed placement (DLP) has been employed in an interactive environment for full-custom designs [58].

The simulated annealing [54] and genetic algorithms are the most effective choice for solving industrial analog placement problems. These algorithms use stochastically controlled hill-climbing to avoid local minima during the optimization process. In addition, they do not impose severe constraints on the size of the problems or on the mathematical properties of the cost function. While efficiently trading off between a variety of layout factors as area, total net length, aspect ratio, maximum chip width and/or height, cell orientation, "soft" cell shape, etc., they are very flexible-supporting incremental addition of new functionality, and they are relatively easy to implement (although good tuning needs more time).

Existing approaches to automated placement generation can be classified into two categories;
i. Template driven layout

This approach is based on a known layout pattern or layout template which specifies necessary device-to-device, device-to-wire, or wire-wire special relationship for a typical circuit. It is fast and easy to obtain a compact layout. However, this approach lacks flexibility as matching varies from circuit design to another.

## ii. Constraint-based layout

It is more flexible than template driven layout approach. Fig. 4.1 shows the general flow of the constraint-driven or performance-driven layout. It usually starts with the circuit analysis based on the netlist and/or performance specification of the design to generate the layout constraints. The placement and routing process is required to meet the constraints, and the final compaction stage is applied to optimize area utilization.


Figure 4.1 Constraint-driven analog layout generation flow

According to [48] and [49], device group placement is classified into four categories; the cross-couple, inter-digitated, common-centroid, and general stacking matching styles. These four styles are studied thoroughly in [50]. This section mainly studies and impements the common-centroid and inter-digitated matching styles in automated device group
placement in order to reduce systematic device mismatch. The inputs of placement algorithm are the aspect ratio bounds, which is computed in the floorplan optimization process, devices to be matched, and matching style.

### 4.1.2. A Review on Simulated Annealing Optimization <br> Algorithm

At each layout optimization stage, one wants to optimize the eventual performance of the system without compromising the feasibility of the subsequent stage. The basic elements of simulated annealing are:
i. A finite set S .
ii. A real-valued cost function $J$ defined on $S$. let $S^{*}$ be the set of global minima of the function $J$, assumed to be a proper subset of $S$.
iii. For each $i \subset S$, a set $S(i) \subset S-\{i\}$, called the set of neighbors of i.
iv. For every $i$, a collection of positive coefficients $q_{i j}, j \in S(i)$, such that $\sum_{j \in S(i)} q_{i j}=1$. It is assumed that $j \in S(i)$ if $i \in S(j)$.
v. A non-increasing function $\mathrm{T}: N-[0, \propto]$, called the cooling schedule. N is the set of positive integers, and $T(t)$ is called the temperature at time $t$.
vi. An initial state $x(0) \in S$.

Given the above elements, the SA algorithm consists of a discrete-time inhomogeneous Markov Chain $x(t)$, whose evolution we now describe. If the current state $x(t)$ is equal to $i$, choose a neighbor $j$ to $i$ at random; the probability that any particular $j \in S(i)$ is selected is equal to $q_{i j}$. Once $j$ is chosen, the next state $x(t+1)$ is determined as follows:

$$
\text { If } J(j) \leq J(i), \text { then } x(t+1)=j
$$

If $J(j)>J(i)$ then

$$
X(t+1)=j \quad \text { with probability } \exp [-(J(j)-J(i)) / T(t)]
$$

$$
X(t+1)=i \quad \text { otherwise }
$$

Formally,

$$
\begin{align*}
& P\left[x(t+1)=j|x(t)=i|=q_{i j} \exp \left[-\frac{1}{T(t)} \max \{0, J(j)-J(i)\}\right]\right. \\
& \text { if } j \neq i, j \in S(i) \tag{4.1}
\end{align*}
$$

$$
\text { If } j \neq i, j \notin S(i) \text {, then } P[x(t+1)=j|x(t)=i|]=0 .
$$

The rationale behind the SA algorithm is best understood by considering a homogeneous Markov chain $X_{T}(t)$ in which the temperature $T(t)$ is held at constant value $T$. Assume that the Markov chain $X_{T}(t)$ is irreducible and periodic and that $q_{i j}=q_{j i}$ for all $i, j$. Then $X_{T}(t)$ is a reversible Markov chain, and its invariant probability distribution is given by
$\pi_{T}(i)=\frac{1}{Z_{T}} \exp \left[-\frac{J(i)}{T}\right] i \in S$,
where $Z_{T}$ is a normalizing constant. (This is easily shown by verifying that the detailed balance equations hold). The probability distribution function $\pi_{T}$ is concentrated on set $S^{*}$ of global minima $J$. This latter property remains valid if the condition $q_{i j}=q_{j i}$ is relaxed.

The probability distribution (4.2), known as the Gibbs distribution, plays an important role in statistical mechanics. Statistical physicists have been interested in generating a sample element $S$, drawn according to the probability distribution $\pi_{T}$. This is accomplished by simulating Markov chain $X_{T}(t)$ until it reaches equilibrium, where this method is known as Metropolis algorithm (Metropolis et al., 1953). In the context of optimization, an optimal element of $S$ can be generated with high probability if a random sample is generated according to $\pi_{T}$, with $T$ being very small. One difficulty with this approach is that when $T$ is very small, the time is takes for Markov chain to reach equilibrium can be excessive. The SA algorithm tries to resolve this drawback by using a slow cooling rate $T(t)$.

The SA can be viewed as a local search algorithm in which that there are occasional upward moves that lead to cost increase.

Assume that $X_{T}(t)$ is irreducible and periodic. According to this assumption, SA algorithm converges if $\lim _{t \rightarrow \infty}\left(P\left[x(t) \in S^{*}\right]\right)=1$. SA convergence condition according to Hajek is presented next.

Theorem (Hajek, 1988): state $i$ communicates with $S^{*}$ at height $h$ if there exists a path in $S$, with each element of the path being neighbor of the preceding element. The path starts at $i$ and ends at some element at $S^{*}$ and such that the largest value of $J$ along the path is $J(i)+h$. Let $d^{*}$ be the smallest number such that every $i \in S$ communicates with $S^{*}$ at height $d^{*}$. Then, the SA algorithm converges if and only if:
$\lim _{t \rightarrow \infty}(T(t))=0$
and,
$\sum_{t=1}^{\infty} \exp \left[-\frac{d^{*}}{T(t)}\right]=\infty$
$T(t)=\frac{d}{\log t^{\prime}}$
where d is a positive constant. Hajek theorem states that SA converges if and only if $d \geq$ $d^{*}$.

The constant $d^{*}$ is the measure of the difficulty of $\mathrm{x}(\mathrm{t})$ to escape the local minima and travel from a non-optimal state to $S^{*}$. A problem with $d^{*}>0$, in the sense that the problem has at least one local minima which is not the optimal solution, is the primary concern. In order to have an acceptable grasp on Hajek theorem, consider a local minimum with depth $d^{*}$. The SA makes an infinite number of trials to escape from it, and the probability of success at each trial, as discussed earlier, is $\exp \left(-d^{*} / T(t)\right)$. Therefore, according to equation (4.4), an infinite number of trial will guarantee a successful escape.

In order to get more intuition on the interpretation of Hajeks' theorem, the connection between SA and the Markov chain is further analyzed. Formally, the statistics of Markov chain $x(t)$ under a slowly variation cooling schedule $\mathrm{T}(\mathrm{t})$ remains fairly unchanged if the cooling schedule is used in which the temperature is held constant for a long time period. Let $t_{k}=1$ and $t_{k+1}=\exp (k d)$. Then let $\widehat{T}(t)=1 / k$, for $t_{k} \leq t \leq t_{k+1}$. Consider the kth element $\left[t_{k}, t_{k+1}\right]$ of the piecewise constant schedule $\hat{T}(t)$. In order to study the convergence of the chain $x_{1 / k}(t)$, the eigenvalues of its transition probability matrix is real. Its relaxation time is determined by its second-largest eigenvalue $\lambda_{2}$ for which good estimates are available, at least in the limit as $k \rightarrow \infty$. e.g., Chiang and Chow, 1988 and Holley and Stroock, 1988. In particular, if the cost function J has a unique global minimum, the relaxation time is approximated by $\exp \left(k d^{*}\right)$, which is the same constant $d^{*}$ defined in the Hajek theorem. This gives more solid evidence on the convergence condition $d>$ $d^{*}$ for the schedule $\widehat{T}(t)$. If $d<d^{*}$, then it means that at each temperature $1 / k, x_{1 / k}(t)$ is run with a negligible fraction of its relaxation time which is not enough for $\pi_{T}(i ; t)$ to stay close to $\pi_{T}(i)$. Whereas, if $d<d^{*}$, then the interval [ $t_{k}, t_{k+1}$ ] corresponds to $\exp \left(k\left(d^{*}-\right.\right.$ d) relaxation times of $x_{1 / k}(t)$ which implies that $\pi_{T}(i ; t)$ is very close to $\pi_{T}(i)$ as k tends to $\infty$.

In practice, despite the lack of solid theoretical justification of SA convergence speed, SA was widely used by researchers in the past decades. Generally, the performance of SA is mixed; in some cases, it outperformed the best known heuristics for these cases, and, in others, heuristics performed better. The choice of the cooling schedule influences significantly the convergence of the SA, and hence, the quality of the solution generated.

To sum up, SA is a generally applicable and easy-to-implement probabilistic approximation algorithm which is able to generate good solution for an optimization problem.

### 4.1.3. Inter-digitated matching style

The device matching placement with inter-digitated matching style is one dimensional common centroid array as shown Fig. 4.1. The two devices are marked as A and B. Therefore, the matching pattern is $\mathrm{AB}_{-} \mathrm{BA}$ or $\mathrm{AB} \_\mathrm{AB}$. Each Inter-digitated group $G_{i}$ contains $S_{i}$ devices, placed according to the bounding length and width, $L_{B}$ and $W_{B}$ respectively, for the whole group in the pattern $\mathrm{AB} \_\mathrm{AB}$. $L_{G}$ denotes the sum of $S_{i}$ horizontal weights and $N_{S}$ denotes the number of segments per row. The inputs of the algorithm are devices to be matched and number of device fingers per segment $N_{f S}$.


Figure 4.2 An example of inter-digitated array

Algorithm 4.1: $\operatorname{interdig}\left(G_{i}, N_{f S}, L_{B}, W_{B}\right)$

1. // calculate coordinates of devices fingers placement.
2. // initialize m, RelX, RelY with 0
3. while ( m < number of fingers per device) DO
4. FOR each device $S_{i}$ DO
5. FOR each finger in segment range from 1 TO $N_{f S}$ DO
6. 

find x -position $\operatorname{Pos} \mathrm{X}=\operatorname{RelX}$; // relative x position
7.
8.
9.
10.
11.
12. RelY = y.max; \}
13.

$$
\mathrm{m}=\mathrm{m}+N_{f S}
$$

Rel $X=0 ;$
3. $\mathrm{m}=\mathrm{m}+N_{f S}$
find y-position $\operatorname{Pos} \mathrm{Y}=$ RelY; // relative y position increment RelX: RelX = RelX + Hweights;
$y \cdot \max =\max (y \cdot \max$ Vweights $) ;$

## IF (RelX + Hweights $>N_{s} *$ Hweights THEN $\{$

### 4.1.4. Common-centroid matching style

The matching of common centroid style requires centroids of matched devices to exactly coincide. Fig. 4.2 shows an example of matched devices by common centroid style.


Figure 4.3 An example of common centroid array

Each common-centroid group $G_{i}$ contains $S_{i}$ devices, placed according to the bounding length and width, $L_{B}$ and $W_{B}$ respectively, for the whole group in which centroid of all devices should coincide. $L_{G}$ denotes the sum of $S_{i}$ horizontal weights, $w_{H}$ and $w_{v}$ denotes
finger horizontal and vertical weights respectively, $N_{f}$ denotes number of device fingers, and $N_{S}$ denotes the number of devices finger per row.
$\operatorname{Algorithm} 4.2:$ comcentroid $\left(G_{i}, N_{S}, w_{H}, w_{v}\right)$

1. // calculate coordinates of devices fingers placement
2. // initialize radprev, rad, Xrel, Yrel with 0
3. while $\left(\mathrm{rad}<=N_{S} * w_{v}\right) \mathrm{DO}$
4. $\quad$ increment $\mathrm{rad}: \mathrm{rad}=\mathrm{rad}+w_{H}$;
5. $\quad$ Yrel $=0$;
6. find $x$-position: $\mathrm{Xpos}=$ Xrel;
7. find mirror x-position: $\mathrm{Xneg}=-\mathrm{Xpos}-w_{H}$;
8. $\quad$ while $\left(\right.$ Yrel $\left.<N_{f} * S_{i} /\left(N_{s} * 2\right)\right)$ DO $\{$
9. find y-position: Ypos = Yrel;
10. find mirror y-position: $\mathrm{Yneg}=-\mathrm{Ypos}-w_{v}$;
11. 

P[F_num $]=\operatorname{list}($ Xpos $Y$ pos $) ;$

P[F_num+1] = list(Xneg Yneg);
P[F_num+2] = list(Xneg Ypos);
14.
P[F_num+3] = list(Xpos Yneg);
15.

F_num = F_num +4 ;
16. $\quad$ increment relative position: $\mathrm{Yrel}=\mathrm{Yrel}+w_{v}$;
17. $\quad$ increment relative position: $\mathrm{Xrel}=\mathrm{Xrel}+w_{H}$;
18. $\quad \mathrm{F}[\mathrm{i}]=\mathrm{F}$ _num;
19. $\mathrm{i}=\mathrm{i}+1$;
20. // initialize k , s with 0
21. while $\left(\mathrm{k}<N_{f}\right)$ DO \{
22. // find number of device fingers per row:
23. $\quad F_{\text {Rnum }}=\mathrm{F}[\mathrm{s}] / \operatorname{NUM}\left(S_{i}\right)$;
24. FOR each device $S_{i}$ DO
25. FOR each device finger m range from $\mathrm{k} \mathrm{TO} \min \left(F_{R n u m} N_{f}-k\right) \mathrm{DO}$
26. Posx.finger $=\operatorname{nth}(0 \mathrm{P}[\mathrm{k}+\mathrm{m}])$;
27. $\quad$ Posy.finger $=n t h(1 \mathrm{P}[\mathrm{k}+\mathrm{m}])$;
28. $\mathrm{s}=\mathrm{s}+1$;
29. $\left.\mathrm{k}=\mathrm{k}+F_{\text {Rnum }} * \operatorname{NUM}\left(S_{i}\right) ;\right\}$

### 4.2. Optimization-Based Router

After placement, specific legal routing must be found for the wires needed to connect the circuits. The techniques typically applied to generate such routing are sequential in nature, treating one wire at a time with incomplete information about the positions and effects of the other wires. Annealing is inherently free of this sequence dependence. Nets with many pins must first be broken into connections-pairs of pins joined by a single continuous wire. This "ordering" of each net is highly dependent on the nature of the circuits being connected and the package technology

Based on simulated annealing algorithm [54], the router starts from the attained placement, after constructing routing channels to ensure the reliability and routability of the placement solution. The router requires modules terminal positions, allowed routing layers, and technology design rules to generate a DRC clean routing. The cost function which computes the probability of accepting a candidate net is given by:
$P=\min \left(1 e^{-\frac{\Delta D}{D_{o l d}} \cdot \frac{1}{T}}\right)$
Where T is a constant-rate decaying temperature and $\Delta D$ presents the difference between the new and the old distance between the routed net and the destination terminal, in the sense that $\Delta D$ becomes more negative as the routed net approaches the destination. Distance between the candidate net and the target pin is calculated by;
$D=\min \left(\operatorname{abs}\left(X_{2}-X_{1}^{\prime}\right) \operatorname{abs}\left(X_{1}-X_{2}^{\prime}\right)\right)+\min \left(\operatorname{abs}\left(Y_{2}-Y_{1}^{\prime}\right) \operatorname{abs}\left(Y_{1}-Y_{2}^{\prime}\right)\right)$

The probability $P$ is then compared with a threshold constant $r$. A candidate net is accepted if $P \geq r$. Hence, chosen net is the one with the least cost, i.e., minimum wirelength.

During routing, each net is instantiated with its electrical constraints, e.g. current density, according to designer preferences, which are automatically converted to the corresponding wire width and layer according to a lookup table generated from the
technology file used. The algorithm searches for the minimum metal width satisfying the rms current density specified by the designer, according to available routing layers and the blockages surrounding the routed net within the DRC spacing specified for each blockage layer. The minimum DRC spacing allowed for each metal layer is defined by; the width, the layer of examined metals, and the length of the part in which metal lines are in a close proximity. Given a number of routing layers, each net is routed with a different metal layer in the presence of obstacles, e.g. wires, in order to ensure minimum wirelength. Metal lines are forbidden to pass over the devices. Multiple power straps are generated using reserved metal layers in Manhattan-like style to account for supply drop and hence prevent performance degradation.

## 5.Experimental Results

OASYN framework is implemented in 10,000 lines of code using SKILL programming language on a $2.4-\mathrm{GHz}$ core i 3 processor with 2 GB of memory. Table 3,4 , and 5 show simulated results of the circuit synthesizer for Folded Cascode OpAmp topology. Experiments are implemented using 65 nm TSMC technology node. Table 6 shows simulated results of the circuit synthesizer for the same topology accounting for process, temperature, and supply variations with the minimum specs reported. Table 7 shows detailed simulated results for each corner.

Based on the MCNC benchmark circuits shown in Table I, experiments on area optimization, convergence speed, and convergence stability are conducted for each representation in the literature. Number of modules, I/O pads, nets and pins of the benchmark circuits are shown in Table I. Area and run time comparisons among different floorplan representations; SP, O-tree, $\mathrm{B}^{*}$-tree, enhanced O-tree, CBL, TCG, and TCG-S are shown in Table 2. TCG-S employing TCG-S* perturbing algorithm achieves almost the state-of-art area usage for the five benchmark circuits at the highest convergence speed.

Figure 5.1 shows the placements for the devices sizings indicated in Table 5 for simultaneous area and matching constraints optimization. Figure 5.2 shows the placement and routing results. Figure 5.3 shows the DRC Error messages of which there are no DRC spacing errors included (only density and CAD layer errors).

Table 2. Area and Runtime Comparisons among SP (On Sun Sparc Ultra60), O-Tree (On Sun Sparc Ultra60), B -TREE (On Sun Sparc Ultra 60), Enhanced O-Tree (On Sun Sparc Ultra60), CBL (On Sun Sparc 20), TCG (On Sun Sparc Ultra60), TCG-S (On Sun Sparc Ultra60), and TCG-S* (On Intel Core-i3) for

Area Optimization

| Circuit | SP |  | O-tree |  | B*-tree |  | $\begin{gathered} \hline \text { Enhanced } \\ \text { O-tree } \end{gathered}$ |  | CBL |  | TCG |  | TCG-S |  | TCG-S* |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
|  | $\begin{gathered} \hline \text { Area } \\ \left(\mathrm{mm}^{2}\right) \\ \hline \end{gathered}$ | Time (sec) | $\begin{gathered} \hline \hline \text { Area } \\ \left(\mathrm{mm}^{2}\right) \\ \hline \end{gathered}$ | Time (sec) | $\begin{gathered} \hline \text { Area } \\ \left(\mathrm{mm}^{2}\right) \\ \hline \end{gathered}$ | Time (sec) | $\begin{gathered} \hline \text { Area } \\ \left(\mathrm{mm}^{2}\right) \\ \hline \end{gathered}$ | $\begin{aligned} & \hline \hline \text { Time } \\ & (\mathrm{sec}) \end{aligned}$ | $\begin{gathered} \hline \hline \text { Area } \\ \left(\mathrm{mm}^{2}\right) \\ \hline \end{gathered}$ | Time (sec) | $\begin{gathered} \hline \text { Area } \\ \left(\mathrm{mm}^{2}\right) \\ \hline \end{gathered}$ | $\begin{aligned} & \hline \text { Time } \\ & (\mathrm{sec}) \end{aligned}$ | $\begin{gathered} \hline \hline \text { Area } \\ \left(\mathrm{mm}^{2}\right) \\ \hline \end{gathered}$ | $\begin{aligned} & \hline \hline \text { Time } \\ & (\mathrm{sec}) \end{aligned}$ | $\begin{gathered} \hline \text { Area } \\ \left(\mathrm{mm}^{2}\right) \\ \hline \end{gathered}$ | $\begin{aligned} & \hline \hline \text { Time } \\ & (\mathrm{sec}) \\ & \hline \end{aligned}$ |
| apte | 48.12 | 13 | 47.1 | 38 | 46.92 | 7 | 46.92 | 11 | NA | NA | 46.92 | 1 | 46.92 | 1 | 46.92 | 0.2 |
| xerox | 20.69 | 15 | 20.1 | 118 | 19.83 | 25 | 20.21 | 38 | 20.96 | 30 | 19.83 | 18 | 19.796 | 5 | 20.74 | 0.62 |
| hp | 9.93 | 5 | 9.21 | 57 | 8.947 | 55 | 9.16 | 19 | 66.14 | 32 | 8.947 | 20 | 8.947 | 7 | 9.37 | 10 |

Table 3. Folded Cascode OpAmp Synthesis Results

| Metric | Specifications | Simulated Results | Synthesized Circuit Parameters |
| :---: | :---: | :---: | :---: |
| Open Loop Gain (dB) | 60 | 60 | $\begin{aligned} & \mathrm{L} 1=228 \mathrm{n} . \mathrm{L} 3=490 \mathrm{n} . \mathrm{L} 5=500 \mathrm{n} . \\ & \mathrm{L} 7=3.6 \mathrm{u} . \mathrm{L} 9=2.7 \mathrm{u} . \mathrm{Lss}=510 \mathrm{n} . \\ & \mathrm{W} 1=221 \mathrm{u} . \mathrm{W} 3=51.8 \mathrm{u} . \mathrm{W} 5=21.4 \mathrm{u} \\ & \mathrm{~W} 7=305 \mathrm{u} . \mathrm{W} 9=5.3 \mathrm{u} . \mathrm{Wss}=1.58 \mathrm{u} \\ & \mathrm{Vb} 1=0.642 . \mathrm{Vb} 2=0.439 \\ & \mathrm{Mt} 9=46 . \mathrm{Mtss}=54 \end{aligned}$ |
| GBW (HZ) | 350M | 398M |  |
| Phase Margin (degree) | 60 | 65.87 |  |
| Current Consumption (mA) | 2 | 1.75 |  |
| Output Swing (v) | 0.8 | 0.9898 |  |
| Slew Rate (v/us) | none | 230 |  |
| Load Cap. (pF) |  |  |  |
| VICM(v) |  |  |  |

Table 4. Folded Cascode OpAmp Synthesis Results

| Metric | Specifications | Simulated Results | Synthesized Circuit Parameters |
| :---: | :---: | :---: | :---: |
| Open Loop Gain (dB) | 60 | 60 | $\begin{aligned} & \mathrm{L} 1=258 \mathrm{n} . \mathrm{L} 3=550 \mathrm{n} . \mathrm{L} 5=530 \mathrm{n} . \\ & \mathrm{L} 7=3.6 \mathrm{u} . \mathrm{L} 9=2.16 \mathrm{u} . \mathrm{Lss}=480 \mathrm{n} . \\ & \mathrm{W} 1=252 \mathrm{u} . \mathrm{W} 3=114 \mathrm{u} . \mathrm{W} 5=40.7 \mathrm{u} . \\ & \mathrm{W} 7=557 \mathrm{u} . \mathrm{W} 9=5.2 \mathrm{u} . \mathrm{Wss}=8.914 \mathrm{u} \\ & \mathrm{Vb} 1=0.642 . \mathrm{Vb} 2=0.449 \\ & \mathrm{Mt} 9=82 . \mathrm{Mtss}=80 . \end{aligned}$ |
| GBW (HZ) | 600M | 605.2M |  |
| Phase Margin (degree) | 55 | 58.19 |  |
| Current Consumption (mA) | 3 | 2.88 |  |
| Output Swing (v) | 0.8 | 1.006 |  |
| Slew Rate (v/us) | none | 542 |  |
| Load Cap. (pF) |  |  |  |
| VICM (v) |  |  |  |

Table 5. Folded Cascode OpAmp Synthesis Results

| Metric | Specifications | Simulated Results | Synthesized Circuit Parameters |
| :---: | :---: | :---: | :---: |
| Open Loop Gain (dB) | 60 | 60 | $\begin{aligned} & \mathrm{L} 1=258 \mathrm{n} . \mathrm{L} 3=550 \mathrm{n} . \mathrm{L} 5=500 \mathrm{n} . \\ & \mathrm{L} 7=3.6 \mathrm{u} . \mathrm{L} 9=1.77 \mathrm{u} . \mathrm{Lss}=480 \mathrm{n} . \\ & \mathrm{W} 1=355 \mathrm{u} . \mathrm{W} 3=170 \mathrm{u} . \mathrm{W} 5=56.8 \mathrm{u} . \\ & \mathrm{W} 7=800 \mathrm{u} . \mathrm{W} 9=5.1 \mathrm{u} . \mathrm{Wss}=13 \mathrm{u} \\ & \mathrm{Vb} 1=0.642 . \mathrm{Vb} 2=0.474 \\ & \mathrm{Mt} 9=100 . \mathrm{Mtss}=117 . \end{aligned}$ |
| GBW (HZ) | 0.8G | 0.81 G |  |
| Phase Margin (degree) | 50 | 51.66 |  |
| Current Consumption (mA) | 4 | 3.797 |  |
| Output Swing (v) | 0.9 | 1.055 |  |
| Slew Rate (v/us) | none | 794 |  |
| Load Cap. (pF) |  |  |  |
| VICM (v) |  |  |  |

Table 6. Folded Cascode OpAmp Synthesis Results on Process, Voltage, and Temperature Corners

| Metric | Specifications | Simulated Results (min) | Post Layout Simulated Results (min) | Synthesized Circuit Parameters |
| :---: | :---: | :---: | :---: | :---: |
| Open Loop Gain (dB) | 50 | 52.7 | 43.3 | $\begin{aligned} & \hline \hline \mathrm{L} 1=200 \mathrm{n} . \mathrm{L} 3=520 \mathrm{n} . \mathrm{L} 5=500 \mathrm{n} . \\ & \mathrm{L} 7=3.6 \mathrm{u} . \mathrm{L} 9=1.2 \mathrm{u} . \mathrm{Lss}=300 \mathrm{n} . \\ & \mathrm{W} 1=156 \mathrm{u} . \mathrm{W} 3=34 \mathrm{u} . \mathrm{W} 5=52 \mathrm{u} . \\ & \mathrm{W} 7=60 \mathrm{u} . \mathrm{W} 9=6 \mathrm{u} . \mathrm{Wss}=20 \mathrm{u} \\ & \mathrm{Vb} 1=0.642 . \mathrm{Vb} 2=0.48 \\ & \mathrm{Mt} 9=10 . \mathrm{Mtss}=28 . \end{aligned}$ |
| GBW (HZ) | 200M | 251M | 136M |  |
| Phase Margin (degree) | 50 | 56.8 | 52.4 |  |
| Current Consumption (mA) | 2 | 1.51 | 1.09 |  |
| Output Swing (v) | 0.7 | 0.74 | 0.62 |  |
| Slew Rate (v/us) | none | 148.7 | 122.3 |  |
| Load Cap. (pF) |  |  |  |  |
| VICM (v) |  |  |  |  |

Table 7. Folded Cascode OpAmp Synthesis Results on Process, Voltage, and Temperature Corners

|  | Process | SS |  |  |  | FF |  |  |  | SF |  |  |  | FS |  |  |  | TT |  |  |  |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| O | Temp | 0 |  | 80 |  | 0 |  | 80 |  | 0 |  | 80 |  | 0 |  | 80 |  | 0 |  | 80 |  |
|  | Supply | 0.9 | 1.1 | 0.9 | 1.1 | 0.9 | 1.1 | 0.9 | 1.1 | 0.9 | 1.1 | 0.9 | 1.1 | 0.9 | 1.1 | 0.9 | 1.1 | 0.9 | 1.1 | 0.9 | 1.1 |
| $\begin{aligned} & \text { 3 } \\ & \stackrel{0}{0} \\ & \stackrel{1}{\mathrm{n}} \end{aligned}$ | Gain(dB) | 61.5 | 53.6 | 59.5 | 52.7 | 60.1 | 59.9 | 56.7 | 56.3 | 60.4 | 55.6 | 57.5 | 54.0 | 60.7 | 59.7 | 58.1 | 56.6 | 60.9 | 58.2 | 58.4 | 55.77 |
|  | GBW(MHz) | 251 | 306 | 298 | 251 | 430 | 542 | 394 | 427 | 381 | 361 | 338 | 287 | 284 | 525 | 334 | 412 | 345 | 458 | 353 | 357 |
|  | PM(deg) | 56.8 | 77.4 | 60.8 | 79.3 | 58.6 | 62.5 | 63.3 | 67.2 | 60.0 | 74.1 | 66.7 | 77.4 | 56.8 | 62.8 | 59.8 | 67.3 | 57.57 | 67.6 | 62.1 | 71.9 |
|  | I(mA) | 0.40 | 1.33 | 0.69 | 1.4 | 0.84 | 1.44 | 1.1 | 1.5 | 0.76 | 1.4 | 1.03 | 1.5 | 0.45 | 1.4 | 0.76 | 1.43 | 0.60 | 1.4 | 0.90 | 1.46 |
|  | Swing(v) | 1.24 | 0.99 | 1.02 | 0.78 | 0.97 | 1.0 | 0.74 | 0.77 | 1.11 | 1.0 | 0.87 | 0.78 | 1.11 | 1.02 | 0.88 | 0.80 | 1.11 | 1.02 | 0.89 | 0.80 |
|  | SLR(v/us) | 148 | 553 | 274 | 582 | 336 | 590 | 445 | 617 | 303 | 586 | 417 | 612 | 172 | 561 | 304 | 589.6 | 236 | 575 | 362 | 601 |
|  | Load Cap.(pF) | 1 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |
|  | VICM (v) | 0.5 |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |  |



Figure 5.1. Generated Folded Cascode OpAmp Layout with the Common Feedback Circuit for Simultaneous Area and Matching Constraints Optimization. Area $=29.665 \times 102.065 \mathrm{um} 2$


Figure 5.2 Automated Placement and routing solution $($ Area $=146 * 47 \mathrm{um} 2)$

| Calbre - RVE V2012.4 16.11 : Foldeacopy.arc.resuls |  |  |  |  |  | - $\mathrm{n} \times \mathrm{x}$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Elie Xirw Higraght Iools yindow Seup |  |  |  |  |  | Help |
|  | D \& 8 $0^{\circ} 10$ |  | Suarch | * |  |  |
|  |  |  |  |  |  | 哭* |
|  |  |  |  |  |  | $\times$ |
|  Chipx sot thatract dod |  |  |  |  |  | $\times$ |

Figure 5.3 Calibre DRC Message of the placement solution


Figure 5.4 Calibre LVS Message of the layout solution

## Conclusion

In this Thesis, a framework is presented for synthesis of operational amplifiers on the cell-level. The tool optimizes the design on both circuit and layout phases by exploring the corners design space and optimizing on worst case solution. Although the results shown are promising, yet other constraints and optimization factors need to be weighed into the tool design flow. The tool undermines the effects of boundary constraints, isolation constraints, and total wirelength of the routed nets. Floorplan area optimizer showed state-of-art results as optimization is applied on relatively few number of blocks. However, as number of blocks increase, the optimizer finds it more difficult to search for the optimum solution compared to other representations. Hence, a complexity analysis for TCG-S* based area optimizer is required to be studied. Considering the circuit synthesis tool, area optimization was only introduced in a later stage limiting the design space for area-power optimization. Applying the aforementioned enhancements and upgrading the tool on the system level can assist in the introduction of the concept of optimized standard-cell, which is well-established in the digital flow, in analog design.

## Future Works

- Simultaneous optimization on area and wirelength. Wirelength of a net is estimated by half perimeter of the minimum bounding box enclosing the terminals of the net.
- Could SA be trapped in a local maxima?

Simulated annealing can be applied to reduce the effect of the highly non-linear non-monotonic behavior of the model.

- Perform Sobol's sensitivity analysis on other amplifier topologies, e.g., Two-Stage Miller compensated OTA, to prove the universality of the algorithm and its minor dependency on the law and the model.
- Area Power optimization can be introduced earlier in the design stage, either by a rough calculation of the area based on the device gate dimensions or by looping through schematic and layout phases.


## References

[1] M. Degrauwe et al., "IDAC: An interactive design taol for analog CMOS circuits," IEEE J. Solid-State Circuits, vol. 22, pp. 1106-1 115, Dec. 1987
[2] R. Harjani, R. A. Rutenbar, and L. R. Carley, "OASYS: A framework for analog circuit synthesis," IEEE Trans. Computer-Aided Design Integrated Circuits and Systems, vol. 8, no. 12, pp. 1247-1266, Dec. 1989.
[3] H. Y. Koh, C. H. Séquin, and P. R. Gray, "OPASYN: A compiler for CMOS operational amplifiers," IEEE Trans. Compute.-Aided Design Integrated Circuits and Systems, vol. 9, no. 2, pp. 113-125, Feb. 1990.
[4] S.K. Gupta and M.M. Hasan, "KANSYS: a CAD tool for analog circuit synthesis," in Proc. of Ninth International Conference on VLSI Design, pp. 333-334, 1996.
[5] N. Fujip, "Second Order Sensitivity Analysis for a Class of Shape Optimization Problems", In Proc. IEEE 20th International Conference on Industrial Electronics, Control and Instrumentation, Sep. 1994, pp. 1176-1178.
[6] F. M. E1-Turky and R. A. Nordin, "BLADES: An expert system for analog circuit design," in Proc. Int. Conf. Circuits Syst., 1986, pp. 552-555.
[7] H. Yang, A. Agarwal, and R. Vemuri, "Fast analog circuit synthesis using multiparameter sensitivity analysis based on element-coefficient diagrams," in Proc. IEEE Comput. Soc. Annu. Symp. VLSI, Tampa, FL 2005, pp. 71-76.
[8] H. Yang, R. Vemuri, "Efficient Temperature-Dependent Symbolic Sensitivity Analysis and Symbolic Performance Evaluation in Analog Circuit Synthesis", In Proc. IEEE Design, Automation and Test in Europe, Mar. 2006, pp. 1-2.
[9] R. H. J. M. Otten, "Automatic floorplan design," in Proc. Design Auto-mation Conf., 1982, pp. 261-267.
[10] D. F. Wong and C. L. Liu, "A new algorithm for floorplan design," in Proc. Design Automation Conf., 1986, pp. 101-107.
[11] X. Tang, R. Tian, and D. Wong, "Fast evaluation of sequence pair in block placement by longest common subsequence computation," IEEE Trans. CAD ICs., vol. 20, no. 12, pp. 1406-1413, Dec. 2001.
[12] J.-M. Lin and Y.-W. Chang, "TCG: A transitive closure graph-based representation for general floorplans," IEEE Trans. VLSI Syst., 2003.
[13] P.-N. Guo, C.-K. Cheng, and T. Yoshimura, "An O-tree representation of nonslicing floorplan and its applications," in Proc. Design Automation Conf., 1999, pp. 268-273.
[14] X. Hong, G. Huang, T. Cai, J. Gu, S. Dong, C.-K. Cheng, and J. Gu, "Corner block list: An effective and efficient topological representation of nonslicing floorplan," in Proc. Int. Conf. Computer-Aided Design, 2000, pp. 8-12.
[15] H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani, "VLSI module placement based on rectangle-packing by the sequence pair," IEEE Trans. Computer-Aided Design, vol. 15, pp. 1518-1524, Dec. 1996.
[16] S. Nakatake, K. Fujiyoshi, H. Murata, and Y. Kajitani, "Module placement on BSG-structure and IC layout applications," in Proc. Int. Conf. Computer-Aided Design, 1996, pp. 484-491.
[17] Y. C. Chang, Y. W. Chang, G. M. Wu, and S. W. Wu, "B -trees: A new representation for nonslicing floorplans," in Proc. Design Automation Conf., 2000, pp. 458-463.
[18] J.-M. Lin and Y.-W. Chang, ‘TCG-S: Orthogonal Coupling of P*-admissible Reprepresentations for General Floorplans." IEEE Trans. Computer-Aided Design, Vol. 24. No. 6, June 2004.
[19] T. Cormen, C. Leiserson, R. Rivest, and C. Stein, Introduction to Algo-rithms, 2nd ed. New York: MIT Press/McGraw-Hill, 2001.
[20] G. Giclen and R.A. Rutcnbar, "Computcr-Aidcd Dcsign of Analog and MixedSignal Integrated Circuits." Proceedings of the IEEE, 88(12): 1825-1852. Dcc. 2000.
[21] M. zakaria, M. Madbouly, M. A. El-Nozahi,, and M. Dessouky,"Knowledge-Based Design Automation of Highly Non-Linear Circuits Using Simulation Correction." Proceedings of the 15th International Conference on Microelectronics, Dec. 2003, pp. 46-49.
[22] C. Toumazou, C. A. Makris, and C. M. Berrah, "ISAID: A methodology for automated analog IC design," in Proc. Int. Symp. Circuits Syst., 1990, vol. 1, pp. 531555.
[23] E. Berkcan, M. d'Abreu, and W. Laughton, "Analog compilation based on successive decompositions," in Proc. Des. Autom. Conf., 1988, pp. 369-375.
[24] Z. Ning, A. J. Mouthaan, and H.Wallinga, "SEAS: A simulated evolution approach for analog circuit synthesis," in Proc. Custom Integr. Circuits Conf., 1991, pp. 5.2-1-5.2-4.
[25] K. Swings, S. Donnay, and W. M. C. Sansen, "HECTOR: A hierarchical topologyconstruction program for analog circuits based on a declarative approach to circuit modeling," in Proc. Custom Integr. Circuits Conf., 1991, pp. 5.3/1-5.3/4.
[26] B. A. A. Antao and A. J. Brodersen, "ARCHGEN: Automated synthesis of analog systems," IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 3, no. 2, pp. 231244, Jun. 1995.
[27] N. C. Horta and J. E. Franca, "Algorithm-driven synthesis of data conversion architectures," IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 10, no. 16, pp. 1116-1135, Oct. 1997.
[28] T. McConaghy, P. Palmers, M. Steyaert, and G. Gielen, "Variation aware structural synthesis of analog circuits via hierarchical building blocks and structural homotopy," IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., vol. 28, no. 9, pp. 1281-1294, Sep. 2009.
[29] J. R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge, MA: MIT Press, 1992.
[30] G. S. Hornby, "ALPS: The age-layered population structure for reducing the problem of premature convergence," in Proc. Conf. Genetic Evol. Comput., M. Keijzer, M. Cattolico, D. Arnold, V. Babovic, C. Blum, P. Bosman, M. V. Butz, C. CoelloCoello, D. Dasgupta, S. G. Ficici, J. Foster, A. Hernandez-Aguirre, G. Hornby, H. Lipson, P. McMinn, J. Moore, G. Raidl, F. Rothlauf, C. Ryan, and D. Thierens, Eds., 2006, vol. 1, pp. 815-822.
[31] R. Martins, N. Lourenço, S. Rodrigues, J. Guilherme, N. Horta, "AIDA: Automated Analog IC Design Flow from Circuit Level to Layout", Proceedings of International Conference on Synthesis, Modeling, Analysis and Simulation Methods and Applications to Circuit Design (SMACD), Seville, Spain, Sep. 2012.
[32] M. Dessouky, M.-M. Louerat, and J. Porte, "Layout-oriented synthesis of high performance analog circuits," In Proceedings of Conference on Design, Automation and Test in Europe (DATE), pp. 53-57, 2000.
[33] H. Habal and H. Graeb, "Constraint-based layout-driven sizing of analog circuits," IEEE Trans. Computer-Aided Design Integr. Circuits Syst., vol.30, no. 8, pp. 10891102, Aug. 2011.
[34] F. Balasa, K. Lampaert, "Symmetry within the sequence-pair representation in the context of placement for analog design," IEEE Trans. CAD of IC's and Syst., vol. 19, no. 7, pp. 721-731, 2000.
[35] K. Krishnamoorthy, S. Maruvada, and F. Balasa, "Topological placement with multiple symmetry groups of devices for analog layout design," in Proc. IEEE Int. Symp. Circuits Syst., May 2007, pp. 20322035.
[36] S. Dong, Z. Zhou, X. Hong, "A New Constraint-Driven Placement Approach for Analog Circuits", In Proc. IEEE 8th International Conference on Solid-State and Integrated Circuit Technology, 2006, pp. 1763-1765.
[37] L. Xiao and E. Young, "Analog placement with common centroid and 1-D symmetry constraints," in Proc. IEEE ASP-DAC, Jan. 2009, pp. 353-360.
[38] J. Lai, M.-S. Lin, T.-C. Wong, and L.-C. Wang, "Module placement with boundary constraints using the sequence-pair representation," in Proc. IEEE Asia and South Pacific Design Automation Conf., 2001, pp. 515-520.
[39] A.B. Kahng S. Reda, "Wirelength Minimization for Min-Cut Placements via Placement Feedback", IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems, Vol. 25, no. 7, pp. 1301-1312, July 2006.
[40] L. Xiao, E. F. Y. Young, X. He, and K. P. Pun, "Practical placement and routing techniques for analog circuit designs," in Proc. IEEE/ACM Int. Conf. on Comput.Aided Des., 2010, pp. 675-679.
[41] Cheng-Wu Lin, Chun-Po Huang, Soon-Jyh Chang, Jai-Ming Lin. Routing-aware Placement Algorithms for Modern Analog Integrated Circuits. Circuits and Systems (MWSCAS), 2011. IEEE 54 ${ }^{\text {th }}$ International Midwest Symposium on. Pages: 1-4, 2011.
[42] H. Ou, H.C. Chien, Y. Chang, "Simultaneous Analog Placement and Routing with Current Flow and Current Density Considerations", In Proc. ieee Design Automation Conference (DAC), May 2013, pp. 1-6.
[43] W. Liu, C. Koh, and Y. Li, "Optimization of Placement Solutions for Routability", In Proc. Ieee Design Automation Conference (DAC), May 2013, pp. 1-9.
[44] H. Zhou, C. Sham, H. Yao, "Congestion-Oriented Approach in Placement for Analog and Mixed-Signal Circuits", In Proc. IEEE 5 ${ }^{\text {th }}$ Asia Symposium on Quality Electronic Design, 2013, pp. 97-102.
[45] L. Zhang and Y. Jiang, "Global-routing driven placement strategy in analog VLSI physical designs," in Proc. MWSCAS, 2005, pp. 1239-1242.
[46] H. Yang, R. Vemuri, "Efficient Symbolic Sensitivity based Parasitic-Inclusive Optimization in Layout Aware Analog Circuit Synthesis", In Proc. IEEE 20th International Conference on VLSI Design, 2007, Jan. 2007, 201-206.
[47] L. C. Severo, A. Girardi, "Parameter Variation and Sensitivity Analysis of a TwoStage Miller Amplifier", In Proc. IEEE Argentine School of Micro-Nanoelectronics, Technology and Applications, Oct. 2010, pp. 78-81.
[48] Yiu-Cheong Tam, Evangline F.Y. Young, Chris Chu. Analog Placement with Symmetry and Other Placement Constraints. Computer- Aided Design. Pages: 349354, 2006.
[49] Ender Yilmaz, Gunhan Dundar. Analog Layout Generator for CMOS Circuit. Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions in, 28(1). Pages: 32-45, 2009.
[50] Y. Wu, X. Zhang, L. Chen, S. Fang, "Automatic Placement for Matched Devices of Analog Circuits", In Proc. IEEE Int. Conf. on Natural Computation, July 2013, pp. 1723-1727.
[51] Sobol IM. Sensitivity estimates for nonlinear mathematical models. Mathematical Modelling and Computational Experiments 1993;1(4): 407-14.
[52] Crestaux T, Le Maitre O, Martinez JM. Polynomial chaos expansion for sensitivity analysis. Reliability Engineering and System Safety, 2009; 94: 1161-1172.
[53] L. Dawei, Q. Zhou, J. Bian, Y. Cai X. Hong, "Cell Shifting Aware of Wirelength and Overlap", In proc. IEEE Quality of Electronic Design, Mar. 2009, pp. 506-510.
[54] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, "Optimization by Simulated Annealing," Science, vol. 220, no. 4598, pp.671-680, May 13, 1983.
[55] S.N. Adya and I.L. Markov. Fixed-outline floorplanning: Enabling hierarchical design. IEEE Trans. on VLSI Systems, 11(6):1120-1135, December 2003.
[56] M. Kayal, S. Piguet, M. Declerq, and B. Hochet, "SALIM: A layout generation tool for analog ICs," in Proc. IEEE Custom Integrated Circuits Conf., 1988, pp. 7.5.1-7.5.4.
[57] S. W. Mehranfar, "STAT: A schematic to artwork translator for custom analog cells," Proc. 1990 IEEE Custom Integrated Circuits Conf., pp. 30.2.1-30.2.3, 1990.
[58] E. Malavasi, J. L. Ganley, and E. Charbon, "Quick placement with geometric constraints," in Proc. IEEE Custom Integrated Circuits Conf., 1997, pp. 561-564.

