411 research outputs found

    Layout decomposition for triple patterning lithography

    Get PDF
    Nowadays the semiconductor industry is continuing to advance the limits of physics as the feature size of the chip keeps shrinking. Products of the 22 nm technology node are already available on the market, and there are many ongoing research studies for the 14/10 nm technology nodes and beyond. Due to the physical limitations, the traditional 193 nm immersion lithography is facing huge challenges in fabricating such tiny features. Several types of next-generation lithography techniques have been discussed for years, such as {\em extreme ultra-violet} (EUV) lithography, {\em E-beam direct write}, and {\em block copolymer directed self-assembly} (DSA). However, the source power for EUV is still an unresolved issue. The low throughput of E-beam makes it impractical for massive productions. DSA is still under calibration in research labs and is not ready for massive industrial deployment. Traditionally features are fabricated under single litho exposure. As feature size becomes smaller and smaller, single exposure is no longer adequate in satisfying the quality requirements. {\em Double patterning lithography} (DPL) utilizes two litho exposures to manufacture features on the same layer. Features are assigned to two masks, with each mask going through a separate litho exposure. With one more mask, the effective pitch is doubled, thus greatly enhancing the printing resolution. Therefore, DPL has been widely recognized as a feasible lithography solution in the sub-22 nm technology node. However, as the technology continues to scale down to 14/10 nm and beyond, DPL begins to show its limitations as it introduces a high number of stitches, which increases the manufacturing cost and potentially leads to functional errors of the circuits. {\em Triple pattering lithography} (TPL) uses three masks to print the features on the same layer, which further enhances the printing resolution. It is a natural extension for DPL with three masks available, and it is one of the most promising solutions for the 14/10 nm technology node and beyond. In this thesis, TPL decomposition for standard-cell-based designs is extensively studied. We proposed a polynomial time triple patterning decomposition algorithm which guarantees finding a TPL decomposition if one exists. For complex designs with stitch candidates, our algorithm is able to find a solution with the optimal number of stitches. For standard-cell-based designs, there are additional coloring constraints where the same type of cell should be fabricated following the same pattern. We proposed an algorithm that is guaranteed to find a solution when one exists. The framework of the algorithm is also extended to pattern-based TPL decompositions, where the cost of a decomposition can be minimized given a library of different patterns. The polynomial time TPL algorithm is further optimized in terms of runtime and memory while keeping the solution quality unaffected. We also studied the TPL aware detailed placement problem, where our approach is guaranteed to find a legal detailed placement satisfying TPL coloring constraints as well as minimizing the {\em half-perimeter wire length} (HPWL). Finally, we studied the problem of performance variations due to mask misalignment in {\em multiple patterning decompositions} (MPL). For advanced technology nodes, process variations (mainly mask misalignment) have significant influences on the quality of fabricated circuits, and often lead to unexpected power/timing degenerations. Mask misalignment would complicate the way of simulating timing closure if engineers do not understand the underlying effects of mask misalignment, which only exists in multiple patterning decompositions. We mathematically proved the worst-case scenarios of coupling capacitance incurred by mask misalignment in MPL decompositions. A graph model is proposed which is guaranteed to compute the tight upper bound on the worst-case coupling capacitance of any MPL decompositions for a given layout

    VLSI Routing for Advanced Technology

    Get PDF
    Routing is a major step in VLSI design, the design process of complex integrated circuits (commonly known as chips). The basic task in routing is to connect predetermined locations on a chip (pins) with wires which serve as electrical connections. One main challenge in routing for advanced chip technology is the increasing complexity of design rules which reflect manufacturing requirements. In this thesis we investigate various aspects of this challenge. First, we consider polygon decomposition problems in the context of VLSI design rules. We introduce different width notions for polygons which are important for width-dependent design rules in VLSI routing, and we present efficient algorithms for computing width-preserving decompositions of rectilinear polygons into rectangles. Such decompositions are used in routing to allow for fast design rule checking. A main contribution of this thesis is an O(n) time algorithm for computing a decomposition of a simple rectilinear polygon with n vertices into O(n) rectangles, preseverving two-dimensional width. Here the two-dimensional width at a point of the polygon is defined as the edge length of a largest square that contains the point and is contained in the polygon. In order to obtain these results we establish a connection between such decompositions and Voronoi diagrams. Furthermore, we consider implications of multiple patterning and other advanced design rules for VLSI routing. The main contribution in this context is the detailed description of a routing approach which is able to manage such advanced design rules. As a main algorithmic concept we use multi-label shortest paths where certain path properties (which model design rules) can be enforced by defining labels assigned to path vertices and allowing only certain label transitions. The described approach has been implemented in BonnRoute, a VLSI routing tool developed at the Research Institute for Discrete Mathematics, University of Bonn, in cooperation with IBM. We present experimental results confirming that a flow combining BonnRoute and an external cleanup step produces far superior results compared to an industry standard router. In particular, our proposed flow runs more than twice as fast, reduces the via count by more than 20%, the wiring length by more than 10%, and the number of remaining design rule errors by more than 60%. These results obtained by applying our multiple patterning approach to real-world chip instances provided by IBM are another main contribution of this thesis. We note that IBM uses our proposed combined BonnRoute flow as the default tool for signal routing

    EDA Solutions for Double Patterning Lithography

    Get PDF
    Expanding the optical lithography to 32-nm node and beyond is impossible using existing single exposure systems. As such, double patterning lithography (DPL) is the most promising option to generate the required lithography resolution, where the target layout is printed with two separate imaging processes. Among different DPL techniques litho-etch-litho-etch (LELE) and self-aligned double patterning (SADP) methods are the most popular ones, which apply two complete exposure lithography steps and an exposure lithography followed by a chemical imaging process, respectively. To realize double patterning lithography, patterns located within a sub-resolution distance should be assigned to either of the imaging sub-processes, so-called layout decomposition. To achieve the optimal design yield, layout decomposition problem should be solved with respect to characteristics and limitations of the applied DPL method. For example, although patterns can be split between the two sub-masks in the LELE method to generate conflict free masks, this pattern split is not favorable due to its sensitivity to lithography imperfections such as the overlay error. On the other hand, pattern split is forbidden in SADP method because it results in non-resolvable gap failures in the final image. In addition to the functional yield, layout decomposition affects parametric yield of the designs printed by double patterning. To deal with both functional and parametric challenges of DPL in dense and large layouts, EDA solutions for DPL are addressed in this thesis. To this end, we proposed a statistical method to determine the interconnect width and space for the LELE method under the effect of random overlay error. In addition to yield maximization and achieving near-optimal trade-off between different parametric requirements, the proposed method provides valuable insight about the trend of parametric and functional yields in future technology nodes. Next, we focused on self-aligned double patterning and proposed layout design and decomposition methods to provide SADP-compatible layouts and litho-friendly decomposed layouts. Precisely, a grid-based ILP formulation of SADP decomposition was proposed to avoid decomposition conflicts and improve overall printability of layout patterns. To overcome the limited applicability of this ILP-based method to fully-decomposable layouts, a partitioning-based method is also proposed which is faster than the grid-based ILP decomposition method too. Moreover, an A∗-based SADP-aware detailed routing method was proposed which performs detailed routing and layout decomposition simultaneously to avoid litho-limited layout configurations. The proposed router preserves the uniformity of pattern density between the two sub-masks of the SADP process. We finally extended our decomposition method for double patterning to triple patterning and formulated SATP decomposition by integer linear programming. In addition to conventional minimum width and spacing constraints, the proposed decomposition method minimizes the mandrel-trim co-defined edges and maximizes the layout features printed by structural spacers to achieve the minimum pattern distortion. This thesis is one of the very early researches that investigates the concept of litho-friendliness in SADP-aware layout design and decomposition. Provided by experimental results, the proposed methods advance prior state-of-the-art algorithms in various aspects. Precisely, the suggested SADP decomposition methods improve total length of sensitive trim edges, total EPE and overall printability of attempted designs. Additionally, our SADP-detailed routing method provides SADP-decomposable layouts in which trim patterns are highly robust to lithography imperfections. The experimental results for SATP decomposition show that total length of overlay-sensitive layout patterns, total EPE and overall printability of the attempted designs are also improved considerably by the proposed decomposition method. Additionally, the methods in this PhD thesis reveal several insights for the upcoming technology nodes which can be considered for improving the manufacturability of these nodes

    DFM Techniques for the Detection and Mitigation of Hotspots in Nanometer Technology

    Get PDF
    With the continuous scaling down of dimensions in advanced technology nodes, process variations are getting worse for each new node. Process variations have a large influence on the quality and yield of the designed and manufactured circuits. There is a growing need for fast and efficient techniques to characterize and mitigate the effects of different sources of process variations on the design's performance and yield. In this thesis we have studied the various sources of systematic process variations and their effects on the circuit, and the various methodologies to combat systematic process variation in the design space. We developed abstract and accurate process variability models, that would model systematic intra-die variations. The models convert the variation in process into variation in electrical parameters of devices and hence variation in circuit performance (timing and leakage) without the need for circuit simulation. And as the analysis and mitigation techniques are studied in different levels of the design ow, we proposed a flow for combating the systematic process variation in nano-meter CMOS technology. By calculating the effects of variability on the electrical performance of circuits we can gauge the importance of the accurate analysis and model-driven corrections. We presented an automated framework that allows the integration of circuit analysis with process variability modeling to optimize the computer intense process simulation steps and optimize the usage of variation mitigation techniques. And we used the results obtained from using this framework to develop a relation between layout regularity and resilience of the devices to process variation. We used these findings to develop a novel technique for fast detection of critical failures (hotspots) resulting from process variation. We showed that our approach is superior to other published techniques in both accuracy and predictability. Finally, we presented an automated method for fixing the lithography hotspots. Our method showed success rate of 99% in fixing hotspots

    MLCAD: A Survey of Research in Machine Learning for CAD Keynote Paper

    Get PDF

    Layout regularity metric as a fast indicator of process variations

    Get PDF
    Integrated circuits design faces increasing challenge as we scale down due to the increase of the effect of sensitivity to process variations. Systematic variations induced by different steps in the lithography process affect both parametric and functional yields of the designs. These variations are known, themselves, to be affected by layout topologies. Design for Manufacturability (DFM) aims at defining techniques that mitigate variations and improve yield. Layout regularity is one of the trending techniques suggested by DFM to mitigate process variations effect. There are several solutions to create regular designs, like restricted design rules and regular fabrics. These regular solutions raised the need for a regularity metric. Metrics in literature are insufficient for different reasons; either because they are qualitative or computationally intensive. Furthermore, there is no study relating either lithography or electrical variations to layout regularity. In this work, layout regularity is studied in details and a new geometrical-based layout regularity metric is derived. This metric is verified against lithographic simulations and shows good correlation. Calculation of the metric takes only few minutes on 1mm x 1mm design, which is considered fast compared to the time taken by simulations. This makes it a good candidate for pre-processing the layout data and selecting certain areas of interest for lithographic simulations for faster throughput. The layout regularity metric is also compared against a model that measures electrical variations due to systematic lithographic variations. The validity of using the regularity metric to flag circuits that have high variability using the developed electrical variations model is shown. The regularity metric results compared to the electrical variability model results show matching percentage that can reach 80%, which means that this metric can be used as a fast indicator of designs more susceptible to lithography and hence electrical variations

    Memory Management for Emerging Memory Technologies

    Get PDF
    The Memory Wall, or the gap between CPU speed and main memory latency, is ever increasing. The latency of Dynamic Random-Access Memory (DRAM) is now of the order of hundreds of CPU cycles. Additionally, the DRAM main memory is experiencing power, performance and capacity constraints that limit process technology scaling. On the other hand, the workloads running on such systems are themselves changing due to virtualization and cloud computing demanding more performance of the data centers. Not only do these workloads have larger working set sizes, but they are also changing the way memory gets used, resulting in higher sharing and increased bandwidth demands. New Non-Volatile Memory technologies (NVM) are emerging as an answer to the current main memory issues. This thesis looks at memory management issues as the emerging memory technologies get integrated into the memory hierarchy. We consider the problems at various levels in the memory hierarchy, including sharing of CPU LLC, traffic management to future non-volatile memories behind the LLC, and extending main memory through the employment of NVM. The first solution we propose is “Adaptive Replacement and Insertion" (ARI), an adaptive approach to last-level CPU cache management, optimizing the cache miss rate and writeback rate simultaneously. Our specific focus is to reduce writebacks as much as possible while maintaining or improving miss rate relative to conventional LRU replacement policy, with minimal hardware overhead. ARI reduces writebacks on benchmarks from SPEC2006 suite on average by 32.9% while also decreasing misses on average by 4.7%. In a PCM based memory system, this decreases energy consumption by 23% compared to LRU and provides a 49% lifetime improvement beyond what is possible with randomized wear-leveling. Our second proposal is “Variable-Timeslice Thread Scheduling" (VATS), an OS kernel-level approach to CPU cache sharing. With modern, large, last-level caches (LLC), the time to fill the LLC is greater than the OS scheduling window. As a result, when a thread aggressively thrashes the LLC by replacing much of the data in it, another thread may not be able to recover its working set before being rescheduled. We isolate the threads in time by increasing their allotted time quanta, and allowing larger periods of time between interfering threads. Our approach, compared to conventional scheduling, mitigates up to 100% of the performance loss caused by CPU LLC interference. The system throughput is boosted by up to 15%. As an unconventional approach to utilizing emerging memory technologies, we present a Ternary Content-Addressable Memory (TCAM) design with Flash transistors. TCAM is successfully used in network routing but can also be utilized in the OS Virtual Memory applications. Based on our layout and circuit simulation experiments, we conclude that our FTCAM block achieves an area improvement of 7.9× and a power improvement of 1.64× compared to a CMOS approach. In order to lower the cost of Main Memory in systems with huge memory demand, it is becoming practical to extend the DRAM in the system with the less-expensive NVMe Flash, for a much lower system cost. However, given the relatively high Flash devices access latency, naively using them as main memory leads to serious performance degradation. We propose OSVPP, a software-only, OS swap-based page prefetching scheme for managing such hybrid DRAM + NVM systems. We show that it is possible to gain about 50% of the lost performance due to swapping into the NVM and thus enable the utilization of such hybrid systems for memory-hungry applications, lowering the memory cost while keeping the performance comparable to the DRAM-only system

    Memory Management for Emerging Memory Technologies

    Get PDF
    The Memory Wall, or the gap between CPU speed and main memory latency, is ever increasing. The latency of Dynamic Random-Access Memory (DRAM) is now of the order of hundreds of CPU cycles. Additionally, the DRAM main memory is experiencing power, performance and capacity constraints that limit process technology scaling. On the other hand, the workloads running on such systems are themselves changing due to virtualization and cloud computing demanding more performance of the data centers. Not only do these workloads have larger working set sizes, but they are also changing the way memory gets used, resulting in higher sharing and increased bandwidth demands. New Non-Volatile Memory technologies (NVM) are emerging as an answer to the current main memory issues. This thesis looks at memory management issues as the emerging memory technologies get integrated into the memory hierarchy. We consider the problems at various levels in the memory hierarchy, including sharing of CPU LLC, traffic management to future non-volatile memories behind the LLC, and extending main memory through the employment of NVM. The first solution we propose is “Adaptive Replacement and Insertion" (ARI), an adaptive approach to last-level CPU cache management, optimizing the cache miss rate and writeback rate simultaneously. Our specific focus is to reduce writebacks as much as possible while maintaining or improving miss rate relative to conventional LRU replacement policy, with minimal hardware overhead. ARI reduces writebacks on benchmarks from SPEC2006 suite on average by 32.9% while also decreasing misses on average by 4.7%. In a PCM based memory system, this decreases energy consumption by 23% compared to LRU and provides a 49% lifetime improvement beyond what is possible with randomized wear-leveling. Our second proposal is “Variable-Timeslice Thread Scheduling" (VATS), an OS kernel-level approach to CPU cache sharing. With modern, large, last-level caches (LLC), the time to fill the LLC is greater than the OS scheduling window. As a result, when a thread aggressively thrashes the LLC by replacing much of the data in it, another thread may not be able to recover its working set before being rescheduled. We isolate the threads in time by increasing their allotted time quanta, and allowing larger periods of time between interfering threads. Our approach, compared to conventional scheduling, mitigates up to 100% of the performance loss caused by CPU LLC interference. The system throughput is boosted by up to 15%. As an unconventional approach to utilizing emerging memory technologies, we present a Ternary Content-Addressable Memory (TCAM) design with Flash transistors. TCAM is successfully used in network routing but can also be utilized in the OS Virtual Memory applications. Based on our layout and circuit simulation experiments, we conclude that our FTCAM block achieves an area improvement of 7.9× and a power improvement of 1.64× compared to a CMOS approach. In order to lower the cost of Main Memory in systems with huge memory demand, it is becoming practical to extend the DRAM in the system with the less-expensive NVMe Flash, for a much lower system cost. However, given the relatively high Flash devices access latency, naively using them as main memory leads to serious performance degradation. We propose OSVPP, a software-only, OS swap-based page prefetching scheme for managing such hybrid DRAM + NVM systems. We show that it is possible to gain about 50% of the lost performance due to swapping into the NVM and thus enable the utilization of such hybrid systems for memory-hungry applications, lowering the memory cost while keeping the performance comparable to the DRAM-only system
    corecore