Search CORE

81 research outputs found

COMPAC : Compaction of hierarchical chip layouts based on a constraint graph method with double-sided constraints

Author: Timmermans X.I.M.
Publication venue
Publication date: 01/01/1983
Field of study

Repository TU/e

Pure OAI Repository

Depth-first-search and dynamic programming algorithms for efficient CMOS cell generation

Author: J.A. Feldman
R. Bar-Yehuda
R.Y. Pinter
S. Wimer
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Reducing Cache Contention On GPUs

Author: Choo Kyoshin
Publication venue: eGrove
Publication date: 01/01/2016
Field of study

The usage of Graphics Processing Units (GPUs) as an application accelerator has become increasingly popular because, compared to traditional CPUs, they are more cost-effective, their highly parallel nature complements a CPU, and they are more energy efficient. With the popularity of GPUs, many GPU-based compute-intensive applications (a.k.a., GPGPUs) present significant performance improvement over traditional CPU-based implementations. Caches, which significantly improve CPU performance, are introduced to GPUs to further enhance application performance. However, the effect of caches is not significant for many cases in GPUs and even detrimental for some cases. The massive parallelism of the GPU execution model and the resulting memory accesses cause the GPU memory hierarchy to suffer from significant memory resource contention among threads. One cause of cache contention arises from column-strided memory access patterns that GPU applications commonly generate in many data-intensive applications. When such access patterns are mapped to hardware thread groups, they become memory-divergent instructions whose memory requests are not GPU hardware friendly, resulting in serialized access and performance degradation. Cache contention also arises from cache pollution caused by lines with low reuse. For the cache to be effective, a cached line must be reused before its eviction. Unfortunately, the streaming characteristic of GPGPU workloads and the massively parallel GPU execution model increase the reuse distance, or equivalently reduce reuse frequency of data. In a GPU, the pollution caused by a large reuse distance data is significant. Memory request stall is another contention factor. A stalled Load/Store (LDST) unit does not execute memory requests from any ready warps in the issue stage. This stall prevents the potential hit chances for the ready warps. This dissertation proposes three novel architectural modifications to reduce the contention: 1) contention-aware selective caching detects the memory-divergent instructions caused by the column-strided access patterns, calculates the contending cache sets and locality information and then selectively caches; 2) locality-aware selective caching dynamically calculates the reuse frequency with efficient hardware and caches based on the reuse frequency; and 3) memory request scheduling queues the memory requests from a warp issuing stage, frees the LDST unit stall and schedules items from the queue to the LDST unit by multiple probing of the cache. Through systematic experiments and comprehensive comparisons with existing state-of-the-art techniques, this dissertation demonstrates the effectiveness of our aforementioned techniques and the viability of reducing cache contention through architectural support. Finally, this dissertation suggests other promising opportunities for future research on GPU architecture

eGrove (Univ. of Mississippi)

Recommended from our members

Standard cell optimization and physical design in advanced technology nodes

Author: Xu Xiaoqing, Ph. D.
Publication venue
Publication date: 07/07/2017
Field of study

Integrated circuits (ICs) are at the heart of modern electronics, which rely heavily on the state-of-the-art semiconductor manufacturing technology. The key to pushing forward semiconductor technology is IC feature-size miniaturization. However, this brings ever-increasing design complexities and manufacturing challenges to the $340 billion semiconductor industry. The manufacturing of two-dimensional layout on high-density metal layers depends on complex design-for-manufacturing techniques and sophisticated empirical optimizations, which introduces huge amounts of turnaround time and yield loss in advanced technology nodes. Our study reveals that unidirectional layout design can significantly reduce the manufacturing complexities and improve the yield, which is becoming increasingly adopted in semiconductor industry [61, 89]. The lithography printing of unidirectional layout can be tightly controlled using advanced patterning techniques, such as self-aligned double and quadruple patterning. Despite the manufacturing benefits, unidirectional layout leads to more restrictive solution space and brings significant impacts on the IC design automation ow for routing closure. Notably, unidirectional routing limits the standard cell pin accessibility, which further exacerbates the resource competitions during routing. Moreover, for post-routing optimization, traditional redundant-via insertion has become obsolete under unidirectional routing style, which makes the yield enhancement task extremely challenging. Regardless of complex multiple patterning and design-for-manufacturing approaches, mask optimization through resolution enhancement techniques remains as the key strategy to improve the yield of the semiconductor manufacturing processes. Among them, Sub-Resolution Assist Feature (SRAF) generation is a very important method to improve lithographic process windows. Model-based SRAF generation has been widely used to achieve high accuracy but it is time-consuming and hard to obtain consistent SRAFs. This dissertation proposes novel CAD algorithms and methodologies for standard cell optimization and physical design in advanced technology nodes, which ultimately reduces the design cycle and manufacturing cost of IC design. First, a standard cell pin access optimization engine is proposed to evaluate the pin accessibility of a given standard cell library. We further propose novel pin access planning techniques and concurrent pin access optimizations to efficiently resolve the routing resource competitions, which generates much better routing solutions than state-of-the-art, manufacturing-friendly routers. To systematically improve the manufacturing yield in the post-routing stage, a global optimization engine has been introduced for redundant local-loop insertion considering advanced manufacturing constraints. Finally, we propose the first machine learning-based framework for fast yet consistent SRAF generation with the high quality of results.Electrical and Computer Engineerin

Texas ScholarWorks

A Combinatorial Approach to Orthogonal Placement Problems

Author: Klau G.W. (Gunnar)
Publication venue
Publication date: 03/09/2001
Field of study

CWI's Institutional Repository

Area-power-delay trade-off in logic synthesis

Author: Berkelaar M.R.C.M.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1992
Field of study

This thesis introduces new concepts to perform area-power-delay trade-offs in a logic synthesis system. To achieve this, a new delay model is presented, which gives accurate delay estimations for arbitrary sets of Boolean expressions. This allows use of this delay model already during the very first steps of logic synthesis. Furthermore, new algorithms are presented for a number of different optimization tasks within logic synthesis. There are new algorithms to create prime irredundant Boo lean expressions, to perform technology mapping for use with standard cell generators, and to perform gate sizing. To prove the validity of the presented ideas, benchmark results are given throughout the thesis

Repository TU/e

Pure OAI Repository

Digital watermark technology in security applications

Author: Xu Xin
Publication venue: 'University of Plymouth'
Publication date: 01/01/2008
Field of study

With the rising emphasis on security and the number of fraud related crimes around the world, authorities are looking for new technologies to tighten security of identity. Among many modern electronic technologies, digital watermarking has unique advantages to enhance the document authenticity. At the current status of the development, digital watermarking technologies are not as matured as other competing technologies to support identity authentication systems. This work presents improvements in performance of two classes of digital watermarking techniques and investigates the issue of watermark synchronisation. Optimal performance can be obtained if the spreading sequences are designed to be orthogonal to the cover vector. In this thesis, two classes of orthogonalisation methods that generate binary sequences quasi-orthogonal to the cover vector are presented. One method, namely "Sorting and Cancelling" generates sequences that have a high level of orthogonality to the cover vector. The Hadamard Matrix based orthogonalisation method, namely "Hadamard Matrix Search" is able to realise overlapped embedding, thus the watermarking capacity and image fidelity can be improved compared to using short watermark sequences. The results are compared with traditional pseudo-randomly generated binary sequences. The advantages of both classes of orthogonalisation inethods are significant. Another watermarking method that is introduced in the thesis is based on writing-on-dirty-paper theory. The method is presented with biorthogonal codes that have the best robustness. The advantage and trade-offs of using biorthogonal codes with this watermark coding methods are analysed comprehensively. The comparisons between orthogonal and non-orthogonal codes that are used in this watermarking method are also made. It is found that fidelity and robustness are contradictory and it is not possible to optimise them simultaneously. Comparisons are also made between all proposed methods. The comparisons are focused on three major performance criteria, fidelity, capacity and robustness. aom two different viewpoints, conclusions are not the same. For fidelity-centric viewpoint, the dirty-paper coding methods using biorthogonal codes has very strong advantage to preserve image fidelity and the advantage of capacity performance is also significant. However, from the power ratio point of view, the orthogonalisation methods demonstrate significant advantage on capacity and robustness. The conclusions are contradictory but together, they summarise the performance generated by different design considerations. The synchronisation of watermark is firstly provided by high contrast frames around the watermarked image. The edge detection filters are used to detect the high contrast borders of the captured image. By scanning the pixels from the border to the centre, the locations of detected edges are stored. The optimal linear regression algorithm is used to estimate the watermarked image frames. Estimation of the regression function provides rotation angle as the slope of the rotated frames. The scaling is corrected by re-sampling the upright image to the original size. A theoretically studied method that is able to synchronise captured image to sub-pixel level accuracy is also presented. By using invariant transforms and the "symmetric phase only matched filter" the captured image can be corrected accurately to original geometric size. The method uses repeating watermarks to form an array in the spatial domain of the watermarked image and the the array that the locations of its elements can reveal information of rotation, translation and scaling with two filtering processes

Plymouth Electronic Archive and Research Library

A combinatorial approach to orthogonal placement problems

Author: Klau Gunnar Werner
Publication venue: Fakultät 6 - Naturwissenschaftlich-Technische Fakultät I. Fachrichtung 6.2 - Informatik
Publication date: 01/01/2001
Field of study

liegt nicht vor!Wir betrachten zwei Familien von NP-schwierigen orthogonalen Platzierungsproblemen aus dem Bereich der Informationsvisualisierung von einem theoretischen und praktischen Standpunkt aus. Diese Arbeit enthält ein gemeinsames kombinatorisches Gerüst für Kompaktierungsprobleme aus dem Bereich des orthogonalen Graphenzeichnens und Beschriftungsprobleme von Punktmengen aus dem Gebiet der Computer-Kartografie. Bei den Kompaktierungsproblemen geht es darum, eine gegebene dimensionslose Beschreibung der orthogonalen Form eines Graphen in eine orthogonale Gitterzeichnung mit kurzen Kanten und geringem Flächenverbrauch zu transformieren. Die Beschriftungsprobleme haben zur Aufgabe, eine gegebene Menge von rechteckigen Labels so zu platzieren, dass eine lesbare Karte entsteht. In einer klassischen Anwendung repräsentieren die Punkte beispielsweise Städte einer Landkarte, und die Labels enthalten die Namen der Städte. Wir präsentieren neue kombinatorische Formulierungen für diese Probleme und verwenden dabei eine pfad- und kreisbasierte graphentheoretische Eigenschaft in einem zugehörigen problemspezifschen Paar von Constraint-Graphen. Die Umformulierung ermöglicht es uns, exakte Algorithmen für die Originalprobleme zu entwickeln. Umfassende experimentelle Studien mit Benchmark-Instanzen aus der Praxis zeigen, dass unsere Algorithmen, die auf linearer Programmierung beruhen, in der Lage sind, große Instanzen der Platzierungsprobleme beweisbar optimal und in kurzer Rechenzeit zu lösen. Ferner kombinieren wir die Formulierungen für Kompaktierungs- und Beschriftungsprobleme und präsentieren einen exakten algorithmischen Ansatz für ein Graphbeschriftungsproblem. Oftmals sind unsere neuen Algorithmen die ersten exakten Algorithmen für die jeweilige Problemvariante

CiteSeerX

CWI's Institutional Repository

MPG.PuRe

A combinatorial approach to orthogonal placement problems

Author: Klau Gunnar Werner
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 23/09/2004
Field of study

Universaar

Acronym