Search CORE

83 research outputs found

A hierarchical approach for generating regular floorplans

Author: Cortadella Jordi
Roca Pérez Antoni
San Pedro Martín Javier de
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other worksThe complexity of the VLSI physical design flow grows dramatically as the level of integration increases. An effective way to manage this increasing complexity is through the use of regular designs which contain more reusable parts. In this work we introduce HiReg, a new floorplanning algorithm that generates regular floorplans. HiReg automatically extracts repeating patterns in a design by using graph mining techniques. Regularity is exploited by reusing the same floorplan for multiple instances of a pattern, as long as neither area, wire length or existing hierarchy constraints are violated or compromised. The proposed scheme is targeted towards early system-level design of chip multiprocessors (CMPs). Experiments show the scalability of the method for many-core CMPs and competitive results in area and wire lengthPeer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Automatic synthesis and optimization of chip multiprocessors

Author: Nikitin Nikita
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2013
Field of study

The microprocessor technology has experienced an enormous growth during the last decades. Rapid downscale of the CMOS technology has led to higher operating frequencies and performance densities, facing the fundamental issue of power dissipation. Chip Multiprocessors (CMPs) have become the latest paradigm to improve the power-performance efficiency of computing systems by exploiting the parallelism inherent in applications. Industrial and prototype implementations have already demonstrated the benefits achieved by CMPs with hundreds of cores.CMP architects are challenged to take many complex design decisions. Only a few of them are:- What should be the ratio between the core and cache areas on a chip?- Which core architectures to select?- How many cache levels should the memory subsystem have?- Which interconnect topologies provide efficient on-chip communication?These and many other aspects create a complex multidimensional space for architectural exploration. Design Automation tools become essential to make the architectural exploration feasible under the hard time-to-market constraints. The exploration methods have to be efficient and scalable to handle future generation on-chip architectures with hundreds or thousands of cores.Furthermore, once a CMP has been fabricated, the need for efficient deployment of the many-core processor arises. Intelligent techniques for task mapping and scheduling onto CMPs are necessary to guarantee the full usage of the benefits brought by the many-core technology. These techniques have to consider the peculiarities of the modern architectures, such as availability of enhanced power saving techniques and presence of complex memory hierarchies.This thesis has several objectives. The first objective is to elaborate the methods for efficient analytical modeling and architectural design space exploration of CMPs. The efficiency is achieved by using analytical models instead of simulation, and replacing the exhaustive exploration with an intelligent search strategy. Additionally, these methods incorporate high-level models for physical planning. The related contributions are described in Chapters 3, 4 and 5 of the document.The second objective of this work is to propose a scalable task mapping algorithm onto general-purpose CMPs with power management techniques, for efficient deployment of many-core systems. This contribution is explained in Chapter 6 of this document.Finally, the third objective of this thesis is to address the issues of the on-chip interconnect design and exploration, by developing a model for simultaneous topology customization and deadlock-free routing in Networks-on-Chip. The developed methodology can be applied to various classes of the on-chip systems, ranging from general-purpose chip multiprocessors to application-specific solutions. Chapter 7 describes the proposed model.The presented methods have been thoroughly tested experimentally and the results are described in this dissertation. At the end of the document several possible directions for the future research are proposed

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Structure discovery techniques for circuit design and process model visualization

Author: San Pedro Martín Javier de
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2017
Field of study

Graphs are one of the most used abstractions in many knowledge fields because of the easy and flexibility by which graphs can represent relationships between objects. The pervasiveness of graphs in many disciplines means that huge amounts of data are available in graph form, allowing many opportunities for the extraction of useful structure from these graphs in order to produce insight into the data. In this thesis we introduce a series of techniques to resolve well-known challenges in the areas of digital circuit design and process mining. The underlying idea that ties all the approaches together is discovering structures in graphs. We show how many problems of practical importance in these areas can be solved utilizing both common and novel structure mining approaches. In the area of digital circuit design, this thesis proposes automatically discovering frequent, repetitive structures in a circuit netlist in order to improve the quality of physical planning. These structures can be used during floorplanning to produce regular designs, which are known to be highly efficient and economical. At the same time, detecting these repeating structures can exponentially reduce the total design time. The second focus of this thesis is in the area of the visualization of process models. Process mining is a recent area of research which centers on studying the behavior of real-life systems and their interactions with the environment. Complicated process models, however, hamper this goal. By discovering the important structures in these models, we propose a series of methods that can derive visualization-friendly process models with minimal loss in accuracy. In addition, and combining the areas of circuit design and process mining, this thesis opens the area of specification mining in asynchronous circuits. Instead of the usual design flow, which involves synthesizing circuits from specifications, our proposal discovers specifications from implemented circuits. This area allows for many opportunities for verification and re-synthesis of asynchronous circuits. The proposed methods have been tested using real-life benchmarks, and the quality of the results compared to the state-of-the-art.Els grafs són una de les representacions abstractes més comuns en molts camps de recerca, gràcies a la facilitat i flexibilitat amb la que poden representar relacions entre objectes. Aquesta popularitat fa que una gran quantitat de dades es puguin trobar en forma de graf, i obre moltes oportunitats per a extreure estructures d'aquest grafs, útils per tal de donar una intuïció millor de les dades subjacents. En aquesta tesi introduïm una sèrie de tècniques per resoldre reptes habitualment trobats en les àrees de disseny de circuits digitals i mineria de processos industrials. La idea comú sota tots els mètodes proposats es descobrir automàticament estructures en grafs. En la tesi es mostra que molts problemes trobats a la pràctica en aquestes àrees poden ser resolts utilitzant nous mètodes de descobriment d'estructures. En l'àrea de disseny de circuits, proposem descobrir, automàticament, estructures freqüents i repetitives en les definicions del circuit per tal de millorar la qualitat de les etapes posteriors de planificació física. Les estructures descobertes poden fer-se servir durant la planificació per produir dissenys regulars, que son molt més econòmics d'implementar. Al mateix temps, la descoberta i ús d'aquestes estructures pot reduir exponencialment el temps total de disseny. El segon punt focal d'aquesta tesi és en l'àrea de la visualització de models de processos industrials. La mineria de processos industrials es un tema jove de recerca que es centra en estudiar el comportament de sistemes reals i les interaccions d'aquests sistemes amb l'entorn. No obstant, quan d'aquest anàlisi s'obtenen models massa complexos visualment, l'estudi n'és problemàtic. Proposem una sèrie de mètodes que, gràcies al descobriment automàtic de les estructures més importants, poden generar models molt més fàcils de visualitzar que encara descriuen el comportament del sistema amb gran precisió. Combinant les àrees de disseny de circuits i mineria de processos, aquesta tesi també obre un nou tema de recerca: la mineria d'especificacions per circuits asíncrons. En l'estil de disseny asíncron habitual, sintetitzadors automàtics generen circuits a partir de les especificacions. En aquesta tesi proposem el pas invers: descobrir automàticament les especificacions de circuits ja implementats. Així, creem noves oportunitats per a la verificació i la re-síntesi de circuits asíncrons. Els mètodes proposats en aquesta tesi s'han validat fent servir dades obtingudes d'aplicacions pràctiques, i en comparem els resultats amb els mètodes existents

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Doctor of Philosophy

Author: Gebhardt Daniel J.
Publication venue: University of Utah
Publication date: 15/08/2011
Field of study

dissertationPortable electronic devices will be limited to available energy of existing battery chemistries for the foreseeable future. However, system-on-chips (SoCs) used in these devices are under a demand to offer more functionality and increased battery life. A difficult problem in SoC design is providing energy-efficient communication between its components while maintaining the required performance. This dissertation introduces a novel energy-efficient network-on-chip (NoC) communication architecture. A NoC is used within complex SoCs due it its superior performance, energy usage, modularity, and scalability over traditional bus and point-to-point methods of connecting SoC components. This is the first academic research that combines asynchronous NoC circuits, a focus on energy-efficient design, and a software framework to customize a NoC for a particular SoC. Its key contribution is demonstrating that a simple, asynchronous NoC concept is a good match for low-power devices, and is a fruitful area for additional investigation. The proposed NoC is energy-efficient in several ways: simple switch and arbitration logic, low port radix, latch-based router buffering, a topology with the minimum number of 3-port routers, and the asynchronous advantages of zero dynamic power consumption while idle and the lack of a clock tree. The tool framework developed for this work uses novel methods to optimize the topology and router oorplan based on simulated annealing and force-directed movement. It studies link pipelining techniques that yield improved throughput in an energy-efficient manner. A simulator is automatically generated for each customized NoC, and its traffic generators use a self-similar message distribution, as opposed to Poisson, to better match application behavior. Compared to a conventional synchronous NoC, this design is superior by achieving comparable message latency with half the energy

The University of Utah: J. Willard Marriott Digital Library

Physical Planning and Uncore Power Management for Multi-Core Processors

Author: Chen Xi
Publication venue
Publication date: 02/10/2013
Field of study

For the microprocessor technology of today and the foreseeable future, multi-core is a key engine that drives performance growth under very tight power dissipation constraints. While previous research has been mostly focused on individual processor cores, there is a compelling need for studying how to efficiently manage shared resources among cores, including physical space, on-chip communication and on-chip storage. In managing physical space, floorplanning is the first and most critical step that largely affects communication efficiency and cost-effectiveness of chip designs. We consider floorplanning with regularity constraints that requires identical processing/memory cores to form an array. Such regularity can greatly facilitate design modularity and therefore shorten design turn-around time. Very little attention has been paid to automatic floorplanning considering regularity constraints because manual floorplanning has difficulty handling the complexity as chip core count increases. In this dissertation work, we investigate the regularity constraints in a simulated-annealing based floorplanner for multi/many core processor designs. A simple and effective technique is proposed to encode the regularity constraints in sequence-pair, which is a classic format of data representation in automatic floorplanning. To the best of our knowledge, this is the first work on regularity-constrained floorplanning in the context of multi/many core processor designs. On-chip communication and shared last level cache (LLC) play a role that is at least as equally important as processor cores in terms of chip performance and power. This dissertation research studies dynamic voltage and frequency scaling for on-chip network and LLC, which forms a single uncore domain of voltage and frequency. This is in contrast to most previous works where the network and LLC are partitioned and associated with processor cores based on physical proximity. The single shared domain can largely avoid the interfacing overhead across domain boundaries and is practical and very useful for industrial products. Our goal is to minimize uncore energy dissipation with little, e.g., 5% or less, performance degradation. The first part of this study is to identify a metric that can reflect the chip performance determined by uncore voltage/frequency. The second part is about how to monitor this metric with low overhead and high fidelity. The last part is the control policy that decides uncore voltage/frequency based on monitoring results. Our approach is validated through full system simulations on public architecture benchmarks

Texas A&M Repository

Recommended from our members

Synthesis of On-Chip Interconnection Structures:From Point-to-Point Links to Networks-on-Chip

Author: Carloni Luca
Pinto Alessandro
Sangiovanni-Vincentelli Alberto L.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

Packet-switched networks-on-chip (NOC) have been advocated as the solution to the challenge of organizing efficient and reliable communication structures among the components of a system-on-chip (SOC). A critical issue in designing a NOC is to determine its topology given the set of point-to-point communication requirements among these components. We present a novel approach to on-chip communication synthesis that is based on the iterative combination of two efficient computational steps: (1) an application of the k-Median algorithm to coarsely determine the global communication structure (which may turned out not be a network after all), and a (2) a variation of the shortest-path algorithm in order to finely tune the data flows on the communication channels. The application of our method to case studies taken from the literature shows that we can automatically synthesize optimal NOC topologies for multi-core on-chip processors and it offers new insights on why NOC are not necessarily a value proposition for some classes of applcation-specific SOCs

Columbia University Academic Commons

Understanding the impact of 3D stacked layouts on ILP

Author: Awasthi Manu, Venkatesan, Vivek
Balasubramonian Rajeev
Publication venue: International Symposium on Microarchitecture
Publication date: 01/01/2007
Field of study

Journal Article3D die-stacked chips can alleviate the penalties imposed by long wires within micro-processor circuits. Many recent studies have attempted to partition each microprocessor structure across three dimensions to reduce their access times. In this paper, we implement each microprocessor structure on a single 2D die and leverage 3D to reduce the lengths of wires that communicate data between microprocessor structures within a single core. We begin with a criticality analysis of inter-structure wire delays and show that for most tra- ditional simple superscalar cores, 2D floorplans are already very efficient at minimizing critical wire delays. For an aggressive wire-constrained clustered superscalar architecture, an exploration of the design space reveals that 3D can yield higher benefit. However, this benefit may be negated by the higher power density and temperature entailed by 3D integration. Overall, we report a negative result and argue against leveraging 3D for higher ILP

The University of Utah: J. Willard Marriott Digital Library

Recommended from our members

The analysis and synthesis of a parallel sorting engine

Author: Ahn Byoungchul
Publication venue: 'Oregon State University'
Publication date
Field of study

This thesis is concerned with the development of a unique parallel sort-merge system suitable for implementation in VLSI. Two new sorting subsystems, a high performance VLSI sorter and a four-way merger, were also realized during the development process. In addition, the analysis of several existing parallel sorting architectures and algorithms was carried out. Algorithmic time complexity, VLSI processor performance, and chip area requirements for the existing sorting systems were evaluated. The rebound sorting algorithm was determined to be the most efficient among those considered. The rebound sorter algorithm was implemented in hardware as a systolic array with external expansion capability. The second phase of the research involved analyzing several parallel merge algorithms and their buffer management schemes. The dominant considerations for this phase of the research were the achievement of minimum VLSI chip area, design complexity, and logic delay. It was determined that the proposed merger architecture could be implemented in several ways. Selecting the appropriate microarchitecture for the merger, given the constraints of chip area and performance, was the major problem. The tradeoffs associated with this process are outlined. Finally, a pipelined sort-merge system was implemented in VLSI by combining a rebound sorter and a four-way merger on a single chip. The final chip size was 416 mils by 432 mils. Two micron CMOS technology was utilized in this chip realization. An overall throughput rate of 10M bytes/sec was achieved. The prototype system developed is capable of sorting thirty two 2-byte keys during each merge phase. If extended, this system is capable of economically sorting files of 100M bytes or more in size. In order to sort larger files, this design should be incorporated in a disk-based sort-merge system. A simplified disk I/O access model for such a system was studied. In this study the sort-merge system was assumed to be part of a disk controller subsystem

ScholarsArchive@OSU

Parametric Yield of VLSI Systems under Variability: Analysis and Design Solutions

Author: Haghdad Kian
Publication venue: 'University of Waterloo'
Publication date: 29/04/2011
Field of study

Variability has become one of the vital challenges that the designers of integrated circuits encounter. variability becomes increasingly important. Imperfect manufacturing process manifest itself as variations in the design parameters. These variations and those in the operating environment of VLSI circuits result in unexpected changes in the timing, power, and reliability of the circuits. With scaling transistor dimensions, process and environmental variations become significantly important in the modern VLSI design. A smaller feature size means that the physical characteristics of a device are more prone to these unaccounted-for changes. To achieve a robust design, the random and systematic fluctuations in the manufacturing process and the variations in the environmental parameters should be analyzed and the impact on the parametric yield should be addressed. This thesis studies the challenges and comprises solutions for designing robust VLSI systems in the presence of variations. Initially, to get some insight into the system design under variability, the parametric yield is examined for a small circuit. Understanding the impact of variations on the yield at the circuit level is vital to accurately estimate and optimize the yield at the system granularity. Motivated by the observations and results, found at the circuit level, statistical analyses are performed, and solutions are proposed, at the system level of abstraction, to reduce the impact of the variations and increase the parametric yield. At the circuit level, the impact of the supply and threshold voltage variations on the parametric yield is discussed. Here, a design centering methodology is proposed to maximize the parametric yield and optimize the power-performance trade-off under variations. In addition, the scaling trend in the yield loss is studied. Also, some considerations for design centering in the current and future CMOS technologies are explored. The investigation, at the circuit level, suggests that the operating temperature significantly affects the parametric yield. In addition, the yield is very sensitive to the magnitude of the variations in supply and threshold voltage. Therefore, the spatial variations in process and environmental variations make it necessary to analyze the yield at a higher granularity. Here, temperature and voltage variations are mapped across the chip to accurately estimate the yield loss at the system level. At the system level, initially the impact of process-induced temperature variations on the power grid design is analyzed. Also, an efficient verification method is provided that ensures the robustness of the power grid in the presence of variations. Then, a statistical analysis of the timing yield is conducted, by taking into account both the process and environmental variations. By considering the statistical profile of the temperature and supply voltage, the process variations are mapped to the delay variations across a die. This ensures an accurate estimation of the timing yield. In addition, a method is proposed to accurately estimate the power yield considering process-induced temperature and supply voltage variations. This helps check the robustness of the circuits early in the design process. Lastly, design solutions are presented to reduce the power consumption and increase the timing yield under the variations. In the first solution, a guideline for floorplaning optimization in the presence of temperature variations is offered. Non-uniformity in the thermal profiles of integrated circuits is an issue that impacts the parametric yield and threatens chip reliability. Therefore, the correlation between the total power consumption and the temperature variations across a chip is examined. As a result, floorplanning guidelines are proposed that uses the correlation to efficiently optimize the chip's total power and takes into account the thermal uniformity. The second design solution provides an optimization methodology for assigning the power supply pads across the chip for maximizing the timing yield. A mixed-integer nonlinear programming (MINLP) optimization problem, subject to voltage drop and current constraint, is efficiently solved to find the optimum number and location of the pads

University of Waterloo's Institutional Repository