13 research outputs found

    Efficient approaches in interconnect-driven floorplanning.

    Get PDF
    Lai Tsz Wai.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 123-129).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- VLSI Design Cycle --- p.2Chapter 1.2 --- Physical Design Cycle --- p.4Chapter 1.3 --- Floorplanning --- p.7Chapter 1.3.1 --- Types of Floorplan and Floorplan Representations --- p.11Chapter 1.3.2 --- Interconnect-driven Floorplanning --- p.13Chapter 1.4 --- Motivations and Contributions --- p.17Chapter 1.5 --- Organization of this Thesis --- p.18Chapter 2 --- Literature Review on Floorplan Representation --- p.20Chapter 2.1 --- Slicing Floorplan Representation --- p.20Chapter 2.1.1 --- Normalized Polish Expression --- p.20Chapter 2.2 --- Non-slicing Floorplan Representations --- p.21Chapter 2.2.1 --- Sequence Pair (SP) --- p.21Chapter 2.2.2 --- Bounded-sliceline Grid (BSG) --- p.23Chapter 2.2.3 --- O-tree --- p.25Chapter 2.2.4 --- B*-tree --- p.26Chapter 2.3 --- Mosaic Floorplan Representations --- p.28Chapter 2.3.1 --- Corner Block List (CBL) --- p.28Chapter 2.3.2 --- Twin Binary Trees (TBT) --- p.31Chapter 2.3.3 --- Twin Binary Sequences (TBS) --- p.32Chapter 2.4 --- Summary --- p.34Chapter 3 --- Literature Review on Interconnect Optimization in Floorplan- ning --- p.37Chapter 3.1 --- Wirelength Estimation --- p.37Chapter 3.2 --- Congestion Optimization --- p.38Chapter 3.2.1 --- Integrated Floorplanning and Interconnect Planning --- p.41Chapter 3.2.2 --- Multi-layer Global Wiring Planning (GWP) --- p.43Chapter 3.2.3 --- Estimating Routing Congestion using Probabilistic Anal- ysis --- p.44Chapter 3.2.4 --- Congestion Minimization During Placement --- p.46Chapter 3.2.5 --- Modelling and Minimization of Routing Congestion --- p.48Chapter 3.3 --- Buffer Planning --- p.49Chapter 3.3.1 --- Buffer Clustering with Feasible Region --- p.51Chapter 3.3.2 --- Routability-driven Repeater Clustering Algorithm with Iterative Deletion --- p.55Chapter 3.3.3 --- Planning Buffer Locations by Network Flow --- p.58Chapter 3.3.4 --- Buffer Planning using Integer Multicommodity Flow --- p.60Chapter 3.3.5 --- Buffer Planning Problem using Tile Graph --- p.60Chapter 3.3.6 --- Probabilistic Analysis for Buffer Block Planning --- p.62Chapter 3.3.7 --- Fast Buffer Planning and Congestion Optimization --- p.63Chapter 3.4 --- Summary --- p.66Chapter 4 --- Congestion Evaluation: Wire Density Model --- p.68Chapter 4.1 --- Introduction --- p.68Chapter 4.2 --- Overview of Our Floorplanner --- p.70Chapter 4.3 --- Wire Density Model --- p.71Chapter 4.3.1 --- Computation of Ni --- p.72Chapter 4.3.2 --- Computation of Pi --- p.74Chapter 4.3.3 --- Usage of Mirror TBT --- p.76Chapter 4.4 --- Implementation --- p.76Chapter 4.4.1 --- Efficient Calculation of Ni --- p.76Chapter 4.4.2 --- Solving the LCA Problem Efficiently --- p.81Chapter 4.4.3 --- Cost Function --- p.81Chapter 4.4.4 --- Complexity --- p.81Chapter 4.5 --- Experimental Results --- p.82Chapter 4.6 --- Conclusion --- p.83Chapter 5 --- Buffer Planning: Simple Buffer Planning Method --- p.85Chapter 5.1 --- Introduction --- p.85Chapter 5.2 --- Variable Interval Buffer Insertion Constraint --- p.87Chapter 5.3 --- Overview of Our Floorplanner --- p.88Chapter 5.4 --- Buffer Planning --- p.89Chapter 5.4.1 --- Feasible Grids --- p.89Chapter 5.4.2 --- Table Look-up Approach --- p.89Chapter 5.5 --- Implementation --- p.91Chapter 5.5.1 --- Building the Look-up Tables --- p.91Chapter 5.5.2 --- An Example of Look-up Table Construction --- p.94Chapter 5.5.3 --- A Faster Approach for Building the Look-up Tables --- p.101Chapter 5.5.4 --- An Example of the Faster Look-up Table Construction --- p.105Chapter 5.5.5 --- I/O Pin Locations --- p.106Chapter 5.5.6 --- Cost Function --- p.110Chapter 5.5.7 --- Complexity --- p.111Chapter 5.6 --- Experimental Results --- p.112Chapter 5.6.1 --- Selected Value for A --- p.112Chapter 5.6.2 --- Performance of Our Floorplanner --- p.113Chapter 5.7 --- Conclusion --- p.116Chapter 6 --- Conclusion --- p.118Chapter A --- An Efficient Algorithm for the Least Common Ancestor Prob- lem --- p.120Bibliography --- p.12

    Fault and Defect Tolerant Computer Architectures: Reliable Computing With Unreliable Devices

    Get PDF
    This research addresses design of a reliable computer from unreliable device technologies. A system architecture is developed for a fault and defect tolerant (FDT) computer. Trade-offs between different techniques are studied and yield and hardware cost models are developed. Fault and defect tolerant designs are created for the processor and the cache memory. Simulation results for the content-addressable memory (CAM)-based cache show 90% yield with device failure probabilities of 3 x 10(-6), three orders of magnitude better than non fault tolerant caches of the same size. The entire processor achieves 70% yield with device failure probabilities exceeding 10(-6). The required hardware redundancy is approximately 15 times that of a non-fault tolerant design. While larger than current FT designs, this architecture allows the use of devices much more likely to fail than silicon CMOS. As part of model development, an improved model is derived for NAND Multiplexing. The model is the first accurate model for small and medium amounts of redundancy. Previous models are extended to account for dependence between the inputs and produce more accurate results

    Power Management for Deep Submicron Microprocessors

    Get PDF
    As VLSI technology scales, the enhanced performance of smaller transistors comes at the expense of increased power consumption. In addition to the dynamic power consumed by the circuits there is a tremendous increase in the leakage power consumption which is further exacerbated by the increasing operating temperatures. The total power consumption of modern processors is distributed between the processor core, memory and interconnects. In this research two novel power management techniques are presented targeting the functional units and the global interconnects. First, since most leakage control schemes for processor functional units are based on circuit level techniques, such schemes inherently lack information about the operational profile of higher-level components of the system. This is a barrier to the pivotal task of predicting standby time. Without this prediction, it is extremely difficult to assess the value of any leakage control scheme. Consequently, a methodology that can predict the standby time is highly beneficial in bridging the gap between the information available at the application level and the circuit implementations. In this work, a novel Dynamic Sleep Signal Generator (DSSG) is presented. It utilizes the usage traces extracted from cycle accurate simulations of benchmark programs to predict the long standby periods associated with the various functional units. The DSSG bases its decisions on the current and previous standby state of the functional units to accurately predict the length of the next standby period. The DSSG presents an alternative to Static Sleep Signal Generation (SSSG) based on static counters that trigger the generation of the sleep signal when the functional units idle for a prespecified number of cycles. The test results of the DSSG are obtained by the use of a modified RISC superscalar processor, implemented by SimpleScalar, the most widely accepted open source vehicle for architectural analysis. In addition, the results are further verified by a Simultaneous Multithreading simulator implemented by SMTSIM. Leakage saving results shows an increase of up to 146% in leakage savings using the DSSG versus the SSSG, with an accuracy of 60-80% for predicting long standby periods. Second, chip designers in their effort to achieve timing closure, have focused on achieving the lowest possible interconnect delay through buffer insertion and routing techniques. This approach, though, taxes the power budget of modern ICs, especially those intended for wireless applications. Also, in order to achieve more functionality, die sizes are constantly increasing. This trend is leading to an increase in the average global interconnect length which, in turn, requires more buffers to achieve timing closure. Unconstrained buffering is bound to adversely affect the overall chip performance, if the power consumption is added as a major performance metric. In fact, the number of global interconnect buffers is expected to reach hundreds of thousands to achieve an appropriate timing closure. To mitigate the impact of the power consumed by the interconnect buffers, a power-efficient multi-pin routing technique is proposed in this research. The problem is based on a graph representation of the routing possibilities, including buffer insertion and identifying the least power path between the interconnect source and set of sinks. The novel multi-pin routing technique is tested by applying it to the ISPD and IBM benchmarks to verify the accuracy, complexity, and solution quality. Results obtained indicate that an average power savings as high as 32% for the 130-nm technology is achieved with no impact on the maximum chip frequency

    Low Power Memory/Memristor Devices and Systems

    Get PDF
    This reprint focusses on achieving low-power computation using memristive devices. The topic was designed as a convenient reference point: it contains a mix of techniques starting from the fundamental manufacturing of memristive devices all the way to applications such as physically unclonable functions, and also covers perspectives on, e.g., in-memory computing, which is inextricably linked with emerging memory devices such as memristors. Finally, the reprint contains a few articles representing how other communities (from typical CMOS design to photonics) are fighting on their own fronts in the quest towards low-power computation, as a comparison with the memristor literature. We hope that readers will enjoy discovering the articles within

    Improving scalability of large-scale distributed Spiking Neural Network simulations on High Performance Computing systems using novel architecture-aware streaming hypergraph partitioning

    Get PDF
    After theory and experimentation, modelling and simulation is regarded as the third pillar of science, helping scientists to further their understanding of a complex system. In recent years there has been a growing scientific focus on computational neuroscience as a means to understand the brain and its functions, with large international projects (Human Brain Project, Brain Activity Map, MindScope and \textit{China Brain Project}) aiming to further our knowledge of high level cognitive functions. They are a testament to the enormous interest, difficulty and importance of solving the mysteries of the brain. Spiking Neural Network (SNN) simulations are widely used in the domain to facilitate experimentation. Scaling SNN simulations to large networks usually results in more-than-linear increase in computational complexity. The computing resources required at the brain scale simulation far surpass the capabilities of personal computers today. If those demands are to be met, distributed computation models need to be adopted, since there is a slow down of improvements in individual processors speed due to physical limitations on heat dissipation. This is a significant change that requires careful management of the workload in many levels: partition of work, communication and workload balancing, efficient inter-process communication and efficient use of available memory. If large scale neuronal network models are to be run successfully, simulators must consider these, and offer a viable solution to the challenges they pose. Large scale SNN simulations evidence most of the issues of general HPC systems evident in large distributed computation. Commonly used distribution of workload algorithms (round robin, random and manual allocation) do not take into consideration connectivity locality, which is natural in biological networks, which can lead to increased communication requirements when distributing the simulation in multiple computing nodes. State-of-the-art SNN simulations use dense communication collectives to distribute spike data. The common method of point to point communication in distributed computation is through dense patterns. Sparse communication collectives have been suggested to incur in lower overheads when the application's pattern of communication is sparse. In this work we characterise the bottlenecks on communication-bound SNN simulations and identify communication balance and sparsity as the main contributors to scalability. We propose hypergraph partitioning to distribute neurons along computing nodes to minimise communication (increasing sparsity). A hypergraph is a generalisation of graphs, where a (hyper)edge can link 2 or more vertices at once. Coupled with a novel use of sparse-aware communication collective, computational efficiency increases by up to 40.8 percent points and simulation time reduces by up to 73\%, compared to the common round-robin allocation in neuronal simulators. HPC systems have, by design, highly hierarchical communication network links, with qualitative differences in communication speed and latency between computing nodes. This can create a mismatch between the distributed simulation communication patterns and the physical capabilities of the hardware. If large distributed simulations are to take full advantage of these systems, the communication properties of the HPC need to be taken into consideration when allocating workload to route frequent, heavy communication through fast network links. Strategies that consider the heterogeneous physical communication capabilities are called architecture-aware. After demonstrating that hypergraph partitioning leads to more efficient workload allocation in SNN simulations, this thesis proposes a novel sequential hypergraph partitioning algorithm that incorporates network bandwidth via profiling. This leads to a significant reduction in execution time (up to 14x speedup in synthetic benchmark simulations compared to architecture-agnostic partitioners). The motivating context of this work is large scale brain simulations, however in the era of social media, large graphs and hypergraphs are increasingly relevant in many other scientific applications. A common feature of such graphs is that they are too big for a single machine to cope, both in terms of performance and memory requirements. State-of-the-art multilevel partitioning has been shown to struggle to scale to large graphs in distributed memory, not just because they take a long time to process, but also because they require full knowledge of the graph (not possible in dynamic graphs) and to fit the graph entirely in memory (not possible for very large graphs). To address those limitations we propose a parallel implementation of our architecture-aware streaming hypergraph partitioning algorithm (HyperPRAW) to model distributed applications. Results demonstrate that HyperPRAW produces consistent speedup over previous streaming approaches that only consider hyperedge overlap (up to 5.2x speedup). Compared to multilevel global partitioner in dense hypergraphs (those with high average cardinality), HyperPRAW is able to produce workload allocations that result in speeding up runtime in a synthetic simulation benchmark (up to 4.3x). HyperPRAW has the potential to scale to very large hypergraphs as it only requires local information to make allocation decisions, with an order of magnitude less memory footprint than global partitioners. The combined contributions of this thesis lead to a novel, parallel, scalable, streaming hypergraph partitioning algorithm (HyperPRAW) that can be used to help scale large distributed simulations in HPC systems. HyperPRAW helps tackle three of the main scalability challenges: it produces highly balanced distributed computation and communication, minimising idle time between computing nodes; it reduces the communication overhead by placing frequently communicating simulation elements close to each other (where the communication cost is minimal); and it provides a solution with a reasonable memory footprint that allows tackling larger problems than state-of-the-art alternatives such as global multilevel partitioning

    MOCAST 2021

    Get PDF
    The 10th International Conference on Modern Circuit and System Technologies on Electronics and Communications (MOCAST 2021) will take place in Thessaloniki, Greece, from July 5th to July 7th, 2021. The MOCAST technical program includes all aspects of circuit and system technologies, from modeling to design, verification, implementation, and application. This Special Issue presents extended versions of top-ranking papers in the conference. The topics of MOCAST include:Analog/RF and mixed signal circuits;Digital circuits and systems design;Nonlinear circuits and systems;Device and circuit modeling;High-performance embedded systems;Systems and applications;Sensors and systems;Machine learning and AI applications;Communication; Network systems;Power management;Imagers, MEMS, medical, and displays;Radiation front ends (nuclear and space application);Education in circuits, systems, and communications

    Large bichromatic point sets admit empty monochromatic 4-gons

    No full text
    We consider a variation of a problem stated by Erd˝os and Szekeres in 1935 about the existence of a number fES(k) such that any set S of at least fES(k) points in general position in the plane has a subset of k points that are the vertices of a convex k-gon. In our setting the points of S are colored, and we say that a (not necessarily convex) spanned polygon is monochromatic if all its vertices have the same color. Moreover, a polygon is called empty if it does not contain any points of S in its interior. We show that any bichromatic set of n ≥ 5044 points in R2 in general position determines at least one empty, monochromatic quadrilateral (and thus linearly many).Postprint (published version

    Simulated Annealing

    Get PDF
    The book contains 15 chapters presenting recent contributions of top researchers working with Simulated Annealing (SA). Although it represents a small sample of the research activity on SA, the book will certainly serve as a valuable tool for researchers interested in getting involved in this multidisciplinary field. In fact, one of the salient features is that the book is highly multidisciplinary in terms of application areas since it assembles experts from the fields of Biology, Telecommunications, Geology, Electronics and Medicine
    corecore