159 research outputs found

    High-Performance Placement and Routing for the Nanometer Scale.

    Full text link
    Modern semiconductor manufacturing facilitates single-chip electronic systems that only five years ago required ten to twenty chips. Naturally, design complexity has grown within this period. In contrast to this growth, it is becoming common in the industry to limit design team size which places a heavier burden on design automation tools. Our work identifies new objectives, constraints and concerns in the physical design of systems-on-chip, and develops new computational techniques to address them. In addition to faster and more relevant design optimizations, we demonstrate that traditional design flows based on ``separation of concerns'' produce unnecessarily suboptimal layouts. We develop new integrated optimizations that streamline traditional chains of loosely-linked design tools. In particular, we bridge the gap between mixed-size placement and routing by updating the objective of global and detail placement to a more accurate estimate of routed wirelength. To this we add sophisticated whitespace allocation, and the combination provides increased routability, faster routing, shorter routed wirelength, and the best via counts of published techniques. To further improve post-routing design metrics, we present new global routing techniques based on Discrete Lagrange Multipliers (DLM) which produce the best routed wirelength results on recent benchmarks. Our work culminates in the integration of our routing techniques within an incremental placement flow to improve detailed routing solutions, shrink die sizes and reduce total chip cost. Not only do our techniques improve the quality and cost of designs, but also simplify design automation software implementation in many cases. Ultimately, we reduce the time needed for design closure through improved tool fidelity and the use of our incremental techniques for placement and routing.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/64639/1/royj_1.pd

    Concurrent Image Processing Executive (CIPE)

    Get PDF
    The design and implementation of a Concurrent Image Processing Executive (CIPE), which is intended to become the support system software for a prototype high performance science analysis workstation are discussed. The target machine for this software is a JPL/Caltech Mark IIIfp Hypercube hosted by either a MASSCOMP 5600 or a Sun-3, Sun-4 workstation; however, the design will accommodate other concurrent machines of similar architecture, i.e., local memory, multiple-instruction-multiple-data (MIMD) machines. The CIPE system provides both a multimode user interface and an applications programmer interface, and has been designed around four loosely coupled modules; (1) user interface, (2) host-resident executive, (3) hypercube-resident executive, and (4) application functions. The loose coupling between modules allows modification of a particular module without significantly affecting the other modules in the system. In order to enhance hypercube memory utilization and to allow expansion of image processing capabilities, a specialized program management method, incremental loading, was devised. To minimize data transfer between host and hypercube a data management method which distributes, redistributes, and tracks data set information was implemented

    Visualization And Collision Detection Of Direct Metal Deposition

    Get PDF
    Direct metal deposition (DMD) is a manufacturing technique that manufactures solid metal parts from bottom to top using powdered metal and a focused laser. In this research, the swept volume technique was used as framework to develop a computer program to perform volumetric visualization of the deposition process as a pre-processor, before the actual metal deposition commences

    Makespan Minimization in Re-entrant Permutation Flow Shops

    Get PDF
    Re-entrant permutation flow shop problems occur in practical applications such as wafer manufacturing, paint shops, mold and die processes and textile industry. A re-entrant material flow means that the production jobs need to visit at least one working station multiple times. A comprehensive review gives an overview of the literature on re-entrant scheduling. The influence of missing operations received just little attention so far and splitting the jobs into sublots was not examined in re-entrant permutation flow shops before. The computational complexity of makespan minimization in re-entrant permutation flow shop problems requires heuristic solution approaches for large problem sizes. The problem provides promising structural properties for the application of a variable neighborhood search because of the repeated processing of jobs on several machines. Furthermore the different characteristics of lot streaming and their impact on the makespan of a schedule are examined in this thesis and the heuristic solution methods are adjusted to manage the problem’s extension

    Data Bandwidth Reduction Techniques For Distributed Embedded Simulatio

    Get PDF
    Maintaining coherence between the independent views of multiple participants at distributed locations is essential in an Embedded Simulation environment. Currently, the Distributed Interactive Simulation (DIS) protocol maintains coherence by broadcasting the entity state streams from each simulation station. In this dissertation, a novel alternative to DIS that replaces the transmitting sources with local sources is developed, validated, and assessed by analytical and experimental means. The proposed Concurrent Model approach reduces the communication burden to transmission of only synchronization and model-update messages. Necessary and sufficient conditions for the correctness of Concurrent Models in a discrete event simulation environment are established by developing Behavioral Congruence ¨B(EL, ER) and Temporal Congruence ¨T(t, ER) functions. They indicate model discrepancies with respect to the simulation time t, and the local and remote entity state streams EL and ER, respectively. Performance benefits were quantified in terms of the bandwidth reduction ratio BR=N/I obtained from the comparison of the OneSAF Testbed Semi-Automated Forces (OTBSAF) simulator under DIS requiring a total of N bits and a testbed modified for the Concurrent Model approach which required I bits. In the experiments conducted, a range of 100 d BR d 294 was obtained representing two orders of magnitude reduction in simulation traffic. Investigation showed that the models rely heavily on the priority data structure of the discrete event simulation and that performance of the overall simulation can be enhanced by an additional 6% by improving the queue management. A low run-time overhead, self-adapting storage policy called the Smart Priority Queue (SPQ) was developed and evaluated within the Concurrent Model. The proposed SPQ policies employ a lowcomplexity linear queue for near head activities and a rapid-indexing variable binwidth calendar queue for distant events. The SPQ configuration is determined by monitoring queue access behavior using cost scoring factors and then applying heuristics to adjust the organization of the underlying data structures. Results indicate that optimizing storage to the spatial distribution of queue access can decrease HOLD operation cost between 25% and 250% over existing algorithms such as calendar queues. Taken together, these techniques provide an entity state generation mechanism capable of overcoming the challenges of Embedded Simulation in harsh mobile communications environments with restricted bandwidth, increased message latency, and extended message drop-outs

    POWER AND PERFORMANCE STUDIES OF THE EXPLICIT MULTI-THREADING (XMT) ARCHITECTURE

    Get PDF
    Power and thermal constraints gained critical importance in the design of microprocessors over the past decade. Chipmakers failed to keep power at bay while sustaining the performance growth of serial computers at the rate expected by consumers. As an alternative, they turned to fitting an increasing number of simpler cores on a single die. While this is a step forward for relaxing the constraints, the issue of power is far from resolved and it is joined by new challenges which we explain next. As we move into the era of many-cores, processors consisting of 100s, even 1000s of cores, single-task parallelism is the natural path for building faster general-purpose computers. Alas, the introduction of parallelism to the mainstream general-purpose domain brings another long elusive problem to focus: ease of parallel programming. The result is the dual challenge where power efficiency and ease-of-programming are vital for the prevalence of up and coming many-core architectures. The observations above led to the lead goal of this dissertation: a first order validation of the claim that even under power/thermal constraints, ease-of-programming and competitive performance need not be conflicting objectives for a massively-parallel general-purpose processor. As our platform, we choose the eXplicit Multi-Threading (XMT) many-core architecture for fine grained parallel programs developed at the University of Maryland. We hope that our findings will be a trailblazer for future commercial products. XMT scales up to thousand or more lightweight cores and aims at improving single task execution time while making the task for the programmer as easy as possible. Performance advantages and ease-of-programming of XMT have been shown in a number of publications, including a study that we present in this dissertation. Feasibility of the hardware concept has been exhibited via FPGA and ASIC (per our partial involvement) prototypes. Our contributions target the study of power and thermal envelopes of an envisioned 1024-core XMT chip (XMT1024) under programs that exist in popular parallel benchmark suites. First, we compare XMT against an area and power equivalent commercial high-end many-core GPU. We demonstrate that XMT can provide an average speedup of 8.8x in irregular parallel programs that are common and important in general purpose computing. Even under the worst-case power estimation assumptions for XMT, average speedup is only reduced by half. We further this study by experimentally evaluating the performance advantages of Dynamic Thermal Management (DTM), when applied to XMT1024. DTM techniques are frequently used in current single and multi-core processors, however until now their effects on single-tasked many-cores have not been examined in detail. It is our purpose to explore how existing techniques can be tailored for XMT to improve performance. Performance improvements up to 46% over a generic global management technique has been demonstrated. The insights we provide can guide designers of other similar many-core architectures. A significant infrastructure contribution of this dissertation is a highly configurable cycle-accurate simulator, XMTSim. To our knowledge, XMTSim is currently the only publicly-available shared-memory many-core simulator with extensive capabilities for estimating power and temperature, as well as evaluating dynamic power and thermal management algorithms. As a major component of the XMT programming toolchain, it is not only used as the infrastructure in this work but also contributed to other publications and dissertations
    corecore