72,349 research outputs found

    An integrated placement and routing approach

    Get PDF
    As the feature size continues scaling down, interconnects become the major contributor of signal delay. Since interconnects are mainly determined by placement and routing, these two stages play key roles to achieve high performance. Historically, they are divided into two separate stages to make the problem tractable. Therefore, the routing information is not available during the placement process. Net models such as HPWL, are employed to approximate the routing to simplify the placement problem. However, the good placement in terms of these objectives may not be routable at all in the routing stage because different objectives are optimized in placement and routing stages. This inconsistancy makes the results obtained by the two-step optimization method far from optimal;In order to achieve high-quality placement solution and ensure the following routing, we propose an integrated placement and routing approach. In this approach, we integrate placement and routing into the same framework so that the objective optimized in placement is the same as that in routing. Since both placement and routing are very hard problems (NP-hard), we need to have very efficient algorithms so that integrating them together will not lead to intractable complexity;In this dissertation, we first develop a highly efficient placer - FastPlace 3.0 for large-scale mixed-size placement problem. Then, an efficient and effective detailed placer - FastDP is proposed to improve global placement by moving standard cells in designs. For high-degree nets in designs, we propose a novel performance-driven topology design algorithm to generate good topologies to achieve very strict timing requirement. In the routing phase, we develop two global routers, FastRoute and FastRoute 2.0. Compared to traditional global routers, they can generate better solutions and are two orders of magnitude faster. Finally, based on these efficient and high-quality placement and routing algorithms, we propose a new flow which integrates placement and routing together closely. In this flow, global routing is extensively applied to obtain the interconnect information and direct the placement process. In this way, we can get very good placement solutions with guaranteed routability

    Overcoming loss of contrast in atom interferometry due to gravity gradients

    Get PDF
    Long-time atom interferometry is instrumental to various high-precision measurements of fundamental physical properties, including tests of the equivalence principle. Due to rotations and gravity gradients, the classical trajectories characterizing the motion of the wave packets for the two branches of the interferometer do not close in phase space, an effect which increases significantly with the interferometer time. The relative displacement between the interfering wave packets in such open interferometers leads to a fringe pattern in the density profile at each exit port and a loss of contrast in the oscillations of the integrated particle number as a function of the phase shift. Paying particular attention to gravity gradients, we present a simple mitigation strategy involving small changes in the timing of the laser pulses which is very easy to implement. A useful representation-free description of the state evolution in an atom interferometer is introduced and employed to analyze the loss of contrast and mitigation strategy in the general case. (As a by-product, a remarkably compact derivation of the phase-shift in a general light-pulse atom interferometer is provided.) Furthermore, exact results are obtained for (pure and mixed) Gaussian states which allow a simple interpretation in terms of the alignment of Wigner functions in phase-space. Analytical results are also obtained for expanding Bose-Einstein condensates within the time-dependent Thomas-Fermi approximation. Finally, a combined strategy for rotations and nonaligned gravity gradients is considered as well.Comment: 14+7 pages including appendices, 9 figures; v2 minor changes, matches published versio

    Characterization and Compensation of Network-Level Anomalies in Mixed-Signal Neuromorphic Modeling Platforms

    Full text link
    Advancing the size and complexity of neural network models leads to an ever increasing demand for computational resources for their simulation. Neuromorphic devices offer a number of advantages over conventional computing architectures, such as high emulation speed or low power consumption, but this usually comes at the price of reduced configurability and precision. In this article, we investigate the consequences of several such factors that are common to neuromorphic devices, more specifically limited hardware resources, limited parameter configurability and parameter variations. Our final aim is to provide an array of methods for coping with such inevitable distortion mechanisms. As a platform for testing our proposed strategies, we use an executable system specification (ESS) of the BrainScaleS neuromorphic system, which has been designed as a universal emulation back-end for neuroscientific modeling. We address the most essential limitations of this device in detail and study their effects on three prototypical benchmark network models within a well-defined, systematic workflow. For each network model, we start by defining quantifiable functionality measures by which we then assess the effects of typical hardware-specific distortion mechanisms, both in idealized software simulations and on the ESS. For those effects that cause unacceptable deviations from the original network dynamics, we suggest generic compensation mechanisms and demonstrate their effectiveness. Both the suggested workflow and the investigated compensation mechanisms are largely back-end independent and do not require additional hardware configurability beyond the one required to emulate the benchmark networks in the first place. We hereby provide a generic methodological environment for configurable neuromorphic devices that are targeted at emulating large-scale, functional neural networks

    On the Optimality of Virtualized Security Function Placement in Multi-Tenant Data Centers

    Get PDF
    Security and service protection against cyber attacks remain among the primary challenges for virtualized, multi-tenant Data Centres (DCs), for reasons that vary from lack of resource isolation to the monolithic nature of legacy middleboxes. Although security is currently considered a property of the underlying infrastructure, diverse services require protection against different threats and at timescales which are on par with those of service deployment and elastic resource provisioning. We address the resource allocation problem of deploying customised security services over a virtualized, multi-tenant DC. We formulate the problem in Integral Linear Programming (ILP) as an instance of the NP-hard variable size variable cost bin packing problem with the objective of maximising the residual resources after allocation. We propose a modified version of the Best Fit Decreasing algorithm (BFD) to solve the problem in polynomial time and we show that BFD optimises the objective function up to 80% more than other algorithms

    Parallelizing the QUDA Library for Multi-GPU Calculations in Lattice Quantum Chromodynamics

    Full text link
    Graphics Processing Units (GPUs) are having a transformational effect on numerical lattice quantum chromodynamics (LQCD) calculations of importance in nuclear and particle physics. The QUDA library provides a package of mixed precision sparse matrix linear solvers for LQCD applications, supporting single GPUs based on NVIDIA's Compute Unified Device Architecture (CUDA). This library, interfaced to the QDP++/Chroma framework for LQCD calculations, is currently in production use on the "9g" cluster at the Jefferson Laboratory, enabling unprecedented price/performance for a range of problems in LQCD. Nevertheless, memory constraints on current GPU devices limit the problem sizes that can be tackled. In this contribution we describe the parallelization of the QUDA library onto multiple GPUs using MPI, including strategies for the overlapping of communication and computation. We report on both weak and strong scaling for up to 32 GPUs interconnected by InfiniBand, on which we sustain in excess of 4 Tflops.Comment: 11 pages, 7 figures, to appear in the Proceedings of Supercomputing 2010 (submitted April 12, 2010
    corecore