206,812 research outputs found

    An Effective Routability-driven Placer for Mixed-size Circuit Designs

    Get PDF
    We propose a routability-driven analytical placer that aims at distributing pins evenly. This is accomplished by including a group of pin density constraints in its mathematical formulation. Moreover, for mixed-size circuits, we adopt a scaled smoothing method to cope with fixed macro blocks. As a result, we have fewer cells overlapping with fixed blocks after global placement, implying that the optimization of the global placement solution is more accurate and that the global placement solution resembles a legal solution more. Routing solutions obtained by a commercial router show that for most benchmark circuits, better routing results can be achieved on the placement results generated by our pin density oriented placer

    Multifocus Image Fusion Using Biogeography-Based Optimization

    Get PDF
    For multifocus image fusion in spatial domain, sharper blocks from different source images are selected to fuse a new image. Block size significantly affects the fusion results and a fixed block size is not applicable in various multifocus images. In this paper, a novel multifocus image fusion algorithm using biogeography-based optimization is proposed to obtain the optimal block size. The sharper blocks of each source image are first selected by sum modified Laplacian and morphological filter to contain an initial fused image. Then, the proposed algorithm uses the migration and mutation operation of biogeography-based optimization to search the optimal block size according to the fitness function in respect of spatial frequency. The chaotic search is adopted during iteration to improve optimization precision. The final fused image is constructed based on the optimal block size. Experimental results demonstrate that the proposed algorithm has good quantitative and visual evaluations

    Let's Make Block Coordinate Descent Go Fast: Faster Greedy Rules, Message-Passing, Active-Set Complexity, and Superlinear Convergence

    Full text link
    Block coordinate descent (BCD) methods are widely-used for large-scale numerical optimization because of their cheap iteration costs, low memory requirements, amenability to parallelization, and ability to exploit problem structure. Three main algorithmic choices influence the performance of BCD methods: the block partitioning strategy, the block selection rule, and the block update rule. In this paper we explore all three of these building blocks and propose variations for each that can lead to significantly faster BCD methods. We (i) propose new greedy block-selection strategies that guarantee more progress per iteration than the Gauss-Southwell rule; (ii) explore practical issues like how to implement the new rules when using "variable" blocks; (iii) explore the use of message-passing to compute matrix or Newton updates efficiently on huge blocks for problems with a sparse dependency between variables; and (iv) consider optimal active manifold identification, which leads to bounds on the "active set complexity" of BCD methods and leads to superlinear convergence for certain problems with sparse solutions (and in some cases finite termination at an optimal solution). We support all of our findings with numerical results for the classic machine learning problems of least squares, logistic regression, multi-class logistic regression, label propagation, and L1-regularization

    High-level synthesis optimization for blocked floating-point matrix multiplication

    Get PDF
    In the last decade floating-point matrix multiplication on FPGAs has been studied extensively and efficient architectures as well as detailed performance models have been developed. By design these IP cores take a fixed footprint which not necessarily optimizes the use of all available resources. Moreover, the low-level architectures are not easily amenable to a parameterized synthesis. In this paper high-level synthesis is used to fine-tune the configuration parameters in order to achieve the highest performance with maximal resource utilization. An\ exploration strategy is presented to optimize the use of critical resources (DSPs, memory) for any given FPGA. To account for the limited memory size on the FPGA, a block-oriented matrix multiplication is organized such that the block summation is done on the CPU while the block multiplication occurs on the logic fabric simultaneously. The communication overhead between the CPU and the FPGA is minimized by streaming the blocks in a Gray code ordering scheme which maximizes the data reuse for consecutive block matrix product calculations. Using high-level synthesis optimization, the programmable logic operates at 93% of the theoretical peak performance and the combined CPU-FPGA design achieves 76% of the available hardware processing speed for the floating-point multiplication of 2K by 2K matrices
    • …
    corecore