218 research outputs found

    Efficient Neural Network Compression

    Get PDF
    Network compression reduces the computational complexity and memory consumption of deep neural networks by reducing the number of parameters. In SVD-based network compression, the right rank needs to be decided for every layer of the network. In this paper, we propose an efficient method for obtaining the rank configuration of the whole network. Unlike previous methods which consider each layer separately, our method considers the whole network to choose the right rank configuration. We propose novel accuracy metrics to represent the accuracy and complexity relationship for a given neural network. We use these metrics in a non-iterative fashion to obtain the right rank configuration which satisfies the constraints on FLOPs and memory while maintaining sufficient accuracy. Experiments show that our method provides better compromise between accuracy and computational complexity/memory consumption while performing compression at much higher speed. For VGG-16 our network can reduce the FLOPs by 25% and improve accuracy by 0.7% compared to the baseline, while requiring only 3 minutes on a CPU to search for the right rank configuration. Previously, similar results were achieved in 4 hours with 8 GPUs. The proposed method can be used for lossless compression of a neural network as well. The better accuracy and complexity compromise, as well as the extremely fast speed of our method makes it suitable for neural network compression

    Education Policy and Industrial Development: The Cases of Korea and Mexico

    Get PDF
    After many scholars studies, it has been suggested that among several facts of economic growth, Koreas relatively intensive investment in education made its fast economic growth possible. This study started from the question of whether large education expenditure automatically leads to a fast economic growth. We suggest that the expenditure must be allocated to the education level that is in accordance with the industrial policy, which in turn must consist with the countrys economic development stage. In Korea, the education sector supplied workers with adequate level of education that was required in each stage of development, whereas in Mexico, the supply of workers by education level was mismatched with the demand for labor derived from the industrial structure at each development stage. We conclude that not only the size of the expenditure but also its efficient use is important to guarantee the positive effects of education expenditure on economic growth

    A Router for Symmetrical FPGAs based on Exact Routing Density Evaluation

    Get PDF
    Abstract This paper presents a new performance and routability driven routing algorithm for symmetrical array based field-programmable gate arrays (FPGAs). A key contribution of our work is to overcome one essential limitation of the previous routing algorithms: inaccurate estimations of routing density which were too general for symmetrical FPGAs. To this end, we derive an exact routing density calculation that is based on a precise analysis of the structure (switch block) of symmetrical FPGAs, and utilize it consistently in global and detailed routings. With an introduction of the proposed accurate routing metrics, we design a new routing algorithm called a cost-effective net-decomposition based routing which is fast, and yet produces remarkable routing results in terms of both routability and path/net delays. We performed an extensive experiment to show the effectiveness of our algorithm based on the proposed cost metrics

    Temperature-Aware Runtime Power Management for Chip-Multiprocessors with 3-D Stacked Cache

    Get PDF
    The advent of 3-D fabrication technology makes it possible to stack a large amount of last-level cache memory onto a multi-core die to reduce off-chip memory accesses and, thus, increases system performance. However, the higher power density (i.e., power dissipation per unit volume) of 3-D integrated circuits (ICs) might incur temperature-related problems in reliability, leakage power, system performance, and cooling cost. In this paper, we propose a runtime solution to maximize the performance (i.e., instruction throughput) of chip-multiprocessors with 3-D stacked last-level cache memory, without thermal-constraint violation. The proposed method combines runtime cache tuning (e.g., cache-way partitioning, cache-way power-gating, cache data placement) with per-core dynamic voltage/frequency scaling (DVFS) in a temperature-aware manner. Experimental results show that the integrated method offers 23% performance improvement on average in terms of instructions per second (IPS) compared with temperature-aware runtime cache tuning only

    Runtime 3-D Stacked Cache Management for Chip-Multiprocessors

    Get PDF
    These-dimensional (3-D) memory stacking is one of the most promising solutions to memory bandwidth problems in chip multiprocessors. In this work, we propose an efficient runtime 3-D cache management technique which takes advantage of the lower latencies through vertical interconnect as well as the runtime memory demand of applications which varies dynamically with time. Experimental results show that the proposed method offers performance improvement by up to 26.7% and on average 13.1% compared with the private cache organization
    corecore