221 research outputs found
Efficient Neural Network Compression
Network compression reduces the computational complexity and memory
consumption of deep neural networks by reducing the number of parameters. In
SVD-based network compression, the right rank needs to be decided for every
layer of the network. In this paper, we propose an efficient method for
obtaining the rank configuration of the whole network. Unlike previous methods
which consider each layer separately, our method considers the whole network to
choose the right rank configuration. We propose novel accuracy metrics to
represent the accuracy and complexity relationship for a given neural network.
We use these metrics in a non-iterative fashion to obtain the right rank
configuration which satisfies the constraints on FLOPs and memory while
maintaining sufficient accuracy. Experiments show that our method provides
better compromise between accuracy and computational complexity/memory
consumption while performing compression at much higher speed. For VGG-16 our
network can reduce the FLOPs by 25% and improve accuracy by 0.7% compared to
the baseline, while requiring only 3 minutes on a CPU to search for the right
rank configuration. Previously, similar results were achieved in 4 hours with 8
GPUs. The proposed method can be used for lossless compression of a neural
network as well. The better accuracy and complexity compromise, as well as the
extremely fast speed of our method makes it suitable for neural network
compression
Education Policy and Industrial Development: The Cases of Korea and Mexico
After many scholars studies, it has been suggested that among several facts of economic growth, Koreas relatively intensive investment in education made its fast economic growth possible. This study started from the question of whether large education expenditure automatically leads to a fast economic growth. We suggest that the expenditure must be allocated to the education level that is in accordance with the industrial policy, which in turn must consist with the countrys economic development stage. In Korea, the education sector supplied workers with adequate level of education that was required in each stage of development, whereas in Mexico, the supply of workers by education level was mismatched with the demand for labor derived from the industrial structure at each development stage. We conclude that not only the size of the expenditure but also its efficient use is important to guarantee the positive effects of education expenditure on economic growth
A Router for Symmetrical FPGAs based on Exact Routing Density Evaluation
Abstract This paper presents a new performance and routability driven routing algorithm for symmetrical array based field-programmable gate arrays (FPGAs). A key contribution of our work is to overcome one essential limitation of the previous routing algorithms: inaccurate estimations of routing density which were too general for symmetrical FPGAs. To this end, we derive an exact routing density calculation that is based on a precise analysis of the structure (switch block) of symmetrical FPGAs, and utilize it consistently in global and detailed routings. With an introduction of the proposed accurate routing metrics, we design a new routing algorithm called a cost-effective net-decomposition based routing which is fast, and yet produces remarkable routing results in terms of both routability and path/net delays. We performed an extensive experiment to show the effectiveness of our algorithm based on the proposed cost metrics
Temperature-Aware Runtime Power Management for Chip-Multiprocessors with 3-D Stacked Cache
The advent of 3-D fabrication technology makes it possible to stack a large amount of last-level cache memory onto a multi-core die to reduce off-chip memory accesses and, thus, increases system performance. However, the higher power density (i.e., power dissipation per unit volume) of 3-D integrated circuits (ICs) might incur temperature-related problems in reliability, leakage power, system performance, and cooling cost. In this paper, we propose a runtime solution to maximize the performance (i.e., instruction throughput) of chip-multiprocessors with 3-D stacked last-level cache memory, without thermal-constraint violation. The proposed method combines runtime cache tuning (e.g., cache-way partitioning, cache-way power-gating, cache data placement) with per-core dynamic voltage/frequency scaling (DVFS) in a temperature-aware manner. Experimental results show that the integrated method offers 23% performance improvement on average in terms of instructions per second (IPS) compared with temperature-aware runtime cache tuning only
Runtime 3-D Stacked Cache Management for Chip-Multiprocessors
These-dimensional (3-D) memory stacking is one of the most promising solutions to memory bandwidth problems in chip multiprocessors. In this work, we propose an efficient runtime 3-D cache management technique which takes advantage of the lower latencies through vertical interconnect as well as the runtime memory demand of applications which varies dynamically with time. Experimental results show that the proposed method offers performance improvement by up to 26.7% and on average 13.1% compared with the private cache organization
- …