425 research outputs found

    Applied Metaheuristic Computing

    Get PDF
    For decades, Applied Metaheuristic Computing (AMC) has been a prevailing optimization technique for tackling perplexing engineering and business problems, such as scheduling, routing, ordering, bin packing, assignment, facility layout planning, among others. This is partly because the classic exact methods are constrained with prior assumptions, and partly due to the heuristics being problem-dependent and lacking generalization. AMC, on the contrary, guides the course of low-level heuristics to search beyond the local optimality, which impairs the capability of traditional computation methods. This topic series has collected quality papers proposing cutting-edge methodology and innovative applications which drive the advances of AMC

    Development of Novel Independent Component Analysis Techniques and their Applications

    Get PDF
    Real world problems very often provide minimum information regarding their causes. This is mainly due to the system complexities and noninvasive techniques employed by scientists and engineers to study such systems. Signal and image processing techniques used for analyzing such systems essentially tend to be blind. Earlier, training signal based techniques were used extensively for such analyses. But many times either these training signals are not practicable to be availed by the analyzer or become burden on the system itself. Hence blind signal/image processing techniques are becoming predominant in modern real time systems. In fact, blind signal processing has become a very important topic of research and development in many areas, especially biomedical engineering, medical imaging, speech enhancement, remote sensing, communication systems, exploration seismology, geophysics, econometrics, data mining, sensor networks etc. Blind Signal Processing has three major areas: Blind Signal Separation and Extraction, Independent Component Analysis (ICA) and Multichannel Blind Deconvolution and Equalization. ICA technique has also been typically applied to the other two areas mentioned above. Hence ICA research with its wide range of applications is quite interesting and has been taken up as the central domain of the present work

    Multi-Objective Task Scheduling Approach for Fog Computing

    Get PDF
    Despite the remarkable work conducted to improve fog computing applications’ efficiency, the task scheduling problem in such an environment is still a big challenge. Optimizing the task scheduling in these applications, i.e. critical healthcare applications, smart cities, and transportation is urgent to save energy, improve the quality of service, reduce the carbon emission rate, and improve the flow time. As proposed in much recent work, dealing with this problem as a single objective problem did not get the desired results. As a result, this paper presents a new multi-objective approach based on integrating the marine predator’s algorithm with the polynomial mutation mechanism (MHMPA) for task scheduling in fog computing environments. In the proposed algorithm, a trade-off between the makespan and the carbon emission ratio based on the Pareto optimality is produced. An external archive is utilized to store the non-dominated solutions generated from the optimization process. Also, another improved version based on the marine predator’s algorithm (MIMPA) by using the Cauchy distribution instead of the Gaussian distribution with the levy Flight to increase the algorithm’s convergence with avoiding stuck into local minima as possible is investigated in this manuscript. The experimental outcomes proved the superiority of the MIMPA over the standard one under various performance metrics. However, the MIMPA couldn’t overcome the MHMPA even after integrating the polynomial mutation strategy with the improved version. Furthermore, several well-known robust multi-objective optimization algorithms are used to test the efficacy of the proposed method. The experiment outcomes show that MHMPA could achieve better outcomes for the various employed performance metrics: Flow time, carbon emission rate, energy, and makespan with an improvement percentage of 414, 27257.46, 64151, and 2 for those metrics, respectively, compared to the second-best compared algorithm

    Quantum Nescimus: Improving the characterization of quantum systems from limited information

    Get PDF
    We are currently approaching the point where quantum systems with 15 or more qubits will be controllable with high levels of coherence over long timescales. One of the fundamental problems that has been identified is that, as the number of qubits increases to these levels, there is currently no clear way to use efficiently the information that can be obtained from such a system to make diagnostic inferences and to enable improvements in the underlying quantum gates. Even with systems of only a few bits the exponential scaling in resources required by techniques such as quantum tomography or gate-set tomography will render these techniques impractical. Randomized benchmarking (RB) is a technique that will scale in a practical way with these increased system sizes. Although RB provides only a partial characterization of the quantum system, recent advances in the protocol and the interpretation of the results of such experiments confirm the information obtained as helpful in improving the control and verification of such processes. This thesis examines and extends the techniques of RB including practical analysis of systems affected by low frequency noise, extending techniques to allow the anisotropy of noise to be isolated, and showing how additional gates required for universal computation can be added to the protocol and thus benchmarked. Finally, it begins to explore the use of machine learning to aid in the ability to characterize, verify and validate noise in such systems, demonstrating by way of example how machine learning can be used to explore the edge between quantum non-locality and realism

    Evolutionary Algorithms in Engineering Design Optimization

    Get PDF
    Evolutionary algorithms (EAs) are population-based global optimizers, which, due to their characteristics, have allowed us to solve, in a straightforward way, many real world optimization problems in the last three decades, particularly in engineering fields. Their main advantages are the following: they do not require any requisite to the objective/fitness evaluation function (continuity, derivability, convexity, etc.); they are not limited by the appearance of discrete and/or mixed variables or by the requirement of uncertainty quantification in the search. Moreover, they can deal with more than one objective function simultaneously through the use of evolutionary multi-objective optimization algorithms. This set of advantages, and the continuously increased computing capability of modern computers, has enhanced their application in research and industry. From the application point of view, in this Special Issue, all engineering fields are welcomed, such as aerospace and aeronautical, biomedical, civil, chemical and materials science, electronic and telecommunications, energy and electrical, manufacturing, logistics and transportation, mechanical, naval architecture, reliability, robotics, structural, etc. Within the EA field, the integration of innovative and improvement aspects in the algorithms for solving real world engineering design problems, in the abovementioned application fields, are welcomed and encouraged, such as the following: parallel EAs, surrogate modelling, hybridization with other optimization techniques, multi-objective and many-objective optimization, etc

    ์ž„๋ฒ ๋””๋“œ ์‹œ์Šคํ…œ์—์„œ ์—ฌ๋Ÿฌ ์ปจ๋ณผ๋ฃจ์…˜ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ๋ฅผ ์œ„ํ•œ ํ•˜๋“œ์›จ์–ด๋ฅผ ๊ณ ๋ คํ•˜๋Š” ์†Œํ”„ํŠธ์›จ์–ด ์ตœ์ ํ™” ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021. 2. ํ•˜์ˆœํšŒ.์ž„๋ฒ ๋””๋“œ ๊ธฐ๊ธฐ๋Š” ๋Œ€๊ฐœ ๊ณ„์‚ฐ๋Ÿ‰, ๋ฉ”๋ชจ๋ฆฌ ํฌ๊ธฐ, ์—๋„ˆ์ง€ ์†Œ๋ชจ๋Ÿ‰ ๋“ฑ์˜ ์ œ์•ฝ ์‚ฌํ•ญ์ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์—, ๋”ฅ ๋Ÿฌ๋‹ ์‘์šฉ์„ ์ž„๋ฒ ๋””๋“œ ๊ธฐ๊ธฐ์—์„œ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์€ ์‰ฝ์ง€ ์•Š๋‹ค. ๋”ฅ ๋Ÿฌ๋‹ ์‘์šฉ์˜ ๊ณ„์‚ฐ๋Ÿ‰ ์ฆ๊ฐ€๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด์„œ ์—๋„ˆ์ง€ ํšจ์œจ์ ์ธ ๋ชจ๋ฐ”์ผ GPU, ๋””์ง€ํ„ธ ์‹ ํ˜ธ ์ฒ˜๋ฆฌ ํ”„๋กœ์„ธ์„œ์„ ์‚ฌ์šฉํ•˜๊ฑฐ๋‚˜, ๋˜๋Š” ์ƒˆ๋กœ์šด ๋‰ด๋Ÿด ํ”„๋กœ์„ธ์„œ ์นฉ์„ ๋งŒ๋“œ๋ ค๋Š” ํ•˜๋“œ์›จ์–ด ์˜์—ญ์˜ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค. ๋ฐ˜๋ฉด์— ๋”ฅ ๋Ÿฌ๋‹ ์‘์šฉ ์˜์—ญ์—์„œ๋Š” ์ƒˆ๋กœ์šด ๋”ฅ ๋Ÿฌ๋‹ ์‘์šฉ์„ ๋งŒ๋“ค๊ฑฐ๋‚˜, ๋”ฅ ๋Ÿฌ๋‹์˜ ํ†ต๊ณ„์ ์ธ ํŠน์„ฑ์„ ์ด์šฉํ•œ ๊ทผ์‚ฌ ๊ณ„์‚ฐ ๋ฐฉ๋ฒ•์„ ์ด์šฉํ•˜์—ฌ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜๊ณ  ์žˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๋˜ ๋‹ค๋ฅธ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” ๋จผ์ € ํ•˜๋“œ์›จ์–ด ํ”Œ๋žซํผ์˜ ์„ฑ๋Šฅ ๋ณ‘๋ชฉ ๋ถ€๋ถ„์„ ์ฐพ๊ณ , ์ผ์„ ๋™๋“ฑํ•˜๊ฒŒ ์—ฌ๋Ÿฌ ๊ณ„์‚ฐ ์ž์›์— ๋ถ„๋ฐฐํ•˜์—ฌ ์ตœ์ ํ™”ํ•˜๋Š” ํ•˜๋“œ์›จ์–ด๋ฅผ ๊ณ ๋ คํ•œ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์ด ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ํ•˜๋“œ์›จ์–ด๋ฅผ ๊ณ ๋ คํ•œ ์†Œํ”„ํŠธ์›จ์–ด ์ตœ์ ํ™” ๋ฐฉ๋ฒ•๋“ค์„ ๊ณ ์•ˆํ•˜์˜€๋‹ค. ๋จผ์ €, LPIRC ๋Œ€ํšŒ์— ์ฐธ๊ฐ€ํ•œ ๊ฒฝํ—˜์„ ๋ฐ”ํƒ•์œผ๋กœ ์ž„๋ฒ ๋””๋“œ ๋”ฅ ๋Ÿฌ๋‹ ์‹œ์Šคํ…œ์„ ์ตœ์ ํ™”ํ•˜๋Š” ์ฒด๊ณ„์ ์ธ ๋ฐฉ๋ฒ•๋ก ์„ ๊ณ ์•ˆํ•˜๊ณ , ๊ทธ ๋ฐฉ๋ฒ•๋ก ์— ๋”ฐ๋ฅธ C-GOOD์ด๋ผ๋Š” ๋”ฅ ๋Ÿฌ๋‹ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ตฌํ˜„ํ•˜์˜€๋‹ค. C-GOOD์€ ํ•˜๋“œ์›จ์–ด ํ”Œ๋žซํผ์— ๋…๋ฆฝ์ ์œผ๋กœ ์ž‘๋™ํ•˜๊ธฐ ์œ„ํ•ด ๋Œ€๋ถ€๋ถ„์˜ ์ž„๋ฒ ๋””๋“œ ๊ธฐ๊ธฐ์—์„œ ์ปดํŒŒ์ผ, ์ˆ˜ํ–‰์ด ๊ฐ€๋Šฅํ•œ C ์ฝ”๋“œ๋ฅผ ์ƒ์„ฑํ•œ๋‹ค. ๋˜ํ•œ ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋”ฅ ๋Ÿฌ๋‹ ์‘์šฉ ์˜์—ญ์˜ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์˜ต์…˜๊ณผ ์‹œ์Šคํ…œ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜์˜€๋‹ค. ์ด ๋ฐฉ๋ฒ•๋ก ์„ Jetson TX2, Odroid XU4, SRP ๋“ฑ์˜ ์„œ๋กœ ๋‹ค๋ฅธ 3๊ฐœ์˜ ๊ธฐ๊ธฐ์— ์ ์šฉํ•ด ๋ด„์œผ๋กœ์จ, ๊ณ ์•ˆ๋œ ๋ฐฉ๋ฒ•๋ก ์ด ํ•˜๋“œ์›จ์–ด ํ”Œ๋žซํผ์— ๋…๋ฆฝ์ ์ด๋ฉฐ C-GOOD์„ ํ†ตํ•ด ์‰ฝ๊ฒŒ ์—ฌ๋Ÿฌ ๋”ฅ ๋Ÿฌ๋‹ ์‘์šฉ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•  ์ˆ˜ ์žˆ์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค. ์ตœ๊ทผ ์ž„๋ฒ ๋””๋“œ ๊ธฐ๊ธฐ์— ์ด์ข… ํ”„๋กœ์„ธ์„œ๋“ค์ด ๋งŽ์ด ํƒ‘์žฌ๋˜๊ณ  ์žˆ๊ณ , ๋™์‹œ์— ์ž์œจ ์ฃผํ–‰ ์ž๋™์ฐจ์™€ ์Šค๋งˆํŠธํฐ ๋“ฑ์˜ ํ•˜๋‚˜์˜ ์ž„๋ฒ ๋””๋“œ ๊ธฐ๊ธฐ์—์„œ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋”ฅ ๋Ÿฌ๋‹ ์‘์šฉ์„ ๋™์‹œ์— ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์ด ํ•„์š”ํ•ด์ง€๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์—ฌ๋Ÿฌ ๋”ฅ ๋Ÿฌ๋‹ ์‘์šฉ์„ ์ด์ข… ํ”„๋กœ์„ธ์„œ๋“ค์„ ํƒ‘์žฌํ•œ ์ž„๋ฒ ๋””๋“œ ๊ธฐ๊ธฐ์— ์Šค์ผ€์ค„ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๊ณ ์•ˆํ•˜๊ณ , ์Šค์ผ€์ค„๋ง ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ตฌํ˜„ํ•˜์˜€๋‹ค. ์ด ๋ฐฉ๋ฒ•๋ก ์€ ์‹ค์ œ ๊ธฐ๊ธฐ์—์„œ์˜ ํ”„๋กœํŒŒ์ผ๋ง๋ถ€ํ„ฐ ์Šค์ผ€์ค„ ๊ฒฐ๊ณผ๋ฅผ ์‹ค์ œ ๊ธฐ๊ธฐ์—์„œ ํ™•์ธํ•˜๋Š” ๊ณผ์ •๊นŒ์ง€ ํฌํ•จํ•˜๋ฉฐ, ์‹ค์ œ ๊ธฐ๊ธฐ์—์„œ ๋ฐœ์ƒํ•˜๋Š” ์ด์Šˆ๋“ค์ธ DVFS, CPU Hot-plug ๋“ฑ์„ ๊ณ ๋ คํ•˜์˜€๋‹ค. ์ด์ข… ํ”„๋กœ์„ธ์„œ๋กœ์˜ ์Šค์ผ€์ค„๋ง ๊ธฐ๋ฒ•์œผ๋กœ๋Š” ๋งŽ์ด ์‚ฌ์šฉ๋˜๋Š” ๋ฉ”ํƒ€ ํœด๋ฆฌ์Šคํ‹ฑ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์œ ์ „ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ํŠนํžˆ, ์„œ๋กœ ๋‹ค๋ฅธ ์ฃผ๊ธฐ์™€ ์ƒ๋Œ€ ์˜คํ”„์…‹์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ์—ฌ๋Ÿฌ ์‘์šฉ์„ ๋™์‹œ์— ์Šค์ผ€์ค„ํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋ชจ๋“  ํƒœ์Šคํฌ๋“ค์˜ ์Šค์ผ€์ค„ ๊ฐ€๋Šฅ์„ฑ์„ ๊ณ ๋ คํ•˜์—ฌ ์Šค์ผ€์ค„ํ•˜์˜€๋‹ค. ์Šค์ผ€์ค„ ๊ฒฐ๊ณผ๋ฅผ ๊ฒ€์ฆํ•˜๊ธฐ ์œ„ํ•ด์„œ, ACL์˜ ์ฝ”์–ด ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ด์šฉํ•˜์—ฌ ๋”ฅ ๋Ÿฌ๋‹ ์ถ”๋ก  ์‘์šฉ์„ ๊ตฌํ˜„ํ•˜์˜€์œผ๋ฉฐ, ์Šค์ผ€์ค„ ๊ฒฐ๊ณผ์™€ ๊ฐ™์ด ๊ฐ ๋ ˆ์ด์–ด๋“ค์„ ์‹ค์ œ ํ•˜๋“œ์›จ์–ด์˜ ์„œ๋กœ ๋‹ค๋ฅธ ํ”„๋กœ์„ธ์„œ ๋งคํ•‘ํ•˜๋„๋ก ๊ตฌํ˜„ํ•˜์˜€๋‹ค. ๊ฐค๋Ÿญ์‹œ S9 ์Šค๋งˆํŠธํฐ๊ณผ Hikey 970 ๋ณด๋“œ์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ ๋‘๊ฐœ์˜ ๋”ฅ ๋Ÿฌ๋‹ ๋„คํŠธ์›Œํฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๊ณ , ์Šค์ผ€์ค„ ๊ฒฐ๊ณผ์™€ ๋น„๊ตํ•˜์—ฌ ๋ฐฉ๋ฒ•๋ก ์„ ๊ฒ€์ฆํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ์ด์ „ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•๋“ค์ด ๋”ฅ ๋Ÿฌ๋‹ ์‘์šฉ์˜ ๊ณ„์‚ฐ๋Ÿ‰๊ณผ ํ”„๋กœ์„ธ์„œ๋“ค์— ์ง‘์ค‘ํ•˜์˜€๋Š”๋ฐ, ๋”ฅ ๋Ÿฌ๋‹ ๊ฐ€์†๊ธฐ ๋˜๋Š” NPU์˜ ์„ฑ๋Šฅ ๋ณ‘๋ชฉ์ด ์ƒ๊ธฐ๋Š” ์›์ธ์€ ์˜คํ”„ ์นฉ ๋ฉ”๋ชจ๋ฆฌ์™€ ์˜จ ์นฉ ์‚ฌ์ด์˜ ํ†ต์‹ ์ด๋‹ค. ๋”์šฑ์ด ์˜คํ”„ ์นฉ ๋ฉ”๋ชจ๋ฆฌ DRAM ์ ‘๊ทผ์€ NPU์˜ ์ „๋ ฅ์†Œ๋ชจ์˜ ๋งŽ์€ ๋ถ€๋ถ„์„ ์ฐจ์ง€ํ•œ๋‹ค๊ณ  ์•Œ๋ ค์ ธ์žˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด์™€ ๊ฐ™์€ ์˜คํ”„ ์นฉ DRAM ์ ‘๊ทผ์œผ๋กœ ์ธํ•œ NPU์˜ ์„ฑ๋Šฅ๊ณผ ์—๋„ˆ์ง€ ์˜ํ–ฅ์„ ์ค„์ด๊ณ ์ž ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์˜จ ์นฉ ๋ฉ”๋ชจ๋ฆฌ ๋ฑ…ํฌ๋ฅผ ๊ด€๋ฆฌํ•˜๋Š” ์ปดํŒŒ์ผ๋Ÿฌ ๊ธฐ๋ฒ•์„ ๊ณ ์•ˆํ•˜์˜€๋‹ค. ์˜จ ์นฉ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ฑ…ํฌ๋กœ ๊ตฌ์„ฑํ•˜๊ณ  ์—ฐ์‚ฐ ๋„์ค‘์— ์ธํ’‹ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฏธ๋ฆฌ ๋กœ๋“œํ•จ์œผ๋กœ์จ ์—ฐ์‚ฐ ์ง€์—ฐ ์‹œ๊ฐ„์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ๊ณผ ๋ ˆ์ด์–ด์˜ ์•„์›ƒํ’‹์„ ์˜จ ์นฉ ๋ฉ”๋ชจ๋ฆฌ์—์„œ ์žฌ์‚ฌ์šฉํ•˜์—ฌ ์˜คํ”„ ์นฉ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ์„ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์„ ์ด์šฉํ•˜์—ฌ ์„œ๋กœ ๋‹ค๋ฅธ ๋‘ ๊ฐ€์ง€์˜ ๋ชฉ์  ํ•จ์ˆ˜๋ฅผ ๊ฐ€์ง„ ๋‘ ๊ฐ€์ง€ ๊ธฐ๋ฒ•์„ ๊ณ ์•ˆํ•˜์˜€๋‹ค. ๋ชฉ์  ํ•จ์ˆ˜๋Š” ๊ฐ๊ฐ ์˜คํ”„ ์นฉ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ์„ ์ตœ์†Œํ™”ํ•˜๋Š” ๊ฒƒ๊ณผ ์˜คํ”„ ์นฉ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ์œผ๋กœ ์ธํ•œ ํ”„๋กœ์„ธ์„œ๋“ค์˜ ์ฒ˜๋ฆฌ ์ง€์—ฐ์‹œ๊ฐ„์„ ์ค„์ด๋Š” ๊ฒƒ์ด๋‹ค. ์„œ๋กœ ๋‹ค๋ฅธ 5๊ฐœ์˜ ๋”ฅ ๋Ÿฌ๋‹ ๋„คํŠธ์›Œํฌ๋ฅผ ์‚ฌ์ดํด ๋ ˆ๋ฒจ NPU ์‹œ๋ฎฌ๋ ˆ์ดํ„ฐ์—์„œ ์ˆ˜ํ–‰ํ•˜์—ฌ ๋‘ ๋ชฉ์  ํ•จ์ˆ˜์— ๋”ฐ๋ฅธ ์ ˆ์ถฉ (Trade-off) ๊ด€๊ณ„ ๋ฅผ ํ™•์ธํ•˜์˜€๋‹ค. ๋˜ํ•œ ์˜จ ์นฉ ๋ฉ”๋ชจ๋ฆฌ ๋ฑ…ํฌ ๊ด€๋ฆฌ ๊ธฐ๋ฒ•์„ ๋ ˆ์ด์–ด ๊ฐ„ ํ”ผ์ฒ˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ตœ๋Œ€ํ•œ ์žฌ์‚ฌ์šฉํ•˜๋Š” ๋ ˆ์ด์–ด ์œตํ•ฉ ๋ฐฉ๋ฒ•์œผ๋กœ ํ™•์žฅํ•˜์˜€๋‹ค. ๊ธฐ์กด์˜ ์ˆœ์ˆ˜ํ•œ ๋ ˆ์ด์–ด ์œตํ•ฉ ๋ฐฉ๋ฒ•์˜ ๊ฒฝ์šฐ์—๋Š” ์ค‘๋ณต ๊ณ„์‚ฐํ•˜๋Š” ์˜ค๋ฒ„ํ—ค๋“œ์™€ ์ถ”๊ฐ€์ ์ธ ํ•„ํ„ฐ ์›จ์ดํŠธ ๋กœ๋“œ๊ฐ€ ์ƒ๊ธด๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ์กด์˜ ๋ ˆ์ด์–ด ๋ณ„๋กœ ์ฒ˜๋ฆฌํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ์ˆœ์ˆ˜ํ•œ ๋ ˆ์ด์–ด ์œตํ•ฉ ๋ฐฉ๋ฒ• ์‚ฌ์ด์˜ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๋ ˆ์ด์–ด ์œตํ•ฉ ๋ฐฉ๋ฒ•์„ ๊ณ ์•ˆํ•˜์˜€๋‹ค. ๋‘ ์˜จ ์นฉ ๋ฉ”๋ชจ๋ฆฌ ๋ฑ…ํฌ ๊ด€๋ฆฌ ๊ธฐ๋ฒ•์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ๋ ˆ์ด์–ด ์œตํ•ฉ ๋ฐฉ๋ฒ•์ด ๊ธฐ์กด์˜ ๋ ˆ์ด์–ด ๋ณ„ ์ฒ˜๋ฆฌํ•˜๋Š” ๊ธฐ๋ฒ•๊ณผ ์ˆœ์ˆ˜ํ•œ ๋ ˆ์ด์–ด ์œตํ•ฉ ๋ฐฉ๋ฒ•๋ณด๋‹ค ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ž„์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.Executing deep learning algorithms on mobile embedded devices is challenging because embedded devices usually have tight constraints on the computational power, memory size, and energy consumption, while the resource requirements of deep learning algorithms achieving high accuracy continue to increase. To cope with increasing computation complexity, it is common to use an energy-efficient accelerator, such as a mobile GPU or digital signal processor (DSP) array, or to develop a customized neural processor chip called neural processing unit (NPU). In the application domain, many optimization techniques have been proposed to change the application algorithm in order to reduce the computational amount and memory usage by developing new deep learning networks or software optimization techniques that take advantage of the statistical nature of deep learning algorithms. Another approach is hardware-ware software optimization, which finds the performance bottleneck first and then distributes the workload evenly by scheduling the workloads. This dissertation covers hardware-aware software optimization, which is based on a hardware processor or platform. First, we devise a systematic optimization methodology through the experience of participating in the Low Power Image Recognition Challenge (LPIRC) and build a deep learning framework called C-GOOD (C-code Generation Framework for Optimized On-device Deep Learning) based on the devised methodology. For hardware independence, C-GOOD generates a C code that can be compiled for and run on any embedded device. Also, C-GOOD is facilitated with various options for application domain optimization that can be performed according to the devised methodology. By applying the devised methodology to three hardware platforms, NVIDIA Jetson TX2, Odroid XU4, and the Samsung Reconfigurable Processor (SRP), we demonstrate that the devised methodology is independent of the hardware platforms and application domain optimizations can be performed easily with C-GOOD. Recently, embedded devices are equipped with heterogeneous processing elements (PEs), and the need for running multiple deep learning applications concurrently in the embedded systems such as self-driving cars and smartphones is increasing at the same time. In those systems, we devise an end-to-end methodology to schedule deep learning applications onto heterogeneous PEs and implement a scheduling framework according to the methodology. It covers from profiling on real embedded devices to verifying the schedule results on the devices. In this methodology, we use a genetic algorithm (GA)-based scheduling technique for scheduling deep learning applications onto heterogeneous PEs and consider several practical issues in the profile step. Furthermore, we schedule multiple applications with different throughput constraints considering the schedulability of mapped tasks on each processor. After implementing a deep learning inference engine that can utilize heterogeneous PEs using a low-level library of the ARM compute library (ACL), we verify the devised methodology by running two widely used convolution neural networks (CNNs) on a Galaxy S9 smartphones and a Hikey970 board. While the previous optimization methods focus on the computation and processing elements, the performance bottleneck of deep learning accelerators is the communication between off-chip and on-chip memory. Moreover, the off-chip DRAM access volume has a significant effect on the energy consumption of an NPU. To reduce the impact of off-chip DRAM access on the performance and energy of an NPU, we devise compiler techniques for an NPU to manage multi-bank on-chip memory with two different objectives: one is to minimize the off-chip memory access volume, and the other is to minimize the processing delay caused by unhidden DRAM accesses. The main idea is that by organizing on-chip memory into multiple banks, we may hide the off-chip DRAM access delay by prefetching data into unused banks during computation and reduce the off-chip DRAM access volume by storing the output feature map data of each layer to on-chip memory. By running CNN benchmarks on a cycle-level NPU simulator, we demonstrate the trade-off relation between two objectives. The devised multi-bank on-chip memory management (MOMM) techniques are extended to consider layer fusion that aims to reuse feature maps between layers maximally. Since the pure layer fusion technique incurs extra computation overhead and increases DRAM access for filter weights, a hybrid fusion technique is presented between a per-layer processing technique and the pure layer fusion techniques, based on the devised MOMM techniques with two different objectives. Experiment results confirm the superiority of the hybrid fusion technique to the per-layer processing technique and the pure layer fusion technique.Abstract Contents List of Figures List of Tables List of Algorithms Chapter 1 Introduction 1 1.1 Motivation 1 1.2 Contribution 7 1.3 Dissertation Organization 8 Chapter 2 Background 9 2.1 Target Hardware 9 2.1.1 Commodity Hardware Platform 9 2.1.2 Application-specific Hardware Accelerator 10 2.2 Convolutional Neural Network 11 2.2.1 Convolution 11 2.2.2 Optimization Methods for Convolutional Neural Network 11 Chapter 3 Optimization for a Commodity Hardware Platform 14 3.1 Joint Optimization Method of Multiple Objectives 15 3.1.1 Hardware Platform 16 3.1.2 Deep Neural Network and Software Framework 17 3.1.3 Software Optimization Techniques 19 3.2 C-code Generation Framework for Optimized On-device Deep Learning 29 3.2.1 C-GOOD Framework 29 3.2.2 Experiments 36 3.3 Scheduling Deep Learning Applications Onto Heterogeneous Processors 44 3.3.1 Search Space Size 45 3.3.2 Hardware Platform and System Model 45 3.3.3 Proposed Scheduling Framework and Profiling 48 3.3.4 Scheduling a Single Deep Learning Application 53 3.3.5 Scheduling Multiple Deep Learning Applications 61 3.3.6 Verification with Real Hardware Platforms 65 3.4 Related Work 69 3.4.1 Deep Learning Framework 69 3.4.2 Deep Learning Compiler 70 3.4.3 Scheduling Deep Learning Application 70 3.4.4 Scheduling Multiple Applications on Heterogeneous Processors 72 Chapter 4 Optimization for an Application-specific Hardware Accelerator 75 4.1 Multi-Bank On-chip Memory Management Problem 75 4.1.1 Main Idea 75 4.1.2 Assumed Dataflow 76 4.1.3 Multi-bank On-chip Memory Management Problem 79 4.2 Proposed Multi-bank On-chip Memory Management Techniques 83 4.2.1 DRAM-first Storing Policy 84 4.2.2 DRAM Access Minimization Policy (MIN policy) 85 4.2.3 DRAM Access Hiding Policy (HIDE policy) 89 4.2.4 Multiple Path Consideration 91 4.3 Layer Fusion Technique 92 4.3.1 Layer Fusion Technique 92 4.3.2 Hybrid Fusion Technique 94 4.4 Experiments 96 4.4.1 Setup 96 4.4.2 Performance Comparison of MOMM Techniques 98 4.4.3 Multiple Path 100 4.4.4 Design Space Exploration of NPU Architecture 101 4.4.5 Hybrid Fusion Technique 104 4.5 Related Work 106 Chapter 5 Conclusion 108 Bibliography 111 Appendix 120 A Proposed Multi-bank On-chip Memory Management Algorithm 120 A.1 Multi-bank On-chip Memory (MOM) Manager 120 A.2 MIN policy 122 A.3 HIDE policy 124 ์š” ์•ฝ 126Docto
    • โ€ฆ
    corecore