6,710 research outputs found

    Efficient Implementation of a Synchronous Parallel Push-Relabel Algorithm

    Full text link
    Motivated by the observation that FIFO-based push-relabel algorithms are able to outperform highest label-based variants on modern, large maximum flow problem instances, we introduce an efficient implementation of the algorithm that uses coarse-grained parallelism to avoid the problems of existing parallel approaches. We demonstrate good relative and absolute speedups of our algorithm on a set of large graph instances taken from real-world applications. On a modern 40-core machine, our parallel implementation outperforms existing sequential implementations by up to a factor of 12 and other parallel implementations by factors of up to 3

    ๊ธฐ๊ธฐ ์ƒ์—์„œ์˜ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง ๊ฐœ์ธํ™” ๋ฐฉ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2019. 2. Egger, Bernhard.There exist several deep neural network (DNN) architectures suitable for embedded inference, however little work has focused on training neural networks on-device. User customization of DNNs is desirable due to the difficulty of collecting a training set representative of real world scenarios. Additionally, inter-user variation means that a general model has a limitation on its achievable accuracy. In this thesis, a DNN architecture that allows for low power on-device user customization is proposed. This approach is applied to handwritten character recognition of both the Latin and the Korean alphabets. Experiments show a 3.5-fold reduction of the prediction error after user customization for both alphabets compared to a DNN trained with general data. This architecture is additionally evaluated using a number of embedded processors demonstrating its practical application.๋‚ด์žฅํ˜• ๊ธฐ๊ธฐ์—์„œ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์„ ์ถ”๋ก ํ•  ์ˆ˜ ์žˆ๋Š” ์•„ํ‚คํ…์ฒ˜๋“ค์€ ์กด์žฌํ•˜์ง€๋งŒ ๋‚ด์žฅํ˜• ๊ธฐ๊ธฐ์—์„œ ์‹ ๊ฒฝ๋ง์„ ํ•™์Šตํ•˜๋Š” ์—ฐ๊ตฌ๋Š” ๋ณ„๋กœ ์ด๋ค„์ง€์ง€ ์•Š์•˜๋‹ค. ์‹ค์ œ ํ™˜๊ฒฝ์„ ๋ฐ˜์˜ํ•˜๋Š” ํ•™์Šต์šฉ ๋ฐ์ดํ„ฐ ์ง‘ํ•ฉ์„ ๋ชจ์œผ๋Š” ๊ฒƒ์ด ์–ด๋ ต๊ณ  ์‚ฌ์šฉ์ž๊ฐ„์˜ ๋‹ค์–‘์„ฑ์œผ๋กœ ์ธํ•ด ์ผ๋ฐ˜์ ์œผ๋กœ ํ•™์Šต๋œ ๋ชจ๋ธ์ด ์ถฉ๋ถ„ํ•œ ์ •ํ™•๋„๋ฅผ ๊ฐ€์ง€๊ธฐ์—” ํ•œ๊ณ„๊ฐ€ ์กด์žฌํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์šฉ์ž ๋งž์ถคํ˜• ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์ด ํ•„์š”ํ•˜๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ๊ธฐ๊ธฐ์ƒ์—์„œ ์ €์ „๋ ฅ์œผ๋กœ ์‚ฌ์šฉ์ž ๋งž์ถคํ™”๊ฐ€ ๊ฐ€๋Šฅํ•œ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์ ‘๊ทผ ๋ฐฉ๋ฒ•์€ ๋ผํ‹ด์–ด์™€ ํ•œ๊ธ€์˜ ํ•„๊ธฐ์ฒด ๊ธ€์ž ์ธ์‹์— ์ ์šฉ๋œ๋‹ค. ๋ผํ‹ด์–ด์™€ ํ•œ๊ธ€์— ์‚ฌ์šฉ์ž ๋งž์ถคํ™”๋ฅผ ์ ์šฉํ•˜์—ฌ ์ผ๋ฐ˜์ ์ธ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šตํ•œ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง๋ณด๋‹ค 3.5๋ฐฐ๋‚˜ ์ž‘์€ ์˜ˆ์ธก ์˜ค๋ฅ˜์˜ ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ๋‹ค. ๋˜ํ•œ ์ด ์•„ํ‚คํ…์ฒ˜์˜ ์‹ค์šฉ์„ฑ์„ ๋ณด์—ฌ์ฃผ๊ธฐ ์œ„ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๋‚ด์žฅํ˜• ํ”„๋กœ์„ธ์„œ์—์„œ ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜์˜€๋‹ค.Abstract i Contents iii List of Figures vii List of Tables ix Chapter 1 Introduction 1 Chapter 2 Motivation 4 Chapter 3 Background 6 3.1 Deep Neural Networks 6 3.1.1 Inference 6 3.1.2 Training 7 3.2 Convolutional Neural Networks 8 3.3 On-Device Acceleration 9 3.3.1 Hardware Accelerators 9 3.3.2 Software Optimization 10 Chapter 4 Methodology 12 4.1 Initialization 13 4.2 On-Device Training 14 Chapter 5 Implementation 16 5.1 Pre-processing 16 5.2 Latin Handwritten Character Recognition 17 5.2.1 Dataset and BIE Selection 17 5.2.2 AE Design 17 5.3 Korean Handwritten Character Recognition 21 5.3.1 Dataset and BIE Selection 21 5.3.2 AE Design 21 Chapter 6 On-Device Acceleration 26 6.1 Architecure Optimizations 27 6.2 Compiler Optimizations 29 Chapter 7 Experimental Setup 30 Chapter 8 Evaluation 33 8.1 Latin Handwritten Character Recognition 33 8.2 Korean Handwritten Character Recognition 38 8.3 On-Device Acceleration 40 Chapter 9 Related Work 44 Chapter 10 Conclusion 47 Bibliography 47 ์š”์•ฝ 55 Acknowledgements 56Maste

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization โ€“ A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. ยฉ 2009 ACADEMY PUBLISHER

    Inviwo -- A Visualization System with Usage Abstraction Levels

    Full text link
    The complexity of today's visualization applications demands specific visualization systems tailored for the development of these applications. Frequently, such systems utilize levels of abstraction to improve the application development process, for instance by providing a data flow network editor. Unfortunately, these abstractions result in several issues, which need to be circumvented through an abstraction-centered system design. Often, a high level of abstraction hides low level details, which makes it difficult to directly access the underlying computing platform, which would be important to achieve an optimal performance. Therefore, we propose a layer structure developed for modern and sustainable visualization systems allowing developers to interact with all contained abstraction levels. We refer to this interaction capabilities as usage abstraction levels, since we target application developers with various levels of experience. We formulate the requirements for such a system, derive the desired architecture, and present how the concepts have been exemplary realized within the Inviwo visualization system. Furthermore, we address several specific challenges that arise during the realization of such a layered architecture, such as communication between different computing platforms, performance centered encapsulation, as well as layer-independent development by supporting cross layer documentation and debugging capabilities
    • โ€ฆ
    corecore