6,710 research outputs found
Efficient Implementation of a Synchronous Parallel Push-Relabel Algorithm
Motivated by the observation that FIFO-based push-relabel algorithms are able
to outperform highest label-based variants on modern, large maximum flow
problem instances, we introduce an efficient implementation of the algorithm
that uses coarse-grained parallelism to avoid the problems of existing parallel
approaches. We demonstrate good relative and absolute speedups of our algorithm
on a set of large graph instances taken from real-world applications. On a
modern 40-core machine, our parallel implementation outperforms existing
sequential implementations by up to a factor of 12 and other parallel
implementations by factors of up to 3
๊ธฐ๊ธฐ ์์์์ ์ฌ์ธต ์ ๊ฒฝ๋ง ๊ฐ์ธํ ๋ฐฉ๋ฒ
ํ์๋
ผ๋ฌธ (์์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ : ๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ, 2019. 2. Egger, Bernhard.There exist several deep neural network (DNN) architectures suitable for embedded inference, however little work has focused on training neural networks on-device.
User customization of DNNs is desirable due to the difficulty of collecting a training set representative of real world scenarios.
Additionally, inter-user variation means that a general model has a limitation on its achievable accuracy.
In this thesis, a DNN architecture that allows for low power on-device user customization is proposed.
This approach is applied to handwritten character recognition of both the Latin and the Korean alphabets.
Experiments show a 3.5-fold reduction of the prediction error after user customization for both alphabets compared to a DNN trained with general data.
This architecture is additionally evaluated using a number of embedded processors demonstrating its practical application.๋ด์ฅํ ๊ธฐ๊ธฐ์์ ์ฌ์ธต ์ ๊ฒฝ๋ง์ ์ถ๋ก ํ ์ ์๋ ์ํคํ
์ฒ๋ค์ ์กด์ฌํ์ง๋ง ๋ด์ฅํ ๊ธฐ๊ธฐ์์ ์ ๊ฒฝ๋ง์ ํ์ตํ๋ ์ฐ๊ตฌ๋ ๋ณ๋ก ์ด๋ค์ง์ง ์์๋ค. ์ค์ ํ๊ฒฝ์ ๋ฐ์ํ๋ ํ์ต์ฉ ๋ฐ์ดํฐ ์งํฉ์ ๋ชจ์ผ๋ ๊ฒ์ด ์ด๋ ต๊ณ ์ฌ์ฉ์๊ฐ์ ๋ค์์ฑ์ผ๋ก ์ธํด ์ผ๋ฐ์ ์ผ๋ก ํ์ต๋ ๋ชจ๋ธ์ด ์ถฉ๋ถํ ์ ํ๋๋ฅผ ๊ฐ์ง๊ธฐ์ ํ๊ณ๊ฐ ์กด์ฌํ๊ธฐ ๋๋ฌธ์ ์ฌ์ฉ์ ๋ง์ถคํ ์ฌ์ธต ์ ๊ฒฝ๋ง์ด ํ์ํ๋ค. ์ด ๋
ผ๋ฌธ์์๋ ๊ธฐ๊ธฐ์์์ ์ ์ ๋ ฅ์ผ๋ก ์ฌ์ฉ์ ๋ง์ถคํ๊ฐ ๊ฐ๋ฅํ ์ฌ์ธต ์ ๊ฒฝ๋ง ์ํคํ
์ฒ๋ฅผ ์ ์ํ๋ค. ์ด๋ฌํ ์ ๊ทผ ๋ฐฉ๋ฒ์ ๋ผํด์ด์ ํ๊ธ์ ํ๊ธฐ์ฒด ๊ธ์ ์ธ์์ ์ ์ฉ๋๋ค. ๋ผํด์ด์ ํ๊ธ์ ์ฌ์ฉ์ ๋ง์ถคํ๋ฅผ ์ ์ฉํ์ฌ ์ผ๋ฐ์ ์ธ ๋ฐ์ดํฐ๋ก ํ์ตํ ์ฌ์ธต ์ ๊ฒฝ๋ง๋ณด๋ค 3.5๋ฐฐ๋ ์์ ์์ธก ์ค๋ฅ์ ๊ฒฐ๊ณผ๋ฅผ ์ป์๋ค. ๋ํ ์ด ์ํคํ
์ฒ์ ์ค์ฉ์ฑ์ ๋ณด์ฌ์ฃผ๊ธฐ ์ํ์ฌ ๋ค์ํ ๋ด์ฅํ ํ๋ก์ธ์์์ ์คํ์ ์งํํ์๋ค.Abstract i
Contents iii
List of Figures vii
List of Tables ix
Chapter 1 Introduction 1
Chapter 2 Motivation 4
Chapter 3 Background 6
3.1 Deep Neural Networks 6
3.1.1 Inference 6
3.1.2 Training 7
3.2 Convolutional Neural Networks 8
3.3 On-Device Acceleration 9
3.3.1 Hardware Accelerators 9
3.3.2 Software Optimization 10
Chapter 4 Methodology 12
4.1 Initialization 13
4.2 On-Device Training 14
Chapter 5 Implementation 16
5.1 Pre-processing 16
5.2 Latin Handwritten Character Recognition 17
5.2.1 Dataset and BIE Selection 17
5.2.2 AE Design 17
5.3 Korean Handwritten Character Recognition 21
5.3.1 Dataset and BIE Selection 21
5.3.2 AE Design 21
Chapter 6 On-Device Acceleration 26
6.1 Architecure Optimizations 27
6.2 Compiler Optimizations 29
Chapter 7 Experimental Setup 30
Chapter 8 Evaluation 33
8.1 Latin Handwritten Character Recognition 33
8.2 Korean Handwritten Character Recognition 38
8.3 On-Device Acceleration 40
Chapter 9 Related Work 44
Chapter 10 Conclusion 47
Bibliography 47
์์ฝ 55
Acknowledgements 56Maste
Low Power Processor Architectures and Contemporary Techniques for Power Optimization โ A Review
The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. ยฉ 2009 ACADEMY PUBLISHER
Inviwo -- A Visualization System with Usage Abstraction Levels
The complexity of today's visualization applications demands specific
visualization systems tailored for the development of these applications.
Frequently, such systems utilize levels of abstraction to improve the
application development process, for instance by providing a data flow network
editor. Unfortunately, these abstractions result in several issues, which need
to be circumvented through an abstraction-centered system design. Often, a high
level of abstraction hides low level details, which makes it difficult to
directly access the underlying computing platform, which would be important to
achieve an optimal performance. Therefore, we propose a layer structure
developed for modern and sustainable visualization systems allowing developers
to interact with all contained abstraction levels. We refer to this interaction
capabilities as usage abstraction levels, since we target application
developers with various levels of experience. We formulate the requirements for
such a system, derive the desired architecture, and present how the concepts
have been exemplary realized within the Inviwo visualization system.
Furthermore, we address several specific challenges that arise during the
realization of such a layered architecture, such as communication between
different computing platforms, performance centered encapsulation, as well as
layer-independent development by supporting cross layer documentation and
debugging capabilities
- โฆ