1,043 research outputs found
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
In this work we address the task of semantic image segmentation with Deep
Learning and make three main contributions that are experimentally shown to
have substantial practical merit. First, we highlight convolution with
upsampled filters, or 'atrous convolution', as a powerful tool in dense
prediction tasks. Atrous convolution allows us to explicitly control the
resolution at which feature responses are computed within Deep Convolutional
Neural Networks. It also allows us to effectively enlarge the field of view of
filters to incorporate larger context without increasing the number of
parameters or the amount of computation. Second, we propose atrous spatial
pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP
probes an incoming convolutional feature layer with filters at multiple
sampling rates and effective fields-of-views, thus capturing objects as well as
image context at multiple scales. Third, we improve the localization of object
boundaries by combining methods from DCNNs and probabilistic graphical models.
The commonly deployed combination of max-pooling and downsampling in DCNNs
achieves invariance but has a toll on localization accuracy. We overcome this
by combining the responses at the final DCNN layer with a fully connected
Conditional Random Field (CRF), which is shown both qualitatively and
quantitatively to improve localization performance. Our proposed "DeepLab"
system sets the new state-of-art at the PASCAL VOC-2012 semantic image
segmentation task, reaching 79.7% mIOU in the test set, and advances the
results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and
Cityscapes. All of our code is made publicly available online.Comment: Accepted by TPAM
Report from the MPP Working Group to the NASA Associate Administrator for Space Science and Applications
NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era
An Auto-tuner for Quantizing Deep Neural Networks
νμλ
Όλ¬Έ (μμ¬)-- μμΈλνκ΅ λνμ : 곡과λν μ»΄ν¨ν°κ³΅νλΆ, 2019. 2. μ΄μ¬μ±.AI κΈ°λ° μμ© νλ‘κ·Έλ¨ λ° μλΉμ€μ νμ°μΌλ‘ μ¬μΈ΅ μ κ²½λ§ (DNN)μ ν¨μ¨μ μΈ μ²λ¦¬μ λν μμκ° ν¬κ² μ¦κ°νκ³ μλ€. DNNμ λ§μ κ³μ°λκ³Ό λ©λͺ¨λ¦¬ 곡κ°μ νμλ‘ νκΈ° λλ¬Έμ μ»΄ν¨ν
λ° λ©λͺ¨λ¦¬ μ§μ½μ μΈ κ²μΌλ‘ μλ €μ Έ μλ€. μμνλ μ μ λΉνΈ μλ‘ μ«μλ₯Ό νννμ¬ μ»΄ν¨ν
μ±λ₯κ³Ό λ©λͺ¨λ¦¬ 곡κ°μ λͺ¨λ μ€μ΄λλ° λ리 μ¬μ©λλ λ°©λ²μ΄λ€.
κ·Έλ¬λ κ³μΈ΅λ³ μ΅μ νλ‘ μΈν΄ μ
νλλ λ€μν λΉνΈ νμ κ°μ§ κ°λ₯ν μ«μ ννμ μ‘°ν©μ΄ μμ²λ§κ°μ§κ° μλ€, λ°λΌμ DNNμ λν μ΅μ μ μ«μ ννμ μ°Ύλ κ²μ μ΄λ €μ΄ μμ
μ΄λ€. μ΄λ₯Ό ν΄κ²°νκΈ° μν΄ λ³Έ λ
Όλ¬Έλ DNN μμνλ₯Ό μν μλ νλλ₯Ό μ μνλ€. μ¬κΈ°μ μλ νλλ μ νλ μ μ½ μ‘°κ±΄μ λ§μ‘±μν€λ©΄μ μ¬μ©μμ λͺ©μ ν¨μλ₯Ό μ΅μννμ¬ μ«μμ μ½€ν©νΈν νν (μ«μμ ν, λΉνΈ λ° λ°μ΄μ΄μ€)μ μ°Ύμ μ€λ€. FPGA νλ«νΌκ³Ό bit-serial νλμ¨μ΄μ μμ©λμμΌλ‘ κ°κ° λ DNN νλ μ μν¬μμ 11 κ°μ DNN λͺ¨λΈμ μ¬μ©νμ¬ νκ° νλ€. μλ μ νλ μ΅λ 7% (1%) μμ€μ΄ νμ©λλ μν©μ 32 λΉνΈ floating-pointλ₯Ό μ¬μ©νλ baseline κ³Ό λΉκ΅ν λμ λ³μ ν¬κΈ°κ° νκ· μ μΌλ‘ 8λ°° (7λ°°) κ°μλκ³ , μ΅λλ‘λ 16λ°°κΉμ§ κ°μλμλ€.With the proliferation of AI-based applications and services, there are strong demands for efficient processing of deep neural networks (DNNs). DNNs are known to be both compute- and memory-intensive as they require a tremen- dous amount of computation and large memory space. Quantization is a popu- lar technique to boost efficiency of DNNs by representing a number with fewer bits, hence reducing both computational strength and memory footprint. How- ever, it is a difficult task to find an optimal number representation for a DNN due to a combinatorial explosion in feasible number representations with vary- ing bit widths, which is only exacerbated by layer-wise optimization. To address this, an automatic tuner is proposed in this work for DNN quantization. Here, the auto-tuner can efficiently find a compact representation (type, bit width, and bias) for the number that minimizes the user-supplied objective function, while satisfying the accuracy constraint. The evaluation using eleven DNN models on two DNN frameworks targeting an FPGA platform and a bit-serial hardware, demonstrates over 8Γ (7Γ) reduction in the parameter size on aver- age when up to 7% (1%) loss of relative accuracy is tolerable, with a maximum reduction of 16Γ, compared to the baseline using 32-bit floating-point numbers.Abstract i
Contents iv
List of Tables v
List of Figures vii
Chapter 1 Introduction 1
Chapter 2 Motivation 4
2.1 Redundancy in Deep Neural Networks . . . . . . . . . . . . . . 4
2.2 OptimizingNumberRepresentations . . . . . . . . . . . . . . . 6
Chapter 3 Overview 9
Chapter 4 Auto-tuner 12
4.1 ConfiguringtheAuto-tuner .................... 12
4.2 TuningAlgorithm ......................... 13
Chapter 5 Evaluation 19
5.1 Methodology ............................ 19
5.2 Results................................ 20
Chapter 6 Related Work 25
Chapter 7 Conclusion 27
Bibliography 28
κ΅λ¬Έμ΄λ‘ 35
Acknowledgements 36Maste
Energy efficient enabling technologies for semantic video processing on mobile devices
Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This
thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the
human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and
reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing
any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art
- β¦