1,043 research outputs found

    DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

    Get PDF
    In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.Comment: Accepted by TPAM

    Report from the MPP Working Group to the NASA Associate Administrator for Space Science and Applications

    Get PDF
    NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era

    An Auto-tuner for Quantizing Deep Neural Networks

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (석사)-- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 컴퓨터곡학뢀, 2019. 2. 이재욱.AI 기반 μ‘μš© ν”„λ‘œκ·Έλž¨ 및 μ„œλΉ„μŠ€μ˜ ν™•μ‚°μœΌλ‘œ 심측 신경망 (DNN)의 효율적인 μ²˜λ¦¬μ— λŒ€ν•œ μˆ˜μš”κ°€ 크게 μ¦κ°€ν•˜κ³  μžˆλ‹€. DNN은 λ§Žμ€ κ³„μ‚°λŸ‰κ³Ό λ©”λͺ¨λ¦¬ 곡간을 ν•„μš”λ‘œ ν•˜κΈ° λ•Œλ¬Έμ— μ»΄ν“¨νŒ… 및 λ©”λͺ¨λ¦¬ 집약적인 κ²ƒμœΌλ‘œ μ•Œλ €μ Έ μžˆλ‹€. μ–‘μžν™”λŠ” 적 은 λΉ„νŠΈ 수둜 숫자λ₯Ό ν‘œν˜„ν•˜μ—¬ μ»΄ν“¨νŒ… μ„±λŠ₯κ³Ό λ©”λͺ¨λ¦¬ 곡간을 λͺ¨λ‘ μ€„μ΄λŠ”λ° 널리 μ‚¬μš©λ˜λŠ” 방법이닀. κ·ΈλŸ¬λ‚˜ 계측별 μ΅œμ ν™”λ‘œ 인해 μ•…ν™”λ˜λŠ” λ‹€μ–‘ν•œ λΉ„νŠΈ 폭을 가진 κ°€λŠ₯ν•œ 숫자 ν‘œν˜„μ˜ 쑰합이 μˆ˜μ²œλ§Œκ°€μ§€κ°€ μžˆλ‹€, λ”°λΌμ„œ DNN에 λŒ€ν•œ 졜적의 숫자 ν‘œν˜„μ„ μ°ΎλŠ” 것은 μ–΄λ €μš΄ μž‘μ—…μ΄λ‹€. 이λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ λ³Έ λ…Όλ¬ΈλŠ” DNN μ–‘μžν™”λ₯Ό μœ„ν•œ μžλ™ νŠœλ„ˆλ₯Ό μ œμ•ˆν•œλ‹€. μ—¬κΈ°μ„œ μžλ™ νŠœλ„ˆλŠ” 정확도 μ œμ•½ 쑰건을 λ§Œμ‘±μ‹œν‚€λ©΄μ„œ μ‚¬μš©μžμ˜ λͺ©μ  ν•¨μˆ˜λ₯Ό μ΅œμ†Œν™”ν•˜μ—¬ 숫자의 μ½€νŒ©νŠΈν•œ ν‘œν˜„ (μˆ«μžμœ ν˜•, λΉ„νŠΈ 및 λ°”μ΄μ–΄μŠ€)을 μ°Ύμ•„ μ€€λ‹€. FPGA ν”Œλž«νΌκ³Ό bit-serial ν•˜λ“œμ›¨μ–΄μ„ μ‘μš©λŒ€μƒμœΌλ‘œ 각각 두 DNN ν”„λ ˆμž„ μ›Œν¬μ—μ„œ 11 개의 DNN λͺ¨λΈμ„ μ‚¬μš©ν•˜μ—¬ 평가 ν–ˆλ‹€. μƒλŒ€ 정확도 μ΅œλŒ€ 7% (1%) 손싀이 ν—ˆμš©λ˜λŠ” 상황에 32 λΉ„νŠΈ floating-pointλ₯Ό μ‚¬μš©ν•˜λŠ” baseline κ³Ό 비ꡐ할 λ•Œμ— λ³€μˆ˜ 크기가 ν‰κ· μ μœΌλ‘œ 8λ°° (7λ°°) κ°μ†Œλ˜κ³ , μ΅œλŒ€λ‘œλŠ” 16λ°°κΉŒμ§€ κ°μ†Œλ˜μ—ˆλ‹€.With the proliferation of AI-based applications and services, there are strong demands for efficient processing of deep neural networks (DNNs). DNNs are known to be both compute- and memory-intensive as they require a tremen- dous amount of computation and large memory space. Quantization is a popu- lar technique to boost efficiency of DNNs by representing a number with fewer bits, hence reducing both computational strength and memory footprint. How- ever, it is a difficult task to find an optimal number representation for a DNN due to a combinatorial explosion in feasible number representations with vary- ing bit widths, which is only exacerbated by layer-wise optimization. To address this, an automatic tuner is proposed in this work for DNN quantization. Here, the auto-tuner can efficiently find a compact representation (type, bit width, and bias) for the number that minimizes the user-supplied objective function, while satisfying the accuracy constraint. The evaluation using eleven DNN models on two DNN frameworks targeting an FPGA platform and a bit-serial hardware, demonstrates over 8Γ— (7Γ—) reduction in the parameter size on aver- age when up to 7% (1%) loss of relative accuracy is tolerable, with a maximum reduction of 16Γ—, compared to the baseline using 32-bit floating-point numbers.Abstract i Contents iv List of Tables v List of Figures vii Chapter 1 Introduction 1 Chapter 2 Motivation 4 2.1 Redundancy in Deep Neural Networks . . . . . . . . . . . . . . 4 2.2 OptimizingNumberRepresentations . . . . . . . . . . . . . . . 6 Chapter 3 Overview 9 Chapter 4 Auto-tuner 12 4.1 ConfiguringtheAuto-tuner .................... 12 4.2 TuningAlgorithm ......................... 13 Chapter 5 Evaluation 19 5.1 Methodology ............................ 19 5.2 Results................................ 20 Chapter 6 Related Work 25 Chapter 7 Conclusion 27 Bibliography 28 ꡭ문초둝 35 Acknowledgements 36Maste

    Energy efficient enabling technologies for semantic video processing on mobile devices

    Get PDF
    Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art
    • …
    corecore