Search CORE

1,043 research outputs found

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs

Author: Chen Liang-Chieh
Kokkinos Iasonas
Murphy Kevin
Papandreou George
Yuille Alan L.
Publication venue
Publication date: 26/04/2017
Field of study

In this work we address the task of semantic image segmentation with Deep Learning and make three main contributions that are experimentally shown to have substantial practical merit. First, we highlight convolution with upsampled filters, or 'atrous convolution', as a powerful tool in dense prediction tasks. Atrous convolution allows us to explicitly control the resolution at which feature responses are computed within Deep Convolutional Neural Networks. It also allows us to effectively enlarge the field of view of filters to incorporate larger context without increasing the number of parameters or the amount of computation. Second, we propose atrous spatial pyramid pooling (ASPP) to robustly segment objects at multiple scales. ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales. Third, we improve the localization of object boundaries by combining methods from DCNNs and probabilistic graphical models. The commonly deployed combination of max-pooling and downsampling in DCNNs achieves invariance but has a toll on localization accuracy. We overcome this by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF), which is shown both qualitatively and quantitatively to improve localization performance. Our proposed "DeepLab" system sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 79.7% mIOU in the test set, and advances the results on three other datasets: PASCAL-Context, PASCAL-Person-Part, and Cityscapes. All of our code is made publicly available online.Comment: Accepted by TPAM

arXiv.org e-Print Archive

UCL Discovery

Report from the MPP Working Group to the NASA Associate Administrator for Space Science and Applications

Author: Fischer James R.
Grosch Chester
Mcanulty Michael
Odonnell John
Storey Owen
Publication venue
Publication date
Field of study

NASA's Office of Space Science and Applications (OSSA) gave a select group of scientists the opportunity to test and implement their computational algorithms on the Massively Parallel Processor (MPP) located at Goddard Space Flight Center, beginning in late 1985. One year later, the Working Group presented its report, which addressed the following: algorithms, programming languages, architecture, programming environments, the way theory relates, and performance measured. The findings point to a number of demonstrated computational techniques for which the MPP architecture is ideally suited. For example, besides executing much faster on the MPP than on conventional computers, systolic VLSI simulation (where distances are short), lattice simulation, neural network simulation, and image problems were found to be easier to program on the MPP's architecture than on a CYBER 205 or even a VAX. The report also makes technical recommendations covering all aspects of MPP use, and recommendations concerning the future of the MPP and machines based on similar architectures, expansion of the Working Group, and study of the role of future parallel processors for space station, EOS, and the Great Observatories era

NASA Technical Reports Server

An Auto-tuner for Quantizing Deep Neural Networks

Author: QUAN QUAN
Publication venue: 서울대학교 대학원
Publication date: 01/02/2019
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2019. 2. 이재욱.AI 기반 응용 프로그램 및 서비스의 확산으로 심층 신경망 (DNN)의 효율적인 처리에 대한 수요가 크게 증가하고 있다. DNN은 많은 계산량과 메모리 공간을 필요로 하기 때문에 컴퓨팅 및 메모리 집약적인 것으로 알려져 있다. 양자화는 적 은 비트 수로 숫자를 표현하여 컴퓨팅 성능과 메모리 공간을 모두 줄이는데 널리 사용되는 방법이다. 그러나 계층별 최적화로 인해 악화되는 다양한 비트 폭을 가진 가능한 숫자 표현의 조합이 수천만가지가 있다, 따라서 DNN에 대한 최적의 숫자 표현을 찾는 것은 어려운 작업이다. 이를 해결하기 위해 본 논문는 DNN 양자화를 위한 자동 튜너를 제안한다. 여기서 자동 튜너는 정확도 제약 조건을 만족시키면서 사용자의 목적 함수를 최소화하여 숫자의 콤팩트한 표현 (숫자유형, 비트 및 바이어스)을 찾아 준다. FPGA 플랫폼과 bit-serial 하드웨어을 응용대상으로 각각 두 DNN 프레임 워크에서 11 개의 DNN 모델을 사용하여 평가 했다. 상대 정확도 최대 7% (1%) 손실이 허용되는 상황에 32 비트 floating-point를 사용하는 baseline 과 비교할 때에 변수 크기가 평균적으로 8배 (7배) 감소되고, 최대로는 16배까지 감소되었다.With the proliferation of AI-based applications and services, there are strong demands for efficient processing of deep neural networks (DNNs). DNNs are known to be both compute- and memory-intensive as they require a tremen- dous amount of computation and large memory space. Quantization is a popu- lar technique to boost efficiency of DNNs by representing a number with fewer bits, hence reducing both computational strength and memory footprint. How- ever, it is a difficult task to find an optimal number representation for a DNN due to a combinatorial explosion in feasible number representations with vary- ing bit widths, which is only exacerbated by layer-wise optimization. To address this, an automatic tuner is proposed in this work for DNN quantization. Here, the auto-tuner can efficiently find a compact representation (type, bit width, and bias) for the number that minimizes the user-supplied objective function, while satisfying the accuracy constraint. The evaluation using eleven DNN models on two DNN frameworks targeting an FPGA platform and a bit-serial hardware, demonstrates over 8× (7×) reduction in the parameter size on aver- age when up to 7% (1%) loss of relative accuracy is tolerable, with a maximum reduction of 16×, compared to the baseline using 32-bit floating-point numbers.Abstract i Contents iv List of Tables v List of Figures vii Chapter 1 Introduction 1 Chapter 2 Motivation 4 2.1 Redundancy in Deep Neural Networks . . . . . . . . . . . . . . 4 2.2 OptimizingNumberRepresentations . . . . . . . . . . . . . . . 6 Chapter 3 Overview 9 Chapter 4 Auto-tuner 12 4.1 ConfiguringtheAuto-tuner .................... 12 4.2 TuningAlgorithm ......................... 13 Chapter 5 Evaluation 19 5.1 Methodology ............................ 19 5.2 Results................................ 20 Chapter 6 Related Work 25 Chapter 7 Conclusion 27 Bibliography 28 국문초록 35 Acknowledgements 36Maste

SNU Open Repository and Archive

DEVELOPMENT AND IMPLEMENTATION OF MACHINE LEARNING METHODS FOR THE IIF IMAGES ANALYSIS

Author: TAORMINA Vincenzo
Publication venue: place:Palermo
Publication date: 01/01/2021
Field of study

Archivio istituzionale della ricerca - Università di Palermo

Energy efficient enabling technologies for semantic video processing on mobile devices

Author: Larkin Daniel
Publication venue: Dublin City University. Centre for Digital Video Processing (CDVP)
Publication date: 01/11/2008
Field of study

Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art

DCU Online Research Access Service