269 research outputs found
Accurate deep neural network inference using computational phase-change memory
In-memory computing is a promising non-von Neumann approach for making
energy-efficient deep learning inference hardware. Crossbar arrays of resistive
memory devices can be used to encode the network weights and perform efficient
analog matrix-vector multiplications without intermediate movements of data.
However, due to device variability and noise, the network needs to be trained
in a specific way so that transferring the digitally trained weights to the
analog resistive memory devices will not result in significant loss of
accuracy. Here, we introduce a methodology to train ResNet-type convolutional
neural networks that results in no appreciable accuracy loss when transferring
weights to in-memory computing hardware based on phase-change memory (PCM). We
also propose a compensation technique that exploits the batch normalization
parameters to improve the accuracy retention over time. We achieve a
classification accuracy of 93.7% on the CIFAR-10 dataset and a top-1 accuracy
on the ImageNet benchmark of 71.6% after mapping the trained weights to PCM.
Our hardware results on CIFAR-10 with ResNet-32 demonstrate an accuracy above
93.5% retained over a one day period, where each of the 361,722 synaptic
weights of the network is programmed on just two PCM devices organized in a
differential configuration.Comment: This is a pre-print of an article accepted for publication in Nature
Communication
SKT AIX 가속기의 유연한 실행을 위한 신경망 해석
학위논문 (석사) -- 서울대학교 대학원 : 공과대학 컴퓨터공학부, 2020. 8. Bernhard Egger.컴퓨터 비전 문제는 합성곱 신경망을 통해 가장 잘 해결된다. 이러한 리소스 집약적 알고리즘은 특히 애플리케이션이 수행 시간과 전력 효율을 중요시할 때, 특수 가속기를 사용하여 수행되어야 한다. 이와 같은 가속기들은 성능 향상을 위해 일반성을 저하시켜야 하는 한계를 가진다. 이 논문에서 우리의 대상 하드웨어는 다크넷 합성곱 신경망의 실행을 위해 설계된 SKT의 AIX 가속기이다. 이 논문을 통해 제안하는 방법을 통해 AIX에서 ONNX 네트워크를 유연하게 실행할 수 있으며, 이는 가속기의 지원을 다양한 프레임워크로 확장한다. 우리는 신경망의 그래프 구조를 가속기가 가지는 다른 형태의 구조로 매핑하기 위해 취해야 할 단계와 지원되지 않는 작업을 포함하는 신경망의 부분 가속을 어떻게 달성하는지 살펴볼 것이다. 본 프로젝트에서 제안하는 방법을 통해 AIX에서 ONNX 네트워크를 실행하면 ONNXRuntime에서 수행되는 기본 실행과 매우 가까운 결과를 얻을 수 있다. 실제로, 우리의 결과는 CIFAR10 샘플에서 98%의 top-1 일치율을 보인다.Computer vision problems are best addressed by convolutional neural networks (CNNs). These resource intensive algorithms require the use of specialised accelerators when the application is particularly time critical or when power efficiency plays an important role. Accelerators' performance gains are reached at the cost of generality. In this thesis, our target hardware is SKT's AIX accelerator designed for the execution of darknet CNNs. The presented work enables the flexible execution of ONNX networks on AIX extending the accelerator's support to an additional framework. We will see the steps to take to map a very different graph structure into that of the accelerator and how we still achieve partial acceleration of networks containing unsupported operations. Running ONNX networks on AIX through our project yields very close results to native execution on ONNXRuntime. Indeed, we reach a 98% top-1 prediction match on a CIFAR10 sample.Chapter 1 Introduction 1
Chapter 2 Background 3
2.1 Convolutional Neural Networks for Computer Vision 3
2.1.1 Feature Maps 4
2.1.2 Convolutions 4
2.1.3 Batch Normalisation 6
2.1.4 Activation Functions 7
2.1.5 Pooling and Subsampling 7
2.2 Open Neural Network Exchange 8
2.3 SKTs Custom Accelerator: AIX 9
Chapter 3 Design 12
3.1 Overview 12
3.2 Operator Support and Graph Division 13
Chapter 4 Implementation 15
4.1 Project Architecture 15
4.2 Graph Division: Ready Queue Exploration 17
4.3 Object Oriented Approach for Common Subgraph Interface 19
4.4 Handling the Data Flow 22
4.5 Working with a Custom Tool under Development 22
Chapter 5 Test Setup 23
5.1 CIFAR10 23
5.2 Prototyping and Converting Test Networks 24
Chapter 6 Results 26
Chapter 7 Conclusion 29
Bibliography 31
요약 34Maste
A General Method for Calibrating Stochastic Radio Channel Models with Kernels
Publisher Copyright: CCBYCalibrating stochastic radio channel models to new measurement data is challenging when the likelihood function is intractable. The standard approach to this problem involves sophisticated algorithms for extraction and clustering of multipath components, following which, point estimates of the model parameters can be obtained using specialized estimators. We propose a likelihood-free calibration method using approximate Bayesian computation. The method is based on the maximum mean discrepancy, which is a notion of distance between probability distributions. Our method not only by-passes the need to implement any high-resolution or clustering algorithm, but is also automatic in that it does not require any additional input or manual pre-processing from the user. It also has the advantage of returning an entire posterior distribution on the value of the parameters, rather than a simple point estimate. We evaluate the performance of the proposed method by fitting two different stochastic channel models, namely the Saleh-Valenzuela model and the propagation graph model, to both simulated and measured data. The proposed method is able to estimate the parameters of both the models accurately in simulations, as well as when applied to 60 GHz indoor measurement data.Peer reviewe
Probabilistic Safety Regions Via Finite Families of Scalable Classifiers
Supervised classification recognizes patterns in the data to separate classes
of behaviours. Canonical solutions contain misclassification errors that are
intrinsic to the numerical approximating nature of machine learning. The data
analyst may minimize the classification error on a class at the expense of
increasing the error of the other classes. The error control of such a design
phase is often done in a heuristic manner. In this context, it is key to
develop theoretical foundations capable of providing probabilistic
certifications to the obtained classifiers. In this perspective, we introduce
the concept of probabilistic safety region to describe a subset of the input
space in which the number of misclassified instances is probabilistically
controlled. The notion of scalable classifiers is then exploited to link the
tuning of machine learning with error control. Several tests corroborate the
approach. They are provided through synthetic data in order to highlight all
the steps involved, as well as through a smart mobility application.Comment: 13 pages, 4 figures, 1 table, submitted to IEEE TNNL
Nonlinear wavefront reconstruction with convolutional neural networks for Fourier-based wavefront sensors
Fourier-based wavefront sensors, such as the Pyramid Wavefront Sensor (PWFS),
are the current preference for high contrast imaging due to their high
sensitivity. However, these wavefront sensors have intrinsic nonlinearities
that constrain the range where conventional linear reconstruction methods can
be used to accurately estimate the incoming wavefront aberrations. We propose
to use Convolutional Neural Networks (CNNs) for the nonlinear reconstruction of
the wavefront sensor measurements. It is demonstrated that a CNN can be used to
accurately reconstruct the nonlinearities in both simulations and a lab
implementation. We show that solely using a CNN for the reconstruction leads to
suboptimal closed loop performance under simulated atmospheric turbulence.
However, it is demonstrated that using a CNN to estimate the nonlinear error
term on top of a linear model results in an improved effective dynamic range of
a simulated adaptive optics system. The larger effective dynamic range results
in a higher Strehl ratio under conditions where the nonlinear error is
relevant. This will allow the current and future generation of large
astronomical telescopes to work in a wider range of atmospheric conditions and
therefore reduce costly downtime of such facilities.Comment: 14 pages, 7 figure
ePointDA: An End-to-End Simulation-to-Real Domain Adaptation Framework for LiDAR Point Cloud Segmentation
Due to its robust and precise distance measurements, LiDAR plays an important
role in scene understanding for autonomous driving. Training deep neural
networks (DNNs) on LiDAR data requires large-scale point-wise annotations,
which are time-consuming and expensive to obtain. Instead, simulation-to-real
domain adaptation (SRDA) trains a DNN using unlimited synthetic data with
automatically generated labels and transfers the learned model to real
scenarios. Existing SRDA methods for LiDAR point cloud segmentation mainly
employ a multi-stage pipeline and focus on feature-level alignment. They
require prior knowledge of real-world statistics and ignore the pixel-level
dropout noise gap and the spatial feature gap between different domains. In
this paper, we propose a novel end-to-end framework, named ePointDA, to address
the above issues. Specifically, ePointDA consists of three modules:
self-supervised dropout noise rendering, statistics-invariant and
spatially-adaptive feature alignment, and transferable segmentation learning.
The joint optimization enables ePointDA to bridge the domain shift at the
pixel-level by explicitly rendering dropout noise for synthetic LiDAR and at
the feature-level by spatially aligning the features between different domains,
without requiring the real-world statistics. Extensive experiments adapting
from synthetic GTA-LiDAR to real KITTI and SemanticKITTI demonstrate the
superiority of ePointDA for LiDAR point cloud segmentation.Comment: Accepted by AAAI 202
On quantifying the value of simulation for training and evaluating robotic agents
Un problème récurrent dans le domaine de la robotique est la difficulté à reproduire les résultats et valider les affirmations faites par les scientifiques. Les expériences conduites en laboratoire donnent fréquemment des résultats propres à l'environnement dans lequel elles ont été effectuées, rendant la tâche de les reproduire et de les valider ardues et coûteuses. Pour cette raison, il est difficile de comparer la performance et la robustesse de différents contrôleurs robotiques. Les environnements substituts à faibles coûts sont populaires, mais introduisent une réduction de performance lorsque l'environnement cible est enfin utilisé. Ce mémoire présente nos travaux sur l'amélioration des références et de la comparaison d'algorithmes (``Benchmarking'') en robotique, notamment dans le domaine de la conduite autonome.
Nous présentons une nouvelle platforme, les Autolabs Duckietown, qui permet aux chercheurs d'évaluer des algorithmes de conduite autonome sur des tâches, du matériel et un environnement standardisé à faible coût. La plateforme offre également un environnement virtuel afin d'avoir facilement accès à une quantité illimitée de données annotées. Nous utilisons la plateforme pour analyser les différences entre la simulation et la réalité en ce qui concerne la prédictivité de la simulation ainsi que la qualité des images générées. Nous fournissons deux métriques pour quantifier l'utilité d'une simulation et nous démontrons de quelles façons elles peuvent être utilisées afin d'optimiser un environnement proxy.A common problem in robotics is reproducing results and claims made by researchers. The experiments done in robotics laboratories typically yield results that are specific to a complex setup and difficult or costly to reproduce and validate in other contexts. For this reason, it is arduous to compare the performance and robustness of various robotic controllers. Low-cost reproductions of physical environments are popular but induce a performance reduction when transferred to the target domain. This thesis present the results of our work toward improving benchmarking in robotics, specifically for autonomous driving.
We build a new platform, the Duckietown Autolabs, which allow researchers to evaluate autonomous driving algorithms in a standardized framework on low-cost hardware. The platform offers a simulated environment for easy access to annotated data and parallel evaluation of driving solutions in customizable environments. We use the platform to analyze the discrepancy between simulation and reality in the case of predictivity and quality of data generated. We supply two metrics to quantify the usefulness of a simulation and demonstrate how they can be used to optimize the value of a proxy environment
- …