349,257 research outputs found

    Randomness in neural networks: an overview

    Get PDF
    Neural networks, as powerful tools for data mining and knowledge engineering, can learn from data to build feature-based classifiers and nonlinear predictive models. Training neural networks involves the optimization of nonconvex objective functions, and usually, the learning process is costly and infeasible for applications associated with data streams. A possible, albeit counterintuitive, alternative is to randomly assign a subset of the networksโ€™ weights so that the resulting optimization task can be formulated as a linear least-squares problem. This methodology can be applied to both feedforward and recurrent networks, and similar techniques can be used to approximate kernel functions. Many experimental results indicate that such randomized models can reach sound performance compared to fully adaptable ones, with a number of favorable benefits, including (1) simplicity of implementation, (2) faster learning with less intervention from human beings, and (3) possibility of leveraging overall linear regression and classification algorithms (e.g., โ„“ 1 norm minimization for obtaining sparse formulations). This class of neural networks attractive and valuable to the data mining community, particularly for handling large scale data mining in real-time. However, the literature in the field is extremely vast and fragmented, with many results being reintroduced multiple times under different names. This overview aims to provide a self-contained, uniform introduction to the different ways in which randomization can be applied to the design of neural networks and kernel functions. A clear exposition of the basic framework underlying all these approaches helps to clarify innovative lines of research, open problems, and most importantly, foster the exchanges of well-known results throughout different communities. WIREs Data Mining Knowl Discov 2017, 7:e1200. doi: 10.1002/widm.1200

    Teacher-Student Architecture for Knowledge Distillation: A Survey

    Full text link
    Although Deep neural networks (DNNs) have shown a strong capacity to solve large-scale problems in many areas, such DNNs are hard to be deployed in real-world systems due to their voluminous parameters. To tackle this issue, Teacher-Student architectures were proposed, where simple student networks with a few parameters can achieve comparable performance to deep teacher networks with many parameters. Recently, Teacher-Student architectures have been effectively and widely embraced on various knowledge distillation (KD) objectives, including knowledge compression, knowledge expansion, knowledge adaptation, and knowledge enhancement. With the help of Teacher-Student architectures, current studies are able to achieve multiple distillation objectives through lightweight and generalized student networks. Different from existing KD surveys that primarily focus on knowledge compression, this survey first explores Teacher-Student architectures across multiple distillation objectives. This survey presents an introduction to various knowledge representations and their corresponding optimization objectives. Additionally, we provide a systematic overview of Teacher-Student architectures with representative learning algorithms and effective distillation schemes. This survey also summarizes recent applications of Teacher-Student architectures across multiple purposes, including classification, recognition, generation, ranking, and regression. Lastly, potential research directions in KD are investigated, focusing on architecture design, knowledge quality, and theoretical studies of regression-based learning, respectively. Through this comprehensive survey, industry practitioners and the academic community can gain valuable insights and guidelines for effectively designing, learning, and applying Teacher-Student architectures on various distillation objectives.Comment: 20 pages. arXiv admin note: substantial text overlap with arXiv:2210.1733

    Sonar discrimination of cylinders from different angles using neural networks neural networks

    Get PDF
    This paper describes an underwater object discrimination system applied to recognize cylinders of various compositions from different angles. The system is based on a new combination of simulated dolphin clicks, simulated auditory filters and artificial neural networks. The model demonstrates its potential on real data collected from four different cylinders in an environment where the angles were controlled in order to evaluate the models capabilities to recognize cylinders independent of angles. 1. INTRODUCTION Dolphins possess an excellent sonar system for solving underwater target discrimination and recognition tasks in shallow water (see e.g., [2]). This has inspired research in new sonar systems based on biological knowledge, i.e. modeling the dolphins discrimination capabilities (see e.g., [4] and [5]). The fact that the inner ear of the dolphin has many similarities with the human inner ear makes it tempting to use knowledge from simulations of the human auditory system when t..

    A Comprehensive Survey on Knowledge Distillation of Diffusion Models

    Full text link
    Diffusion Models (DMs), also referred to as score-based diffusion models, utilize neural networks to specify score functions. Unlike most other probabilistic models, DMs directly model the score functions, which makes them more flexible to parametrize and potentially highly expressive for probabilistic modeling. DMs can learn fine-grained knowledge, i.e., marginal score functions, of the underlying distribution. Therefore, a crucial research direction is to explore how to distill the knowledge of DMs and fully utilize their potential. Our objective is to provide a comprehensible overview of the modern approaches for distilling DMs, starting with an introduction to DMs and a discussion of the challenges involved in distilling them into neural vector fields. We also provide an overview of the existing works on distilling DMs into both stochastic and deterministic implicit generators. Finally, we review the accelerated diffusion sampling algorithms as a training-free method for distillation. Our tutorial is intended for individuals with a basic understanding of generative models who wish to apply DM's distillation or embark on a research project in this field

    A survey of uncertainty in deep neural networks

    Get PDF
    Over the last decade, neural networks have reached almost every field of science and become a crucial part of various real world applications. Due to the increasing spread, confidence in neural network predictions has become more and more important. However, basic neural networks do not deliver certainty estimates or suffer from over- or under-confidence, i.e. are badly calibrated. To overcome this, many researchers have been working on understanding and quantifying uncertainty in a neural network's prediction. As a result, different types and sources of uncertainty have been identified and various approaches to measure and quantify uncertainty in neural networks have been proposed. This work gives a comprehensive overview of uncertainty estimation in neural networks, reviews recent advances in the field, highlights current challenges, and identifies potential research opportunities. It is intended to give anyone interested in uncertainty estimation in neural networks a broad overview and introduction, without presupposing prior knowledge in this field. For that, a comprehensive introduction to the most crucial sources of uncertainty is given and their separation into reducible model uncertainty and irreducible data uncertainty is presented. The modeling of these uncertainties based on deterministic neural networks, Bayesian neural networks (BNNs), ensemble of neural networks, and test-time data augmentation approaches is introduced and different branches of these fields as well as the latest developments are discussed. For a practical application, we discuss different measures of uncertainty, approaches for calibrating neural networks, and give an overview of existing baselines and available implementations. Different examples from the wide spectrum of challenges in the fields of medical image analysis, robotics, and earth observation give an idea of the needs and challenges regarding uncertainties in the practical applications of neural networks. Additionally, the practical limitations of uncertainty quantification methods in neural networks for mission- and safety-critical real world applications are discussed and an outlook on the next steps towards a broader usage of such methods is given

    ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์˜ ํ–ฅ์ƒ์„ ์œ„ํ•œ ๊นŠ์€ ์‹ ๊ฒฝ๋ง ์–‘์žํ™”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ์„ฑ์›์šฉ.์ตœ๊ทผ ๊นŠ์€ ์‹ ๊ฒฝ๋ง(deep neural network, DNN)์€ ์˜์ƒ, ์Œ์„ฑ ์ธ์‹ ๋ฐ ํ•ฉ์„ฑ ๋“ฑ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ณ  ์žˆ๋‹ค. ํ•˜์ง€๋งŒ ๋Œ€๋ถ€๋ถ„์˜ ์ธ๊ณต์‹ ๊ฒฝ๋ง์€ ๋งŽ์€ ๊ฐ€์ค‘์น˜(parameter) ์ˆ˜์™€ ๊ณ„์‚ฐ๋Ÿ‰์„ ์š”๊ตฌํ•˜์—ฌ ์ž„๋ฒ ๋””๋“œ ์‹œ์Šคํ…œ์—์„œ์˜ ๋™์ž‘์„ ๋ฐฉํ•ดํ•œ๋‹ค. ์ธ๊ณต์‹ ๊ฒฝ๋ง์€ ๋‚ฎ์€ ์ •๋ฐ€๋„์—์„œ๋„ ์ž˜ ๋™์ž‘ํ•˜๋Š” ์ธ๊ฐ„์˜ ์‹ ๊ฒฝ์„ธํฌ๋ฅผ ๋ชจ๋ฐฉํ•˜์˜€๊ธฐ ๋–„๋ฌธ์— ๋‚ฎ์€ ์ •๋ฐ€๋„์—์„œ๋„ ์ž˜ ๋™์ž‘ํ•  ๊ฐ€๋Šฅ์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋‹ค. ์ธ๊ณต์‹ ๊ฒฝ๋ง์˜ ์–‘์žํ™”(quantization)๋Š” ์ด๋Ÿฌํ•œ ํŠน์ง•์„ ์ด์šฉํ•œ๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ ๊นŠ์€ ์‹ ๊ฒฝ๋ง ๊ณ ์ •์†Œ์ˆ˜์  ์–‘์žํ™”๋Š” 8-bit ์ด์ƒ์˜ ๋‹จ์–ด๊ธธ์ด์—์„œ ๋ถ€๋™์†Œ์ˆ˜์ ๊ณผ ์œ ์‚ฌํ•œ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜์žˆ์ง€๋งŒ, ๊ทธ๋ณด๋‹ค ๋‚ฎ์€ 1-, 2-bit์—์„œ๋Š” ์„ฑ๋Šฅ์ด ๋–จ์–ด์ง„๋‹ค. ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์€ ๋ถˆ๊ท ํ˜• ์–‘์žํ™”๊ธฐ๋‚˜ ์ ์‘์  ์–‘์žํ™” ๋“ฑ์˜ ๋” ์ •๋ฐ€ํ•œ ์ธ๊ณต์‹ ๊ฒฝ๋ง ์–‘์žํ™” ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ ๊ธฐ์กด์˜ ์—ฐ๊ตฌ์™€ ๋งค์šฐ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ๋Š” ๊ณ ์ • ์†Œ์ˆ˜์  ๋„คํŠธ์›Œํฌ์˜ ์ผ๋ฐ˜ํ™”๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š”๋ฐ ์ดˆ์ ์„ ๋งž์ถ”์—ˆ์œผ๋ฉฐ, ์ด๋ฅผ ์œ„ํ•ด ์žฌํ›ˆ๋ จ(retraining) ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ์–‘์žํ™”๋œ ์ธ๊ณต์‹ ๊ฒฝ๋ง์˜ ์„ฑ๋Šฅ์„ ๋ถ„์„ํ•œ๋‹ค. ์„ฑ๋Šฅ ๋ถ„์„์€ ๋ ˆ์ด์–ด๋ณ„ ๋ฏผ๊ฐ๋„ ์ธก์ •(layer-wise sensitivity analysis)์— ๊ธฐ๋ฐ˜ํ•œ๋‹ค. ๋˜ํ•œ ์–‘์žํ™” ๋ชจ๋ธ์˜ ๋„“์ด์™€ ๊นŠ์ด์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ๋„ ๋ถ„์„ํ•œ๋‹ค. ๋ถ„์„๋œ ๊ฒฐ๊ณผ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์–‘์žํ™” ์Šคํ… ์ ์‘ ํ›ˆ๋ จ๋ฒ•(quantization step size adaptation)๊ณผ ์ ์ง„์  ์–‘์žํ™” ํ›ˆ๋ จ ๋ฐฉ๋ฒ•(gradual quantization)์„ ์ œ์•ˆํ•œ๋‹ค. ์–‘์žํ™”๋œ ์‹ ๊ฒฝ๋ง ํ›ˆ๋ จ์‹œ ์–‘์žํ™” ๋…ธ์ด์ฆˆ๋ฅผ ์ ๋‹นํžˆ ์กฐ์ •ํ•˜์—ฌ ์†์‹ค ํ‰๋ฉด(loss surface)์ƒ์— ํ‰ํ‰ํ•œ ๋ฏธ๋‹ˆ๋งˆ(minima)์— ๋„๋‹ฌ ํ•  ์ˆ˜ ์žˆ๋Š” ์–‘์žํ™” ํ›ˆ๋ จ ๋ฐฉ๋ฒ• ๋˜ํ•œ ์ œ์•ˆํ•œ๋‹ค. HLHLp (high-low-high-low-precision)๋กœ ๋ช…๋ช…๋œ ํ›ˆ๋ จ ๋ฐฉ๋ฒ•์€ ์–‘์žํ™” ์ •๋ฐ€๋„๋ฅผ ํ›ˆ๋ จ์ค‘์— ๋†’๊ฒŒ-๋‚ฎ๊ฒŒ-๋†’๊ฒŒ-๋‚ฎ๊ฒŒ ๋ฐ”๊พธ๋ฉด์„œ ํ›ˆ๋ จํ•œ๋‹ค. ํ›ˆ๋ จ๋ฅ (learning rate)๋„ ์–‘์žํ™” ์Šคํ… ์‚ฌ์ด์ฆˆ๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์œ ๋™์ ์œผ๋กœ ๋ฐ”๋€๋‹ค. ์ œ์•ˆํ•˜๋Š” ํ›ˆ๋ จ๋ฐฉ๋ฒ•์€ ์ผ๋ฐ˜์ ์ธ ๋ฐฉ๋ฒ•์œผ๋กœ ํ›ˆ๋ จ๋œ ์–‘์žํ™” ๋ชจ๋ธ์— ๋น„ํ•ด ์ƒ๋‹นํžˆ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๋‹ค. ๋˜ํ•œ ์„ ํ›ˆ๋ จ๋œ ์„ ์ƒ ๋ชจ๋ธ๋กœ ํ•™์ƒ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๋Š” ์ง€์‹ ์ฆ๋ฅ˜(knowledge distillation, KD) ๊ธฐ์ˆ ์„ ์ด์šฉํ•˜์—ฌ ์–‘์žํ™”์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ํŠนํžˆ ์„ ์ƒ ๋ชจ๋ธ์„ ์„ ํƒํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ์ง€์‹ ์ฆ๋ฅ˜์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ๋ถ„์„ํ•œ๋‹ค. ๋ถ€๋™์†Œ์ˆ˜์  ์„ ์ƒ๋ชจ๋ธ๊ณผ ์–‘์žํ™” ๋œ ์„ ์ƒ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ํ›ˆ๋ จ ์‹œํ‚จ ๊ฒฐ๊ณผ ์„ ์ƒ ๋ชจ๋ธ์ด ๋งŒ๋“ค์–ด๋‚ด๋Š” ์†Œํ”„ํŠธ๋งฅ์Šค(softmax) ๋ถ„ํฌ๊ฐ€ ์ง€์‹์ฆ๋ฅ˜ํ•™์Šต ๊ฒฐ๊ณผ์— ํฌ๊ฒŒ ์˜ํ–ฅ์„ ์ฃผ๋Š” ๊ฒƒ์„ ๋ฐœ๊ฒฌํ•˜์˜€๋‹ค. ์†Œํ”„ํŠธ๋งฅ์Šค ๋ถ„ํฌ๋Š” ์ง€์‹์ฆ๋ฅ˜์˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์„ ํ†ตํ•ด ์กฐ์ ˆ๋ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ง€์‹์ฆ๋ฅ˜ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋“ค๊ฐ„์˜ ์—ฐ๊ด€๊ด€๊ณ„ ๋ถ„์„์„ ํ†ตํ•ด ๋†’์€ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋˜ํ•œ ์ ์ง„์ ์œผ๋กœ ์†Œํ”„ํŠธ ์†์‹ค ํ•จ์ˆ˜ ๋ฐ˜์˜ ๋น„์œจ์„ ํ›ˆ๋ จ์ค‘์— ์ค„์—ฌ๊ฐ€๋Š” ์ ์ง„์  ์†Œํ”„ํŠธ ์†์‹ค ๊ฐ์†Œ(gradual soft loss reducing)๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์˜€๋‹ค. ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์—ฌ๋Ÿฌ ์–‘์žํ™”๋ชจ๋ธ์„ ํ‰๊ท ๋‚ด์–ด ๋†’์€ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ๊ฐ–๋Š” ์–‘์žํ™” ๋ชจ๋ธ์„ ์–ป๋Š” ํ›ˆ๋ จ ๋ฐฉ๋ฒ•์ธ ํ™•๋ฅ  ์–‘์žํ™” ๊ฐ€์ค‘์น˜ ํ‰๊ท (stochastic quantized weight averaging, SQWA) ํ›ˆ๋ จ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ๋ฐฉ๋ฒ•์€ (1) ๋ถ€๋™์†Œ์ˆ˜์  ํ›ˆ๋ จ, (2) ๋ถ€๋™์†Œ์ˆ˜์  ๋ชจ๋ธ์˜ ์ง์ ‘ ์–‘์žํ™”(direct quantization), (3) ์žฌํ›ˆ๋ จ(retraining)๊ณผ์ •์—์„œ ์ง„๋™ ํ›ˆ๋ จ์œจ(cyclical learning rate)์„ ์‚ฌ์šฉํ•˜์—ฌ ํœธ๋ จ์œจ์ด ์ง„๋™๋‚ด์—์„œ ๊ฐ€์žฅ ๋‚ฎ์„ ๋•Œ ๋ชจ๋ธ๋“ค์„ ์ €์žฅ, (4) ์ €์žฅ๋œ ๋ชจ๋ธ๋“ค์„ ํ‰๊ท , (5) ํ‰๊ท  ๋œ ๋ชจ๋ธ์„ ๋‚ฎ์€ ํ›ˆ๋ จ์œจ๋กœ ์žฌ์กฐ์ • ํ•˜๋Š” ๋‹ค์ค‘ ๋‹จ๊ณ„ ํ›ˆ๋ จ๋ฒ•์ด๋‹ค. ์ถ”๊ฐ€๋กœ ์–‘์žํ™” ๊ฐ€์ค‘์น˜ ๋„๋ฉ”์ธ์—์„œ ์—ฌ๋Ÿฌ ์–‘์žํ™” ๋ชจ๋ธ๋“ค์„ ํ•˜๋‚˜์˜ ์†์‹คํ‰๋ฉด๋‚ด์— ๋™์‹œ์— ๋‚˜ํƒ€๋‚ผ ์ˆ˜ ์žˆ๋Š” ์‹ฌ์ƒ(visualization) ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ์‹ฌ์ƒ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด SQWA๋กœ ํ›ˆ๋ จ๋œ ์–‘์žํ™” ๋ชจ๋ธ์€ ์†์‹คํ‰๋ฉด์˜ ๊ฐ€์šด๋ฐ ๋ถ€๋ถ„์— ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์˜€๋‹ค.Deep neural networks (DNNs) achieve state-of-the-art performance for various applications such as image recognition and speech synthesis across different fields. However, their implementation in embedded systems is difficult owing to the large number of associated parameters and high computational costs. In general, DNNs operate well using low-precision parameters because they mimic the operation of human neurons; therefore, quantization of DNNs could further improve their operational performance. In many applications, word-length larger than 8 bits leads to DNN performance comparable to that of a full-precision model; however, shorter word-length such as those of 1 or 2 bits can result in significant performance degradation. To alleviate this problem, complex quantization methods implemented via asymmetric or adaptive quantizers have been employed in previous works. In contrast, in this study, we propose a different approach for quantization of DNNs. In particular, we focus on improving the generalization capability of quantized DNNs (QDNNs) instead of employing complex quantizers. To this end, first, we analyze the performance characteristics of quantized DNNs using a retraining algorithm; we employ layer-wise sensitivity analysis to investigate the quantization characteristics of each layer. In addition, we analyze the differences in QDNN performance for different quantized network sizes. Based on our analyses, two simple quantization training techniques, namely \textit{adaptive step size retraining} and \textit{gradual quantization} are proposed. Furthermore, a new training scheme for QDNNs is proposed, which is referred to as high-low-high-low-precision (HLHLp) training scheme, that allows the network to achieve flat minima on its loss surface with the aid of quantization noise. As the name suggests, the proposed training method employs high-low-high-low precision for network training in an alternating manner. Accordingly, the learning rate is also abruptly changed at each stage. Our obtained analysis results include that the proposed training technique leads to good performance improvement for QDNNs compared with previously reported fine tuning-based quantization schemes. Moreover, the knowledge distillation (KD) technique that utilizes a pre-trained teacher model for training a student network is exploited for the optimization of the QDNNs. We explore the effect of teacher network selection and investigate that of different hyperparameters on the quantization of DNNs using KD. In particular, we use several large floating-point and quantized models as teacher networks. Our experiments indicate that, for effective KD training, softmax distribution produced by a teacher network is more important than its performance. Furthermore, because softmax distribution of a teacher network can be controlled using KD hyperparameters, we analyze the interrelationship of each KD component for QDNN training. We show that even a small teacher model can achieve the same distillation performance as a larger teacher model. We also propose the gradual soft loss reducing (GSLR) technique for robust KD-based QDNN optimization, wherein the mixing ratio of hard and soft losses during training is controlled. In addition, we present a new QDNN optimization approach, namely \textit{stochastic quantized weight averaging} (SQWA), to design low-precision DNNs with good generalization capability using model averaging. The proposed approach includes (1) floating-point model training, (2) direct quantization of weights, (3) capture of multiple low-precision models during retraining with cyclical learning rate, (4) averaging of the captured models, and (5) re-quantization of the averaged model and its fine-tuning with low learning rate. Additionally, we present a loss-visualization technique for the quantized weight domain to elucidate the behavior of the proposed method. Our visualization results indicate that a QDNN optimized using our proposed approach is located near the center of the flat minimum on the loss surface.1.Introduction 1 1.1 Quantization of Deep Neural Networks 1 1.2 Generalization Capability of DNNs 3 1.3 Improved Generalization Capability of QDNNs 3 1.4 Outline of the Dissertation 5 2. Analysis of Fixedpoint Quantization of Deep Neural Networks 6 2.1 Introduction 6 2.2 Fixedpoint Performance Analysis of Deep Neural Networks 8 2.2.1 Model Design of Deep Neural Networks 8 2.2.2 Retrainbased Weight Quantization 10 2.2.3 Quantization Sensitivity Analysis 12 2.2.4 Empirical Analysis 13 2.3 Step Size Adaptation and Gradual Quantization for Retraining of DeepNeural Networks 22 2.3.1 Stepsize adaptation during retraining 22 2.3.2 Gradual quantization scheme 24 2.3.3 Experimental Results 24 2.4 Concluding remarks 30 3. HLHLp:Quantized Neural Networks Training for Reaching Flat Minimain Loss Surface 32 3.1 Introduction 32 3.2 Related Works 33 3.2.1 Quantization of Deep Neural Networks 33 3.2.2 Flat Minima in Loss Surfaces 34 3.3 Training QDNN for IMproved Generalization Capability 35 3.3.1 Analysis of Training with Quantized Weights 35 3.3.2 Highlowhighlowprecision Training 38 3.4 Experimental Results 40 3.4.1 Image Classification with CNNs 41 3.4.2 Language Modeling on PTB and WikiText2 44 3.4.3 Speech Recognition on WSJ Corpus 48 3.4.4 Discussion 49 3.5 Concluding Remarks 55 4 Knowledge Distillation for Optimization of Quantized Deep Neural Networks 56 4.1 Introduction 56 4.2 Quantized Deep Neural Netowrk Training Using Knowledge Distillation 57 4.2.1 Quantization of deep neural networks and knowledge distillation 58 4.2.2 Teacher model selection for KD 59 4.2.3 Discussion on hyperparameters of KD 62 4.3 Experimental Results 62 4.3.1 Experimental setup 62 4.3.2 Results on CIFAR10 and CIFAR100 64 4.3.3 Model size and temperature 66 4.3.4 Gradual Soft Loss Reducing 68 4.4 Concluding Remarks 68 5 SQWA: Stochastic Quantized Weight Averaging for Improving the Generalization Capability of LowPrecision Deep Neural Networks 70 5.1 Introduction 70 5.2 Related works 71 5.2.1 Quantization of deep neural networks for efficient implementations 71 5.2.2 Stochastic weight averaging and losssurface visualization 72 5.3 Quantization of DNN and loss surface visualization 73 5.3.1 Quantization of deep neural networks 73 5.3.2 Loss surface visualization for QDNNs 75 5.4 SQWA algorithm 76 5.5 Experimental results 80 5.5.1 CIFAR100 80 5.5.2 ImageNet 87 5.6 Concluding remarks 90 6 Conclusion 92 Abstract (In Korean) 110Docto

    Machine learning and its applications in reliability analysis systems

    Get PDF
    In this thesis, we are interested in exploring some aspects of Machine Learning (ML) and its application in the Reliability Analysis systems (RAs). We begin by investigating some ML paradigms and their- techniques, go on to discuss the possible applications of ML in improving RAs performance, and lastly give guidelines of the architecture of learning RAs. Our survey of ML covers both levels of Neural Network learning and Symbolic learning. In symbolic process learning, five types of learning and their applications are discussed: rote learning, learning from instruction, learning from analogy, learning from examples, and learning from observation and discovery. The Reliability Analysis systems (RAs) presented in this thesis are mainly designed for maintaining plant safety supported by two functions: risk analysis function, i.e., failure mode effect analysis (FMEA) ; and diagnosis function, i.e., real-time fault location (RTFL). Three approaches have been discussed in creating the RAs. According to the result of our survey, we suggest currently the best design of RAs is to embed model-based RAs, i.e., MORA (as software) in a neural network based computer system (as hardware). However, there are still some improvement which can be made through the applications of Machine Learning. By implanting the 'learning element', the MORA will become learning MORA (La MORA) system, a learning Reliability Analysis system with the power of automatic knowledge acquisition and inconsistency checking, and more. To conclude our thesis, we propose an architecture of La MORA

    A Large Imaging Database and Novel Deep Neural Architecture for Covid-19 Diagnosis

    Get PDF
    Deep learning methodologies constitute nowadays the main approach for medical image analysis and disease prediction. Large annotated databases are necessary for developing these methodologies; such databases are difficult to obtain and to make publicly available for use by researchers and medical experts. In this paper, we focus on diagnosis of Covid-19 based on chest 3-D CT scans and develop a dual knowledge framework, including a large imaging database and a novel deep neural architecture. We introduce COV19-CT-DB, a very large database annotated for COVID-19 that consists of 7,750 3-D CT scans, 1,650 of which refer to COVID-19 cases and 6,100 to non-COVID19 cases. We use this database to train and develop the RACNet architecture. This architecture performs 3-D analysis based on a CNN-RNN network and handles input CT scans of different lengths, through the introduction of dynamic routing, feature alignment and a mask layer. We conduct a large experimental study that illustrates that the RACNet network has the best performance compared to other deep neural networks i) when trained and tested on COV19-CT-DB; ii) when tested, or when applied, through transfer learning, to other public databases
    • โ€ฆ
    corecore