190 research outputs found

    Generalization Error in Deep Learning

    Get PDF
    Deep learning models have lately shown great performance in various fields such as computer vision, speech recognition, speech translation, and natural language processing. However, alongside their state-of-the-art performance, it is still generally unclear what is the source of their generalization ability. Thus, an important question is what makes deep neural networks able to generalize well from the training set to new data. In this article, we provide an overview of the existing theory and bounds for the characterization of the generalization error of deep neural networks, combining both classical and more recent theoretical and empirical results

    Ensemble of Singleโ€Layered Complexโ€Valued Neural Networks for Classification Tasks

    Get PDF
    This paper presents ensemble approaches in single-layered complex-valued neural network (CVNN) to solve real-valued classification problems. Each component CVNN of an ensemble uses a recently proposed activation function for its complex-valued neurons (CVNs). A gradient-descent based learning algorithm was used to train the component CVNNs. We applied two ensemble methods, negative correlation learning and bagging, to create the ensembles. Experimental results on a number of real-world benchmark problems showed a substantial performance improvement of the ensembles over the individual single-layered CVNN classifiers. Furthermore, the generalization performances were nearly equivalent to those obtained by the ensembles of real-valued multilayer neural networks

    Training Data Influence Analysis and Estimation: A Survey

    Full text link
    Good models require good training data. For overparameterized deep models, the causal relationship between training data and model predictions is increasingly opaque and poorly understood. Influence analysis partially demystifies training's underlying interactions by quantifying the amount each training instance alters the final model. Measuring the training data's influence exactly can be provably hard in the worst case; this has led to the development and use of influence estimators, which only approximate the true influence. This paper provides the first comprehensive survey of training data influence analysis and estimation. We begin by formalizing the various, and in places orthogonal, definitions of training data influence. We then organize state-of-the-art influence analysis methods into a taxonomy; we describe each of these methods in detail and compare their underlying assumptions, asymptotic complexities, and overall strengths and weaknesses. Finally, we propose future research directions to make influence analysis more useful in practice as well as more theoretically and empirically sound. A curated, up-to-date list of resources related to influence analysis is available at https://github.com/ZaydH/influence_analysis_papers

    Modification of Learning Ratio and Drop-Out for Stochastic Gradient Descendant Algorithm

    Get PDF
    The stochastic gradient descendant algorithm is one of the most popular neural network training algorithms. Many authors have contributed to modifying or adapting its shape and parametrizations in order to improve its performance. In this paper, the authors propose two modifications on this algorithm that can result in a better performance without increasing significantly the computational and time resources needed. The first one is a dynamic learning ratio depending on the network layer where it is applied, and the second one is a dynamic drop-out that decreases through the epochs of training. These techniques have been tested against different benchmark function to see their effect on the learning process. The obtained results show that the application of these techniques improves the performance of the learning of the neural network, especially when they are used together.The current study has been sponsored by the Government of the Basque Country-ELKARTEK21/10 KK-2021/00014 (โ€œEstudio de nuevas tรฉcnicas de inteligencia artificial basadas en Deep Learning dirigidas a la optimizaciรณn de procesos industrialesโ€) research program

    Deep Learning based Recommender System: A Survey and New Perspectives

    Full text link
    With the ever-growing volume of online information, recommender systems have been an effective strategy to overcome such information overload. The utility of recommender systems cannot be overstated, given its widespread adoption in many web applications, along with its potential impact to ameliorate many problems related to over-choice. In recent years, deep learning has garnered considerable interest in many research fields such as computer vision and natural language processing, owing not only to stellar performance but also the attractive property of learning feature representations from scratch. The influence of deep learning is also pervasive, recently demonstrating its effectiveness when applied to information retrieval and recommender systems research. Evidently, the field of deep learning in recommender system is flourishing. This article aims to provide a comprehensive review of recent research efforts on deep learning based recommender systems. More concretely, we provide and devise a taxonomy of deep learning based recommendation models, along with providing a comprehensive summary of the state-of-the-art. Finally, we expand on current trends and provide new perspectives pertaining to this new exciting development of the field.Comment: The paper has been accepted by ACM Computing Surveys. https://doi.acm.org/10.1145/328502

    Exploring CNNs: an application study on nuclei recognition task in colon cancer histology images

    Get PDF
    In this work we explore the recent advances in the field of Convolutional Neural Network (CNN), with particular interest to the task of image classification. Moreover, we explore a new neural network algorithm, called ladder network, which enables the semi-supervised framework on pre-existing neural networks. These techniques were applied to a task of nuclei classification in routine colon cancer histology images. Specifically, starting from an existing CNN developed for this purpose, we improve its performances utilizing a better data augmentation, a more efficient initialization of the network and adding the batch normalization layer. These improvements were made to achieve a state-of-the-art architecture which could be compatible with the ladder network algorithm. A specific custom version of the ladder network algorithm was implemented in our CNN in order to use the amount of data without a label presented with the used database. However we observed a deterioration of the performances using the unlabeled examples of this database, probably due to a distribution bias in them compared to the labeled ones. Even without using of the semi-supervised framework, the ladder algorithm allows to obtain a better representation in the CNN which leads to a dramatic performance improvement of the starting CNN algorithm. We reach this result only with a little increase in complexity of the final model, working specifically on the training process of the algorithm

    Agree to Disagree: Diversity through Disagreement for Better Transferability

    Full text link
    Gradient-based learning algorithms have an implicit simplicity bias which in effect can limit the diversity of predictors being sampled by the learning procedure. This behavior can hinder the transferability of trained models by (i) favoring the learning of simpler but spurious features -- present in the training data but absent from the test data -- and (ii) by only leveraging a small subset of predictive features. Such an effect is especially magnified when the test distribution does not exactly match the train distribution -- referred to as the Out of Distribution (OOD) generalization problem. However, given only the training data, it is not always possible to apriori assess if a given feature is spurious or transferable. Instead, we advocate for learning an ensemble of models which capture a diverse set of predictive features. Towards this, we propose a new algorithm D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data, but disagreement on the OOD data. We show how D-BAT naturally emerges from the notion of generalized discrepancy, as well as demonstrate in multiple experiments how the proposed method can mitigate shortcut-learning, enhance uncertainty and OOD detection, as well as improve transferability.Comment: 23 pages, 17 figure

    ์–‘์žํ™”๋œ ๊นŠ์€ ์‹ ๊ฒฝ๋ง์˜ ํŠน์„ฑ ๋ถ„์„ ๋ฐ ์ตœ์ ํ™”

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ •๋ณด๊ณตํ•™๋ถ€, 2020. 8. ์„ฑ์›์šฉ.Deep neural networks (DNNs) have achieved impressive performance on various machine learning tasks. However, performance improvements are usually accompanied by increased network complexity incurring vast arithmetic operations and memory accesses. In addition, the recent increase in demand for utilizing DNNs in resource-limited devices leads to a plethora of explorations in model compression and acceleration. Among them, network quantization is one of the most cost-efficient implementation methods for DNNs. Network quantization converts the precision of parameters and signals from 32-bit floating-point to 8, 4, or 2-bit fixed-point precision. The weight quantization can directly compress DNNs by reducing the representation levels of the parameters. Activation outputs can also be quantized to reduce the computational costs and working memory footprint. However, severe quantization degrades the performance of the network. Many previous studies focused on developing optimization methods for the quantization of given models without considering the effects of the quantization on DNNs. Therefore, extreme simulation is required to obtain quantization precision that maintains performance on different models or datasets. In this dissertation, we attempt to measure the per-parameter capacity of DNN models and interpret the results to obtain insights on the optimum quantization of parameters. The uniform random vectors are sampled and used for training generic forms of fully connected DNNs, convolutional neural networks (CNNs), and recurrent neural networks (RNNs). We conduct memorization and classification tests to study the effects of the parameters number and precision on the performance. The model and the per-parameter capacities are assessed by measuring the mutual information between the input and the classified output. To get insight for parameter quantization when performing real tasks, the training and the test performances are compared. In addition, we analyze and demonstrate that quantization noise of weight and activation are disparate in inference. Synthesized data is designed to visualize the effects of weight and activation quantization. The results indicate that deeper models are more prone to activation quantization, while wider models improve the resiliency to both weight and activation quantization. Considering the characteristics of the quantization errors, we propose a holistic approach for the optimization of QDNNs, which contains QDNN training methods as well as quantization-friendly architecture design. Based on the observation that the activation quantization induces noised prediction, we propose the Stochastic Precision Ensemble training for QDNNs (SPEQ). The SPEQ is teacher-student learning, but the teacher and the student share the model parameters. We obtain the teacher's soft labels by changing the bit-precision of the activation stochastically at each layer of the forward-pass computation. The student model is trained with these soft labels to reduce the activation quantization noise. Instead of the KL-divergence, the cosine-distance loss is employed for the KD training. Since the teacher model changes continuously by random bit-precision assignment, it exploits the effect of stochastic ensemble KD. The SPEQ method outperforms various tasks, such as image classification, question-answering, and transfer learning without requiring cumbersome teacher networks.์ตœ๊ทผ ๊นŠ์€ ์‹ ๊ฒฝ๋ง(deep neural network, DNN)์€ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ๋งค์šฐ ์ธ์ƒ์ ์ธ ์„ฑ๋Šฅ์„ ๋ณด์ด๊ณ  ์žˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์‹ ๊ฒฝ๋ง์˜ ๋ณต์žก๋„๊ฐ€ ํ•จ๊ป˜ ์ฆ๊ฐ€ํ•˜๋ฉด์„œ, ์ ์  ๋” ๋งŽ์€ ๊ณ„์‚ฐ ๋ฐ ๋ฉ”๋ชจ๋ฆฌ ์ ‘๊ทผ ๋น„์šฉ์ด ๋ฐœ์ƒํ•˜๊ณ  ์žˆ๋‹ค. ์ธ๊ณต์‹ ๊ฒฝ๋ง์˜ ์–‘์žํ™”(quantization)๋Š” ๊นŠ์€ ์‹ ๊ฒฝ๋ง์˜ ๋™์ž‘ ๋น„์šฉ์„ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ์ผ๋ฐ˜์ ์œผ๋กœ, ์‹ ๊ฒฝ๋ง์˜ ๊ฐ€์ค‘์น˜(weights) ๋ฐ ํ™œ์„ฑํ™”๋œ ์‹ ํ˜ธ(activation outputs)๋Š” 32 ๋น„ํŠธ ๋ถ€๋™ ์†Œ์ˆ˜์ (floating-point) ์ •๋ฐ€๋„๋ฅผ ๊ฐ€์ง„๋‹ค. ๊ณ ์ • ์†Œ์ˆ˜์  ์–‘์žํ™”๋Š” ์ด๋ฅผ ๋” ๋‚ฎ์€ ์ •๋ฐ€๋„๋กœ ํ‘œํ˜„ํ•จ์œผ๋กœ์จ ์‹ ๊ฒฝ๋ง์˜ ํฌ๊ธฐ ๋ฐ ์—ฐ์‚ฐ ๋น„์šฉ์„ ์ค„์ธ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, 1๋˜๋Š” 2๋น„ํŠธ ๋“ฑ ๋งค์šฐ ๋‚ฎ์€ ์ •๋ฐ€๋กœ๋„ ์–‘์žํ™”๋œ ์‹ ๊ฒฝ๋ง์€ ๋ถ€๋™ ์†Œ์ˆ˜์  ์‹ ๊ฒฝ๋ง๊ณผ ๋น„๊ตํ•˜์—ฌ ํฐ ์„ฑ๋Šฅ ํ•˜๋ฝ์„ ๋ณด์ธ๋‹ค. ๊ธฐ์กด์˜ ์—ฐ๊ตฌ๋“ค์€ ์–‘์žํ™” ์—๋Ÿฌ(error)์— ๋Œ€ํ•œ ๋ถ„์„ ์—†์ด ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์™€ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์—ฐ๊ตฌ ๊ฒฐ๊ณผ๋ฅผ ๋‹ค๋ฅธ ๋ชจ๋ธ๊ณผ ๋ฐ์ดํ„ฐ์— ์ ์šฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ˆ˜๋งŽ์€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ์ˆ˜ํ–‰ํ•˜์—ฌ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋Š” ์–‘์žํ™” ์ •๋ฐ€๋„์˜ ํ•œ๊ณ„๋ฅผ ์ฐพ์•„์•ผ ํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์‹ ๊ฒฝ๋ง์—์„œ์˜ ์–‘์žํ™” ํŠน์„ฑ์„ ๋ถ„์„ํ•˜๊ณ , ์–‘์žํ™”๋กœ ์ธํ•œ ์‹ ๊ฒฝ๋ง์˜ ์„ฑ๋Šฅ ์ €ํ•˜ ์›์ธ์„ ์ œ์‹œํ•œ๋‹ค. ์‹ ๊ฒฝ๋ง์˜ ์–‘์žํ™”๋Š” ํฌ๊ฒŒ ๊ฐ€์ค‘์น˜ ์–‘์žํ™”(weight quantization)์™€ ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ์–‘์žํ™”(activation quantization)๋กœ ๋‚˜๋‰œ๋‹ค. ๋จผ์ €, ๊ฐ€์ค‘์น˜ ์–‘์žํ™”์˜ ํŠน์„ฑ์„ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•ด ๋ฌด์ž‘์œ„ ํ›ˆ๋ จ ์ƒ˜ํ”Œ์„ ์ƒ์„ฑํ•˜๊ณ , ์ด ๋ฐ์ดํ„ฐ๋กœ ์‹ ๊ฒฝ๋ง์„ ํ›ˆ๋ จ์‹œํ‚ค๋ฉด์„œ ์‹ ๊ฒฝ๋ง์˜ ์•”๊ธฐ ๋Šฅ๋ ฅ(memorization capacity)์„ ์ •๋Ÿ‰ํ™” ํ•œ๋‹ค. ์‹ ๊ฒฝ๋ง์ด ์ž์‹ ์˜ ์•”๊ธฐ ๋Šฅ๋ ฅ์„ ์ตœ๋Œ€๋กœ ํ™œ์šฉํ•˜๋„๋ก ํ›ˆ๋ จ์‹œํ‚จ ๋’ค ์„ฑ๋Šฅ์ด ํ•˜๋ฝํ•˜๋Š” ์–‘์žํ™” ์ •๋ฐ€๋„์˜ ํ•œ๊ณ„๋ฅผ ๋ถ„์„ํ•œ๋‹ค. ๋ถ„์„ ๊ฒฐ๊ณผ, ๊ฐ€์ค‘์น˜๊ฐ€ ์ •๋ณด๋Ÿ‰์„ ์žƒ๊ธฐ ์‹œ์ž‘ํ•˜๋Š” ์–‘์žํ™” ์ •๋ฐ€๋„๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์ˆ˜์™€ ๊ด€๊ณ„๊ฐ€ ์—†์Œ์„ ํ™•์ธํ•˜์˜€๋‹ค. ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ํŒŒ๋ผ๋ฏธํ„ฐ์— ์ €์žฅ๋œ ์ •๋ณด๋ฅผ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋Š” ํ•œ๊ณ„ ์–‘์žํ™” ์ •๋ฐ€๋„๋Š” ๋ชจ๋ธ์˜ ๊ตฌ์กฐ์— ๋”ฐ๋ผ ๋‹ฌ๋ผ์ง„๋‹ค. ๋˜ํ•œ, ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ํ™œ์„ฑํ™” ํ•จ์ˆ˜ ์–‘์žํ™”์™€ ๊ฐ€์ค‘์น˜ ์–‘์žํ™”๋กœ ์ธํ•œ ์—๋Ÿฌ์˜ ์ฐจ์ด์ ์„ ๋ถ„์„ํ•œ๋‹ค. ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ(synthesized data)๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ์ด ๋ฐ์ดํ„ฐ๋กœ ํ›ˆ๋ จ๋œ ๋ชจ๋ธ์„ ์–‘์žํ™” ํ•œ ๋’ค ์–‘์žํ™” ์—๋Ÿฌ๋ฅผ ์‹œ๊ฐํ™” ํ•œ๋‹ค. ๋ถ„์„ ๊ฒฐ๊ณผ ๊ฐ€์ค‘์น˜ ์–‘์žํ™”๋Š” ์‹ ๊ฒฝ๋ง์˜ ์šฉ๋Ÿ‰(capacity)์„ ๊ฐ์†Œ์‹œํ‚ค๋ฉฐ, ์‹ ๊ฒฝ๋ง์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๋ฉด ๊ฐ€์ค‘์น˜ ์–‘์žํ™” ์—๋Ÿฌ๊ฐ€ ๊ฐ์†Œํ•œ๋‹ค. ๋ฐ˜๋ฉด, ํ™œ์„ฑํ™” ํ•จ์ˆ˜์˜ ์–‘์žํ™”๋Š” ์ถ”๋ก  ๊ณผ์ •(inference)์—์„œ ์žก์Œ(noise)์„ ์œ ๋ฐœํ•˜๋ฉฐ ์‹ ๊ฒฝ๋ง์˜ ๊นŠ์ด๊ฐ€ ๊นŠ์–ด์งˆ ์ˆ˜๋ก ํ™œ์„ฑํ™” ํ•จ์ˆ˜์˜ ์—๋Ÿฌ๊ฐ€ ์ฆํญ๋œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š”, ๋‘ ์–‘์žํ™” ์—๋Ÿฌ์˜ ์ฐจ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์–‘์žํ™” ์นœํ™”์  ์•„ํ‚คํ…์ฒ˜ ์„ค๊ณ„์™€ ๊ณ ์ • ์†Œ์ˆ˜์  ํ›ˆ๋ จ ๋ฐฉ๋ฒ•์„ ํฌํ•จํ•˜๋Š” ํฌ๊ด„์ ์ธ ๊ณ ์ • ์†Œ์ˆ˜์  ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ํ™œ์„ฑํ™” ํ•จ์ˆ˜๊ฐ€ ์–‘์žํ™”๋œ ์‹ ๊ฒฝ๋ง์˜ ์„ฑ๋Šฅ ๋ณต์›๋ ฅ์„ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ SPEQ ํ›ˆ๋ จ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค. ์ œ์•ˆํ•˜๋Š” ํ›ˆ๋ จ ๋ฐฉ๋ฒ•์€ ์ง€์‹ ์ฆ๋ฅ˜ (knowledge distillation, KD) ๊ธฐ๋ฐ˜ ํ•™์Šต ๋ฐฉ๋ฒ•์œผ๋กœ, ๋งค ํ›ˆ๋ จ ๋‹จ๊ณ„ ๋งˆ๋‹ค ์„œ๋กœ ๋‹ค๋ฅธ ์„ ์ƒ ๋ชจ๋ธ์˜ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•œ๋‹ค. ์„ ์ƒ ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ํ•™์ƒ ๋ชจ๋ธ๊ณผ ๋™์ผํ•˜๋ฉฐ, ํ™œ์„ฑํ™” ํ•จ์ˆ˜์˜ ์–‘์žํ™” ์ •๋ฐ€๋„๋ฅผ ํ™•๋ฅ ์ ์œผ๋กœ ์„ ํƒํ•จ์œผ๋กœ์จ ์„ ์ƒ ๋ชจ๋ธ์˜ ์†Œํ”„ํŠธ ๋ผ๋ฒจ(soft label)์„ ์ƒ์„ฑํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ์„ ์ƒ ๋ชจ๋ธ์€ ํ•™์ƒ ๋ชจ๋ธ์—์„œ ์œ ๋ฐœ๋˜๋Š” ์–‘์žํ™” ์žก์Œ์„ ๊ณ ๋ คํ•œ ์ง€์‹์„ ์ œ๊ณตํ•ด ์ค€๋‹ค. ํ•™์ƒ ๋ชจ๋ธ์€ ํ›ˆ๋ จ ๋‹จ๊ณ„๋งˆ๋‹ค ๋‹ค๋ฅธ ์ข…๋ฅ˜์˜ ์–‘์žํ™” ์žก์Œ์„ ๊ณ ๋ คํ•œ ์ง€์‹์œผ๋กœ ํ›ˆ๋ จ๋˜๊ธฐ ๋•Œ๋ฌธ์— ์•™์ƒ๋ธ” ํ•™์Šต(ensemble training) ํšจ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค. ์ œ์•ˆํ•˜๋Š” SPEQ ํ›ˆ๋ จ ๋ฐฉ๋ฒ•์€ ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ์–‘์žํ™”๋œ ์‹ ๊ฒฝ๋ง์˜ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค.1 Introduction 1 1.1 Quantization of Deep Neural Networks 1 1.1.1 Weight and Activation Quantization on Deep Neural Networks 2 1.1.2 Analysis of Quantized Deep Neural Networks 3 1.2 Scope of the Dissertation 4 1.2.1 Characterization of Quantization Errors 4 1.2.2 Optimization of Quantized Deep Neural Networks 6 2 Memorization Capacity of Deep Neural Networks under Parameter Quantization 8 2.1 Introduction 8 2.2 Related Works and Backgrounds 10 2.2.1 Neural Network Capacity 10 2.2.2 Fixed-Point Deep Neural Networks 11 2.3 Network Capacity Measurements of DNNs 11 2.3.1 Capacity Measurements on a Memorization Task 11 2.3.2 Network Quantization Method 13 2.3.3 Network Quantization and Parameter Capacity 14 2.4 Experimental Results on Capacity of Floating-point DNNs 15 2.4.1 Capacity of FCDNNs 15 2.4.2 Capacity of CNNs 19 2.4.3 Capacity of RNNs 19 2.5 Experimental Results of Parameter Quantization 21 2.5.1 Capacity under Parameter Quantization 21 2.5.2 Quantization Experiments on CIFAR-10 Dataset 23 2.5.3 Quantization Experiments on Shuffled CIFAR-10 Dataset 25 2.6 Concluding Remarks 28 3 Characterization and Holistic Optimization of Quantized Deep Neural Networks 30 3.1 Introduction 30 3.2 Backgrounds 32 3.2.1 Related Works on Network Quantization 32 3.2.2 Revisit of QDNN Optimization 33 3.3 Visualization of Quantization Errors using Synthetic Dataset 34 3.3.1 Synthetic Dataset Generation 34 3.3.2 Results on Synthetic Dataset 37 3.4 QDNN Optimization with Architectural Transformation and Improved Training 39 3.4.1 Architecture Transformation for Improved Robustness to Quantization 40 3.4.2 Cyclical Learning Rate Scheduling for Improved Generalization 41 3.4.3 Regularization for Limiting the Activation Noise Amplification 42 3.5 Experimental Results 42 3.5.1 Visualizing the Effects of Quantization on the Segmentation Task 42 3.5.2 The Width and Depth Effects on QDNNs 44 3.5.3 QDNN Architecture Selection under the Parameter Constraint 49 3.5.4 Results of Training Methods on QDNNs 51 3.6 Concluding Remarks 53 4 Parameter Shared Stochastic Precision Knowledge Distillation for Quantized Deep Neural Networks 55 4.1 Introduction 55 4.2 Background and Related Works 58 4.2.1 Quantization of Deep Neural Networks 58 4.2.2 Knowledge Distillation for Quantization 59 4.3 Stochastic Precision Ensemble Training for QDNNs 60 4.3.1 Quantization Method 60 4.3.2 Stochastic Precision Self-Distillation with Model Sharing 61 4.3.3 Stochastic Ensemble Learning 63 4.3.4 Cosine Similarity Learning 65 4.4 Experimental Results 70 4.4.1 Experiment Setup 70 4.4.2 Results on CIFAR-10 and CIFAR-100 Datasets 70 4.4.3 Results on ImageNet Dataset 73 4.4.4 Results on Transfer Learning 76 4.5 Concluding Remarks 78 5 Conclusion 80 Abstract (In Korean) 97 ๊ฐ์‚ฌ์˜ ๊ธ€ 99Docto
    • โ€ฆ
    corecore