Search CORE

190 research outputs found

Generalization Error in Deep Learning

Author: D McAllester
D Vainsencher
DA McAllester
Daniel Jakubovitz
Huan Xu
J Bruna
J Sokolic
K Schnass
M Anthony
N Akhtar
PL Bartlett
PL Bartlett
R Gribonval
R Gribonval
S Shalev-Shwartz
SJ Pan
TM Cover
V Papyan
Publication venue
Publication date: 06/04/2019
Field of study

Deep learning models have lately shown great performance in various fields such as computer vision, speech recognition, speech translation, and natural language processing. However, alongside their state-of-the-art performance, it is still generally unclear what is the source of their generalization ability. Thus, an important question is what makes deep neural networks able to generalize well from the training set to new data. In this article, we provide an overview of the existing theory and bounds for the characterization of the generalization error of deep neural networks, combining both classical and more recent theoretical and empirical results

arXiv.org e-Print Archive

Crossref

UCL Discovery

Ensemble of Single‐Layered Complex‐Valued Neural Networks for Classification Tasks

Author: AMIN Md. Faijul
ISLAM Md. Monirul
MURASE Kazuyuki
Publication venue: ELSEVIER SCIENCE B.V.
Publication date: 01/06/2009
Field of study

This paper presents ensemble approaches in single-layered complex-valued neural network (CVNN) to solve real-valued classification problems. Each component CVNN of an ensemble uses a recently proposed activation function for its complex-valued neurons (CVNs). A gradient-descent based learning algorithm was used to train the component CVNNs. We applied two ensemble methods, negative correlation learning and bagging, to create the ensembles. Experimental results on a number of real-world benchmark problems showed a substantial performance improvement of the ensembles over the individual single-layered CVNN classifiers. Furthermore, the generalization performances were nearly equivalent to those obtained by the ensembles of real-valued multilayer neural networks

Community Repository of Fukui

University of Fukui Repository

Training Data Influence Analysis and Estimation: A Survey

Author: Hammoudeh Zayd
Lowd Daniel
Publication venue
Publication date: 08/12/2022
Field of study

Good models require good training data. For overparameterized deep models, the causal relationship between training data and model predictions is increasingly opaque and poorly understood. Influence analysis partially demystifies training's underlying interactions by quantifying the amount each training instance alters the final model. Measuring the training data's influence exactly can be provably hard in the worst case; this has led to the development and use of influence estimators, which only approximate the true influence. This paper provides the first comprehensive survey of training data influence analysis and estimation. We begin by formalizing the various, and in places orthogonal, definitions of training data influence. We then organize state-of-the-art influence analysis methods into a taxonomy; we describe each of these methods in detail and compare their underlying assumptions, asymptotic complexities, and overall strengths and weaknesses. Finally, we propose future research directions to make influence analysis more useful in practice as well as more theoretically and empirically sound. A curated, up-to-date list of resources related to influence analysis is available at https://github.com/ZaydH/influence_analysis_papers

arXiv.org e-Print Archive

Modification of Learning Ratio and Drop-Out for Stochastic Gradient Descendant Algorithm

Author: Botana Martínez de Ibarreta Carlos
Cabezas Olivenza Mireya
Fernández Gámiz Unai
Teso Fernández de Betoño Adrián
Zulueta Guerrero Ekaitz
Publication venue: 'MDPI AG'
Publication date: 10/03/2023
Field of study

The stochastic gradient descendant algorithm is one of the most popular neural network training algorithms. Many authors have contributed to modifying or adapting its shape and parametrizations in order to improve its performance. In this paper, the authors propose two modifications on this algorithm that can result in a better performance without increasing significantly the computational and time resources needed. The first one is a dynamic learning ratio depending on the network layer where it is applied, and the second one is a dynamic drop-out that decreases through the epochs of training. These techniques have been tested against different benchmark function to see their effect on the learning process. The obtained results show that the application of these techniques improves the performance of the learning of the neural network, especially when they are used together.The current study has been sponsored by the Government of the Basque Country-ELKARTEK21/10 KK-2021/00014 (“Estudio de nuevas técnicas de inteligencia artificial basadas en Deep Learning dirigidas a la optimización de procesos industriales”) research program

Archivo Digital para la Docencia y la Investigación

Deep Learning based Recommender System: A Survey and New Perspectives

Author: Sun Aixin
Tay Yi
Yao Lina
Zhang Shuai
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

With the ever-growing volume of online information, recommender systems have been an effective strategy to overcome such information overload. The utility of recommender systems cannot be overstated, given its widespread adoption in many web applications, along with its potential impact to ameliorate many problems related to over-choice. In recent years, deep learning has garnered considerable interest in many research fields such as computer vision and natural language processing, owing not only to stellar performance but also the attractive property of learning feature representations from scratch. The influence of deep learning is also pervasive, recently demonstrating its effectiveness when applied to information retrieval and recommender systems research. Evidently, the field of deep learning in recommender system is flourishing. This article aims to provide a comprehensive review of recent research efforts on deep learning based recommender systems. More concretely, we provide and devise a taxonomy of deep learning based recommendation models, along with providing a comprehensive summary of the state-of-the-art. Finally, we expand on current trends and provide new perspectives pertaining to this new exciting development of the field.Comment: The paper has been accepted by ACM Computing Surveys. https://doi.acm.org/10.1145/328502

arXiv.org e-Print Archive

DR-NTU (Digital Repository of NTU)

Exploring CNNs: an application study on nuclei recognition task in colon cancer histology images

Author: Mattia Carmine
Publication venue: Alma Mater Studiorum - Università di Bologna
Publication date: 16/12/2016
Field of study

In this work we explore the recent advances in the field of Convolutional Neural Network (CNN), with particular interest to the task of image classification. Moreover, we explore a new neural network algorithm, called ladder network, which enables the semi-supervised framework on pre-existing neural networks. These techniques were applied to a task of nuclei classification in routine colon cancer histology images. Specifically, starting from an existing CNN developed for this purpose, we improve its performances utilizing a better data augmentation, a more efficient initialization of the network and adding the batch normalization layer. These improvements were made to achieve a state-of-the-art architecture which could be compatible with the ladder network algorithm. A specific custom version of the ladder network algorithm was implemented in our CNN in order to use the amount of data without a label presented with the used database. However we observed a deterioration of the performances using the unlabeled examples of this database, probably due to a distribution bias in them compared to the labeled ones. Even without using of the semi-supervised framework, the ladder algorithm allows to obtain a better representation in the CNN which leads to a dramatic performance improvement of the starting CNN algorithm. We reach this result only with a little increase in complexity of the final model, working specifically on the training process of the algorithm

AMS Tesi di Laurea

Agree to Disagree: Diversity through Disagreement for Better Transferability

Author: Fleuret François
Jaggi Martin
Karimireddy Sai Praneeth
Pagliardini Matteo
Publication venue
Publication date: 23/11/2022
Field of study

Gradient-based learning algorithms have an implicit simplicity bias which in effect can limit the diversity of predictors being sampled by the learning procedure. This behavior can hinder the transferability of trained models by (i) favoring the learning of simpler but spurious features -- present in the training data but absent from the test data -- and (ii) by only leveraging a small subset of predictive features. Such an effect is especially magnified when the test distribution does not exactly match the train distribution -- referred to as the Out of Distribution (OOD) generalization problem. However, given only the training data, it is not always possible to apriori assess if a given feature is spurious or transferable. Instead, we advocate for learning an ensemble of models which capture a diverse set of predictive features. Towards this, we propose a new algorithm D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data, but disagreement on the OOD data. We show how D-BAT naturally emerges from the notion of generalized discrepancy, as well as demonstrate in multiple experiments how the proposed method can mitigate shortcut-learning, enhance uncertainty and OOD detection, as well as improve transferability.Comment: 23 pages, 17 figure

arXiv.org e-Print Archive

양자화된 깊은 신경망의 특성 분석 및 최적화

Author: 부윤호
Publication venue: 서울대학교 대학원
Publication date: 01/08/2020
Field of study

학위논문 (박사) -- 서울대학교 대학원 : 공과대학 전기·정보공학부, 2020. 8. 성원용.Deep neural networks (DNNs) have achieved impressive performance on various machine learning tasks. However, performance improvements are usually accompanied by increased network complexity incurring vast arithmetic operations and memory accesses. In addition, the recent increase in demand for utilizing DNNs in resource-limited devices leads to a plethora of explorations in model compression and acceleration. Among them, network quantization is one of the most cost-efficient implementation methods for DNNs. Network quantization converts the precision of parameters and signals from 32-bit floating-point to 8, 4, or 2-bit fixed-point precision. The weight quantization can directly compress DNNs by reducing the representation levels of the parameters. Activation outputs can also be quantized to reduce the computational costs and working memory footprint. However, severe quantization degrades the performance of the network. Many previous studies focused on developing optimization methods for the quantization of given models without considering the effects of the quantization on DNNs. Therefore, extreme simulation is required to obtain quantization precision that maintains performance on different models or datasets. In this dissertation, we attempt to measure the per-parameter capacity of DNN models and interpret the results to obtain insights on the optimum quantization of parameters. The uniform random vectors are sampled and used for training generic forms of fully connected DNNs, convolutional neural networks (CNNs), and recurrent neural networks (RNNs). We conduct memorization and classification tests to study the effects of the parameters number and precision on the performance. The model and the per-parameter capacities are assessed by measuring the mutual information between the input and the classified output. To get insight for parameter quantization when performing real tasks, the training and the test performances are compared. In addition, we analyze and demonstrate that quantization noise of weight and activation are disparate in inference. Synthesized data is designed to visualize the effects of weight and activation quantization. The results indicate that deeper models are more prone to activation quantization, while wider models improve the resiliency to both weight and activation quantization. Considering the characteristics of the quantization errors, we propose a holistic approach for the optimization of QDNNs, which contains QDNN training methods as well as quantization-friendly architecture design. Based on the observation that the activation quantization induces noised prediction, we propose the Stochastic Precision Ensemble training for QDNNs (SPEQ). The SPEQ is teacher-student learning, but the teacher and the student share the model parameters. We obtain the teacher's soft labels by changing the bit-precision of the activation stochastically at each layer of the forward-pass computation. The student model is trained with these soft labels to reduce the activation quantization noise. Instead of the KL-divergence, the cosine-distance loss is employed for the KD training. Since the teacher model changes continuously by random bit-precision assignment, it exploits the effect of stochastic ensemble KD. The SPEQ method outperforms various tasks, such as image classification, question-answering, and transfer learning without requiring cumbersome teacher networks.최근 깊은 신경망(deep neural network, DNN)은 다양한 분야에서 매우 인상적인 성능을 보이고 있다. 그러나, 신경망의 복잡도가 함께 증가하면서, 점점 더 많은 계산 및 메모리 접근 비용이 발생하고 있다. 인공신경망의 양자화(quantization)는 깊은 신경망의 동작 비용을 줄일 수 있는 효과적인 방법 중 하나이다. 일반적으로, 신경망의 가중치(weights) 및 활성화된 신호(activation outputs)는 32 비트 부동 소수점(floating-point) 정밀도를 가진다. 고정 소수점 양자화는 이를 더 낮은 정밀도로 표현함으로써 신경망의 크기 및 연산 비용을 줄인다. 그러나, 1또는 2비트 등 매우 낮은 정밀로도 양자화된 신경망은 부동 소수점 신경망과 비교하여 큰 성능 하락을 보인다. 기존의 연구들은 양자화 에러(error)에 대한 분석 없이 주어진 데이터와 모델에 대한 최적화 방법을 제시한다. 이러한 연구 결과를 다른 모델과 데이터에 적용하기 위해서는 수많은 시뮬레이션을 수행하여 성능을 유지할 수 있는 양자화 정밀도의 한계를 찾아야 한다. 본 연구에서는 신경망에서의 양자화 특성을 분석하고, 양자화로 인한 신경망의 성능 저하 원인을 제시한다. 신경망의 양자화는 크게 가중치 양자화(weight quantization)와 활성화 함수 양자화(activation quantization)로 나뉜다. 먼저, 가중치 양자화의 특성을 분석하기 위해 무작위 훈련 샘플을 생성하고, 이 데이터로 신경망을 훈련시키면서 신경망의 암기 능력(memorization capacity)을 정량화 한다. 신경망이 자신의 암기 능력을 최대로 활용하도록 훈련시킨 뒤 성능이 하락하는 양자화 정밀도의 한계를 분석한다. 분석 결과, 가중치가 정보량을 잃기 시작하는 양자화 정밀도는 파라미터의 수와 관계가 없음을 확인하였다. 뿐만 아니라, 파라미터에 저장된 정보를 유지할 수 있는 한계 양자화 정밀도는 모델의 구조에 따라 달라진다. 또한, 본 연구에서는 활성화 함수 양자화와 가중치 양자화로 인한 에러의 차이점을 분석한다. 합성 데이터(synthesized data)를 생성하고, 이 데이터로 훈련된 모델을 양자화 한 뒤 양자화 에러를 시각화 한다. 분석 결과 가중치 양자화는 신경망의 용량(capacity)을 감소시키며, 신경망의 파라미터 수를 증가시키면 가중치 양자화 에러가 감소한다. 반면, 활성화 함수의 양자화는 추론 과정(inference)에서 잡음(noise)을 유발하며 신경망의 깊이가 깊어질 수록 활성화 함수의 에러가 증폭된다. 본 연구에서는, 두 양자화 에러의 차이를 바탕으로 양자화 친화적 아키텍처 설계와 고정 소수점 훈련 방법을 포함하는 포괄적인 고정 소수점 최적화 방법을 제안한다. 뿐만 아니라, 활성화 함수가 양자화된 신경망의 성능 복원력을 높이는 방법으로 SPEQ 훈련 방법을 제안한다. 제안하는 훈련 방법은 지식 증류 (knowledge distillation, KD) 기반 학습 방법으로, 매 훈련 단계 마다 서로 다른 선생 모델의 정보를 활용한다. 선생 모델의 파라미터는 학생 모델과 동일하며, 활성화 함수의 양자화 정밀도를 확률적으로 선택함으로써 선생 모델의 소프트 라벨(soft label)을 생성한다. 따라서 선생 모델은 학생 모델에서 유발되는 양자화 잡음을 고려한 지식을 제공해 준다. 학생 모델은 훈련 단계마다 다른 종류의 양자화 잡음을 고려한 지식으로 훈련되기 때문에 앙상블 학습(ensemble training) 효과를 얻을 수 있다. 제안하는 SPEQ 훈련 방법은 다양한 분야에서 양자화된 신경망의 성능을 크게 향상시켰다.1 Introduction 1 1.1 Quantization of Deep Neural Networks 1 1.1.1 Weight and Activation Quantization on Deep Neural Networks 2 1.1.2 Analysis of Quantized Deep Neural Networks 3 1.2 Scope of the Dissertation 4 1.2.1 Characterization of Quantization Errors 4 1.2.2 Optimization of Quantized Deep Neural Networks 6 2 Memorization Capacity of Deep Neural Networks under Parameter Quantization 8 2.1 Introduction 8 2.2 Related Works and Backgrounds 10 2.2.1 Neural Network Capacity 10 2.2.2 Fixed-Point Deep Neural Networks 11 2.3 Network Capacity Measurements of DNNs 11 2.3.1 Capacity Measurements on a Memorization Task 11 2.3.2 Network Quantization Method 13 2.3.3 Network Quantization and Parameter Capacity 14 2.4 Experimental Results on Capacity of Floating-point DNNs 15 2.4.1 Capacity of FCDNNs 15 2.4.2 Capacity of CNNs 19 2.4.3 Capacity of RNNs 19 2.5 Experimental Results of Parameter Quantization 21 2.5.1 Capacity under Parameter Quantization 21 2.5.2 Quantization Experiments on CIFAR-10 Dataset 23 2.5.3 Quantization Experiments on Shuffled CIFAR-10 Dataset 25 2.6 Concluding Remarks 28 3 Characterization and Holistic Optimization of Quantized Deep Neural Networks 30 3.1 Introduction 30 3.2 Backgrounds 32 3.2.1 Related Works on Network Quantization 32 3.2.2 Revisit of QDNN Optimization 33 3.3 Visualization of Quantization Errors using Synthetic Dataset 34 3.3.1 Synthetic Dataset Generation 34 3.3.2 Results on Synthetic Dataset 37 3.4 QDNN Optimization with Architectural Transformation and Improved Training 39 3.4.1 Architecture Transformation for Improved Robustness to Quantization 40 3.4.2 Cyclical Learning Rate Scheduling for Improved Generalization 41 3.4.3 Regularization for Limiting the Activation Noise Amplification 42 3.5 Experimental Results 42 3.5.1 Visualizing the Effects of Quantization on the Segmentation Task 42 3.5.2 The Width and Depth Effects on QDNNs 44 3.5.3 QDNN Architecture Selection under the Parameter Constraint 49 3.5.4 Results of Training Methods on QDNNs 51 3.6 Concluding Remarks 53 4 Parameter Shared Stochastic Precision Knowledge Distillation for Quantized Deep Neural Networks 55 4.1 Introduction 55 4.2 Background and Related Works 58 4.2.1 Quantization of Deep Neural Networks 58 4.2.2 Knowledge Distillation for Quantization 59 4.3 Stochastic Precision Ensemble Training for QDNNs 60 4.3.1 Quantization Method 60 4.3.2 Stochastic Precision Self-Distillation with Model Sharing 61 4.3.3 Stochastic Ensemble Learning 63 4.3.4 Cosine Similarity Learning 65 4.4 Experimental Results 70 4.4.1 Experiment Setup 70 4.4.2 Results on CIFAR-10 and CIFAR-100 Datasets 70 4.4.3 Results on ImageNet Dataset 73 4.4.4 Results on Transfer Learning 76 4.5 Concluding Remarks 78 5 Conclusion 80 Abstract (In Korean) 97 감사의 글 99Docto

SNU Open Repository and Archive