Search CORE

9 research outputs found

Improved Knowledge Distillation via Teacher Assistant

Author: Farajtabar Mehrdad
Ghasemzadeh Hassan
Levine Nir
Li Ang
Matsukawa Akihiro
Mirzadeh Seyed-Iman
Publication venue
Publication date: 16/12/2019
Field of study

Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too large to be deployed on edge devices like smartphones or embedded sensor nodes. There have been efforts to compress these networks, and a popular method is knowledge distillation, where a large (teacher) pre-trained network is used to train a smaller (student) network. However, in this paper, we show that the student network performance degrades when the gap between student and teacher is large. Given a fixed student network, one cannot employ an arbitrarily large teacher, or in other words, a teacher can effectively transfer its knowledge to students up to a certain size, not smaller. To alleviate this shortcoming, we introduce multi-step knowledge distillation, which employs an intermediate-sized network (teacher assistant) to bridge the gap between the student and the teacher. Moreover, we study the effect of teacher assistant size and extend the framework to multi-step distillation. Theoretical analysis and extensive experiments on CIFAR-10,100 and ImageNet datasets and on CNN and ResNet architectures substantiate the effectiveness of our proposed approach.Comment: AAAI 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Wide Neural Networks Forget Less Catastrophically

Author: Chaudhry Arslan
Farajtabar Mehrdad
Gorur Dilan
Hu Huiyi
Mirzadeh Seyed Iman
Pascanu Razvan
Yin Dong
Publication venue
Publication date: 14/07/2022
Field of study

A primary focus area in continual learning research is alleviating the "catastrophic forgetting" problem in neural networks by designing new algorithms that are more robust to the distribution shifts. While the recent progress in continual learning literature is encouraging, our understanding of what properties of neural networks contribute to catastrophic forgetting is still limited. To address this, instead of focusing on continual learning algorithms, in this work, we focus on the model itself and study the impact of "width" of the neural network architecture on catastrophic forgetting, and show that width has a surprisingly significant effect on forgetting. To explain this effect, we study the learning dynamics of the network from various perspectives such as gradient orthogonality, sparsity, and lazy training regime. We provide potential explanations that are consistent with the empirical results across different architectures and continual learning benchmarks.Comment: ICML 202

arXiv.org e-Print Archive

Use of machine learning to predict medication adherence in individuals at risk for atherosclerotic cardiovascular disease.

Author: Mirzadeh Seyed Iman,
Publication venue
Publication date: 06/07/2023
Field of study

Ezid

Recommended from our members

Alleviating Catastrophic Forgetting in Continual Learning

Author: Mirzadeh Seyed Iman
Publication venue: Washington State University
Publication date: 01/01/2022
Field of study

Machine learning has enjoyed rapid and substantial advances in the past few years. However, machine learning models cannot learn continually as we humans do. Humans are continual learners, meaning they can accumulate knowledge, use the previous knowledge to learn from new experiences better, and retain knowledge from previous experiences. In contrast, current machine learning models learn in an isolated manner meaning there is no notion of time (e.g., past or present) in their closed-world learning. The goal of continual learning is to mimic the learning mechanism of humans for machines with significant impacts on the machine learning community. However, this is a challenging problem since current machine learning systems suffer from the catastrophic forgetting problem, meaning they cannot preserve their learned knowledge. Catastrophic forgetting happens mainly because the model is trained sequentially over evolving data distributions. Consequently, the representations the model has learned for previous data will change to adapt to the new data, and the new representations are no longer adequate for the past data. While the recent progress in continual learning is encouraging, our understanding of the catastrophic forgetting problem is still limited. This dissertation aims to understand the continual learning problem better and fill this knowledge gap by studying the theoretical and practical implications of the catastrophic forgetting problem for deep learning models. We will study the catastrophic forgetting problem from various perspectives and show that the optimization, training regime, loss landscape, and architectures of neural networks all play a significant role in alleviating the forgetting. We then use the gained insights to develop continual agents that are more robust to catastrophic forgetting

Washington State University institutional repository

Recommended from our members

Improved knowledge distillation for deep neural networks

Author: Mirzadeh Seyed Iman
Publication venue: Washington State University
Publication date: 01/12/2020
Field of study

Despite the fact that deep neural networks are powerful models and achieve appealing results on many tasks, they are too large to be deployed on edge devices like smartphones or embedded sensor nodes. There have been efforts to compress these networks, and a popular method is knowledge distillation, where a large (teacher) pre-trained network is used to train a smaller (student) network. However, in this thesis, we show that the student network performance degrades when the gap between student and teacher is large. Given a fixed student network, one cannot employ an arbitrarily large teacher, or in other words, a teacher can effectively transfer its knowledge to students up to a certain size, not smaller. To alleviate this shortcoming, we introduce multi-step knowledge distillation, which employs an intermediate-sized network (teacher assistant) to bridge the gap between the student and the teacher. Moreover, we study the effect of teacher assistant size and extend the framework to multi-step distillation. Theoretical analysis and extensive experiments on CIFAR10,100 and ImageNet datasets and on CNN and ResNet architectures substantiate the effectiveness of our proposed approach

Washington State University institutional repository

Continual Learning Beyond a Single Model

Author: Doan Thang
Farajtabar Mehrdad
Mirzadeh Seyed Iman
Publication venue
Publication date: 27/11/2022
Field of study

A growing body of research in continual learning focuses on the catastrophic forgetting problem. While many attempts have been made to alleviate this problem, the majority of the methods assume a single model in the continual learning setup. In this work, we question this assumption and show that employing ensemble models can be a simple yet effective method to improve continual performance. However, ensembles' training and inference costs can increase significantly as the number of models grows. Motivated by this limitation, we study different ensemble models to understand their benefits and drawbacks in continual learning scenarios. Finally, to overcome the high compute cost of ensembles, we leverage recent advances in neural network subspace to propose a computationally cheap algorithm with similar runtime to a single model yet enjoying the performance benefits of ensembles.Comment: Keywords: continual learning, neural network subspaces, efficient trainin

arXiv.org e-Print Archive

Recommended from our members

Use of machine learning to predict medication adherence in individuals at risk for atherosclerotic cardiovascular disease

Author: Ardo Jessica
Arefeen Asiful
Cook Diane
Evangelista Lorraine S
Fallahzadeh Ramin
Ghasemzadeh Hassan
Hildebrand Janett A
Lee Jung-Ah
Minor Bryan
Mirzadeh Seyed Iman
Publication venue: eScholarship, University of California
Publication date: 01/12/2022
Field of study

BackgroundMedication nonadherence is a critical problem with severe implications in individuals at risk for atherosclerotic cardiovascular disease. Many studies have attempted to predict medication adherence in this population, but few, if any, have been effective in prediction, sug-gesting that essential risk factors remain unidentified.ObjectiveThis study's objective was to (1) establish an accurate prediction model of medi-cation adherence in individuals at risk for atherosclerotic cardiovascular disease and (2) identify significant contributing factors to the predictive accuracy of medication adherence. In particular, we aimed to use only the baseline questionnaire data to assess medication adherence prediction feasibility.MethodsA sample of 40 individuals at risk for atherosclerotic cardiovascular disease was recruited for an eight-week feasibility study. After collecting baseline data, we recorded data from a pillbox that sent events to a cloud-based server. Health measures and medication use events were analyzed using machine learning algorithms to identify variables that best predict medication adherence.ResultsOur adherence prediction model, based on only the ten most relevant variables, achieved an average error rate of 12.9%. Medication adherence was closely correlated with being encouraged to play an active role in their treatment, having confidence about what to do in an emergency, knowledge about their medications, and having a special person in their life.ConclusionsOur results showed the significance of clinical and psychosocial factors for predicting medication adherence in people at risk for atherosclerotic cardiovascular diseases. Clini-cians and researchers can use these factors to stratify individuals to make evidence-based decisions to reduce the risks

eScholarship - University of California