7 research outputs found
Fast Yet Effective Machine Unlearning
Unlearning the data observed during the training of a machine learning (ML)
model is an important task that can play a pivotal role in fortifying the
privacy and security of ML-based applications. This paper raises the following
questions: (i) can we unlearn a single or multiple classes of data from an ML
model without looking at the full training data even once? (ii) can we make the
process of unlearning fast and scalable to large datasets, and generalize it to
different deep networks? We introduce a novel machine unlearning framework with
error-maximizing noise generation and impair-repair based weight manipulation
that offers an efficient solution to the above questions. An error-maximizing
noise matrix is learned for the class to be unlearned using the original model.
The noise matrix is used to manipulate the model weights to unlearn the
targeted class of data. We introduce impair and repair steps for a controlled
manipulation of the network weights. In the impair step, the noise matrix along
with a very high learning rate is used to induce sharp unlearning in the model.
Thereafter, the repair step is used to regain the overall performance. With
very few update steps, we show excellent unlearning while substantially
retaining the overall model accuracy. Unlearning multiple classes requires a
similar number of update steps as for the single class, making our approach
scalable to large problems. Our method is quite efficient in comparison to the
existing methods, works for multi-class unlearning, doesn't put any constraints
on the original optimization mechanism or network design, and works well in
both small and large-scale vision tasks. This work is an important step towards
fast and easy implementation of unlearning in deep networks. We will make the
source code publicly available
TabSynDex: A Universal Metric for Robust Evaluation of Synthetic Tabular Data
Synthetic tabular data generation becomes crucial when real data is limited,
expensive to collect, or simply cannot be used due to privacy concerns.
However, producing good quality synthetic data is challenging. Several
probabilistic, statistical, and generative adversarial networks (GANs) based
approaches have been presented for synthetic tabular data generation. Once
generated, evaluating the quality of the synthetic data is quite challenging.
Some of the traditional metrics have been used in the literature but there is
lack of a common, robust, and single metric. This makes it difficult to
properly compare the effectiveness of different synthetic tabular data
generation methods. In this paper we propose a new universal metric, TabSynDex,
for robust evaluation of synthetic data. TabSynDex assesses the similarity of
synthetic data with real data through different component scores which evaluate
the characteristics that are desirable for "high quality" synthetic data. Being
a single score metric, TabSynDex can also be used to observe and evaluate the
training of neural network based approaches. This would help in obtaining
insights that was not possible earlier. Further, we present several baseline
models for comparative analysis of the proposed evaluation metric with existing
generative models
Unifying Synergies between Self-supervised Learning and Dynamic Computation
Self-supervised learning (SSL) approaches have made major strides forward by
emulating the performance of their supervised counterparts on several computer
vision benchmarks. This, however, comes at a cost of substantially larger model
sizes, and computationally expensive training strategies, which eventually lead
to larger inference times making it impractical for resource constrained
industrial settings. Techniques like knowledge distillation (KD), dynamic
computation (DC), and pruning are often used to obtain a lightweight
sub-network, which usually involves multiple epochs of fine-tuning of a large
pre-trained model, making it more computationally challenging.
In this work we propose a novel perspective on the interplay between SSL and
DC paradigms that can be leveraged to simultaneously learn a dense and gated
(sparse/lightweight) sub-network from scratch offering a good
accuracy-efficiency trade-off, and therefore yielding a generic and
multi-purpose architecture for application specific industrial settings. Our
study overall conveys a constructive message: exhaustive experiments on several
image classification benchmarks: CIFAR-10, STL-10, CIFAR-100, and ImageNet-100,
demonstrates that the proposed training strategy provides a dense and
corresponding sparse sub-network that achieves comparable (on-par) performance
compared with the vanilla self-supervised setting, but at a significant
reduction in computation in terms of FLOPs under a range of target budgets
Zero-Shot Machine Unlearning
Modern privacy regulations grant citizens the right to be forgotten by
products, services and companies. In case of machine learning (ML)
applications, this necessitates deletion of data not only from storage archives
but also from ML models. Due to an increasing need for regulatory compliance
required for ML applications, machine unlearning is becoming an emerging
research problem. The right to be forgotten requests come in the form of
removal of a certain set or class of data from the already trained ML model.
Practical considerations preclude retraining of the model from scratch minus
the deleted data. The few existing studies use either the whole training data,
or a subset of training data, or some metadata stored during training to update
the model weights for unlearning. However, strict regulatory compliance
requires time-bound deletion of data. Thus, in many cases, no data related to
the training process or training samples may be accessible even for the
unlearning purpose. We therefore ask the question: is it possible to achieve
unlearning with zero training samples? In this paper, we introduce the novel
problem of zero-shot machine unlearning that caters for the extreme but
practical scenario where zero original data samples are available for use. We
then propose two novel solutions for zero-shot machine unlearning based on (a)
error minimizing-maximizing noise and (b) gated knowledge transfer. These
methods remove the information of the forget data from the model while
maintaining the model efficacy on the retain data. The zero-shot approach
offers good protection against the model inversion attacks and membership
inference attacks. We introduce a new evaluation metric, Anamnesis Index (AIN)
to effectively measure the quality of the unlearning method. The experiments
show promising results for unlearning in deep learning models on benchmark
vision data-sets
Can Bad Teaching Induce Forgetting? Unlearning in Deep Networks Using an Incompetent Teacher
Machine unlearning has become an important area of research due to an increasing need for machine learning (ML) applications to comply with the emerging data privacy regulations. It facilitates the provision for removal of certain set or class of data from an already trained ML model without requiring retraining from scratch. Recently, several efforts have been put in to make unlearning to be effective and efficient. We propose a novel machine unlearning method by exploring the utility of competent and incompetent teachers in a student-teacher framework to induce forgetfulness. The knowledge from the competent and incompetent teachers is selectively transferred to the student to obtain a model that doesn't contain any information about the forget data. We experimentally show that this method generalizes well, is fast and effective. Furthermore, we introduce the zero retrain forgetting (ZRF) metric to evaluate any unlearning method. Unlike the existing unlearning metrics, the ZRF score does not depend on the availability of the expensive retrained model. This makes it useful for analysis of the unlearned model after deployment as well. We present results of experiments conducted for random subset forgetting and class forgetting on various deep networks and across different application domains. Code is at: https://github.com/vikram2000b/bad-teaching- unlearnin
Frequency of Neurologic Manifestations in COVID-19 A Systematic Review and Meta-analysis
BACKGROUND AND OBJECTIVES: One year after the onset of the coronavirus disease 2019 (COVID-19) pandemic, we aimed to summarize the frequency of neurologic manifestations reported in patients with COVID-19 and to investigate the association of these manifestations with disease severity and mortality. METHODS: We searched PubMed, Medline, Cochrane library, ClinicalTrials.gov, and EMBASE for studies from December 31, 2019, to December 15, 2020, enrolling consecutive patients with COVID-19 presenting with neurologic manifestations. Risk of bias was examined with the Joanna Briggs Institute scale. A random-effects meta-analysis was performed, and pooled prevalence and 95% confidence intervals (CIs) were calculated for neurologic manifestations. Odds ratio (ORs) and 95% CIs were calculated to determine the association of neurologic manifestations with disease severity and mortality. Presence of heterogeneity was assessed with I(2), meta-regression, and subgroup analyses. Statistical analyses were conducted in R version 3.6.2. RESULTS: Of 2,455 citations, 350 studies were included in this review, providing data on 145,721 patients with COVID-19, 89% of whom were hospitalized. Forty-one neurologic manifestations (24 symptoms and 17 diagnoses) were identified. Pooled prevalence of the most common neurologic symptoms included fatigue (32%), myalgia (20%), taste impairment (21%), smell impairment (19%), and headache (13%). A low risk of bias was observed in 85% of studies; studies with higher risk of bias yielded higher prevalence estimates. Stroke was the most common neurologic diagnosis (pooled prevalence 2%). In patients with COVID-19 ≥60 years of age, the pooled prevalence of acute confusion/delirium was 34%, and the presence of any neurologic manifestations in this age group was associated with mortality (OR 1.80, 95% CI 1.11–2.91). DISCUSSION: Up to one-third of patients with COVID-19 analyzed in this review experienced at least 1 neurologic manifestation. One in 50 patients experienced stroke. In those >60 years of age, more than one-third had acute confusion/delirium; the presence of neurologic manifestations in this group was associated with nearly a doubling of mortality. Results must be interpreted with the limitations of observational studies and associated bias in mind. SYSTEMATIC REVIEW REGISTRATION: PROSPERO CRD42020181867