7 research outputs found

    Fast Yet Effective Machine Unlearning

    Full text link
    Unlearning the data observed during the training of a machine learning (ML) model is an important task that can play a pivotal role in fortifying the privacy and security of ML-based applications. This paper raises the following questions: (i) can we unlearn a single or multiple classes of data from an ML model without looking at the full training data even once? (ii) can we make the process of unlearning fast and scalable to large datasets, and generalize it to different deep networks? We introduce a novel machine unlearning framework with error-maximizing noise generation and impair-repair based weight manipulation that offers an efficient solution to the above questions. An error-maximizing noise matrix is learned for the class to be unlearned using the original model. The noise matrix is used to manipulate the model weights to unlearn the targeted class of data. We introduce impair and repair steps for a controlled manipulation of the network weights. In the impair step, the noise matrix along with a very high learning rate is used to induce sharp unlearning in the model. Thereafter, the repair step is used to regain the overall performance. With very few update steps, we show excellent unlearning while substantially retaining the overall model accuracy. Unlearning multiple classes requires a similar number of update steps as for the single class, making our approach scalable to large problems. Our method is quite efficient in comparison to the existing methods, works for multi-class unlearning, doesn't put any constraints on the original optimization mechanism or network design, and works well in both small and large-scale vision tasks. This work is an important step towards fast and easy implementation of unlearning in deep networks. We will make the source code publicly available

    TabSynDex: A Universal Metric for Robust Evaluation of Synthetic Tabular Data

    Full text link
    Synthetic tabular data generation becomes crucial when real data is limited, expensive to collect, or simply cannot be used due to privacy concerns. However, producing good quality synthetic data is challenging. Several probabilistic, statistical, and generative adversarial networks (GANs) based approaches have been presented for synthetic tabular data generation. Once generated, evaluating the quality of the synthetic data is quite challenging. Some of the traditional metrics have been used in the literature but there is lack of a common, robust, and single metric. This makes it difficult to properly compare the effectiveness of different synthetic tabular data generation methods. In this paper we propose a new universal metric, TabSynDex, for robust evaluation of synthetic data. TabSynDex assesses the similarity of synthetic data with real data through different component scores which evaluate the characteristics that are desirable for "high quality" synthetic data. Being a single score metric, TabSynDex can also be used to observe and evaluate the training of neural network based approaches. This would help in obtaining insights that was not possible earlier. Further, we present several baseline models for comparative analysis of the proposed evaluation metric with existing generative models

    Unifying Synergies between Self-supervised Learning and Dynamic Computation

    Full text link
    Self-supervised learning (SSL) approaches have made major strides forward by emulating the performance of their supervised counterparts on several computer vision benchmarks. This, however, comes at a cost of substantially larger model sizes, and computationally expensive training strategies, which eventually lead to larger inference times making it impractical for resource constrained industrial settings. Techniques like knowledge distillation (KD), dynamic computation (DC), and pruning are often used to obtain a lightweight sub-network, which usually involves multiple epochs of fine-tuning of a large pre-trained model, making it more computationally challenging. In this work we propose a novel perspective on the interplay between SSL and DC paradigms that can be leveraged to simultaneously learn a dense and gated (sparse/lightweight) sub-network from scratch offering a good accuracy-efficiency trade-off, and therefore yielding a generic and multi-purpose architecture for application specific industrial settings. Our study overall conveys a constructive message: exhaustive experiments on several image classification benchmarks: CIFAR-10, STL-10, CIFAR-100, and ImageNet-100, demonstrates that the proposed training strategy provides a dense and corresponding sparse sub-network that achieves comparable (on-par) performance compared with the vanilla self-supervised setting, but at a significant reduction in computation in terms of FLOPs under a range of target budgets

    Zero-Shot Machine Unlearning

    Full text link
    Modern privacy regulations grant citizens the right to be forgotten by products, services and companies. In case of machine learning (ML) applications, this necessitates deletion of data not only from storage archives but also from ML models. Due to an increasing need for regulatory compliance required for ML applications, machine unlearning is becoming an emerging research problem. The right to be forgotten requests come in the form of removal of a certain set or class of data from the already trained ML model. Practical considerations preclude retraining of the model from scratch minus the deleted data. The few existing studies use either the whole training data, or a subset of training data, or some metadata stored during training to update the model weights for unlearning. However, strict regulatory compliance requires time-bound deletion of data. Thus, in many cases, no data related to the training process or training samples may be accessible even for the unlearning purpose. We therefore ask the question: is it possible to achieve unlearning with zero training samples? In this paper, we introduce the novel problem of zero-shot machine unlearning that caters for the extreme but practical scenario where zero original data samples are available for use. We then propose two novel solutions for zero-shot machine unlearning based on (a) error minimizing-maximizing noise and (b) gated knowledge transfer. These methods remove the information of the forget data from the model while maintaining the model efficacy on the retain data. The zero-shot approach offers good protection against the model inversion attacks and membership inference attacks. We introduce a new evaluation metric, Anamnesis Index (AIN) to effectively measure the quality of the unlearning method. The experiments show promising results for unlearning in deep learning models on benchmark vision data-sets

    Can Bad Teaching Induce Forgetting? Unlearning in Deep Networks Using an Incompetent Teacher

    No full text
    Machine unlearning has become an important area of research due to an increasing need for machine learning (ML) applications to comply with the emerging data privacy regulations. It facilitates the provision for removal of certain set or class of data from an already trained ML model without requiring retraining from scratch. Recently, several efforts have been put in to make unlearning to be effective and efficient. We propose a novel machine unlearning method by exploring the utility of competent and incompetent teachers in a student-teacher framework to induce forgetfulness. The knowledge from the competent and incompetent teachers is selectively transferred to the student to obtain a model that doesn't contain any information about the forget data. We experimentally show that this method generalizes well, is fast and effective. Furthermore, we introduce the zero retrain forgetting (ZRF) metric to evaluate any unlearning method. Unlike the existing unlearning metrics, the ZRF score does not depend on the availability of the expensive retrained model. This makes it useful for analysis of the unlearned model after deployment as well. We present results of experiments conducted for random subset forgetting and class forgetting on various deep networks and across different application domains. Code is at: https://github.com/vikram2000b/bad-teaching- unlearnin

    Frequency of Neurologic Manifestations in COVID-19 A Systematic Review and Meta-analysis

    No full text
    BACKGROUND AND OBJECTIVES: One year after the onset of the coronavirus disease 2019 (COVID-19) pandemic, we aimed to summarize the frequency of neurologic manifestations reported in patients with COVID-19 and to investigate the association of these manifestations with disease severity and mortality. METHODS: We searched PubMed, Medline, Cochrane library, ClinicalTrials.gov, and EMBASE for studies from December 31, 2019, to December 15, 2020, enrolling consecutive patients with COVID-19 presenting with neurologic manifestations. Risk of bias was examined with the Joanna Briggs Institute scale. A random-effects meta-analysis was performed, and pooled prevalence and 95% confidence intervals (CIs) were calculated for neurologic manifestations. Odds ratio (ORs) and 95% CIs were calculated to determine the association of neurologic manifestations with disease severity and mortality. Presence of heterogeneity was assessed with I(2), meta-regression, and subgroup analyses. Statistical analyses were conducted in R version 3.6.2. RESULTS: Of 2,455 citations, 350 studies were included in this review, providing data on 145,721 patients with COVID-19, 89% of whom were hospitalized. Forty-one neurologic manifestations (24 symptoms and 17 diagnoses) were identified. Pooled prevalence of the most common neurologic symptoms included fatigue (32%), myalgia (20%), taste impairment (21%), smell impairment (19%), and headache (13%). A low risk of bias was observed in 85% of studies; studies with higher risk of bias yielded higher prevalence estimates. Stroke was the most common neurologic diagnosis (pooled prevalence 2%). In patients with COVID-19 ≥60 years of age, the pooled prevalence of acute confusion/delirium was 34%, and the presence of any neurologic manifestations in this age group was associated with mortality (OR 1.80, 95% CI 1.11–2.91). DISCUSSION: Up to one-third of patients with COVID-19 analyzed in this review experienced at least 1 neurologic manifestation. One in 50 patients experienced stroke. In those >60 years of age, more than one-third had acute confusion/delirium; the presence of neurologic manifestations in this group was associated with nearly a doubling of mortality. Results must be interpreted with the limitations of observational studies and associated bias in mind. SYSTEMATIC REVIEW REGISTRATION: PROSPERO CRD42020181867
    corecore