Search CORE

9 research outputs found

Understanding and Mitigating Copying in Diffusion Models

Author: Geiping Jonas
Goldblum Micah
Goldstein Tom
Singla Vasu
Somepalli Gowthami
Publication venue
Publication date: 31/05/2023
Field of study

Images generated by diffusion models like Stable Diffusion are increasingly widespread. Recent works and even lawsuits have shown that these models are prone to replicating their training data, unbeknownst to the user. In this paper, we first analyze this memorization problem in text-to-image diffusion models. While it is widely believed that duplicated images in the training set are responsible for content replication at inference time, we observe that the text conditioning of the model plays a similarly important role. In fact, we see in our experiments that data replication often does not happen for unconditional models, while it is common in the text-conditional case. Motivated by our findings, we then propose several techniques for reducing data replication at both training and inference time by randomizing and augmenting image captions in the training set.Comment: 17 pages, preprint. Code is available at https://github.com/somepago/DC

arXiv.org e-Print Archive

Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models

Author: Geiping Jonas
Goldblum Micah
Goldstein Tom
Singla Vasu
Somepalli Gowthami
Publication venue
Publication date: 12/12/2022
Field of study

Cutting-edge diffusion models produce images with high quality and customizability, enabling them to be used for commercial art and graphic design purposes. But do diffusion models create unique works of art, or are they replicating content directly from their training sets? In this work, we study image retrieval frameworks that enable us to compare generated images with training samples and detect when content has been replicated. Applying our frameworks to diffusion models trained on multiple datasets including Oxford flowers, Celeb-A, ImageNet, and LAION, we discuss how factors such as training set size impact rates of content replication. We also identify cases where diffusion models, including the popular Stable Diffusion model, blatantly copy from their training data.Comment: Updated draft with the following changes (1) Clarified the LAION Aesthetics versions everywhere (2) Correction on which LAION Aesthetics version SD - 1.4 is finetuned on and updated figure 12 based on this (3) A section on possible causes of replicatio

arXiv.org e-Print Archive

What Can We Learn from Unlearnable Datasets?

Author: Geiping Jonas
Goldblum Micah
Goldstein Tom
Sandoval-Segura Pedro
Singla Vasu
Publication venue
Publication date: 30/05/2023
Field of study

In an era of widespread web scraping, unlearnable dataset methods have the potential to protect data privacy by preventing deep neural networks from generalizing. But in addition to a number of practical limitations that make their use unlikely, we make a number of findings that call into question their ability to safeguard data. First, it is widely believed that neural networks trained on unlearnable datasets only learn shortcuts, simpler rules that are not useful for generalization. In contrast, we find that networks actually can learn useful features that can be reweighed for high test performance, suggesting that image privacy is not preserved. Unlearnable datasets are also believed to induce learning shortcuts through linear separability of added perturbations. We provide a counterexample, demonstrating that linear separability of perturbations is not a necessary condition. To emphasize why linearly separable perturbations should not be relied upon, we propose an orthogonal projection attack which allows learning from unlearnable datasets published in ICML 2021 and ICLR 2023. Our proposed attack is significantly less complex than recently proposed techniques.Comment: 17 pages, 9 figure

arXiv.org e-Print Archive

Autoregressive Perturbations for Data Poisoning

Author: Geiping Jonas
Goldblum Micah
Goldstein Tom
Jacobs David W.
Sandoval-Segura Pedro
Singla Vasu
Publication venue
Publication date: 15/06/2022
Field of study

The prevalence of data scraping from social media as a means to obtain datasets has led to growing concerns regarding unauthorized use of data. Data poisoning attacks have been proposed as a bulwark against scraping, as they make data "unlearnable" by adding small, imperceptible perturbations. Unfortunately, existing methods require knowledge of both the target architecture and the complete dataset so that a surrogate network can be trained, the parameters of which are used to generate the attack. In this work, we introduce autoregressive (AR) poisoning, a method that can generate poisoned data without access to the broader dataset. The proposed AR perturbations are generic, can be applied across different datasets, and can poison different architectures. Compared to existing unlearnable methods, our AR poisons are more resistant against common defenses such as adversarial training and strong data augmentations. Our analysis further provides insight into what makes an effective data poison.Comment: 22 pages, 13 figures. Code available at https://github.com/psandovalsegura/autoregressive-poisonin

arXiv.org e-Print Archive

Real world data on clinical profile, management and outcomes of venous thromboembolism from a tertiary care centre in India

Author: Abhishek Goyal
Bhupinder Singh
Bishav Mohan
Gurbhej Singh
Gurpreet S. Wander
Naved Aslam
Rohit Tandon
Samir Kapoor
Shibba Takkar Chhabra
Sonaal Singla
Suvir Singh
Tanvi Singla
Vasu Bansal
Publication venue: 'Elsevier BV'
Publication date: 01/05/2021
Field of study

Objectives: Venous thromboembolism (VTE) is a major cause of mortality and morbidity worldwide. This study describes a real-world scenario of VTE presenting to a tertiary care hospital in India. Methods: All patients presenting with acute VTE or associated complications from January 2017 to January 2020 were included in the study. Results: A total of 330 patient admissions related to VTE were included over 3 years, of which 303 had an acute episode of VTE. The median age was 50 years (IQR 38–64); 30% of patients were younger than 40 years of age. Only 24% of patients had provoked VTE with recent surgery (56%) and malignancy (16%) being the commonest risk factors. VTE manifested as isolated DVT (56%), isolated pulmonary embolism (PE; 19.1%), combined DVT/PE (22.4%), and upper limb DVT (2.3%). Patients with PE (n = 126) were classified as low-risk (15%), intermediate-risk (55%) and high-risk (29%). Reperfusion therapy was performed for 15.7% of patients with intermediate-risk and 75.6% with high-risk PE. In-hospital mortality for the entire cohort was 8.9%; 35% for high-risk PE and 11% for intermediate-risk PE. On multivariate analysis, the presence of active malignancy (OR = 5.8; 95% CI: 1.1–30.8, p = 0.038) and high-risk PE (OR = 4.8; 95% CI: 1.6–14.9, p = 0.006) were found to be independent predictors of mortality. Conclusion: Our data provides real-world perspectives on the demographic sand management of patients presenting with acute VTE in a referral hospital setting. We observed relatively high mortality for intermediate-risk PE, necessitating better subclassification of this group to identify candidates for more aggressive approaches

Directory of Open Access Journals

Centrobin-mediated Regulation of the Centrosomal Protein 4.1-associated Protein (CPAP) Level Limits Centriole Length during Elongation Stage

Author: Al-Dosari
Azimzadeh
Balestra
Bettencourt-Dias
Bettencourt-Dias
Blachon
Bornens
Chen
Chenthamarakshan Vasu
Chrétien
Comartin
Courtney J. Haycraft
Doxsey
Doxsey
Ganem
Godinho
Gudi
Gudi
Gul
Habedanck
Hinchcliffe
Hinchcliffe
Hinchcliffe
Hung
Iwai
Januschke
Jeffery
Jeong
Kim
Kitagawa
Kleylein-Sohn
Kochanski
Kohlmaier
Komander
Korzeniewski
Lee
Leidel
Li
Lin
Lin
Lingle
Lüders
McIntyre
Meraldi
Middendorp
Nigg
P. Darwin Bell
Piel
Pihan
Pihan
Radhika Gudi
Robbins
Schmidt
Schmukle
Singla
Stearns
Stinchcombe
Strnad
Stucke
Tang
Tang
Thornton
Tsou
Vorobjev
Vulprecht
Wang
Wu
Zihai Li
Zou
Publication venue: 'American Society for Biochemistry & Molecular Biology (ASBMB)'
Publication date
Field of study

Crossref