179 research outputs found

    Counterfactual Memorization in Neural Language Models

    Full text link
    Modern neural language models that are widely used in various NLP tasks risk memorizing sensitive information from their training data. Understanding this memorization is important in real world applications and also from a learning-theoretical perspective. An open question in previous studies of language model memorization is how to filter out "common" memorization. In fact, most memorization criteria strongly correlate with the number of occurrences in the training set, capturing memorized familiar phrases, public knowledge, templated texts, or other repeated data. We formulate a notion of counterfactual memorization which characterizes how a model's predictions change if a particular document is omitted during training. We identify and study counterfactually-memorized training examples in standard text datasets. We estimate the influence of each memorized training example on the validation set and on generated texts, showing how this can provide direct evidence of the source of memorization at test time.Comment: NeurIPS 2023; 42 pages, 33 figure

    Taking forward a 'One Health' approach for turning the tide against the Middle East respiratory syndrome coronavirus and other zoonotic pathogens with epidemic potential

    Get PDF
    The appearance of novel pathogens of humans with epidemic potential and high mortality rates have threatened global health security for centuries. Over the past few decades new zoonotic infectious diseases of humans caused by pathogens arising from animal reservoirs have included West Nile virus, Yellow fever virus, Ebola virus, Nipah virus, Lassa Fever virus, Hanta virus, Dengue fever virus, Rift Valley fever virus, Crimean-Congo haemorrhagic fever virus, severe acute respiratory syndrome coronavirus, highly pathogenic avian influenza viruses, Middle East Respiratory Syndrome Coronavirus, and Zika virus. The recent Ebola Virus Disease epidemic in West Africa and the ongoing Zika Virus outbreak in South America highlight the urgent need for local, regional and international public health systems to be be more coordinated and better prepared. The One Health concept focuses on the relationship and interconnectedness between Humans, Animals and the Environment, and recognizes that the health and wellbeing of humans is intimately connected to the health of animals and their environment (and vice versa). Critical to the establishment of a One Health platform is the creation of a multidisciplinary team with a range of expertise including public health officers, physicians, veterinarians, animal husbandry specialists, agriculturalists, ecologists, vector biologists, viral phylogeneticists, and researchers to co-operate, collaborate to learn more about zoonotic spread between animals, humans and the environment and to monitor, respond to and prevent major outbreaks. We discuss the unique opportunities for Middle Eastern and African stakeholders to take leadership in building equitable and effective partnerships with all stakeholders involved in human and health systems to take forward a 'One Health' approach to control such zoonotic pathogens with epidemic potential

    Preventing Verbatim Memorization in Language Models Gives a False Sense of Privacy

    Full text link
    Studying data memorization in neural language models helps us understand the risks (e.g., to privacy or copyright) associated with models regurgitating training data and aids in the development of countermeasures. Many prior works -- and some recently deployed defenses -- focus on "verbatim memorization", defined as a model generation that exactly matches a substring from the training set. We argue that verbatim memorization definitions are too restrictive and fail to capture more subtle forms of memorization. Specifically, we design and implement an efficient defense that perfectly prevents all verbatim memorization. And yet, we demonstrate that this "perfect" filter does not prevent the leakage of training data. Indeed, it is easily circumvented by plausible and minimally modified "style-transfer" prompts -- and in some cases even the non-modified original prompts -- to extract memorized information. We conclude by discussing potential alternative definitions and why defining memorization is a difficult yet crucial open question for neural language models

    Scalable Extraction of Training Data from (Production) Language Models

    Full text link
    This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT. Existing techniques from the literature suffice to attack unaligned models; in order to attack the aligned ChatGPT, we develop a new divergence attack that causes the model to diverge from its chatbot-style generations and emit training data at a rate 150x higher than when behaving properly. Our methods show practical attacks can recover far more data than previously thought, and reveal that current alignment techniques do not eliminate memorization

    The global dynamics of diabetes and tuberculosis: the impact of migration and policy implications

    Get PDF
    The convergence between tuberculosis (TB) and diabetes mellitus (DM) will represent a major public health challenge in the near future. DM increases the risk of developing TB by two to three times and also increases the risk of TB treatment failure, relapse, and death. The global prevalence of DM is predicted to rise significantly in the next two decades, particularly in some of the low-and middle-income countries with the highest TB burden. Migration may add further complexity to the effort to control the impact on TB of the growing DM pandemic. Migration may increase the risk of DM, although the magnitude of this association varies according to country of origin and ethnic group, due to genetic factors and lifestyle differences. Migrants with TB may have an increased prevalence of DM compared to the native population, and the risk of TB among persons with DM may be higher in migrants than in autochthonous populations. Screening for DM among migrants, screening migrants with DM for active and latent TB, and improving access to DM care, could contribute to mitigate the effects of DM on TB. (C) 2017 The Authors. Published by Elsevier Ltd on behalf of International Society for Infectious Diseases

    Are aligned neural networks adversarially aligned?

    Full text link
    Large language models are now tuned to align with the goals of their creators, namely to be "helpful and harmless." These models should respond helpfully to user questions, but refuse to answer requests that could cause harm. However, adversarial users can construct inputs which circumvent attempts at alignment. In this work, we study to what extent these models remain aligned, even when interacting with an adversarial user who constructs worst-case inputs (adversarial examples). These inputs are designed to cause the model to emit harmful content that would otherwise be prohibited. We show that existing NLP-based optimization attacks are insufficiently powerful to reliably attack aligned text models: even when current NLP-based attacks fail, we can find adversarial inputs with brute force. As a result, the failure of current attacks should not be seen as proof that aligned text models remain aligned under adversarial inputs. However the recent trend in large-scale ML models is multimodal models that allow users to provide images that influence the text that is generated. We show these models can be easily attacked, i.e., induced to perform arbitrary un-aligned behavior through adversarial perturbation of the input image. We conjecture that improved NLP attacks may demonstrate this same level of adversarial control over text-only models

    Rapid spread of Zika virus in the Americas: implications for public health preparedness for mass gatherings at the 2016 Brazil Olympic Games

    Get PDF
    Mass gatherings at major international sporting events put millions of international travelers and local host-country residents at risk of acquiring infectious diseases, including locally endemic infectious diseases. The mosquito-borne Zika virus (ZIKV) has recently aroused global attention due to its rapid spread since its first detection in May 2015 in Brazil to 22 other countries and other territories in the Americas. The ZIKV outbreak in Brazil, has also been associated with a significant rise in the number of babies born with microcephaly and neurological disorders, and has been declared a 'Global Emergency' by the World Health Organization. This explosive spread of ZIKV in Brazil poses challenges for public health preparedness and surveillance for the Olympics and Paralympics which are due to be held in Rio De Janeiro in August, 2016. We review the epidemiology and clinical features of the current ZIKV outbreak in Brazil, highlight knowledge gaps, and review the public health implications of the current ZIKV outbreak in the Americas. We highlight the urgent need for a coordinated collaborative response for prevention and spread of infectious diseases with epidemic potential at mass gatherings events. (C) 2016 The Authors. Published by Elsevier Ltd on behalf of International Society for Infectious Diseases

    Review of mass drug administration for malaria and its operational challenges.

    Get PDF
    Mass drug administration (MDA) was a component of many malaria programs during the eradication era, but later was seldomly deployed due to concerns regarding efficacy and feasibility and fear of accelerating drug resistance. Recently, however, there has been renewed interest in the role of MDA as an elimination tool. Following a 2013 Cochrane Review that focused on the quantitative effects of malaria MDA, we have conducted a systematic, qualitative review of published, unpublished, and gray literature documenting past MDA experiences. We have also consulted with field experts, using their historical experience to provide an informed, contextual perspective on the role of MDA in malaria elimination. Substantial knowledge gaps remain and more research is necessary, particularly on optimal target population size, methods to improve coverage, and primaquine safety. Despite these gaps, MDA has been used successfully to control and eliminate Plasmodium falciparum and P. vivax malaria in the past, and should be considered as part of a comprehensive malaria elimination strategy in specific settings

    Taking forward the Stop TB Partnership and World Health Organization joint theme for World TB Day March 24th 2018 - "Wanted: Leaders for a TB-Free World. You can make history. End TB"

    Get PDF
    World TB Day, March 24th commemorates the day in March 1882 when Professor Robert Koch made the groundbreaking announcement in Berlin of his discovery of Mycobacterium tuberculosis as the cause of Tuberculosis (TB) (Koch, 1882). At the time of his announcement, there was a deadly TB epidemic, rampaging throughout Europe and the Americas, causing the death of one out of every seven people. Since Koch’s announcement, Mycobacterium tuberculosis has defied worldwide efforts by public health systems, researchers, governments and the World Health Organization (WHO) to eradicate it. The data presented in the WHO Global TB Report 2017 (World Health Organization, 2017a) makes very gruesome reading. In 2016 there were an estimated 10.4 million people who developed TB disease worldwide, of which 90% were adults, 35% female and 10% were HIV-co-infected people. An estimated 40% of active TB cases go undiagnosed each year. One hundred and thirty-six years since Koch’s announcement, TB remains a major global public health issue and TB has surpassed HIV/AIDS and malaria as the world’s top cause of death from an infectious disease! On World TB Day, March 24th, 2018, we need to reflect on the current status quo of the continuing devastating global TB epidemic
    • …
    corecore