100 research outputs found

    DP-Mix: Mixup-based Data Augmentation for Differentially Private Learning

    Full text link
    Data augmentation techniques, such as simple image transformations and combinations, are highly effective at improving the generalization of computer vision models, especially when training data is limited. However, such techniques are fundamentally incompatible with differentially private learning approaches, due to the latter's built-in assumption that each training image's contribution to the learned model is bounded. In this paper, we investigate why naive applications of multi-sample data augmentation techniques, such as mixup, fail to achieve good performance and propose two novel data augmentation techniques specifically designed for the constraints of differentially private learning. Our first technique, DP-Mix_Self, achieves SoTA classification performance across a range of datasets and settings by performing mixup on self-augmented data. Our second technique, DP-Mix_Diff, further improves performance by incorporating synthetic data from a pre-trained diffusion model into the mixup process. We open-source the code at https://github.com/wenxuan-Bao/DP-Mix.Comment: 17 pages, 2 figures, to be published in Neural Information Processing Systems 202

    Abnormal Cortical Network Activation in Human Amnesia: A High-resolution Evoked Potential Study

    Get PDF
    Little is known about how human amnesia affects the activation of cortical networks during memory processing. In this study, we recorded high-density evoked potentials in 12 healthy control subjects and 11 amnesic patients with various types of brain damage affecting the medial temporal lobes, diencephalic structures, or both. Subjects performed a continuous recognition task composed of meaningful designs. Using whole-scalp spatiotemporal mapping techniques, we found that, during the first 200ms following picture presentation, map configuration of amnesics and controls were indistinguishable. Beyond this period, processing significantly differed. Between 200 and 350ms, amnesic patients expressed different topographical maps than controls in response to new and repeated pictures. From 350 to 550ms, healthy subjects showed modulation of the same maps in response to new and repeated items. In amnesics, by contrast, presentation of repeated items induced different maps, indicating distinct cortical processing of new and old information. The study indicates that cortical mechanisms underlying memory formation and re-activation in amnesia fundamentally differ from normal memory processin

    SoK: Memorization in General-Purpose Large Language Models

    Full text link
    Large Language Models (LLMs) are advancing at a remarkable pace, with myriad applications under development. Unlike most earlier machine learning models, they are no longer built for one specific application but are designed to excel in a wide range of tasks. A major part of this success is due to their huge training datasets and the unprecedented number of model parameters, which allow them to memorize large amounts of information contained in the training data. This memorization goes beyond mere language, and encompasses information only present in a few documents. This is often desirable since it is necessary for performing tasks such as question answering, and therefore an important part of learning, but also brings a whole array of issues, from privacy and security to copyright and beyond. LLMs can memorize short secrets in the training data, but can also memorize concepts like facts or writing styles that can be expressed in text in many different ways. We propose a taxonomy for memorization in LLMs that covers verbatim text, facts, ideas and algorithms, writing styles, distributional properties, and alignment goals. We describe the implications of each type of memorization - both positive and negative - for model performance, privacy, security and confidentiality, copyright, and auditing, and ways to detect and prevent memorization. We further highlight the challenges that arise from the predominant way of defining memorization with respect to model behavior instead of model weights, due to LLM-specific phenomena such as reasoning capabilities or differences between decoding algorithms. Throughout the paper, we describe potential risks and opportunities arising from memorization in LLMs that we hope will motivate new research directions

    Making Mobile Augmented Reality A Reality

    Get PDF
    Recent advances in mobile device technology have freed augmented reality (AR) applications from the constraints of desktops, laptops and head-mounted displays. But they are met with a lack of guidelines on the design and user interactions of mobile device-based AR systems. The situation for developers is further exacerbated by closed-license environments and inflexible solutions. We provide an overview of the design of AR applications on handheld devices, the necessary building blocks and problems that future AR systems need to overcome. This experience was gathered during the design and development of an AR framework for the Android™ platform. User experience evaluations showed a great demand for overlay collision avoidance and the value of being able to freeze AR screens. These will be valuable for the design of future mobile device based AR applications

    Leakage-Abuse Attacks against Order-Revealing Encryption

    Get PDF
    Order-preserving encryption and its generalization order-revealing encryption (OPE/ORE) are used in a variety of settings in practice in order to allow sorting, performing range queries, and filtering data — all while only having access to ciphertexts. But OPE and ORE ciphertexts necessarily leak information about plaintexts, and what level of security they provide has been unclear. In this work, we introduce new leakage-abuse attacks that show how to recover plaintexts from OPE/ORE-encrypted databases. Underlying our new attacks against practically-used schemes is a framework in which we cast the adversary’s challenge as a non- crossing bipartite matching problem. This allows easy tailoring of attacks to a specific scheme’s leakage profile. In a case study of customer records, we show attacks that recover 99% of first names, 97% of last names, and 90% of birthdates held in a database, despite all values being encrypted with the OPE scheme most widely used in practice. We also show the first attack against the recent frequency- hiding Kerschbaum scheme, to which no prior attacks have been demonstrated. Our attack recovers frequently occurring plaintexts most of the time

    Inferring Social Ties in Academic Networks Using Short-Range Wireless Communications

    Get PDF
    International audienceWiFi base stations are increasingly deployed in both public spaces and private companies, and the increase in their density poses a significant threat to the privacy of connected users. Prior studies have provided evidence that it is possible to infer the social ties of users from their location and co-location traces but they lack one important component: the comparison of the inference accuracy between an internal attacker (e.g., a curious application running on a mobile device) and a realistic external eavesdropper in the same field trial. In this paper, we experimentally show that such an eavesdropper is able to infer the type of social relationships between mobile users better than an internal attacker. Moreover, our results indicate that by exploiting the underlying social community structure of mobile users, the accuracy of the inference attacks doubles. Based on our findings, we propose countermeasures to help users protect their privacy against eavesdroppers
    corecore