685 research outputs found
Backdoor Attacks and Countermeasures in Natural Language Processing Models: A Comprehensive Security Review
Deep Neural Networks (DNNs) have led to unprecedented progress in various
natural language processing (NLP) tasks. Owing to limited data and computation
resources, using third-party data and models has become a new paradigm for
adapting various tasks. However, research shows that it has some potential
security vulnerabilities because attackers can manipulate the training process
and data source. Such a way can set specific triggers, making the model exhibit
expected behaviors that have little inferior influence on the model's
performance for primitive tasks, called backdoor attacks. Hence, it could have
dire consequences, especially considering that the backdoor attack surfaces are
broad.
To get a precise grasp and understanding of this problem, a systematic and
comprehensive review is required to confront various security challenges from
different phases and attack purposes. Additionally, there is a dearth of
analysis and comparison of the various emerging backdoor countermeasures in
this situation. In this paper, we conduct a timely review of backdoor attacks
and countermeasures to sound the red alarm for the NLP security community.
According to the affected stage of the machine learning pipeline, the attack
surfaces are recognized to be wide and then formalized into three
categorizations: attacking pre-trained model with fine-tuning (APMF) or
prompt-tuning (APMP), and attacking final model with training (AFMT), where
AFMT can be subdivided into different attack aims. Thus, attacks under each
categorization are combed. The countermeasures are categorized into two general
classes: sample inspection and model inspection. Overall, the research on the
defense side is far behind the attack side, and there is no single defense that
can prevent all types of backdoor attacks. An attacker can intelligently bypass
existing defenses with a more invisible attack. ......Comment: 24 pages, 4 figure
Constructing dummy query sequences to protect location privacy and query privacy in location-based services
© 2020, Springer Science+Business Media, LLC, part of Springer Nature. Location-based services (LBS) have become an important part of people’s daily life. However, while providing great convenience for mobile users, LBS result in a serious problem on personal privacy, i.e., location privacy and query privacy. However, existing privacy methods for LBS generally take into consideration only location privacy or query privacy, without considering the problem of protecting both of them simultaneously. In this paper, we propose to construct a group of dummy query sequences, to cover up the query locations and query attributes of mobile users and thus protect users’ privacy in LBS. First, we present a client-based framework for user privacy protection in LBS, which requires not only no change to the existing LBS algorithm on the server-side, but also no compromise to the accuracy of a LBS query. Second, based on the framework, we introduce a privacy model to formulate the constraints that ideal dummy query sequences should satisfy: (1) the similarity of feature distribution, which measures the effectiveness of the dummy query sequences to hide a true user query sequence; and (2) the exposure degree of user privacy, which measures the effectiveness of the dummy query sequences to cover up the location privacy and query privacy of a mobile user. Finally, we present an implementation algorithm to well meet the privacy model. Besides, both theoretical analysis and experimental evaluation demonstrate the effectiveness of our proposed approach, which show that the location privacy and attribute privacy behind LBS queries can be effectively protected by the dummy queries generated by our approach
Privacy-Preserving Federated Learning of Remote Sensing Image Classification With Dishonest Majority
The classification of remote sensing images can give valuable data for various practical applications for smart cities, including urban planning, construction, and water resource management. The federated learning (FL) solution is often adopted to resolve the problems of limited resources and the confidentiality of data in remote sensing image classification. Privacy-preserving federated learning (PPFL) is a state-of-art FL scheme tailored for the privacy-constrained situation. It is required to address safeguarding data privacy and optimizing model accuracy effectively. However, existing PPFL methods usually suffer from model poisoning attacks, especially in the case of dishonest-majority scenarios. To address this challenge, in this work, we propose a blockchain-empowered PPFL for remote sensing image classification framework with the poisonous dishonest majority, which is able to defend against encrypted model poisoning attacks without compromising users' privacy. Specifically, we first propose the method of proof of accuracy (PoA) aiming to evaluate the encrypted models in an authentic way. Then, we design the secure aggregation framework using PoA, which can achieve robustness in a majority proportion of adversary settings. The experimental results show that our scheme can reach 92.5%, 90.61%, 87.48%, and 81.84% accuracy when the attacker accounts for 20%, 40%, 60%, and 80%, respectively. This is consistent with the FedAvg accuracy when only benign clients own the corresponding proportion of data. The experiment results demonstrate the proposed scheme's superiority in defending against model poisoning attacks
DeepClean : a robust deep learning technique for autonomous vehicle camera data privacy
Autonomous Vehicles (AVs) are equipped with several sensors which produce various forms of data, such as
geo-location, distance, and camera data. The volume and utility of these data, especially camera data, have
contributed to the advancement of high-performance self-driving applications. However, these vehicles and
their collected data are prone to security and privacy attacks. One of the main attacks against AV-generated
camera data is location inference, in which camera data is used to extract knowledge for tracking the users. A
few research studies have proposed privacy-preserving approaches for analysing AV-generated camera data
using powerful generative models, such as Variational Auto Encoder (VAE) and Generative Adversarial
Network (GAN). However, the related work considers a weak geo-localisation attack model, which leads
to weak privacy protection against stronger attack models. This paper proposes DeepClean, a robust deeplearning
model that combines VAE and a private clustering technique. DeepClean learns distinct labelled
object structures of the image data as clusters and generates a more visual representation of the non-private
object clusters, e.g., roads. It then distorts the private object areas using a private Gaussian Mixture Model
(GMM) to learn distinct cluster structures of the labelled object areas. The synthetic images generated
from our model guarantee privacy and resist a robust location inference attack by less than 4% localisation
accuracy. This result implies that using DeepClean for synthetic data generation makes it less likely for a
subject to be localised by an attacker, even when using a robust geo-localisation attack. The overall image
utility level of the generated synthetic images by DeepClean is comparable to the benchmark studies
Facial Data Minimization: Shallow Model as Your Privacy Filter
Face recognition service has been used in many fields and brings much
convenience to people. However, once the user's facial data is transmitted to a
service provider, the user will lose control of his/her private data. In recent
years, there exist various security and privacy issues due to the leakage of
facial data. Although many privacy-preserving methods have been proposed, they
usually fail when they are not accessible to adversaries' strategies or
auxiliary data. Hence, in this paper, by fully considering two cases of
uploading facial images and facial features, which are very typical in face
recognition service systems, we proposed a data privacy minimization
transformation (PMT) method. This method can process the original facial data
based on the shallow model of authorized services to obtain the obfuscated
data. The obfuscated data can not only maintain satisfactory performance on
authorized models and restrict the performance on other unauthorized models but
also prevent original privacy data from leaking by AI methods and human visual
theft. Additionally, since a service provider may execute preprocessing
operations on the received data, we also propose an enhanced perturbation
method to improve the robustness of PMT. Besides, to authorize one facial image
to multiple service models simultaneously, a multiple restriction mechanism is
proposed to improve the scalability of PMT. Finally, we conduct extensive
experiments and evaluate the effectiveness of the proposed PMT in defending
against face reconstruction, data abuse, and face attribute estimation attacks.
These experimental results demonstrate that PMT performs well in preventing
facial data abuse and privacy leakage while maintaining face recognition
accuracy.Comment: 14 pages, 11 figure
Using Granule to Search Privacy Preserving Voice in Home IoT Systems
The Home IoT Voice System (HIVS) such as Amazon Alexa or Apple Siri can provide voice-based interfaces for people to conduct the search tasks using their voice. However, how to protect privacy is a big challenge. This paper proposes a novel personalized search scheme of encrypting voice with privacy-preserving by the granule computing technique. Firstly, Mel-Frequency Cepstrum Coefficients (MFCC) are used to extract voice features. These features are obfuscated by obfuscation function to protect them from being disclosed the server. Secondly, a series of definitions are presented, including fuzzy granule, fuzzy granule vector, ciphertext granule, operators and metrics. Thirdly, the AES method is used to encrypt voices. A scheme of searchable encrypted voice is designed by creating the fuzzy granule of obfuscation features of voices and the ciphertext granule of the voice. The experiments are conducted on corpus including English, Chinese and Arabic. The results show the feasibility and good performance of the proposed scheme
- …