497 research outputs found

    Mobile Device Background Sensors: Authentication vs Privacy

    Get PDF
    The increasing number of mobile devices in recent years has caused the collection of a large amount of personal information that needs to be protected. To this aim, behavioural biometrics has become very popular. But, what is the discriminative power of mobile behavioural biometrics in real scenarios? With the success of Deep Learning (DL), architectures based on Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs), such as Long Short-Term Memory (LSTM), have shown improvements compared to traditional machine learning methods. However, these DL architectures still have limitations that need to be addressed. In response, new DL architectures like Transformers have emerged. The question is, can these new Transformers outperform previous biometric approaches? To answers to these questions, this thesis focuses on behavioural biometric authentication with data acquired from mobile background sensors (i.e., accelerometers and gyroscopes). In addition, to the best of our knowledge, this is the first thesis that explores and proposes novel behavioural biometric systems based on Transformers, achieving state-of-the-art results in gait, swipe, and keystroke biometrics. The adoption of biometrics requires a balance between security and privacy. Biometric modalities provide a unique and inherently personal approach for authentication. Nevertheless, biometrics also give rise to concerns regarding the invasion of personal privacy. According to the General Data Protection Regulation (GDPR) introduced by the European Union, personal data such as biometric data are sensitive and must be used and protected properly. This thesis analyses the impact of sensitive data in the performance of biometric systems and proposes a novel unsupervised privacy-preserving approach. The research conducted in this thesis makes significant contributions, including: i) a comprehensive review of the privacy vulnerabilities of mobile device sensors, covering metrics for quantifying privacy in relation to sensitive data, along with protection methods for safeguarding sensitive information; ii) an analysis of authentication systems for behavioural biometrics on mobile devices (i.e., gait, swipe, and keystroke), being the first thesis that explores the potential of Transformers for behavioural biometrics, introducing novel architectures that outperform the state of the art; and iii) a novel privacy-preserving approach for mobile biometric gait verification using unsupervised learning techniques, ensuring the protection of sensitive data during the verification process

    Provably Secure Decisions based on Potentially Malicious Information

    Get PDF
    There are various security-critical decisions routinely made, on the basis of information provided by peers: routing messages, user reports, sensor data, navigational information, blockchain updates, etc. Jury theorems were proposed in sociology to make decisions based on information from peers, which assume peers may be mistaken with some probability. We focus on attackers in a system, which manifest as peers that strategically report fake information to manipulate decision making. We define the property of robustness: a lower bound probability of deciding correctly, regardless of what information attackers provide. When peers are independently selected, we propose an optimal, robust decision mechanism called Most Probable Realisation (MPR). When peer collusion affects source selection, we prove that generally it is NP-hard to find an optimal decision scheme. We propose multiple heuristic decision schemes that can achieve optimality for some collusion scenarios

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    FheFL: Fully Homomorphic Encryption Friendly Privacy-Preserving Federated Learning with Byzantine Users

    Full text link
    The federated learning (FL) technique was initially developed to mitigate data privacy issues that can arise in the traditional machine learning paradigm. While FL ensures that a user's data always remain with the user, the gradients of the locally trained models must be communicated with the centralized server to build the global model. This results in privacy leakage, where the server can infer private information of the users' data from the shared gradients. To mitigate this flaw, the next-generation FL architectures proposed encryption and anonymization techniques to protect the model updates from the server. However, this approach creates other challenges, such as a malicious user might sabotage the global model by sharing false gradients. Since the gradients are encrypted, the server is unable to identify and eliminate rogue users which would protect the global model. Therefore, to mitigate both attacks, this paper proposes a novel fully homomorphic encryption (FHE) based scheme suitable for FL. We modify the one-to-one single-key Cheon-Kim-Kim-Song (CKKS)-based FHE scheme into a distributed multi-key additive homomorphic encryption scheme that supports model aggregation in FL. We employ a novel aggregation scheme within the encrypted domain, utilizing users' non-poisoning rates, to effectively address data poisoning attacks while ensuring privacy is preserved by the proposed encryption scheme. Rigorous security, privacy, convergence, and experimental analyses have been provided to show that FheFL is novel, secure, and private, and achieves comparable accuracy at reasonable computational cost

    Statistical disclosure control for numeric microdata via sequential joint probability preserving data shuffling

    Full text link
    Traditional perturbative statistical disclosure control (SDC) approaches such as microaggregation, noise addition, rank swapping, etc, perturb the data in an ``ad-hoc" way in the sense that while they manage to preserve some particular aspects of the data, they end up modifying others. Synthetic data approaches based on the fully conditional specification data synthesis paradigm, on the other hand, aim to generate new datasets that follow the same joint probability distribution as the original data. These synthetic data approaches, however, rely either on parametric statistical models, or non-parametric machine learning models, which need to fit well the original data in order to generate credible and useful synthetic data. Another important drawback is that they tend to perform better when the variables are synthesized in the correct causal order (i.e., in the same order as the true data generating process), which is often unknown in practice. To circumvent these issues, we propose a fully non-parametric and model free perturbative SDC approach that approximates the joint distribution of the original data via sequential applications of restricted permutations to the numerical microdata (where the restricted permutations are guided by the joint distribution of a discretized version of the data). Empirical comparisons against popular SDC approaches, using both real and simulated datasets, suggest that the proposed approach is competitive in terms of the trade-off between confidentiality and data utility.Comment: 25 page, 12 figure

    Personalising lung cancer screening with machine learning

    Get PDF
    Personalised screening is based on a straightforward concept: repeated risk assessment linked to tailored management. However, delivering such programmes at scale is complex. In this work, I aimed to contribute to two areas: the simplification of risk assessment to facilitate the implementation of personalised screening for lung cancer; and, the use of synthetic data to support privacy-preserving analytics in the absence of access to patient records. I first present parsimonious machine learning models for lung cancer screening, demonstrating an approach that couples the performance of model-based risk prediction with the simplicity of risk-factor-based criteria. I trained models to predict the five-year risk of developing or dying from lung cancer using UK Biobank and US National Lung Screening Trial participants before external validation amongst temporally and geographically distinct ever-smokers in the US Prostate, Lung, Colorectal and Ovarian Screening trial. I found that three predictors – age, smoking duration, and pack-years – within an ensemble machine learning framework achieved or exceeded parity in discrimination, calibration, and net benefit with comparators. Furthermore, I show that these models are more sensitive than risk-factor-based criteria, such as those currently recommended by the US Preventive Services Taskforce. For the implementation of more personalised healthcare, researchers and developers require ready access to high-quality datasets. As such data are sensitive, their use is subject to tight control, whilst the majority of data present in electronic records are not available for research use. Synthetic data are algorithmically generated but can maintain the statistical relationships present within an original dataset. In this work, I used explicitly privacy-preserving generators to create synthetic versions of the UK Biobank before we performed exploratory data analysis and prognostic model development. Comparing results when using the synthetic against the real datasets, we show the potential for synthetic data in facilitating prognostic modelling

    Perceptions and Practicalities for Private Machine Learning

    Get PDF
    data they and their partners hold while maintaining data subjects' privacy. In this thesis I show that private computation, such as private machine learning, can increase end-users' acceptance of data sharing practices, but not unconditionally. There are many factors that influence end-users' privacy perceptions in this space; including the number of organizations involved and the reciprocity of any data sharing practices. End-users emphasized the importance of detailing the purpose of a computation and clarifying that inputs to private computation are not shared across organizations. End-users also struggled with the notion of protections not being guaranteed 100\%, such as in statistical based schemes, thus demonstrating a need for a thorough understanding of the risk form attacks in such applications. When training a machine learning model on private data, it is critical to understand the conditions under which that data can be protected; and when it cannot. For instance, membership inference attacks aim to violate privacy protections by determining whether specific data was used to train a particular machine learning model. Further, the successful transition of private machine learning theoretical research to practical use must account for gaps in achieving these properties that arise due to the realities of concrete implementations, threat models, and use cases; which is not currently the case

    PERSONALIZED POINT OF INTEREST RECOMMENDATIONS WITH PRIVACY-PRESERVING TECHNIQUES

    Get PDF
    Location-based services (LBS) have become increasingly popular, with millions of people using mobile devices to access information about nearby points of interest (POIs). Personalized POI recommender systems have been developed to assist users in discovering and navigating these POIs. However, these systems typically require large amounts of user data, including location history and preferences, to provide personalized recommendations. The collection and use of such data can pose significant privacy concerns. This dissertation proposes a privacy-preserving approach to POI recommendations that address these privacy concerns. The proposed approach uses clustering, tabular generative adversarial networks, and differential privacy to generate synthetic user data, allowing for personalized recommendations without revealing individual user data. Specifically, the approach clusters users based on their fuzzy locations, generates synthetic user data using a tabular generative adversarial network and perturbs user data with differential privacy before it is used for recommendation. The proposed approaches achieve well-balanced trade-offs between accuracy and privacy preservation and can be applied to different recommender systems. The approach is evaluated through extensive experiments on real-world POI datasets, demonstrating that it is effective in providing personalized recommendations while preserving user privacy. The results show that the proposed approach achieves comparable accuracy to traditional POI recommender systems that do not consider privacy while providing significant privacy guarantees for users. The research\u27s contribution is twofold: it compares different methods for synthesizing user data specifically for POI recommender systems and offers a general privacy-preserving framework for different recommender systems. The proposed approach provides a novel solution to the privacy concerns of POI recommender systems, contributes to the development of more trustworthy and user-friendly LBS applications, and can enhance the trust of users in these systems

    Protecting Micro-Data Privacy: The Moment-Based Density Estimation Method and its Application

    Get PDF
    Privacy concerns pertaining to the release of confidential micro-level information are increasingly relevant to organisations and institutions. Controlling the dissemination of disclosure-prone micro-data by means of suppression, aggregation and perturbation techniques often entails different levels of effectiveness and drawbacks depending on the context and properties of the data. In this dissertation, we briefly review existing disclosure control methods for microdata and undertake a study demonstrating the applicability of micro-data methods to proportion data. This is achieved by using the sample size efficiency related to a simple hypothesis test for a fixed significance level and power, as a measure of statistical utility. We compare a query-based differential privacy mechanism to the multiplicative noise method for disclosure control and demonstrate that with the correct specification of noise parameters, the multiplicative noise method, which is a micro-data based method, achieves similar disclosure protection properties with reduced statistical efficiency costs
    • …
    corecore