Search CORE

32 research outputs found

LoPub: High-Dimensional Crowdsourced Data Publication with Local Differential Privacy

Author: McCann Julie A.
Ren Xuebin
Yang Shusen
Yang Xinyu
Yu Chia-mu
Yu Philip S.
Yu Weiren
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 20/08/2017
Field of study

High-dimensional crowdsourced data collected from numerous users produces rich knowledge about our society. However, it also brings unprecedented privacy threats to the participants. Local differential privacy (LDP), a variant of differential privacy, is recently proposed as a state-of-the-art privacy notion. Unfortunately, achieving LDP on high-dimensional crowdsourced data publication raises great challenges in terms of both computational efficiency and data utility. To this end, based on Expectation Maximization (EM) algorithm and Lasso regression, we first propose efficient multi-dimensional joint distribution estimation algorithms with LDP. Then, we develop a Local differentially private high-dimensional data Publication algorithm, LoPub, by taking advantage of our distribution estimation techniques. In particular, correlations among multiple attributes are identified to reduce the dimensionality of crowdsourced data, thus speeding up the distribution learning process and achieving high data utility. Extensive experiments on realworld datasets demonstrate that our multivariate distribution estimation scheme significantly outperforms existing estimation schemes in terms of both communication overhead and estimation speed. Moreover, LoPub can keep, on average, 80% and 60% accuracy over the released datasets in terms of SVM and random forest classification, respectively

arXiv.org e-Print Archive

Aston Publications Explorer

Warwick Research Archives Portal Repository

Spiral - Imperial College Digital Repository

Differentially Private High-Dimensional Data Publication in Internet of Things

Author: Bashir Ali Kashif
Chauhdary Sajjad Hussain
Mumtaz Shahid
Wang Tao
Wen Jinming
Zheng Zhigao
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 25/11/2019
Field of study

Internet of Things and the related computing paradigms, such as cloud computing and fog computing, provide solutions for various applications and services with massive and high-dimensional data, while produces threatens on the personal privacy. Differential privacy is a promising privacy-preserving definition for various applications and is enforced by injecting random noise into each query result such that the adversary with arbitrary background knowledge cannot infer sensitive input from the noisy results. Nevertheless, existing differentially private mechanisms have poor utility and high computation complexity on high-dimensional data because the necessary noise in queries is proportional to the size of the data domain, which is exponential to the dimensionality. To address these issues, we develop a compressed sensing mechanism (CSM) that enforces differential privacy on the basis of the compressed sensing framework while providing accurate results to linear queries. We derive the utility guarantee of CSM theoretically. An extensive experimental evaluation on real-world datasets over multiple fields demonstrates that our proposed mechanism consistently outperforms several state-of-the-art mechanisms under differential privacy

E-space: Manchester Metropolitan University's Research Repository

Privacy Amplification via Shuffling: Unified, Simplified, and Tightened

Author: Wang Shaowei
Publication venue
Publication date: 11/04/2023
Field of study

In decentralized settings, the shuffle model of differential privacy has emerged as a promising alternative to the classical local model. Analyzing privacy amplification via shuffling is a critical component in both single-message and multi-message shuffle protocols. However, current methods used in these two areas are distinct and specific, making them less convenient for protocol designers and practitioners. In this work, we introduce variation-ratio reduction as a unified framework for privacy amplification analyses in the shuffle model. This framework utilizes total variation bounds of local messages and probability ratio bounds of other users' blanket messages, converting them to indistinguishable levels. Our results indicate that the framework yields tighter bounds for both single-message and multi-message encoders (e.g., with local DP, local metric DP, or general multi-message randomizers). Specifically, for a broad range of local randomizers having extremal probability design, our amplification bounds are precisely tight. We also demonstrate that variation-ratio reduction is well-suited for parallel composition in the shuffle model and results in stricter privacy accounting for common sampling-based local randomizers. Our experimental findings show that, compared to existing amplification bounds, our numerical amplification bounds can save up to

30\%

of the budget for single-message protocols,

75\%

of the budget for multi-message protocols, and

75\%

95\%

of the budget for parallel composition. Additionally, our implementation for numerical amplification bounds has only

\tilde{O}(n)

complexity and is highly efficient in practice, taking just

2

minutes for

n=10^8

users. The code for our implementation can be found at \url{https://github.com/wangsw/PrivacyAmplification}.Comment: Code available at https://github.com/wangsw/PrivacyAmplificatio

arXiv.org e-Print Archive

Shallow Representations, Profound Discoveries : A methodological study of game culture in social media

Author: Lu Chien
Publication venue: Tampere University
Publication date: 06/09/2023
Field of study

This thesis explores the potential of representation learning techniques in game studies, highlighting their effectiveness and addressing challenges in data analysis. The primary focus of this thesis is shallow representation learning, which utilizes simpler model architectures but is able to yield effective modeling results. This thesis investigates the following research objectives: disentangling the dependencies of data, modeling temporal dynamics, learning multiple representations, and learning from heterogeneous data. The contributions of this thesis are made from two perspectives: empirical analysis and methodology development, to address these objectives. Chapters 1 and 2 provide a thorough introduction, motivation, and necessary background information for the thesis, framing the research and setting the stage for subsequent publications. Chapters 3 to 5 summarize the contribution of the 6 publications, each of which contributes to demonstrating the effectiveness of representation learning techniques in addressing various analytical challenges. In Chapter 1 and 2, the research objects and questions are also motivated and described. In particular, Introduction to the primary application field game studies is provided and the connections of data analysis and game culture is highlighted. Basic notion of representation learning, and canonical techniques such as probabilistic principal component analysis, topic modeling, and embedding models are described. Analytical challenges and data types are also described to motivate the research of this thesis. Chapter 3 presents two empirical analyses conducted in Publication I and II that present empirical data analysis on player typologies and temporal dynamics of player perceptions. The first empirical analysis takes the advantage of a factor model to offer a flexible player typology analysis. Results and analytical framework are particularly useful for personalized gamification. The Second empirical analysis uses topic modeling to analyze the temporal dynamic of player perceptions of the game No Man’s Sky in relation to game changes. The results reflect a variety of player perceptions including general gaming activities, game mechanic. Moreover, a set of underlying topics that are directly related to game updates and changes are extracted and the temporal dynamics of them have reflected that players responds differently to different updates and changes. Chapter 4 presents two method developments that are related to factor models. The first method, DNBGFA, developed in Publication III, is a matrix factorization model for modeling the temporal dynamics of non-negative matrices from multiple sources. The second mothod, CFTM, developed in Publication IV introduces a factor model to a topic model to handle sophisticated document-level covariates. The develeopd methods in Chapter 4 are also demonstrated for analyzing text data. Chapter 5 summarizes Publication V and Publication VI that develop embedding models. Publication V introduces Bayesian non-parametric to a graph embedding model to learn multiple representations for nodes. Publication VI utilizes a Gaussian copula model to deal with heterogeneous data in representation learning. The develeopd methods in Chapter 5 are also demonstrated for data analysis tasks in the context of online communities. Lastly, Chapter 6 renders discussions and conclusions. Contributions of this thesis are highlighted, limitations, ongoing challenges, and potential future research directions are discussed

Trepo - Institutional Repository of Tampere University

Joint Redundancy Analysis by a Multivariate Linear Predictor

Author: Salvatore Renato
Publication venue: Pearson
Publication date: 01/01/2020
Field of study

IRIS Unicas (Università degli Studi di Cassino e del Lazio Meridionale)

A Statistical Approach to the Alignment of fMRI Data

Author: and Livio Finos
Angela Andreella
James Haxby
Ma Feilong
Yaroslav Halchenko
Publication venue: country:ITA
Publication date: 01/01/2020
Field of study

Multi-subject functional Magnetic Resonance Image studies are critical. The anatomical and functional structure varies across subjects, so the image alignment is necessary. We define a probabilistic model to describe functional alignment. Imposing a prior distribution, as the matrix Fisher Von Mises distribution, of the orthogonal transformation parameter, the anatomical information is embedded in the estimation of the parameters, i.e., penalizing the combination of spatially distant voxels. Real applications show an improvement in the classification and interpretability of the results compared to various functional alignment methods

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

A comparison of the CAR and DAGAR spatial random effects models with an application to diabetics rate estimation in Belgium

Author: Christel Faes
Niel Hens
Pierpaolo Brutti
Vittoria La Serra
Publication venue: place:Milano
Publication date: 01/01/2020
Field of study

When hierarchically modelling an epidemiological phenomenon on a finite collection of sites in space, one must always take a latent spatial effect into account in order to capture the correlation structure that links the phenomenon to the territory. In this work, we compare two autoregressive spatial models that can be used for this purpose: the classical CAR model and the more recent DAGAR model. Differently from the former, the latter has a desirable property: its ρ parameter can be naturally interpreted as the average neighbor pair correlation and, in addition, this parameter can be directly estimated when the effect is modelled using a DAGAR rather than a CAR structure. As an application, we model the diabetics rate in Belgium in 2014 and show the adequacy of these models in predicting the response variable when no covariates are available

Archivio della ricerca- Università di Roma La Sapienza