Search CORE

3,766 research outputs found

Differential Privacy in Privacy-Preserving Big Data and Learning: Challenge and Opportunity

Author: Gao Yifeng
Garza Perez Luis
Jiang Honglu
Robin Mahmudul
Sarwar S. M.
Publication venue: ScholarWorks @ UTRGV
Publication date: 02/12/2021
Field of study

Differential privacy (DP) has become the de facto standard of privacy preservation due to its strong protection and sound mathematical foundation, which is widely adopted in different applications such as big data analysis, graph data process, machine learning, deep learning, and federated learning. Although DP has become an active and influential area, it is not the best remedy for all privacy problems in different scenarios. Moreover, there are also some misunderstanding, misuse, and great challenges of DP in specific applications. In this paper, we point out a series of limits and open challenges of corresponding research areas. Besides, we offer potentially new insights and avenues on combining differential privacy with other effective dimension reduction techniques and secure multiparty computing to clearly define various privacy models

arXiv.org e-Print Archive

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

Differential Privacy in Privacy-Preserving Big Data and Learning: Challenge and Opportunity

Author: Gao Yifeng
Garza Perez Luis
Jiang Honglu
Robin Mahmudul
Sarwar S. M.
Publication venue: ScholarWorks @ UTRGV
Publication date: 10/02/2022
Field of study

Scholarworks@UTRGV Univ. of Texas RioGrande Valley

Extracting Novel Facts from Tables for Knowledge Graph Completion (Extended version)

Author: Boncz P.A. (Peter)
Kruit B.B. (Benno)
Urbani J. (Jacopo)
Publication venue
Publication date: 15/07/2019
Field of study

We propose a new end-to-end method for extending a Knowledge Graph (KG) from tables. Existing techniques tend to interpret tables by focusing on information that is already in the KG, and therefore tend to extract many redundant facts. Our method aims to find more novel facts. We introduce a new technique for table interpretation based on a scalable graphical model using entity similarities. Our method further disambiguates cell values using KG embeddings as additional ranking method. Other distinctive features are the lack of assumptions about the underlying KG and the enabling of a fine-grained tuning of the precision/recall trade-off of extracted facts. Our experiments show that our approach has a higher recall during the interpretation process than the state-of-the-art, and is more resistant against the bias observed in extracting mostly redundant facts since it produces more novel extractions

arXiv.org e-Print Archive

CWI's Institutional Repository

Noise Infusion as a Confidentiality Protection Measure for Graph-Based Statistics

Author: Abowd John M.
McKinney Kevin L.
Publication venue: DigitalCommons@ILR
Publication date: 28/07/2015
Field of study

We use the bipartite graph representation of longitudinally linked employer-employee data, and the associated projections onto the employer and employee nodes, respectively, to characterize the set of potential statistical summaries that the trusted custodian might produce. We consider noise infusion as the primary confidentiality protection method. We show that a relatively straightforward extension of the dynamic noise-infusion method used in the U.S. Census Bureau’s Quarterly Workforce Indicators can be adapted to provide the same confidentiality guarantees for the graph-based statistics: all inputs have been modified by a minimum percentage deviation (i.e., no actual respondent data are used) and, as the number of entities contributing to a particular statistic increases, the accuracy of that statistic approaches the unprotected value. Our method also ensures that the protected statistics will be identical in all releases based on the same inputs

CiteSeerX

DigitalCommons@ILR

eCommons@Cornell

Beyond Accuracy: Measuring Representation Capacity of Embeddings to Preserve Structural and Contextual Information

Author: Ali Sarwan
Publication venue
Publication date: 20/09/2023
Field of study

Effective representation of data is crucial in various machine learning tasks, as it captures the underlying structure and context of the data. Embeddings have emerged as a powerful technique for data representation, but evaluating their quality and capacity to preserve structural and contextual information remains a challenge. In this paper, we address this need by proposing a method to measure the \textit{representation capacity} of embeddings. The motivation behind this work stems from the importance of understanding the strengths and limitations of embeddings, enabling researchers and practitioners to make informed decisions in selecting appropriate embedding models for their specific applications. By combining extrinsic evaluation methods, such as classification and clustering, with t-SNE-based neighborhood analysis, such as neighborhood agreement and trustworthiness, we provide a comprehensive assessment of the representation capacity. Additionally, the use of optimization techniques (bayesian optimization) for weight optimization (for classification, clustering, neighborhood agreement, and trustworthiness) ensures an objective and data-driven approach in selecting the optimal combination of metrics. The proposed method not only contributes to advancing the field of embedding evaluation but also empowers researchers and practitioners with a quantitative measure to assess the effectiveness of embeddings in capturing structural and contextual information. For the evaluation, we use

3

real-world biological sequence (proteins and nucleotide) datasets and performed representation capacity analysis of

4

embedding methods from the literature, namely Spike2Vec, Spaced

k

-mers, PWM2Vec, and AutoEncoder.Comment: Accepted at ISBRA 202

arXiv.org e-Print Archive

ASPECTS OF STATISTICAL DISCLOSURE CONTROL

Author: Smith Duncan
Publication venue
Publication date: 01/08/2012
Field of study

The University of Manchester - Institutional Repository

Confidentiality Protection in the 2020 US Census of Population and Housing

Author: Abowd John M
Hawes Michael B
Publication venue: 'Annual Reviews'
Publication date: 27/12/2022
Field of study

In an era where external data and computational capabilities far exceed statistical agencies' own resources and capabilities, they face the renewed challenge of protecting the confidentiality of underlying microdata when publishing statistics in very granular form and ensuring that these granular data are used for statistical purposes only. Conventional statistical disclosure limitation methods are too fragile to address this new challenge. This article discusses the deployment of a differential privacy framework for the 2020 US Census that was customized to protect confidentiality, particularly the most detailed geographic and demographic categories, and deliver controlled accuracy across the full geographic hierarchy.Comment: Version 2 corrects a few transcription errors in Tables 2, 3 and 5. Version 3 adds final journal copy edits to the preprin

arXiv.org e-Print Archive