1,543 research outputs found
Big Data Privacy Context: Literature Effects On Secure Informational Assets
This article's objective is the identification of research opportunities in
the current big data privacy domain, evaluating literature effects on secure
informational assets. Until now, no study has analyzed such relation. Its
results can foster science, technologies and businesses. To achieve these
objectives, a big data privacy Systematic Literature Review (SLR) is performed
on the main scientific peer reviewed journals in Scopus database. Bibliometrics
and text mining analysis complement the SLR. This study provides support to big
data privacy researchers on: most and least researched themes, research
novelty, most cited works and authors, themes evolution through time and many
others. In addition, TOPSIS and VIKOR ranks were developed to evaluate
literature effects versus informational assets indicators. Secure Internet
Servers (SIS) was chosen as decision criteria. Results show that big data
privacy literature is strongly focused on computational aspects. However,
individuals, societies, organizations and governments face a technological
change that has just started to be investigated, with growing concerns on law
and regulation aspects. TOPSIS and VIKOR Ranks differed in several positions
and the only consistent country between literature and SIS adoption is the
United States. Countries in the lowest ranking positions represent future
research opportunities.Comment: 21 pages, 9 figure
Injecting Uncertainty in Graphs for Identity Obfuscation
Data collected nowadays by social-networking applications create fascinating
opportunities for building novel services, as well as expanding our
understanding about social structures and their dynamics. Unfortunately,
publishing social-network graphs is considered an ill-advised practice due to
privacy concerns. To alleviate this problem, several anonymization methods have
been proposed, aiming at reducing the risk of a privacy breach on the published
data, while still allowing to analyze them and draw relevant conclusions. In
this paper we introduce a new anonymization approach that is based on injecting
uncertainty in social graphs and publishing the resulting uncertain graphs.
While existing approaches obfuscate graph data by adding or removing edges
entirely, we propose using a finer-grained perturbation that adds or removes
edges partially: this way we can achieve the same desired level of obfuscation
with smaller changes in the data, thus maintaining higher utility. Our
experiments on real-world networks confirm that at the same level of identity
obfuscation our method provides higher usefulness than existing randomized
methods that publish standard graphs.Comment: VLDB201
Microdata Disclosure by Resampling: Empirical Findings for Business Survey Data
A problem statistical offices and research institutes are faced with by releasing micro-data is the preservation of confidentiality. Traditional methods to avoid disclosure often destroy the structure of data, i.e., information loss is potentially high. In this paper I discuss an alternative technique of creating scientific-use-files, which reproduce the characteristics of the original data quite well. It is based on Fienberg?s (1997 and 1994) [5], [6] idea to estimate and resample from the empirical multivariate cumulative distribution function of the data in order to get synthetic data. The procedure creates datasets - the resample - which have the same characteristics as the original survey data. In this paper I present some applications of this method with (a) simulated data and (b) innovation survey data, the Mannheim Innovation Panel (MIP), and compare resampling with a common method of disclosure control, i.e. disturbance with multiplicative error, concerning confidentiality on the one hand and the appropriateness of the disturbed data for different kinds of analyses on the other. The results show that univariate distributions can be better reproduced by unweighted resampling. Parameter estimates can be reproduced quite well if (a) the resampling procedure implements the correlation structure of the original data as a scale and (b) the data is multiplicative perturbed and a correction term is used. On average, anonymized data with multiplicative perturbed values better protect against re?identification as the various resampling methods used. --resampling,multiplicative data perturbation,Monte Carlo studies,business survey data
Distribution-Agnostic Database De-Anonymization Under Synchronization Errors
There has recently been an increased scientific interest in the
de-anonymization of users in anonymized databases containing user-level
microdata via multifarious matching strategies utilizing publicly available
correlated data. Existing literature has either emphasized practical aspects
where underlying data distribution is not required, with limited or no
theoretical guarantees, or theoretical aspects with the assumption of complete
availability of underlying distributions. In this work, we take a step towards
reconciling these two lines of work by providing theoretical guarantees for the
de-anonymization of random correlated databases without prior knowledge of data
distribution. Motivated by time-indexed microdata, we consider database
de-anonymization under both synchronization errors (column repetitions) and
obfuscation (noise). By modifying the previously used replica detection
algorithm to accommodate for the unknown underlying distribution, proposing a
new seeded deletion detection algorithm, and employing statistical and
information-theoretic tools, we derive sufficient conditions on the database
growth rate for successful matching. Our findings demonstrate that a
double-logarithmic seed size relative to row size ensures successful deletion
detection. More importantly, we show that the derived sufficient conditions are
the same as in the distribution-aware setting, negating any asymptotic loss of
performance due to unknown underlying distributions
Regulating Data as Property: A New Construct for Moving Forward
The global community urgently needs precise, clear rules that define ownership of data and express the attendant rights to license, transfer, use, modify, and destroy digital information assets. In response, this article proposes a new approach for regulating data as an entirely new class of property. Recently, European and Asian public officials and industries have called for data ownership principles to be developed, above and beyond current privacy and data protection laws. In addition, official policy guidances and legal proposals have been published that offer to accelerate realization of a property rights structure for digital information. But how can ownership of digital information be achieved? How can those rights be transferred and enforced? Those calls for data ownership emphasize the impact of ownership on the automotive industry and the vast quantities of operational data which smart automobiles and self-driving vehicles will produce. We looked at how, if at all, the issue was being considered in consumer-facing statements addressing the data being collected by their vehicles. To formulate our proposal, we also considered continued advances in scientific research, quantum mechanics, and quantum computing which confirm that information in any digital or electronic medium is, and always has been, physical, tangible matter. Yet, to date, data regulation has sought to adapt legal constructs for “intangible” intellectual property or to express a series of permissions and constraints tied to specific classifications of data (such as personally identifiable information). We examined legal reforms that were recently approved by the United Nations Commission on International Trade Law to enable transactions involving electronic transferable records, as well as prior reforms adopted in the United States Uniform Commercial Code and Federal law to enable similar transactions involving digital records that were, historically, physical assets (such as promissory notes or chattel paper). Finally, we surveyed prior academic scholarship in the U.S. and Europe to determine if the physical attributes of digital data had been previously considered in the vigorous debates on how to regulate personal information or the extent, if at all, that the solutions developed for transferable records had been considered for larger classes of digital assets. Based on the preceding, we propose that regulation of digital information assets, and clear concepts of ownership, can be built on existing legal constructs that have enabled electronic commercial practices. We propose a property rules construct that clearly defines a right to own digital information arises upon creation (whether by keystroke or machine), and suggest when and how that right attaches to specific data though the exercise of technological controls. This construct will enable faster, better adaptations of new rules for the ever-evolving portfolio of data assets being created around the world. This approach will also create more predictable, scalable, and extensible mechanisms for regulating data and is consistent with, and may improve the exercise and enforcement of, rights regarding personal information. We conclude by highlighting existing technologies and their potential to support this construct and begin an inventory of the steps necessary to further proceed with this process
Microdata Disclosure by Resampling - Empirical Findings for Business Survey Data
A problem statistical oces and research institutes are faced with by releasing micro-data is the preservation of confidentiality. Traditional methods to avoid disclosure often destroy the structure of data, i.e., information loss is potentially high. In this paper I discuss an alternative technique of creating scientific-use-files, which reproduce the characteristics of the original data quite well. It is based on Fienbergs (1997 and 1994) [5], [6] idea to estimate and resample from the empirical multivariate cumulative distribution function of the data in order to get synthetic data. The procedure creates datasets - the resample - which have the same characteristics as the original survey data. In this paper I present some applications of this method with (a) simulated data and (b) innovation survey data, the Mannheim Innovation Panel (MIP), and compare resampling with a common method of disclosure control, i.e. disturbance with multiplicative error, concerning confidentiality on the one hand and the appropriateness of the disturbed data for different kinds of analyses on the other. The results show that univariate distributions can be better reproduced by unweighted resampling. Parameter estimates can be reproduced quite well if (a) the resampling procedure implements the correlation structure of the original data as a scale and (b) the data is multiplicative perturbed and a correction term is used. On average, anonymized data with multiplicative perturbed values better protect against re-identification as the various resampling methods used
- …