3 research outputs found

    Providing Spatial Data for Secondary Analysis: Issues and Current Practices Relating to Confidentiality

    Full text link
    Spatially explicit data pose a series of opportunities and challenges for all the actors involved in providing data for long-term preservation and secondary analysis—the data producer, the data archive, and the data user. We report on opportunities and challenges for each of the three players, and then turn to a summary of current thinking about how best to prepare, archive, disseminate, and make use of social science data that have spatially explicit identification. The core issue that runs through the paper is the risk of the disclosure of the identity of respondents. If we know where they live, where they work, or where they own property, it is possible to find out who they are. Those involved in collecting, archiving, and using data need to be aware of the risks of disclosure and become familiar with best practices to avoid disclosures that will be harmful to respondents.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/60426/1/spatial data.confidentiality.fulltext.pd

    Spectral anonymization of data

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 87-96).Data anonymization is the process of conditioning a dataset such that no sensitive information can be learned about any specific individual, but valid scientific analysis can nevertheless be performed on it. It is not sufficient to simply remove identifying information because the remaining data may be enough to infer the individual source of the record (a reidentification disclosure) or to otherwise learn sensitive information about a person (a predictive disclosure). The only known way to prevent these disclosures is to remove additional information from the dataset. Dozens of anonymization methods have been proposed over the past few decades; most work by perturbing or suppressing variable values. None have been successful at simultaneously providing perfect privacy protection and allowing perfectly accurate scientific analysis. This dissertation makes the new observation that the anonymizing operations do not need to be made in the original basis of the dataset. Operating in a different, judiciously chosen basis can improve privacy protection, analytic utility, and computational efficiency. I use the term 'spectral anonymization' to refer to anonymizing in a spectral basis, such as the basis provided by the data's eigenvectors. Additionally, I propose new measures of reidentification and prediction risk that are more generally applicable and more informative than existing measures. I also propose a measure of analytic utility that assesses the preservation of the multivariate probability distribution. Finally, I propose the demanding reference standard of nonparticipation in the study to define adequate privacy protection. I give three examples of spectral anonymization in practice. The first example improves basic cell swapping from a weak algorithm to one competitive with state of-the-art methods merely by a change of basis.(cont) The second example demonstrates avoiding the curse of dimensionality in microaggregation. The third describes a powerful algorithm that reduces computational disclosure risk to the same level as that of nonparticipants and preserves at least 4th order interactions in the multivariate distribution. No previously reported algorithm has achieved this combination of results.by Thomas Anton Lasko.Ph.D

    Social Epidemiology and Spatial Epidemiology: An Empirical Comparison of Perspectives

    Get PDF
    University of Minnesota Ph.D. dissertation. May 2013. Major: Epidemiology. Advisor: Michael Oakes. 1 computer file (PDF); x, 100 pages.Social and spatial epidemiologists each bring a unique perspective to how they examine contextual or neighborhood-level determinants of health. Although both perspectives draw from epidemiology, social epidemiology is additionally grounded in sociology and causal counterfactual frameworks while spatial epidemiology is heavily influenced by medical geography and predictive models. No study to date has compared these two distinct perspectives, along with their corresponding analytical approaches and model results. Yet this comparison may advance contextual effects research in epidemiology by suggesting methodological enhancements, providing insights into the robustness of our conclusions to the perspective taken, and suggesting whether we can truly identify contextual effects from observational data. To facilitate this comparison we used both perspectives to examine a research question: What is the estimated effect of increasing neighborhood education or income on overweight/obesity, type 2 diabetes, and current smoking, independent of individual-level differences? The social epidemiology approach employed propensity score matching while the spatial approach used approximated spatial multilevel models. Data for this study came from the California Health Interview Survey (2005, 2007, 2009) and the American Community Survey (2006-2010). Results revealed minimal to no effect of neighborhood education and income on overweight/obesity, type 2 diabetes, or current smoking, but estimated effects did vary somewhat by approach. This comparison highlighted fundamentally different goals in social and spatial epidemiology: identifying causal factors to intervene compared to predicting potential causal factors to describe reality. Attempts to improve causal inference in observational studies by integrating analytical techniques across subfields will likely be hampered by different objectives and model requirements. This incompatibility for integration, lack of strong evidence of effects, and the overall identification problem cast further doubt on our ability to identify causal contextual effects using observational data. However, this work may help in the design of experiments, which is where we should now focus
    corecore