17 research outputs found

    Finding a Needle in a Haystack: The Theoretical and Empirical Foundations of Assessing Disclosure Risk for Contextualized Microdata

    Full text link
    Contextualized microdata are one way to safely release geographic data without identifying the location of survey respondents. This study informs the design of such datafiles with its needle-in-haystack approach to disclosure and its discussion of associated methodological concerns. Drawing a sample of counties, tracts, and blockgroups, I illustrate how the reidentification of individuals is shaped by aggregating geographies into look-alike sets. I detail the complexity of reidentification patterns by assessing the likelihood that young adult white and black males would be pinpointed within reconstituted haystacks given: (1) the size of the total population of aggregated contexts; (2) the amount of error in population counts; and (3) differential search costs stemming from spatially dispersed contexts.http://deepblue.lib.umich.edu/bitstream/2027.42/58628/1/ICPSR-WP-No4-Witkowski.pd

    Disclosure Risk of Geography Attributes: The Role of Spatial Scale, Identified Geography, and Measurement Detail in Public-Use Files

    Full text link
    Spatial information is essential for modern forms of analysis; and as a result, researchers have increasingly called for geographically-specific microdata. Contextual data is one way to safely release this information without identifying the location of survey respondents. Analyzing an array of geography attributes, I conduct reidentification experiments for 14,796 simulated datasets to measure the likelihood of pinpointing geographic locations under alternative database designs, relating to: (1) the spatial scale of standard geographies, as determined by the areal size of these administrative units; (2) the scope of study, as determined by the identification of division, state, and MSA-status; (3) the number of geography attributes provided in a dataset; and (4) and coarseness of these contextual measures, as determined by global recoding schema. Using the “data file” as my unit of analysis, the number of geographic units resembling a study location as the outcome of interest, and associated experimental traits, I detail the complexity of reidentification patterns that emerge when constructing public-use files that provide contextual data where two distinct scenarios of intruder search behavior are assumed.http://deepblue.lib.umich.edu/bitstream/2027.42/58626/1/ICPSR-WP-No2-Witkowski.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/58626/4/ICPSR-WP-No2-Witkowski-Jan-2009 (2).pd

    Disclosure Risk Components of Contextualized Microdata: Identifying Unique Geographic Units and the Implications for Pinpointing Survey Respondents

    Full text link
    To safely respond to increased demand for microdata that contain contextual information, producers ought to consider how this data may be used to identify the location of survey respondents. This study informs the design of these datafiles with its hierarchical matching algorithm and discussion of associated methodological concerns. Compiling nearly 15,000 test datasets composed of person-records, I assess three determinants of “locational” risk, that of identifying the location of survey respondents whose contextual characteristics: (1) are rarely found among the total population of geographic units; (2) are rarely found within a survey; and (3) pose no disclosure risk given the protection offered by the area’s dense population. Using the “datafile” as my unit of analysis, the proportion of survey respondents whose locations are easily-reidentified as the outcome of interest, and indicators of different components of this risk, I detail the complexity of reidentification patterns that emerge when constructing public-use files that provide contextual data.http://deepblue.lib.umich.edu/bitstream/2027.42/58627/1/ICPSR-WP-No3-Witkowski.pd

    A Reconfiguration of Census Tabulations: Maintaining Historical Consistency of Aggregate Industrial Categories at the County-Level

    Full text link
    Consistent measures are imperative for conducting valid historical analyses. Collected in the long-form survey of the decennial census, employment data has traditionally been tabulated by aggregate industrial category for all counties. Starting in 2000, the industrial coding scheme drastically changed. In response, we develop a methodology to formulate “geographically-sensitive” conversion factors that reconfigure NAISC-based tabulations into long-established SIC categories.This research has been supported by Grant Number P01 HD045753 from the National Institute of Child Health and Human Development.http://deepblue.lib.umich.edu/bitstream/2027.42/57739/1/ICPSR-WP-No1-Witkowksi-Gutmann.pd

    Providing Spatial Data for Secondary Analysis: Issues and Current Practices Relating to Confidentiality

    Full text link
    Spatially explicit data pose a series of opportunities and challenges for all the actors involved in providing data for long-term preservation and secondary analysis—the data producer, the data archive, and the data user. We report on opportunities and challenges for each of the three players, and then turn to a summary of current thinking about how best to prepare, archive, disseminate, and make use of social science data that have spatially explicit identification. The core issue that runs through the paper is the risk of the disclosure of the identity of respondents. If we know where they live, where they work, or where they own property, it is possible to find out who they are. Those involved in collecting, archiving, and using data need to be aware of the risks of disclosure and become familiar with best practices to avoid disclosures that will be harmful to respondents.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/60426/1/spatial data.confidentiality.fulltext.pd

    Enhancing Data Sharing Via Safe Designs: Generating Knowledge to Inform Scientific Practice

    Full text link
    The social value of data collections are dramatically enhanced by the broad dissemination of research files and the resulting increase in scientific productivity. Currently, most studies are designed with a focus on collecting information that is analytically useful and accurate, with little forethought as to how it will be shared. Both literature and practice also presume that disclosure analysis will take place after data collection. But to produce public-use data of the highest analytical utility for the largest user group, disclosure risk must be considered at the beginning of the research process. Drawing upon economic and statistical decision-theoretic frameworks and survey methodology research, this study seeks to enhance the scientific productivity of shared research data by describing how disclosure risk can be addressed in the earliest stages of research with the formulation of "safe designs". Implications for various research costs are also discussed.National Institute of Child Health and Human Development (NICHD), Grant 1 R01 HD067184http://deepblue.lib.umich.edu/bitstream/2027.42/106398/1/Witkowski_Working Paper_Enhancing Data Sharing Via Safe Designs_2014Mar.pd

    Salivary Cortisol and Posttraumatic Stress Disorder in a Low-Income Community Sample of Women

    Full text link
    http://deepblue.lib.umich.edu/bitstream/2027.42/51381/1/Young EA, Salivary Cortisol and Posttraumatic Stress Disorder, 2004.pd

    A survey of the vegetation of Fairy Island.

    Full text link
    http://deepblue.lib.umich.edu/bitstream/2027.42/52694/1/1127.pdfDescription of 1127.pdf : Access restricted to on-site users at the U-M Biological Station
    corecore