9 research outputs found

    Projective Averages for Summarizing Redistricting Ensembles

    Full text link
    A recurring challenge in the application of redistricting simulation algorithms lies in extracting useful summaries and comparisons from a large ensemble of districting plans. Researchers often compute summary statistics for each district in a plan, and then study their distribution across the plans in the ensemble. This approach discards rich geographic information that is inherent in districting plans. We introduce the projective average, an operation that projects a district-level summary statistic back to the underlying geography and then averages this statistic across plans in the ensemble. Compared to traditional district-level summaries, projective averages are a powerful tool for geographically granular, sub-district analysis of districting plans along a variety of dimensions. However, care must be taken to account for variation within redistricting ensembles, to avoid misleading conclusions. We propose and validate a multiple-testing procedure to control the probability of incorrectly identifying outlier plans or regions when using projective averages.Comment: 7 pages, 3 figures, plus appendice

    Sequential Monte Carlo for Sampling Balanced and Compact Redistricting Plans

    Full text link
    Random sampling of graph partitions under constraints has become a popular tool for evaluating legislative redistricting plans. Analysts detect partisan gerrymandering by comparing a proposed redistricting plan with an ensemble of sampled alternative plans. For successful application, sampling methods must scale to large maps with many districts, incorporate realistic legal constraints, and accurately and efficiently sample from a selected target distribution. Unfortunately, most existing methods struggle in at least one of these three areas. We present a new Sequential Monte Carlo (SMC) algorithm that draws representative redistricting plans from a realistic target distribution of choice. Because it samples directly, the SMC algorithm can efficiently explore the relevant space of redistricting plans better than the existing Markov chain Monte Carlo algorithms that yield dependent samples. Our algorithm can simultaneously incorporate several constraints commonly imposed in real-world redistricting problems, including equal population, compactness, and preservation of administrative boundaries. We validate the accuracy of the proposed algorithm by using a small map where all redistricting plans can be enumerated. We then apply the SMC algorithm to evaluate the partisan implications of several maps submitted by relevant parties in a recent high-profile redistricting case in the state of Pennsylvania. We find that the proposed algorithm is roughly 40 times more efficient in sampling from the target distribution than a state-of-the-art MCMC algorithm. Open-source software is available for implementing the proposed methodology.Comment: 44 pages, 11 figures; added additional validation example, improved measurements in Section 6 comparison, reworked some language for precision throughou

    Making Differential Privacy Work for Census Data Users

    Full text link
    The U.S. Census Bureau collects and publishes detailed demographic data about Americans which are heavily used by researchers and policymakers. The Bureau has recently adopted the framework of differential privacy in an effort to improve confidentiality of individual census responses. A key output of this privacy protection system is the Noisy Measurement File (NMF), which is produced by adding random noise to tabulated statistics. The NMF is critical to understanding any errors introduced in the data, and performing valid statistical inference on published census data. Unfortunately, the current release format of the NMF is difficult to access and work with. We describe the process we use to transform the NMF into a usable format, and provide recommendations to the Bureau for how to release future versions of the NMF. These changes are essential for ensuring transparency of privacy measures and reproducibility of scientific research built on census data.Comment: 9 pages, 2 figure

    Measuring and Modeling Neighborhoods

    Full text link
    The availability of granular geographic data presents new opportunities to understand how neighborhoods are formed and how they influence attitudes and behavior. To facilitate such studies, we develop an online survey instrument for respondents to draw their neighborhoods on a map. We then propose a statistical model to analyze how the characteristics of respondents and geography, and their interactions, predict subjective neighborhoods. We illustrate the proposed methodology using a survey of 2,572 voters in Miami, New York City, and Phoenix. Holding other factors constant, White respondents tend to include census blocks with more White residents in their neighborhoods. Similarly, Democratic and Republican respondents are more likely to include co-partisan census blocks. Our model also provides more accurate out-of-sample predictions than standard distance-based neighborhood measures. Lastly, we use these methodological tools to test how demographic information shapes neighborhoods, but find limited effects from this experimental manipulation. Open-source software is available for implementing the methodology.Comment: 31 pages, 13 figures, and appendices. This revision includes two updated figure

    Widespread Partisan Gerrymandering Mostly Cancels Nationally, but Reduces Electoral Competition

    Full text link
    Congressional district lines in many U.S. states are drawn by partisan actors, raising concerns about gerrymandering. To separate the partisan effects of redistricting from the effects of other factors including geography and redistricting rules, we compare possible party compositions of the U.S. House under the enacted plan to those under a set of alternative simulated plans that serve as a non-partisan baseline. We find that partisan gerrymandering is widespread in the 2020 redistricting cycle, but most of the electoral bias it creates cancels at the national level, giving Republicans two additional seats on average. Geography and redistricting rules separately contribute a moderate pro-Republican bias. Finally, we find that partisan gerrymandering reduces electoral competition and makes the partisan composition of the U.S. House less responsive to shifts in the national vote.Comment: 10 pages, 4 figures, plus references and appendi

    Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy Protection Methods

    Full text link
    The United States Census Bureau faces a difficult trade-off between the accuracy of Census statistics and the protection of individual information. We conduct the first independent evaluation of bias and noise induced by the Bureau's two main disclosure avoidance systems: the TopDown algorithm employed for the 2020 Census and the swapping algorithm implemented for the 1990, 2000, and 2010 Censuses. Our evaluation leverages the recent release of the Noisy Measure File (NMF) as well as the availability of two independent runs of the TopDown algorithm applied to the 2010 decennial Census. We find that the NMF contains too much noise to be directly useful alone, especially for Hispanic and multiracial populations. TopDown's post-processing dramatically reduces the NMF noise and produces similarly accurate data to swapping in terms of bias and noise. These patterns hold across census geographies with varying population sizes and racial diversity. While the estimated errors for both TopDown and swapping are generally no larger than other sources of Census error, they can be relatively substantial for geographies with small total populations.Comment: 21 pages, 6 figure

    Comment: The Essential Role of Policy Evaluation for the 2020 Census Disclosure Avoidance System

    Full text link
    In "Differential Perspectives: Epistemic Disconnects Surrounding the US Census Bureau's Use of Differential Privacy," boyd and Sarathy argue that empirical evaluations of the Census Disclosure Avoidance System (DAS), including our published analysis, failed to recognize how the benchmark data against which the 2020 DAS was evaluated is never a ground truth of population counts. In this commentary, we explain why policy evaluation, which was the main goal of our analysis, is still meaningful without access to a perfect ground truth. We also point out that our evaluation leveraged features specific to the decennial Census and redistricting data, such as block-level population invariance under swapping and voter file racial identification, better approximating a comparison with the ground truth. Lastly, we show that accurate statistical predictions of individual race based on the Bayesian Improved Surname Geocoding, while not a violation of differential privacy, substantially increases the disclosure risk of private information the Census Bureau sought to protect. We conclude by arguing that policy makers must confront a key trade-off between data utility and privacy protection, and an epistemic disconnect alone is insufficient to explain disagreements between policy choices.Comment: Version accepted to Harvard Data Science Revie
    corecore