9 research outputs found
Projective Averages for Summarizing Redistricting Ensembles
A recurring challenge in the application of redistricting simulation
algorithms lies in extracting useful summaries and comparisons from a large
ensemble of districting plans. Researchers often compute summary statistics for
each district in a plan, and then study their distribution across the plans in
the ensemble. This approach discards rich geographic information that is
inherent in districting plans. We introduce the projective average, an
operation that projects a district-level summary statistic back to the
underlying geography and then averages this statistic across plans in the
ensemble. Compared to traditional district-level summaries, projective averages
are a powerful tool for geographically granular, sub-district analysis of
districting plans along a variety of dimensions. However, care must be taken to
account for variation within redistricting ensembles, to avoid misleading
conclusions. We propose and validate a multiple-testing procedure to control
the probability of incorrectly identifying outlier plans or regions when using
projective averages.Comment: 7 pages, 3 figures, plus appendice
Sequential Monte Carlo for Sampling Balanced and Compact Redistricting Plans
Random sampling of graph partitions under constraints has become a popular
tool for evaluating legislative redistricting plans. Analysts detect partisan
gerrymandering by comparing a proposed redistricting plan with an ensemble of
sampled alternative plans. For successful application, sampling methods must
scale to large maps with many districts, incorporate realistic legal
constraints, and accurately and efficiently sample from a selected target
distribution. Unfortunately, most existing methods struggle in at least one of
these three areas. We present a new Sequential Monte Carlo (SMC) algorithm that
draws representative redistricting plans from a realistic target distribution
of choice. Because it samples directly, the SMC algorithm can efficiently
explore the relevant space of redistricting plans better than the existing
Markov chain Monte Carlo algorithms that yield dependent samples. Our algorithm
can simultaneously incorporate several constraints commonly imposed in
real-world redistricting problems, including equal population, compactness, and
preservation of administrative boundaries. We validate the accuracy of the
proposed algorithm by using a small map where all redistricting plans can be
enumerated. We then apply the SMC algorithm to evaluate the partisan
implications of several maps submitted by relevant parties in a recent
high-profile redistricting case in the state of Pennsylvania. We find that the
proposed algorithm is roughly 40 times more efficient in sampling from the
target distribution than a state-of-the-art MCMC algorithm. Open-source
software is available for implementing the proposed methodology.Comment: 44 pages, 11 figures; added additional validation example, improved
measurements in Section 6 comparison, reworked some language for precision
throughou
Making Differential Privacy Work for Census Data Users
The U.S. Census Bureau collects and publishes detailed demographic data about
Americans which are heavily used by researchers and policymakers. The Bureau
has recently adopted the framework of differential privacy in an effort to
improve confidentiality of individual census responses. A key output of this
privacy protection system is the Noisy Measurement File (NMF), which is
produced by adding random noise to tabulated statistics. The NMF is critical to
understanding any errors introduced in the data, and performing valid
statistical inference on published census data. Unfortunately, the current
release format of the NMF is difficult to access and work with. We describe the
process we use to transform the NMF into a usable format, and provide
recommendations to the Bureau for how to release future versions of the NMF.
These changes are essential for ensuring transparency of privacy measures and
reproducibility of scientific research built on census data.Comment: 9 pages, 2 figure
Measuring and Modeling Neighborhoods
The availability of granular geographic data presents new opportunities to
understand how neighborhoods are formed and how they influence attitudes and
behavior. To facilitate such studies, we develop an online survey instrument
for respondents to draw their neighborhoods on a map. We then propose a
statistical model to analyze how the characteristics of respondents and
geography, and their interactions, predict subjective neighborhoods. We
illustrate the proposed methodology using a survey of 2,572 voters in Miami,
New York City, and Phoenix. Holding other factors constant, White respondents
tend to include census blocks with more White residents in their neighborhoods.
Similarly, Democratic and Republican respondents are more likely to include
co-partisan census blocks. Our model also provides more accurate out-of-sample
predictions than standard distance-based neighborhood measures. Lastly, we use
these methodological tools to test how demographic information shapes
neighborhoods, but find limited effects from this experimental manipulation.
Open-source software is available for implementing the methodology.Comment: 31 pages, 13 figures, and appendices. This revision includes two
updated figure
Widespread Partisan Gerrymandering Mostly Cancels Nationally, but Reduces Electoral Competition
Congressional district lines in many U.S. states are drawn by partisan
actors, raising concerns about gerrymandering. To separate the partisan effects
of redistricting from the effects of other factors including geography and
redistricting rules, we compare possible party compositions of the U.S. House
under the enacted plan to those under a set of alternative simulated plans that
serve as a non-partisan baseline. We find that partisan gerrymandering is
widespread in the 2020 redistricting cycle, but most of the electoral bias it
creates cancels at the national level, giving Republicans two additional seats
on average. Geography and redistricting rules separately contribute a moderate
pro-Republican bias. Finally, we find that partisan gerrymandering reduces
electoral competition and makes the partisan composition of the U.S. House less
responsive to shifts in the national vote.Comment: 10 pages, 4 figures, plus references and appendi
Evaluating Bias and Noise Induced by the U.S. Census Bureau's Privacy Protection Methods
The United States Census Bureau faces a difficult trade-off between the
accuracy of Census statistics and the protection of individual information. We
conduct the first independent evaluation of bias and noise induced by the
Bureau's two main disclosure avoidance systems: the TopDown algorithm employed
for the 2020 Census and the swapping algorithm implemented for the 1990, 2000,
and 2010 Censuses. Our evaluation leverages the recent release of the Noisy
Measure File (NMF) as well as the availability of two independent runs of the
TopDown algorithm applied to the 2010 decennial Census. We find that the NMF
contains too much noise to be directly useful alone, especially for Hispanic
and multiracial populations. TopDown's post-processing dramatically reduces the
NMF noise and produces similarly accurate data to swapping in terms of bias and
noise. These patterns hold across census geographies with varying population
sizes and racial diversity. While the estimated errors for both TopDown and
swapping are generally no larger than other sources of Census error, they can
be relatively substantial for geographies with small total populations.Comment: 21 pages, 6 figure
Comment: The Essential Role of Policy Evaluation for the 2020 Census Disclosure Avoidance System
In "Differential Perspectives: Epistemic Disconnects Surrounding the US
Census Bureau's Use of Differential Privacy," boyd and Sarathy argue that
empirical evaluations of the Census Disclosure Avoidance System (DAS),
including our published analysis, failed to recognize how the benchmark data
against which the 2020 DAS was evaluated is never a ground truth of population
counts. In this commentary, we explain why policy evaluation, which was the
main goal of our analysis, is still meaningful without access to a perfect
ground truth. We also point out that our evaluation leveraged features specific
to the decennial Census and redistricting data, such as block-level population
invariance under swapping and voter file racial identification, better
approximating a comparison with the ground truth. Lastly, we show that accurate
statistical predictions of individual race based on the Bayesian Improved
Surname Geocoding, while not a violation of differential privacy, substantially
increases the disclosure risk of private information the Census Bureau sought
to protect. We conclude by arguing that policy makers must confront a key
trade-off between data utility and privacy protection, and an epistemic
disconnect alone is insufficient to explain disagreements between policy
choices.Comment: Version accepted to Harvard Data Science Revie