9 research outputs found
Large-scale text processing pipeline with Apache Spark
In this paper, we evaluate Apache Spark for a data-intensive machine learning
problem. Our use case focuses on policy diffusion detection across the state
legislatures in the United States over time. Previous work on policy diffusion
has been unable to make an all-pairs comparison between bills due to
computational intensity. As a substitute, scholars have studied single topic
areas.
We provide an implementation of this analysis workflow as a distributed text
processing pipeline with Spark dataframes and Scala application programming
interface. We discuss the challenges and strategies of unstructured data
processing, data formats for storage and efficient access, and graph processing
at scale
Political Cleavages within Industry: Firm level lobbying for Trade Liberalization ∗
[Work in Progress] Existing political economy models rely on inter-industry differences such as factor endowment or factor specificity to explain the politics of trade policy-making. However, this paper finds that a large proportion of variation in U.S. applied tariff rates in fact arises within industry. I offer a theory of trade liberalization that explains how product differentiation in economic markets leads to firm-level lobbying in political markets. I argue that while high product differentiation eliminates the collective action problem exporting firms confront, political objections to product-specific liberalization will decline due to less substitutability and the possibility of serving foreign markets based on the norms of reciprocity. To test this argument, I construct a new dataset on lobbying by all publicly traded manufacturing firms after parsing all 838,588 lobbying reports filed under the Lobbying Disclosure Act of 1995. I find that productive exporting firms are more likely to lobby to reduce tariffs, especially when their products are sufficiently differentiated. I also find that highly differentiated products have lower tariff rates. The results challenge the common focus on industry-level lobbying for protection
Essays in Political Methodology
This collection of essays studies Bayesian statistical models for data analysis in political science.
The first chapter proposes a nonparametric Bayesian approach that uncovers heterogeneous treatment effects even when moderators are unobserved. The method employs a Dirichlet process mixture model to estimate the distribution of treatment effects, and it is applicable to any setting in which regression models are used for causal inference. An application to a study on resource curse also shows that the method finds the subset of observations for which the monotonicity assumption is likely to hold.
The second chapter proposes a new topic model for analyzing plagiarism. Text is modeled as a mixture of words copied from the plagiarized source and words drawn from a new distribution, and the model provides estimated probabilities of plagiarism for each word. An application to the corpus of preferential trade arrangements shows the utility of the model by describing how the likelihood of plagiarism varies across topics.
The third chapter develops an -gram based topic model for analyzing citations and text. It introduces a latent variable indicating whether each token is in the positive or negative mode and assumes dependence between the mode of citation and its previous terms. An application to the data set of the WTO dispute settlement mechanism shows that the conclusion of an existing study using the same data may hold only in some issues.
The fourth chapter (coauthored with Gabriel Lopez-Moctezuma and Devin Incerti) reexamines resource curse using the Bayesian dynamic linear model (DLM). The DLM models both temporal dependence in the data and the evolution of parameters. The results show that there is a negative relationship between oil income and the level of democracy only after the 1970s, which coincides with the Arab oil embargo and the nationalization of the oil industry in major oil-exporting countries
Recommended from our members
Coethnic bias and wartime informing
© 2015 Southern Political Science Association. Information about insurgent groups is a central resource in civil wars: Counterinsurgents seek it, insurgents safeguard it, and civilians often trade it. Yet despite its essential role in civil war dynamics, the act of informing is still poorly understood, due mostly to the classified nature of informant "tips." As an alternative research strategy, we use an original 2,700 respondent survey experiment in 100 villages to examine attitudes toward the Guardians of Peace program, a widespread campaign by the International Security Assistance Force in Afghanistan to recruit local informants. We find that coethnic bias-the systematic tendency to favor cooperation with coethnics-shapes attitudes about informing and beliefs about retaliation, especially among Tajik respondents. This bias persists even after adjusting for additional explanations and potential confounding variables, suggesting that identity considerations such as coethnicity may influence attitudes toward high-risk behavior in wartime settings
A Non-parametric Bayesian Model for Detecting Differential Item Functioning: An Application to Political Representation in the US
A common approach when studying the quality of representation involves
comparing the latent preferences of voters and legislators, commonly obtained
by fitting an item-response theory (IRT) model to a common set of stimuli.
Despite being exposed to the same stimuli, voters and legislators may not share
a common understanding of how these stimuli map onto their latent preferences,
leading to differential item-functioning (DIF) and incomparability of
estimates. We explore the presence of DIF and incomparability of latent
preferences obtained through IRT models by re-analyzing an influential survey
data set, where survey respondents expressed their preferences on roll call
votes that U.S. legislators had previously voted on. To do so, we propose
defining a Dirichlet Process prior over item-response functions in standard IRT
models. In contrast to typical multi-step approaches to detecting DIF, our
strategy allows researchers to fit a single model, automatically identifying
incomparable sub-groups with different mappings from latent traits onto
observed responses. We find that although there is a group of voters whose
estimated positions can be safely compared to those of legislators, a sizeable
share of surveyed voters understand stimuli in fundamentally different ways.
Ignoring these issues can lead to incorrect conclusions about the quality of
representation
Recommended from our members
Coethnic Bias and Wartime Informing
Information about insurgent groups is a central resource in civil wars: counterinsurgets seek it, insurgents safeguard it, and civilians often trade it. Yet despite its essential role in civil war dynamics, the act of informing is still poorly understood, due mostly to the classified nature of informant "tips" and the absence of reliable data on civilian attitudes. We use a 2,700 respondent survey experiment in 100 villages to examine attitudes toward the Guardians of Peace program, a widespread campaign by the International Security Assistance Force (ISAF) in Afghanistan to recruit local informants. We find that coethnic bias - the systematic tendency to favor cooperation with coethnics - shapes attitudes about informing and beliefs about retaliation, especially among Tajik respondents. This bias persists even after adjusting for additional explanations and potential confounding variables, suggesting that identity considerations such as coethnicity influence attitudes toward a high-risk behavior in wartime settings
Credibility and Distributional Effects of International Banking Regulations: Evidence from US Bank Stock Returns
Abstract Financial regulatory networks are a pervasive, new type of global governance heralded by some as a flexible answer to globalization dilemmas and dismissed by others as ineffective due to weak enforcement mechanisms. Whether regulatory network agreements provide global public goods or private goods for certain states' firms is a second debated issue. This paper adjudicates among competing perspectives by examining whether Basel III, an international agreement negotiated by the bank regulatory network about bank capital minimums in 2009 and 2010, was viewed as credible and affecting regulated US firms. I use stock returns to measure investors' perceptions, and an event study methodology to test whether regulated banks' observed stock returns significantly differ from expected stock returns on days when new information about Basel III becomes available. If the agreement is viewed as credible and affecting firm value, banks' stock returns will deviate from expectations. The direction of any deviation indicates whether regulations benefit or hurt banks. While the direction of effects is not uniform across events, I find that the initial stock return reaction and the net effect across all five events are negative, indicating that US banks were not helped by new international regulations. Further, US banks experienced stock returns that differed from expectations, providing evidence that international regulatory network agreements are viewed as credible and tangibly affect firms independent of domestic implementation. * Author is a Ph.D. candidate in the Department of Politics, Princeton University (email: [email protected], website: meredithwilf.webs.com). Many thanks to Christina Davis and Kosuke Imai for numerous iterations of comments, to Helen Milner for comments on earlier drafts and for the suggestion to look at the winners and losers of Basel III, and to Marc Ratkovic for assistance learning Lasso and for comments. For helpful comments and suggestions, thanks t