9 research outputs found

    Large-scale text processing pipeline with Apache Spark

    Full text link
    In this paper, we evaluate Apache Spark for a data-intensive machine learning problem. Our use case focuses on policy diffusion detection across the state legislatures in the United States over time. Previous work on policy diffusion has been unable to make an all-pairs comparison between bills due to computational intensity. As a substitute, scholars have studied single topic areas. We provide an implementation of this analysis workflow as a distributed text processing pipeline with Spark dataframes and Scala application programming interface. We discuss the challenges and strategies of unstructured data processing, data formats for storage and efficient access, and graph processing at scale

    Political Cleavages within Industry: Firm level lobbying for Trade Liberalization ∗

    Get PDF
    [Work in Progress] Existing political economy models rely on inter-industry differences such as factor endowment or factor specificity to explain the politics of trade policy-making. However, this paper finds that a large proportion of variation in U.S. applied tariff rates in fact arises within industry. I offer a theory of trade liberalization that explains how product differentiation in economic markets leads to firm-level lobbying in political markets. I argue that while high product differentiation eliminates the collective action problem exporting firms confront, political objections to product-specific liberalization will decline due to less substitutability and the possibility of serving foreign markets based on the norms of reciprocity. To test this argument, I construct a new dataset on lobbying by all publicly traded manufacturing firms after parsing all 838,588 lobbying reports filed under the Lobbying Disclosure Act of 1995. I find that productive exporting firms are more likely to lobby to reduce tariffs, especially when their products are sufficiently differentiated. I also find that highly differentiated products have lower tariff rates. The results challenge the common focus on industry-level lobbying for protection

    Essays in Political Methodology

    No full text
    This collection of essays studies Bayesian statistical models for data analysis in political science. The first chapter proposes a nonparametric Bayesian approach that uncovers heterogeneous treatment effects even when moderators are unobserved. The method employs a Dirichlet process mixture model to estimate the distribution of treatment effects, and it is applicable to any setting in which regression models are used for causal inference. An application to a study on resource curse also shows that the method finds the subset of observations for which the monotonicity assumption is likely to hold. The second chapter proposes a new topic model for analyzing plagiarism. Text is modeled as a mixture of words copied from the plagiarized source and words drawn from a new distribution, and the model provides estimated probabilities of plagiarism for each word. An application to the corpus of preferential trade arrangements shows the utility of the model by describing how the likelihood of plagiarism varies across topics. The third chapter develops an nn-gram based topic model for analyzing citations and text. It introduces a latent variable indicating whether each token is in the positive or negative mode and assumes dependence between the mode of citation and its previous terms. An application to the data set of the WTO dispute settlement mechanism shows that the conclusion of an existing study using the same data may hold only in some issues. The fourth chapter (coauthored with Gabriel Lopez-Moctezuma and Devin Incerti) reexamines resource curse using the Bayesian dynamic linear model (DLM). The DLM models both temporal dependence in the data and the evolution of parameters. The results show that there is a negative relationship between oil income and the level of democracy only after the 1970s, which coincides with the Arab oil embargo and the nationalization of the oil industry in major oil-exporting countries

    A Non-parametric Bayesian Model for Detecting Differential Item Functioning: An Application to Political Representation in the US

    Full text link
    A common approach when studying the quality of representation involves comparing the latent preferences of voters and legislators, commonly obtained by fitting an item-response theory (IRT) model to a common set of stimuli. Despite being exposed to the same stimuli, voters and legislators may not share a common understanding of how these stimuli map onto their latent preferences, leading to differential item-functioning (DIF) and incomparability of estimates. We explore the presence of DIF and incomparability of latent preferences obtained through IRT models by re-analyzing an influential survey data set, where survey respondents expressed their preferences on roll call votes that U.S. legislators had previously voted on. To do so, we propose defining a Dirichlet Process prior over item-response functions in standard IRT models. In contrast to typical multi-step approaches to detecting DIF, our strategy allows researchers to fit a single model, automatically identifying incomparable sub-groups with different mappings from latent traits onto observed responses. We find that although there is a group of voters whose estimated positions can be safely compared to those of legislators, a sizeable share of surveyed voters understand stimuli in fundamentally different ways. Ignoring these issues can lead to incorrect conclusions about the quality of representation

    Credibility and Distributional Effects of International Banking Regulations: Evidence from US Bank Stock Returns

    No full text
    Abstract Financial regulatory networks are a pervasive, new type of global governance heralded by some as a flexible answer to globalization dilemmas and dismissed by others as ineffective due to weak enforcement mechanisms. Whether regulatory network agreements provide global public goods or private goods for certain states' firms is a second debated issue. This paper adjudicates among competing perspectives by examining whether Basel III, an international agreement negotiated by the bank regulatory network about bank capital minimums in 2009 and 2010, was viewed as credible and affecting regulated US firms. I use stock returns to measure investors' perceptions, and an event study methodology to test whether regulated banks' observed stock returns significantly differ from expected stock returns on days when new information about Basel III becomes available. If the agreement is viewed as credible and affecting firm value, banks' stock returns will deviate from expectations. The direction of any deviation indicates whether regulations benefit or hurt banks. While the direction of effects is not uniform across events, I find that the initial stock return reaction and the net effect across all five events are negative, indicating that US banks were not helped by new international regulations. Further, US banks experienced stock returns that differed from expectations, providing evidence that international regulatory network agreements are viewed as credible and tangibly affect firms independent of domestic implementation. * Author is a Ph.D. candidate in the Department of Politics, Princeton University (email: [email protected], website: meredithwilf.webs.com). Many thanks to Christina Davis and Kosuke Imai for numerous iterations of comments, to Helen Milner for comments on earlier drafts and for the suggestion to look at the winners and losers of Basel III, and to Marc Ratkovic for assistance learning Lasso and for comments. For helpful comments and suggestions, thanks t
    corecore