6 research outputs found

    Detecting Latent Ideology in Expert Text: Evidence From Academic Papers in Economics

    Full text link
    Previous work on extracting ideology from text has focused on domains where expression of political views is expected, but it’s unclear if current technology can work in domains where displays of ide-ology are considered inappropriate. We present a supervised ensemble n-gram model for ideology extraction with topic adjustments and apply it to one such do-main: research papers written by academic economists. We show economists ’ polit-ical leanings can be correctly predicted, that our predictions generalize to new do-mains, and that they correlate with public policy-relevant research findings. We also present evidence that unsupervised models can under-perform in domains where ide-ological expression is discouraged.

    Facebook users have become much more private: A large-scale study

    Full text link
    Abstract—We investigate whether Facebook users have become more private in recent years. Specifically, we examine if there have been any important trends in the information Facebook users reveal about themselves on their public profile pages since early 2010. To this end, we have crawled the public profile pages of 1.4 million New York City (NYC) Facebook users in March 2010 and again in June 2011. We have found that NYC users in our sample have become dramatically more private during this period. For example, in March 2010 only 17.2 % of users in our sample hid their friend lists, whereas in June 2011, just 15 months later, 52.6 % of the users hid their friend lists. We explore privacy trends for several personal attributes including friend list, networks, relationship, high school name and graduation year, gender, and hometown. We find that privacy trends have become more pronounced for certain demographics. Finally, we attempt to determine the primary causes behind the dramatic decrease in the amount of information Facebook users reveal about themselves to the general public. I

    No ground truth? No problem: Improving administrative data linking using active learning and a little bit of guile.

    No full text
    While linking records across large administrative datasets ["big data"] has the potential to revolutionize empirical social science research, many administrative data files do not have common identifiers and are thus not designed to be linked to others. To address this problem, researchers have developed probabilistic record linkage algorithms which use statistical patterns in identifying characteristics to perform linking tasks. Naturally, the accuracy of a candidate linking algorithm can be substantially improved when an algorithm has access to "ground-truth" examples-matches which can be validated using institutional knowledge or auxiliary data. Unfortunately, the cost of obtaining these examples is typically high, often requiring a researcher to manually review pairs of records in order to make an informed judgement about whether they are a match. When a pool of ground-truth information is unavailable, researchers can use "active learning" algorithms for linking, which ask the user to provide ground-truth information for select candidate pairs. In this paper, we investigate the value of providing ground-truth examples via active learning for linking performance. We confirm popular intuition that data linking can be dramatically improved with the availability of ground truth examples. But critically, in many real-world applications, only a relatively small number of tactically-selected ground-truth examples are needed to obtain most of the achievable gains. With a modest investment in ground truth, researchers can approximate the performance of a supervised learning algorithm that has access to a large database of ground truth examples using a readily available off-the-shelf tool
    corecore