Search CORE

211,365 research outputs found

Towards Coding Social Science Datasets with Language Models

Author: Argyle Lisa
Busby Ethan
Fulda Nancy
Gubler Joshua
Rytting Christopher Michael
Sorensen Taylor
Wingate David
Publication venue
Publication date: 03/06/2023
Field of study

Researchers often rely on humans to code (label, annotate, etc.) large sets of texts. This kind of human coding forms an important part of social science research, yet the coding process is both resource intensive and highly variable from application to application. In some cases, efforts to automate this process have achieved human-level accuracies, but to achieve this, these attempts frequently rely on thousands of hand-labeled training examples, which makes them inapplicable to small-scale research studies and costly for large ones. Recent advances in a specific kind of artificial intelligence tool - language models (LMs) - provide a solution to this problem. Work in computer science makes it clear that LMs are able to classify text, without the cost (in financial terms and human effort) of alternative methods. To demonstrate the possibilities of LMs in this area of political science, we use GPT-3, one of the most advanced LMs, as a synthetic coder and compare it to human coders. We find that GPT-3 can match the performance of typical human coders and offers benefits over other machine learning methods of coding text. We find this across a variety of domains using very different coding procedures. This provides exciting evidence that language models can serve as a critical advance in the coding of open-ended texts in a variety of applications

arXiv.org e-Print Archive

Large-Scale Analysis of the Accuracy of the Journal Classification Systems of Web of Science and Scopus

Author: Waltman Ludo
Wang Qi
Publication venue
Publication date: 24/01/2016
Field of study

Journal classification systems play an important role in bibliometric analyses. The two most important bibliographic databases, Web of Science and Scopus, each provide a journal classification system. However, no study has systematically investigated the accuracy of these classification systems. To examine and compare the accuracy of journal classification systems, we define two criteria on the basis of direct citation relations between journals and categories. We use Criterion I to select journals that have weak connections with their assigned categories, and we use Criterion II to identify journals that are not assigned to categories with which they have strong connections. If a journal satisfies either of the two criteria, we conclude that its assignment to categories may be questionable. Accordingly, we identify all journals with questionable classifications in Web of Science and Scopus. Furthermore, we perform a more in-depth analysis for the field of Library and Information Science to assess whether our proposed criteria are appropriate and whether they yield meaningful results. It turns out that according to our citation-based criteria Web of Science performs significantly better than Scopus in terms of the accuracy of its journal classification system

arXiv.org e-Print Archive

The Profiling Potential of Computer Vision and the Challenge of Computational Empiricism

Author: Alley Thomas
Andrejevic Mark
Apter Emily
Citron Danielle Keats
Clemens Justin
Gandy Oscar H
Humphries Paul
Jäger Jens
Krizhevsky Alex
Mann Monique
Marien Mary Warner
Selinger Evan
Todorov Alexander
Vanian Jonathan
Weatherby Leif
Willis J
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/04/2019
Field of study

Computer vision and other biometrics data science applications have commenced a new project of profiling people. Rather than using 'transaction generated information', these systems measure the 'real world' and produce an assessment of the 'world state' - in this case an assessment of some individual trait. Instead of using proxies or scores to evaluate people, they increasingly deploy a logic of revealing the truth about reality and the people within it. While these profiling knowledge claims are sometimes tentative, they increasingly suggest that only through computation can these excesses of reality be captured and understood. This article explores the bases of those claims in the systems of measurement, representation, and classification deployed in computer vision. It asks if there is something new in this type of knowledge claim, sketches an account of a new form of computational empiricism being operationalised, and questions what kind of human subject is being constructed by these technological systems and practices. Finally, the article explores legal mechanisms for contesting the emergence of computational empiricism as the dominant knowledge platform for understanding the world and the people within it

arXiv.org e-Print Archive