211,365 research outputs found
Towards Coding Social Science Datasets with Language Models
Researchers often rely on humans to code (label, annotate, etc.) large sets
of texts. This kind of human coding forms an important part of social science
research, yet the coding process is both resource intensive and highly variable
from application to application. In some cases, efforts to automate this
process have achieved human-level accuracies, but to achieve this, these
attempts frequently rely on thousands of hand-labeled training examples, which
makes them inapplicable to small-scale research studies and costly for large
ones. Recent advances in a specific kind of artificial intelligence tool -
language models (LMs) - provide a solution to this problem. Work in computer
science makes it clear that LMs are able to classify text, without the cost (in
financial terms and human effort) of alternative methods. To demonstrate the
possibilities of LMs in this area of political science, we use GPT-3, one of
the most advanced LMs, as a synthetic coder and compare it to human coders. We
find that GPT-3 can match the performance of typical human coders and offers
benefits over other machine learning methods of coding text. We find this
across a variety of domains using very different coding procedures. This
provides exciting evidence that language models can serve as a critical advance
in the coding of open-ended texts in a variety of applications
Large-Scale Analysis of the Accuracy of the Journal Classification Systems of Web of Science and Scopus
Journal classification systems play an important role in bibliometric
analyses. The two most important bibliographic databases, Web of Science and
Scopus, each provide a journal classification system. However, no study has
systematically investigated the accuracy of these classification systems. To
examine and compare the accuracy of journal classification systems, we define
two criteria on the basis of direct citation relations between journals and
categories. We use Criterion I to select journals that have weak connections
with their assigned categories, and we use Criterion II to identify journals
that are not assigned to categories with which they have strong connections. If
a journal satisfies either of the two criteria, we conclude that its assignment
to categories may be questionable. Accordingly, we identify all journals with
questionable classifications in Web of Science and Scopus. Furthermore, we
perform a more in-depth analysis for the field of Library and Information
Science to assess whether our proposed criteria are appropriate and whether
they yield meaningful results. It turns out that according to our
citation-based criteria Web of Science performs significantly better than
Scopus in terms of the accuracy of its journal classification system
The Profiling Potential of Computer Vision and the Challenge of Computational Empiricism
Computer vision and other biometrics data science applications have commenced
a new project of profiling people. Rather than using 'transaction generated
information', these systems measure the 'real world' and produce an assessment
of the 'world state' - in this case an assessment of some individual trait.
Instead of using proxies or scores to evaluate people, they increasingly deploy
a logic of revealing the truth about reality and the people within it. While
these profiling knowledge claims are sometimes tentative, they increasingly
suggest that only through computation can these excesses of reality be captured
and understood. This article explores the bases of those claims in the systems
of measurement, representation, and classification deployed in computer vision.
It asks if there is something new in this type of knowledge claim, sketches an
account of a new form of computational empiricism being operationalised, and
questions what kind of human subject is being constructed by these
technological systems and practices. Finally, the article explores legal
mechanisms for contesting the emergence of computational empiricism as the
dominant knowledge platform for understanding the world and the people within
it
- …