Using large language models for preprocessing and information extraction from unstructured text: A proof-of-concept application in the social sciences

Abstract

Recent months have witnessed an increase in suggested applications for large language models (LLMs) in the social sciences. This proof-of-concept paper explores the use of LLMs to improve text quality and to extract predefined information from unstructured text. The study showcases promising results with an example focussed on historical newspapers and highlights the effectiveness of LLMs in correcting errors in the parsed text and in accurately extracting specified information. By leveraging the capabilities of LLMs in these straightforward, instruction-based tasks, this research note demonstrates their potential to improve on the efficiency and accuracy of text analysis workflows. The ongoing development of LLMs and the emergence of robust open-source options underscores their increasing accessibility for both, the quantitative and qualitative, social sciences and other disciplines working with text data

Similar works

Full text

thumbnail-image

MAnnheim DOCument Server (Univ. Mannheim)

redirect
Last time updated on 05/02/2025

This paper was published in MAnnheim DOCument Server (Univ. Mannheim).

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.

Licence: https://creativecommons.org/licenses/by/4.0/