Media framing is the study of strategically selecting and presenting specific
aspects of political issues to shape public opinion. Despite its relevance to
almost all societies around the world, research has been limited due to the
lack of available datasets and other resources. This study explores the
possibility of dataset creation through crowdsourcing, utilizing non-expert
annotators to develop training corpora. We first extend framing analysis beyond
English news to a multilingual context (12 typologically diverse languages)
through automatic translation. We also present a novel benchmark in Bengali and
Portuguese on the immigration and same-sex marriage domains. Additionally, we
show that a system trained on our crowd-sourced dataset, combined with other
existing ones, leads to a 5.32 percentage point increase from the baseline,
showing that crowdsourcing is a viable option. Last, we study the performance
of large language models (LLMs) for this task, finding that task-specific
fine-tuning is a better approach than employing bigger non-specialized models.Comment: accepted at NAACL 202