27 research outputs found

    Capturing contentiousness: Constructing the contentious terms in context corpus

    Get PDF
    Recent initiatives by cultural heritage institutions in addressing outdated and offensive language used in their collections demonstrate the need for further understanding into when terms are problematic or contentious. This paper presents an annotated dataset of 2,715 unique samples of terms in context, drawn from a historical newspaper archive, collating 21,800 annotations of contentiousness from expert and crowd workers. We describe the contents of the corpus by analysing inter-rater agreement and differences between experts and crowd workers. In addition, we demonstrate the potential of the corpus for automated detection of contentiousness. We show that a simple classifier applied to the embedding representation of a target word provides a better than baseline performance in predicting contentiousness. We find that the term itself and the context play a role in whether a term is considered contentious

    Towards Olfactory Information Extraction from Text: A Case Study on Detecting Smell Experiences in Novels

    Get PDF
    Environmental factors determine the smells we perceive, but societal factors factors shape the importance, sentiment and biases we give to them. Descriptions of smells in text, or as we call them `smell experiences', offer a window into these factors, but they must first be identified. To the best of our knowledge, no tool exists to extract references to smell experiences from text. In this paper, we present two variations on a semi-supervised approach to identify smell experiences in English literature. The combined set of patterns from both implementations offer significantly better performance than a keyword-based baseline.Comment: Accepted to The 4th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature (LaTeCH-CLfL 2020). Barcelona, Spain. December 2020.

    Improving language model predictions via prompts enriched with knowledge graphs

    No full text
    Despite advances in deep learning and knowledge graphs (KGs), using language models for natural language understanding and question answering remains a challenging task. Pre-trained language models (PLMs) have shown to be able to leverage contextual information, to complete cloze prompts, next sentence completion and question answering tasks in various domains. Unlike structured data querying in e.g. KGs, mapping an input question to data that may or may not be stored by the language model is not a simple task. Recent studies have highlighted the improvements that can be made to the quality of information retrieved from PLMs by adding auxiliary data to otherwise naive prompts. In this paper, we explore the effects of enriching prompts with additional contextual information leveraged from the Wikidata KG on language model performance. Specifically, we compare the performance of naive vs. KG-engineered cloze prompts for entity genre classification in the movie domain. Selecting a broad range of commonly available Wikidata properties, we show that enrichment of cloze-style prompts with Wikidata information can result in a significantly higher recall for the investigated BERT and RoBERTa large PLMs. However, it is also apparent that the optimum level of data enrichment differs between models

    New worlds in political science

    Get PDF
    Political science’ is a ‘vanguard’ field concerned with advancing generic knowledge of political processes, while a wider ‘political scholarship’ utilising eclectic approaches has more modest or varied ambitions. Political science nonetheless necessarily depends upon and is epistemologically comparable with political scholarship. I deploy Boyer's distinctions between discovery, integration, application and renewing the profession to show that these connections are close woven. Two sets of key challenges need to be tackled if contemporary political science is to develop positively. The first is to ditch the current unworkable and restrictive comparative politics approach, in favour of a genuinely global analysis framework. Instead of obsessively looking at data on nation states, we need to seek data completeness on the whole (multi-level) world we have. A second cluster of challenges involves looking far more deeply into political phenomena; reaping the benefits of ‘digital-era’ developments; moving from sample methods to online census methods in organisational analysis; analysing massive transactional databases and real-time political processes (again, instead of depending on surveys); and devising new forms of ‘instrumentation’, informed by post-rational choice theoretical perspectives
    corecore