4,916 research outputs found

    Automatic Detection of Online Jihadist Hate Speech

    Full text link
    We have developed a system that automatically detects online jihadist hate speech with over 80% accuracy, by using techniques from Natural Language Processing and Machine Learning. The system is trained on a corpus of 45,000 subversive Twitter messages collected from October 2014 to December 2016. We present a qualitative and quantitative analysis of the jihadist rhetoric in the corpus, examine the network of Twitter users, outline the technical procedure used to train the system, and discuss examples of use.Comment: 31 page

    Exploiting `Subjective' Annotations

    Get PDF
    Many interesting phenomena in conversation can only be annotated as a subjective task, requiring interpretative judgements from annotators. This leads to data which is annotated with lower levels of agreement not only due to errors in the annotation, but also due to the differences in how annotators interpret conversations. This paper constitutes an attempt to find out how subjective annotations with a low level of agreement can profitably be used for machine learning purposes. We analyse the (dis)agreements between annotators for two different cases in a multimodal annotated corpus and explicitly relate the results to the way machine-learning algorithms perform on the annotated data. Finally we present two new concepts, namely `subjective entity' classifiers resp. `consensus objective' classifiers, and give recommendations for using subjective data in machine-learning applications.\u

    Introspective data and corpus data : combination instead of confrontation in the study of German metaphorical idioms of life

    Get PDF
    This paper examines the applicability of the combination of data types in a study of German idioms of life with the tools of cognitive metaphor theory. The data sources for conceptual metaphors were mainly metaphors found in the relevant literature. These metaphors are of introspective nature to a great extent. The primary data sources for metaphorical expressions were dictionaries that represent introspective data, too. These data have been complemented by corpus data. The paper discusses the problems of introspective and corpus data raised by the study of German idioms of life. Two case studies demonstrate the advantages of the combination of data and methods

    Between context and community: Regional variation in register effects in the English dative alternation

    Full text link
    This paper investigates the relationship between the stylistic context of utterance production and the language user’s regional background as influencing factors in one syntactic alternation, i.e., variation between the double object and the prepositional dative construction. To that end, this chapter zooms in on (1) the competition between stylistic context and regional community regarding dative choice, (2) cross-regional inter-register variation, and (3) register-specific coherence (aka intra-register variation). Comparing data from nine varieties of English using corpora that presumably share the same structure (and registers) reveals that community is more important than context, that the effect of register is regionally variable and that registers are largely but not fully coherent. These findings do not only stress the variable nature of probabilistic grammars but also point to the importance of regional effects when studying register variation (all scripts at https://​osf​.io​/3djkr/)
    • …
    corecore