52 research outputs found

    Introduction

    Get PDF
    Zadanie pt. „Digitalizacja i udostępnienie w Cyfrowym Repozytorium Uniwersytetu Łódzkiego kolekcji czasopism naukowych wydawanych przez Uniwersytet Łódzki” nr 885/P-DUN/2014 zostało dofinansowane ze środków MNiSW w ramach działalności upowszechniającej nauk

    A statistical analysis of lexis in conversational english

    Get PDF
    Chapter ElevenZadanie pt. „Digitalizacja i udostępnienie w Cyfrowym Repozytorium Uniwersytetu Łódzkiego kolekcji czasopism naukowych wydawanych przez Uniwersytet Łódzki” nr 885/P-DUN/2014 zostało dofinansowane ze środków MNiSW w ramach działalności upowszechniającej nauk

    Introduction

    Get PDF
    Zadanie pt. „Digitalizacja i udostępnienie w Cyfrowym Repozytorium Uniwersytetu Łódzkiego kolekcji czasopism naukowych wydawanych przez Uniwersytet Łódzki” nr 885/P-DUN/2014 zostało dofinansowane ze środków MNiSW w ramach działalności upowszechniającej nauk

    Remembering Alina Kwiatkowska

    Get PDF

    Partial Perception and Approximate Understanding

    Get PDF
    What is discussed in the present paper is the assumption concerning a human narrowed sense of perception of external world and, resulting from this, a basically approximate nature of concepts that are to portray it. Apart from the perceptual vagueness, other types of vagueness are also discussed, involving both the nature of things, indeterminacy of linguistic expressions and psycho-sociological conditioning of discourse actions in one language and in translational contexts. The second part of the paper discusses the concept of conceptual and linguistic resemblance (similarity, equivalence) and discourse approximating strategies and proposes a Resemblance Matrix, presenting ways used to narrow the approximation gap between the interacting parties in monolingual and translational discourses

    Implicit Offensive Language Taxonomy and Its Application for Automatic Extraction and Ontology

    Get PDF
    Purpose: In this current study, we intend to explore varying forms of implicit (mostly figurative) offensiveness (e.g., irony, metaphor, hyperbole, etc.) in order to propose a linguistic taxonomy of implicit offensiveness (and how it permeates explicit forms), and an ontology of offensive terms readily applicable to fine-tuned, pre-trained language models (word and phrase embedding). Offensive language has recently attracted great attention from computational scientists (e.g., Zampieri et al., 2019) and linguists alike (e.g., Haugh & Sinkeviciute, 2019). While in NLP scholars focus on ways of automatic extraction of what is generally and most often referred to as toxic language, in linguistics the concept of hate speech is frequently explored. Implicit offensive language, however, as opposed to explicit offence, has received little scholarly attention which so far has focused solely on single and unrelated concepts/terms. This paper aims at proposing an overarching model where varying subtypes of implicitness used in the context of offensive language are conceptually linked (Bączkowska et al., 2022)

    LOD-Connected Offensive Language Ontology and Tagset Enrichment

    Get PDF
    CC BY 4.0The main focus of the paper is the definitional revision and enrichment of offensive language typology, making reference to publicly available offensive language datasets and testing them on available pretrained lexical embedding systems. We review over 60 available corpora and compare tagging schemas applied there while making an attempt to explain semantic differences between particular concepts of the category OFFENSIVE in English. A finite set of classes that cover aspects of offensive language representation along with linguistically sound explanations is presented, based on the categories originally proposed by Zampieri et al. [1, 2] in terms of offensive language categorization schemata and tested by means of Sketch Engine tools on a large web-based corpus. The schemata are juxtaposed and discussed with reference to non-contextual word embeddings FastText, Word2Vec, and Glove. The methodology for mapping from existing corpora to a unified ontology as presented in this paper is provided. The proposed schema will enable further comparable research and effective use of corpora of languages other than English. It will also be applied in building an enriched tagset to be trained and used on new data, with the application of recently developed LLOD techniques [3]

    Anotacijska shema i njezina evaluacija: primjer uvredljivoga jezika

    Get PDF
    The present paper focuses on the presentation and discussion of aspects of OFFENSIVE LANGUAGE linguistic annotation, including the creation, annotation practice, curation, and evaluation of an OFFENSIVE LANGUAGE annotation taxonomy scheme, that was first proposed in Lewandowska-Tomaszczyk et al. (2021). An extended offensive language ontology comprising 17 categories, structured in terms of 4 hierarchical levels, has been shown to represent the encoding of the defined offensive language schema, trained in terms of non-contextual word embeddings – i.e., Word2Vec and Fast Text, and eventually juxtaposed to the data acquired by using a pair wise training and testing analysis for existing categories in the HateBERT model (Lewandowska-Tomaszczyk et al. submitted). The study reports on the annotation practice in WG 4.1.1. Incivility in media and social media in the context of COST Action CA 18209 European network for Web-centred linguistic data science (Nexus Linguarum) with the INCEpTION tool (https://github.com/inception-project/inception) – a semantic annotation platform offering assistance in the annotation. The results partly support the proposed ontology of explicit offense and positive implicitness types to provide more variance among widely recognized types of figurative language (e.g., metaphorical, metonymic, ironic, etc.). The use of the annotation system and the representation of linguistic data were also evaluated in a series of the annotators’ comments, by means of a questionnaire and an open discussion. The annotation results and the questionnaire showed that for some of the categories there was low or medium inter-annotator agreement, and it was more challenging for annotators to distinguish between category items than between aspect items, with the category items offensive, insulting and abusive being the most difficult in this respect. The need for taxonomic simplification measures on the basis of these results has been recognized for further annotation practices.U ovome je radu predstavljen proces označavanja uvredljivoga jezika koji uključuje izradu klasifikacije toga jezika, označivačku praksu, vođenje procesa i evaluaciju. Klasifikacijska je shema prvi put predložena u Lewandowska-Tomaszczyk i dr. (2021). Proširena ontologija uvredljivoga jezika sadrži 17 kategorija posloženih u četiri hijerarhijske razine te tako predstavlja shemu uvredljivoga jezika koja je trenirana u okviru nekontekstualiziranih vektorskih prikaza riječi (engl. word embeddings) poput Word2Vec i Fast Text koji su naposljetku supostavljeni podatcima prikupljenima korištenjem analize parova i analize testiranja za postojeće kategorije u modelu HateBERT (Lewandowska-Tomaszczyk i dr., u postupku recenzije). U radu se izvještava o označivačkoj praksi u okviru radne grupe WG 4.1.1. Incivility in media and social media COST-ove akcije CA 18209 European network for Web-centred linguistic data science (Nexus Linguarum). Označavanje je provedeno u alatu INCEpTION (https://github.com/inception-project/inception) – platformi za semantičko označavanje koja ima ugrađene alate za takvu obradu podataka. Dobiveni rezultati podupiru predloženu ontologiju eksplicitnoga i implicitnoga uvredljivog jezika koja omogućuje veću raznovrsnost među već prepoznatim tipovima figurativnoga jezika (primjerice metafora, metonimija, ironija itd.). Upotreba sustava za označavanje i prikazivanje jezičnih podataka također je procijenjena u povratnim komentarima koje su pružili označivači. Komentari označivača prikupljeni su metodom upitnika te otvorenom raspravom. Na kraju je usustavljen niz preporuka za buduće označivačke prakse

    Annotation Scheme and Evaluation: The Case of OFFENSIVE Language

    Get PDF
    Purpose: Offensive discourse refers to the presence of explicit or implicit verbal attacks towards individuals or groups and has been extensively analyzed in linguistics (e.g., Culpeper, 2005; Haugh & Sinkeviciute, 2019) and in NLP (e.g., OffensEval (Zampieri et al., 2020), HASOC (Mandl et al., 2019)), under the names of hate speech, abusive language, offensive language, etc. The paper focuses on the presentation and discussion of aspects of the linguistic annotation of OFFENSIVE LANGUAGE, including creation, annotation practice, curation, and evaluation of an OFFENSIVE LANGUAGE annotation taxonomy scheme first proposed in Lewandowska-Tomaszczyk et al. (2021) and Žitnik et al. (in press). An extended offensive language ontology in terms of 17 categories, structured in terms of 4 hierarchical levels, has been shown to represent the encoding of the defined offensive language schema, trained in terms of non-contextual word embeddings – i.e., Word2Vec and Fast Text – and eventually juxtaposed to the data acquired by using pairwise training and testing analysis for existing categories in the HateBERT model
    corecore