Search CORE

137 research outputs found

Multilingual Cross-domain Perspectives on Online Hate Speech

Author: Daelemans Walter
De Pauw Guy
De Smedt Tom
Gwóźdź Maja
Jaki Sylvia
Kotzé Eduan
Saoud Leïla
Publication venue
Publication date: 01/01/2018
Field of study

In this report, we present a study of eight corpora of online hate speech, by demonstrating the NLP techniques that we used to collect and analyze the jihadist, extremist, racist, and sexist content. Analysis of the multilingual corpora shows that the different contexts share certain characteristics in their hateful rhetoric. To expose the main features, we have focused on text classification, text profiling, keyword and collocation extraction, along with manual annotation and qualitative study.Comment: 24 page

arXiv.org e-Print Archive

Institutional Repository Universiteit Antwerpen

Forensics Writer Identification using Text Mining and Machine Learning

Author: Alawar Saif Ali
Publication venue: RIT Scholar Works
Publication date: 01/04/2021
Field of study

Constant technological growth has resulted in the danger and seriousness of cyber-attacks, which has recently unmistakably developed in various institutions that have complex Information Technology (IT) infrastructure. For instance, for the last three (3) years, the most horrendous instances of cybercrimes were perceived globally with enormous information breaks, fake news spreading, cyberbullying, crypto-jacking, and cloud computing services. To this end, various agencies improvised techniques to curb this vice and bring perpetrators, both real and perceived, to book in relation to such serious cybersecurity issues. Consequently, Forensic Writer Identification was introduced as one of the most effective remedies to the concerned issue through a stylometry application. Indeed, the Forensic Writer Identification is a complex forensic science technology that utilizes Artificial Intelligence (AI) technology to safeguard, recognize proof, extraction, and documentation of the computer or digital explicit proof that can be utilized by the official courtroom, especially, the investigative officers in case of a criminal issue or just for data analytics. This research\u27s fundamental objective was to scrutinize Forensic Writer Identification technology aspects in twitter authorship analytics of various users globally and apply it to reduce the time to find criminals by providing the Police with the most accurate methodology. As well as compare the accuracy of different techniques. The report shall analytically follow a logical literature review that observes the vital text analysis techniques. Additionally, the research applied agile text mining methodology to extract and analyze various Twitter users\u27 texts. In essence, digital exploration for appropriate academics and scholarly artifacts was affected in various online and offline databases to expedite this research. Forensic Writer Identification for text extraction, analytics have recently appreciated reestablished attention, with extremely encouraging outcomes. In fact, this research presents an overall foundation and reason for text and author identification techniques. Scope of current techniques and applications are given, additionally tending to the issue of execution assessment. Results on various strategies are summed up, and a more inside and out illustration of two consolidated methodologies are introduced. By encompassing textural, algorithms, and allographic, emerging technologies are beginning to show valuable execution levels. Nevertheless, user acknowledgment would play a vital role with regards to the future of technology. To this end, the goal of coming up with a project proposal was to come up with an analytical system that would automate the process of authorship identification methodology in various Web 2.0 Technologies aspects globally, hence addressing the contemporary cybercrime issues

RIT Scholar Works

Application of linguistic cues in the analysis of language of hate groups

Author: Balcerzak Bartłomiej
Jaworski Wojciech
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 01/01/2015
Field of study

Hate speech and fringe ideologies are social phenomena that thrive on-line. Members of the political and religious fringe are able to propagate their ideas via the Internet with less effort than in traditional media. In this article, we attempt to use linguistic cues such as the occurrence of certain parts of speech in order to distinguish the language of fringe groups from strictly informative sources. The aim of this research is to provide a preliminary model for identifying deceptive materials online. Examples of these would include aggressive marketing and hate speech. For the sake of this paper, we aim to focus on the political aspect. Our research has shown that information about sentence length and the occurrence of adjectives and adverbs can provide information for the identification of differences between the language of fringe political groups and mainstream media

Computer Science Journal (AGH University of Science and Technology, Krakow)

Biblioteka Nauki - repozytorium artykuÅÃ³w

Proceedings of the LREC 2020 workshop on Resources and Techniques for User and Author Profiling in Abusive Language (ResT-UP 2020)

Author: di Buono Maria Pia
MANNA RAFFAELE
MONTI JOHANNA
PASCUCCI ANTONIO
Sara Tonelli
Valerio Basile
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2020
Field of study

Università degli Studi di Napoli L'Orientale: CINECA IRIS

The translator’s wife’s traces : Alma Cardell Curtin and Jeremiah Curtin

Author: Rybicki Jan
Publication venue: 'Uniwersytet Jagiellonski - Wydawnictwo Uniwersytetu Jagiellonskiego'
Publication date: 01/01/2012
Field of study

Jeremiah Curtin translated most works by Poland’s first literary Nobel Prize winner, Henryk Sienkiewicz. He was helped in this life-long task by his wife Alma Cardell Curtin. It was Alma who, after her husband’s death, produced the lengthy Memoirs she steadfastly ascribed to her husband for his, rather than hers, greater glory. This paper investigates the possible textual influences Alma might have had on other works by her husband, including his travelogues, ethnographic and mythological studies, and the translations themselves. Lacking traditional authorial evidence, this study relies on stylometric methods comparing most frequent word usage by means of cluster analysis of z-scores. There is much in this statistics-based authorial attribution to show how Alma Cardell Curtin affected at least two other original works of her husband and, possibly, at least two of his translations as well.

Jagiellonian Univeristy Repository

Law on the Installment Plan

Author: Frier Bruce W.
Publication venue: University of Michigan Law School Scholarship Repository
Publication date: 01/02/1984
Field of study

A Review of Ulpian by Tony Honor

University of Michigan School of Law

Quantitative Patterns of Stylistic Influence in the Evolution of Literature

Author: Foti Nicholas J
Hughes James M.
Krakauer David C
Rockmore Daniel N
Publication venue: Dartmouth Digital Commons
Publication date: 30/04/2012
Field of study

Literature is a form of expression whose temporal structure, both in content and style, provides a historical record of the evolution of culture. In this work we take on a quantitative analysis of literary style and conduct the first large-scale temporal stylometric study of literature by using the vast holdings in the Project Gutenberg Digital Library corpus. We find temporal stylistic localization among authors through the analysis of the similarity structure in feature vectors derived from content-free word usage, nonhomogeneous decay rates of stylistic influence, and an accelerating rate of decay of influence among modern authors. Within a given time period we also find evidence for stylistic coherence with a given literary topic, such that writers in different fields adopt different literary styles. This study gives quantitative support to the notion of a literary “style of a time” with a strong trend toward increasingly contemporaneous stylistic influence

PubMed Central

Dartmouth Digital Commons (Dartmouth College)

Recommended from our members

A Stylometric Analysis of Climate Change Fiction

Author: Lorenz Nina
Publication venue: ScholarWorks@UMass Amherst
Publication date: 15/07/2020
Field of study

This work sets out to analyze stylistic changes in Anthropocene fiction over the past 60 years. The starting point for the analysis has been Rachel Carson, and the presumed beginning of the Anthropocene in the 1960s. The primary insight gained reveals the connections within these novel and relations of similar writing about climate change thereby contributing to the field of Environmental Humanities in a fundamental way, as so far, climate change fiction has only been investigated through a topic centered focus. The corpus compiled for scrutiny here extends to over 84 novels from these years. These novels have been selected based on a dual approach, looking at the secondary literature as well as a crowdsourced approach in looking at Good Reads’ cli-fi lists. The resulting texts are then analyzed with stylo, an R package that has been specifically created for stylometric analysis by humanists. The results are visualized in a network that allows easier interpretation and leads to an understanding of more detailed questions about the nature of the connection between works, the inspiration and representation of a specific genre of writing. Moreover, the thesis looks diachronically at clustering based on time and topic. Understanding the ways in which authors address and have addressed climate change is one indicator of how climate change is and has been comprehended. In terms of the digital approach applied here, the basis is a distant reading approach covering a larger number of novels and rather than close reading them, the task is to find patterns that extend throughout. However, for a thorough analysis, scalable reading is applied to contextualize and investigate the results in more depth. Overall, the results are meant to establish a baseline for discussing climate change fiction in the Anthropocene which although gaining more scholarly attention still is understudied. The hope is to not only gain insight but to generate visualizations that will provide a helpful resource for fellow scholars

ScholarWorks@UMass Amherst

Detecting deceptive behaviour in the wild:text mining for online child protection in the presence of noisy and adversarial social media communications

Author: Peersman Claudia
Publication venue: Lancaster University
Publication date: 01/01/2018
Field of study

A real-life application of text mining research “in the wild”, i.e. in online social media, differs from more general applications in that its defining characteristics are both domain and process dependent. This gives rise to a number of challenges of which contemporary research has only scratched the surface. More specifically, a text mining approach applied in the wild typically has no control over the dataset size. Hence, the system has to be robust towards limited data availability, a variable number of samples across users and a highly skewed dataset. Additionally, the quality of the data cannot be guaranteed. As a result, the approach needs to be tolerant to a certain degree of linguistic noise. Finally, it has to be robust towards deceptive behaviour or adversaries. This thesis examines the viability of a text mining approach for supporting cybercrime investigations pertaining to online child protection. The main contributions of this dissertation are as follows. A systematic study of different aspects of methodological design of a state-ofthe- art text mining approach is presented to assess its scalability towards a large, imbalanced and linguistically noisy social media dataset. In this framework, three key automatic text categorisation tasks are examined, namely the feasibility to (i) identify a social network user’s age group and gender based on textual information found in only one single message; (ii) aggregate predictions on the message level to the user level without neglecting potential clues of deception and detect false user profiles on social networks and (iii) identify child sexual abuse media among thousands of legal other media, including adult pornography, based on their filename. Finally, a novel approach is presented that combines age group predictions with advanced text clustering techniques and unsupervised learning to identify online child sex offenders’ grooming behaviour. The methodology presented in this thesis was extensively discussed with law enforcement to assess its forensic readiness. Additionally, each component was evaluated on actual child sex offender data. Despite the challenging characteristics of these text types, the results show high degrees of accuracy for false profile detection, identifying grooming behaviour and child sexual abuse media identification

Lancaster E-Prints

Explore Bristol Research