Search CORE

1,623 research outputs found

Brute - Force Sentence Pattern Extortion from Harmful Messages for Cyberbullying Detection

Author: Araki Kenji
Kimura Yasutomo
Leliwa Gniewosz
Lempa Pawel
Masui Fumito
Ptaszynski Michal
Rzepka Rafal
Wroczynski Michal
Publication venue: AIS Electronic Library (AISeL)
Publication date: 29/08/2019
Field of study

Cyberbullying, or humiliating people using the Internet, has existed almost since the beginning ofInternet communication.The relatively recent introduction of smartphones and tablet computers has caused cyberbullying to evolve into a serious social problem. In Japan, members of a parent-teacher association (PTA)attempted to address the problem by scanning the Internet for cyber bullying entries. To help these PTA members and other interested parties confront this difficult task we propose a novel method for automatic detection of malicious Internet content. This method is based on a combinatorial approach resembling brute-force search algorithms, but applied in language classification. The method extracts sophisticated patterns from sentences and uses them in classification. The experiments performed on actual cyberbullying data reveal an advantage of our method vis-à-visprevious methods. Next, we implemented the method into an application forAndroid smartphones to automatically detect possible harmful content in messages. The method performed well in the Android environment, but still needs to be optimized for time efficiency in order to be used in practic

AIS Electronic Library (AISeL)

TA-COS 2018 : 2nd Workshop on Text Analytics for Cybersecurity and Online Safety : Proceedings

Author: De Pauw Guy
Desmet Bart
Lefever Els
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2018
Field of study

Ghent University Academic Bibliography

Novel Survey on Email Spam Filtering Methods

Author: Diksha M. Bhalerao, Prof. Dr Dayanand R. Ingle
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/04/2018
Field of study

Spam emails are causing major resource wastage by unnecessarily flooding the network links.The cost of spam is borne mostly by the recipient, so it is a form of postage due advertising. This paper describes how different methods can be used for spam filtering.To protect against unsolicited e-mails there are number of techniques presented with goal of efficient, accurate spam filtering. Few previous spam filters can meet the requirements of being user-friendly, attack-resilient, and personalized. This paper presents a literature survey into the state of research on spam filtering methods and how it is useful for user’s lives

International Journal on Recent and Innovation Trends in Computing and Communication

Proceedings of the LREC 2020 workshop on Resources and Techniques for User and Author Profiling in Abusive Language (ResT-UP 2020)

Author: di Buono Maria Pia
MANNA RAFFAELE
MONTI JOHANNA
PASCUCCI ANTONIO
Sara Tonelli
Valerio Basile
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2020
Field of study

Università degli Studi di Napoli L'Orientale: CINECA IRIS

Efficient filtering of adult content using textual information

Author: Largillier Thomas
Peyronnet Guillaume
Peyronnet Sylvain
Publication venue
Publication date: 04/12/2015
Field of study

Nowadays adult content represents a non negligible proportion of the Web content. It is of the utmost importance to protect children from this content. Search engines, as an entry point for Web navigation are ideally placed to deal with this issue. In this paper, we propose a method that builds a safe index i.e. adult-content free for search engines. This method is based on a filter that uses only textual information from the web page and the associated URL

arXiv.org e-Print Archive

HAL - Normandie Université

Results of the PolEval 2019 Shared Task 6 : first dataset and Open Shared Task for automatic cyberbullying detection in Polish Twitter

Author: Dybała Paweł
Pieciukiewicz Agata
Ptaszynski Michal
Publication venue: Institute of Computer Sciences. Polish Academy of Sciences
Publication date: 01/01/2019
Field of study

In this paper we describe the first dataset for the Polish language containing annotations of harmful and toxic language. The dataset was created to study harmful Internet phenomena such as cyberbullying and hate speech, which recently dramatically gain on numbers in Polish Internet as well as worldwide. The dataset was automatically collected from Polish Twitter accounts and annotated by both layperson volunteers under the supervision of a cyberbullying and hate-speech expert. Together with the dataset we propose the first open shared task for Polish to utilize the dataset in classification of such harmful phenomena. In particular, we propose two subtasks: 1) binary classification of harmful and non-harmful tweets, and 2) multiclass classification between two types of harmful information (cyberbullying and hate-speech), and other. The first installment of the shared task became a success by reaching fourteen overall submissions, hence proving a high demand for research applying such data

Jagiellonian Univeristy Repository

Aggressive, Repetitive, Intentional, Visible, and Imbalanced: Refining Representations for Cyberbullying Classification

Author: Morstatter Fred
Vigfusson Ymir
Ziems Caleb
Publication venue
Publication date: 03/04/2020
Field of study

Cyberbullying is a pervasive problem in online communities. To identify cyberbullying cases in large-scale social networks, content moderators depend on machine learning classifiers for automatic cyberbullying detection. However, existing models remain unfit for real-world applications, largely due to a shortage of publicly available training data and a lack of standard criteria for assigning ground truth labels. In this study, we address the need for reliable data using an original annotation framework. Inspired by social sciences research into bullying behavior, we characterize the nuanced problem of cyberbullying using five explicit factors to represent its social and linguistic aspects. We model this behavior using social network and language-based features, which improve classifier performance. These results demonstrate the importance of representing and modeling cyberbullying as a social phenomenon.Comment: 12 pages, 5 figures, 22 tables, Accepted to the 14th International AAAI Conference on Web and Social Media, ICWSM'2

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Relationship Between Personality Patterns and Harmfulness : Analysis and Prediction Based on Sentence Embedding

Author: Hirobayashi Tomoki
Kishima Ryota
Kita Kenji
Matsumoto Kazuyuki
Tsuchiya Seiji
Yoshida Minoru
Publication venue: 'IGI Global'
Publication date: 28/02/2022
Field of study

This paper hypothesizes that harmful utterances need to be judged in the context of whole sentences, and the authors extract features of harmful expressions using a general-purpose language model. Based on the extracted features, the authors propose a method to predict the presence or absence of harmful categories. In addition, the authors believe that it is possible to analyze users who incite others by combining this method with research on analyzing the personality of the speaker from statements on social networking sites. The results confirmed that the proposed method can judge the possibility of harmful comments with higher accuracy than simple dictionary-based models or models using a distributed representation of words. The relationship between personality patterns and harmful expressions was also confirmed by an analysis based on a harmful judgment model

Tokushima University Institutional Repository

Classifying spam emails using agglomerative hierarchical clustering and a topic-based approach

Author: Alaiz Rodríguez Rocío
Alegre Gutiérrez Enrique
Fidalgo Fernández Eduardo
González Castro Víctor
Jáñez-Martino Francisco
Publication venue: 'Elsevier BV'
Publication date: 17/04/2023
Field of study

[EN] Spam emails are unsolicited, annoying and sometimes harmful messages which may contain malware, phishing or hoaxes. Unlike most studies that address the design of efficient anti-spam filters, we approach the spam email problem from a different and novel perspective. Focusing on the needs of cybersecurity units, we follow a topic-based approach for addressing the classification of spam email into multiple categories. We propose SPEMC-15K-E and SPEMC-15K-S, two novel datasets with approximately 15K emails each in English and Spanish, respectively, and we label them using agglomerative hierarchical clustering into 11 classes. We evaluate 16 pipelines, combining four text representation techniques -Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words, Word2Vec and BERT- and four classifiers: Support Vector Machine, Näive Bayes, Random Forest and Logistic Regression. Experimental results show that the highest performance is achieved with TF-IDF and LR for the English dataset, with a F1 score of 0.953 and an accuracy of 94.6%, and while for the Spanish dataset, TF-IDF with NB yields a F1 score of 0.945 and 98.5% accuracy. Regarding the processing time, TF-IDF with LR leads to the fastest classification, processing an English and Spanish spam email in 2ms and 2.2ms on average, respectively.S

Leon University (Spain)