24,841 research outputs found

    Protectbot: A Chatbot to Protect Children on Gaming Platforms

    Get PDF
    Online gaming no longer has limited access, as it has become available to a high percentage of children in recent years. Consequently, children are exposed to multifaceted threats, such as cyberbullying, grooming, and sexting. The online gaming industry is taking concerted measures to create a safe environment for children to play and interact with, such efforts remain inadequate and fragmented. Different approaches utilizing machine learning (ML) techniques to detect child predatory behavior have been designed to provide potential detection and protection in this context. After analyzing the available AI tools and solutions it was observed that the available solutions are limited to the identification of predatory behavior in chat logs which is not enough to avert the multifaceted threats. In this thesis, we developed a chatbot Protectbot to interact with the suspect on the gaming platform. Protectbot leveraged the dialogue generative pre-trained transformer (DialoGPT) model which is based on Generative Pre-trained Transformer 2 (GPT-2). To analyze the suspect\u27s behavior, we developed a text classifier based on natural language processing that can classify the chats as predatory and non-predatory. The developed classifier is trained and tested on Pan 12 dataset. To convert the text into numerical vectors we utilized fastText. The best results are obtained by using non-linear SVM on sentence vectors obtained from fastText. We got a recall of 0.99 and an F_0.5-score of 0.99 which is better than the state-of-the-art methods. We also built a new dataset containing 71 predatory full chats retrieved from Perverted Justice. Using sentence vectors generated by fastText and KNN classifier, 66 chats out of 71 were correctly classified as predatory chats

    Characterizing Attention Cascades in WhatsApp Groups

    Full text link
    An important political and social phenomena discussed in several countries, like India and Brazil, is the use of WhatsApp to spread false or misleading content. However, little is known about the information dissemination process in WhatsApp groups. Attention affects the dissemination of information in WhatsApp groups, determining what topics or subjects are more attractive to participants of a group. In this paper, we characterize and analyze how attention propagates among the participants of a WhatsApp group. An attention cascade begins when a user asserts a topic in a message to the group, which could include written text, photos, or links to articles online. Others then propagate the information by responding to it. We analyzed attention cascades in more than 1.7 million messages posted in 120 groups over one year. Our analysis focused on the structural and temporal evolution of attention cascades as well as on the behavior of users that participate in them. We found specific characteristics in cascades associated with groups that discuss political subjects and false information. For instance, we observe that cascades with false information tend to be deeper, reach more users, and last longer in political groups than in non-political groups.Comment: Accepted as a full paper at the 11th International ACM Web Science Conference (WebSci 2019). Please cite the WebSci versio

    Identifying Cyber Predators by Using Sentiment Analysis and Recurrent Neural Networks

    Get PDF
    Recurrent Neural Network with Long Short-Term Memory cells (LSTM-RNN) have impressive ability in sequence data processing, particularly language model building and text classification. This research proposes the combination of sentiment analysis, sentence vectors, and LSTM-RNN as a novel way for cyber Sexual Predator Identification (SPI). There are two tasks in SPI. The first one is identifying sexual predators among chats. The second one is highlighting specific sexual predators’ lines in chats. Our research focuses on the first task. An LSTM-RNN language model is applied to generate sentence vectors which are the last hidden states in the language model. Sentence vectors are fed into the LSTM-RNN classifier, so as to capture suspicious conversations. Hidden state makes a breakthrough in the generation of unseen sentence vectors i.e., the system can score a sentence never seen before in the training data. Fasttext is used to filter the contents of conversations and generate a sentiment score to the purpose of identifying potential predators. IMDB sentiment review task is introduced to provide an intuitive measurement of the combined method. The model identified 206 predators out of 254. The experiment achieved a record-breaking F-0.5 score of 0.9555, higher than the top-ranked result in the SPI competition

    Корпусный подход в современных исследованиях языковой вариативности

    No full text
    В статье рассматриваются возможности корпусного подхода в исследовании языка, функционирующего в реальных условиях; указываются характеристики метода, определяющие его надежность и достоверность. Основное внимание уделяется новейшим исследованиям социально и территориально обусловленной вариативности немецкого языка в Северной Германии и стилистической языковой дифференциации в виртуальных жанрах коммуникации; описываются возможности электронных собраний текстов для использования их в научных целях.У статті розглядуються можливості корпусного підходу у дослідженні мови, що функціонує в реальних умовах; указуються характеристики методу, які визначають його надійність і достовірність. Основна увага приділяється новітнім дослідженням соціально и територіально зумовленої варіативності мови у Північній Німеччині та стилістичної мовної диференціації у віртуальних жанрах комунікації; описуються можливості електронних колекцій текстів для використання їх з науковою метою.This article discusses the possibilities of corpus-based approach in the linguistic study, i.e. the study of language as expressed in samples (corpora) of "real world" text. It introduces the basic aspects of this approach – corpus representativeness and balance. Being “collections of naturally occurring language text” (John Sinklair), corpora can provide a good basis for the study of language variation with the application of new electronic technologies. The article focuses on the latest research of social and geographical variability of language in Northern Germany as well as on the language differentiation in the virtual genres of communication. With the fact a new spoken language variety, regarded as a contact variety (a result of High and Low German contact), has become widespread in the Northern of Germany, a new research of language situation in this area became necessary. The identification, description and analysis of language variation in terms of regional, social and situational usage are the primary purpose of the SiN project (Sprachvariation in Norddeutschland). Documenting object language data along with the speakers' subjective opinions and reflexions on language norm will allow for compiling the first ever text corpus of Low German substandard dialect. On the bases of such a corpus a wide variety of socio- and idiolect research on phonetic, lexical and grammatical level can be realized. Another actual problem today is the Internet-based communication and its genres: e-mail, forums, web-chats and weblogs, Skype etc., so it becomes necessary to create balanced, annotated corpora of computer-mediated communicaton such as the Dortmund Chat Corpus (ca. 500 annotated chat-logs and a retrieval tool available online), whose resources can be used for studying of chat-stile in its various forms

    Attitudes and emotions through written text: The case of textual deformation in Internet chat rooms

    Get PDF
    Spanish Internet chat rooms are visited by a lot of young people who use language in a very creative way (e.g. repetition of letters and punctuation marks). In this paper, several hypotheses concerning the uses of textual deformation assess their communicative usefulness. The goal of these hypotheses is to check whether these deformations favour a more accurate identification and evaluation of the senders’ underlying attitudes (propositional or affective) and emotions. The answers to a questionnaire indicate that despite the supplementary level of information that textual deformation provides, readers tend not to agree on the exact quality of the sender’s underlying attitudes and emotions, nor do they tend to establish degrees of intensity related to the quantity of text typed. However, and despite this evidence, textual deformation seems to play a part in the eventual quality of chat users’ interpretations of the messages sent to chat rooms.Los chats españoles de Internet son visitados por muchos jóvenes que usan el lenguaje de una forma muy creativa (ej. repetición de letras y signos de puntuación). En este artículo se evalúan varias hipótesis sobre el uso de la deformación textual respecto a su eficacia comunicativa. Se trata de comprobar si estas deformaciones favorecen una identificación y evaluación más adecuada de las actitudes (proposicionales o afectivas) y emociones de sus autores. Las respuestas a un cuestionario revelan que a pesar de la información adicional que la deformación textual aporta, los lectores no suelen coincidir en la cualidad exacta de estas actitudes y emociones, ni establecen grados de intensidad relacionados con la cantidad de texto tecleada. Sin embargo, y a pesar de estos resultados, la deformación textual parece jugar un papel en la interpretación que finalmente se elige de estos mensajes enviados a los chats.The research for this paper has been supported by IULMA (Instituto Interuniversitario de Lenguas Modernas Aplicadas)
    corecore