64 research outputs found

    Fuzzy Fingerprinting Transformer Language-Models for Emotion Recognition in Conversations

    Full text link
    Fuzzy Fingerprints have been successfully used as an interpretable text classification technique, but, like most other techniques, have been largely surpassed in performance by Large Pre-trained Language Models, such as BERT or RoBERTa. These models deliver state-of-the-art results in several Natural Language Processing tasks, namely Emotion Recognition in Conversations (ERC), but suffer from the lack of interpretability and explainability. In this paper, we propose to combine the two approaches to perform ERC, as a means to obtain simpler and more interpretable Large Language Models-based classifiers. We propose to feed the utterances and their previous conversational turns to a pre-trained RoBERTa, obtaining contextual embedding utterance representations, that are then supplied to an adapted Fuzzy Fingerprint classification module. We validate our approach on the widely used DailyDialog ERC benchmark dataset, in which we obtain state-of-the-art level results using a much lighter model.Comment: FUZZ-IEEE 202

    Análisis comparativo sobre modelos de redes neuronales profundas para la detección de ciberbullying en redes sociales

    Get PDF
    Social media usage has been increased and it consists of both positive and negative effects. By considering the misusage of social media platforms by various cyberbullying methods like stalking, harassment there should be preventive methods to control these and to avoid mental stress. These extra words will expand the size of the vocabulary and influence the performance of the algorithm. Therefore, we come up with variant deep learning models like LSTM, BI-LSTM, RNN, BI-RNN, GRU, BI-GRU to detect cyberbullying in social media. These models are applied on Twitter, public comments data and performance were observed for these models and obtained improved accuracy of 90.4%.Introducción: el uso de las redes sociales se ha incrementado y tiene efectos tanto positivos como negativos. Al considerar el uso indebido de las plataformas de redes sociales a través de varios métodos de acoso cibernético, como el acecho y el acoso, debe haber métodos preventivos para controlarlos y evitar el estrés mental.Problema: estas palabras adicionales ampliarán el tamaño del vocabulario e influirán en el rendimiento del algoritmo.Objetivo: Detectar el ciberacoso en las redes sociales.Metodología: en este documento, presentamos variantes de modelos de aprendizaje profundo como la memoria a largo plazo (LSTM), memoria bidireccional a largo plazo (BI-LSTM), redes neuronales recurrentes (RNN), redes neuronales recurrentes bidireccionales (BI-RNN), unidad recurrente cerrada (GRU) y unidad recurrente cerrada bidireccional (BI-GRU) para detectar el ciberacoso en las redes sociales.Resultados: El mecanismo propuesto ha sido realizado, analizado e implementado sobre datos de Twitter con Accuracy, Precision, Recall y F-Score como medidas. Los modelos de aprendizaje profundo como LSTM, BI-LSTM, RNN, BI-RNN, GRU y BI-GRU se aplican en Twitter a los datos de comentarios públicos y se observó el rendimiento de estos modelos, obteniendo una precisión mejorada del 90,4 %.Conclusiones: Los resultados indican que el mecanismo propuesto es eficiente en comparación con los es-quemas del estado del arte.Originalidad: la aplicación de modelos de aprendizaje profundo para realizar un análisis comparativo de los datos de las redes sociales es el primer enfoque para detectar el ciberacoso.Restricciones: estos modelos se aplican solo en comentarios de datos textuales. El trabajo propio no se ha concentrado en datos multimedia como audio, video e imágenes

    Current Limitations in Cyberbullying Detection: on Evaluation Criteria, Reproducibility, and Data Scarcity

    Get PDF
    The detection of online cyberbullying has seen an increase in societal importance, popularity in research, and available open data. Nevertheless, while computational power and affordability of resources continue to increase, the access restrictions on high-quality data limit the applicability of state-of-the-art techniques. Consequently, much of the recent research uses small, heterogeneous datasets, without a thorough evaluation of applicability. In this paper, we further illustrate these issues, as we (i) evaluate many publicly available resources for this task and demonstrate difficulties with data collection. These predominantly yield small datasets that fail to capture the required complex social dynamics and impede direct comparison of progress. We (ii) conduct an extensive set of experiments that indicate a general lack of cross-domain generalization of classifiers trained on these sources, and openly provide this framework to replicate and extend our evaluation criteria. Finally, we (iii) present an effective crowdsourcing method: simulating real-life bullying scenarios in a lab setting generates plausible data that can be effectively used to enrich real data. This largely circumvents the restrictions on data that can be collected, and increases classifier performance. We believe these contributions can aid in improving the empirical practices of future research in the field

    Security Issues in Service Model of Fog Computing Environment

    Get PDF
    Fog computing is an innovative way to expand the cloud platform by providing computing resources. The platform is a cloud that has the same data, management, storage and application features, but their origins are different because they are deployed to different locations. The platform system can retrieve a large amount, work in the field, be fully loaded, and mount on a variety of hardware devices. With this utility, Fog Framework is perfect for applications and critical moments. Fog computing is similar to cloud computing, but because of its variability, creates new security and privacy challenges that go beyond what is common for fog nodes. This paper aims to understand the impact of security problems and how to overcome them, and to provide future safety guidance for those responsible for building, upgrading and maintaining fog systems

    Towards cyberbullying detection on social media

    Get PDF
    O contínuo aparecimento do cyberbullying nas redes sociais constitui um problema mundial que tem aumentado consideravelmente nos últimos anos, e exige medidas urgentes para a deteção automática de tal fenómeno. O objetivo deste trabalho é criar um modelo suficientemente capaz de detetar automaticamente textos ofensivos. Para tal, foram utilizados três conjuntos de dados públicos, bem como duas abordagens principais para resolver este problema: uma baseada em métodos clássicos de aprendizagem automática e a outra baseada em aprendizagem profunda. Na abordagem clássica de aprendizagem automática foi proposta uma fase específica de pré-processamento e engenharia de características com várias etapas. Para além disso, foram exploradas duas abordagens de representação de documentos para gerar as entradas utilizadas pelos classificadores SVM, Logistic Regression e Random Forest. Uma vez que estes conjuntos de dados são desequilibrados, SMOTEENN e Threshold-Moving foram utilizados para lidar com o problema de classificação desbalanceada. Na abordagem de aprendizagem profunda foram exploradas diferentes arquiteturas, combinando vetores de palavras pré-treinados com CNN, CNN-Attention, BiLSTM e BiLSTM-Attention. A configuração experimental envolveu o tratamento de palavras desconhecidas, Cyclical Learning Rate para proporcionar uma melhor convergência, Macro Soft-F1 Loss para otimizar o desempenho e Macro Soft-F2 Loss para lidar com o problema de classificação desbalanceada. Foi também proposto um modelo RoBERTa-base, pré-treinado em 58 milhões de tweets e afinado para identificação de linguagem ofensiva. Os resultados experimentais mostram que, embora seja uma tarefa difícil, ambas asabordagens propostas são adequadas para detetar textos ofensivos. No entanto, a abordagem de aprendizagem profunda alcança os melhores resultados.The continuous appearance of cyberbullying on social media constitutes a worldwide problem that has seen a considerable increase in recent years, and demands urgent measures to automatically detecting such phenomenon. The goal of this work is to create a model suficiently capable of automatically detecting ofensive texts. For this purpose, three public datasets were used, as well as two main approaches to solve this problem: one based on classical Machine Learning methods and the other based on Deep Learning. In the classical Machine Learning approach was proposed a specific pre-processing and Feature Engineering stage with several steps. In addition, two document representation approaches were also explored to generate the inputs used by SVM, Logistic Regression, and Random Forest classifiers. Since these datasets are imbalanced, SMOTEENN and Threshold-Moving were used to deal with the imbalanced classification problem. In the Deep Learning approach diferent architectures were explored, combining pretrained word vectors with CNN, CNN-Attention, BiLSTM and BiLSTM-Attention. The experimental setup involved treatment of unknown words, Cyclical Learning Rate to provide better convergence, Macro Soft-F1 Loss function to optimize performance and Macro Soft-F2 Loss function to deal with the imbalanced classification problem. RoBERTa-base model was also proposed, pre-trained on 58 million tweets and fine-tuned for ofensivelanguage identification. Experimental results show that, although it is a dificult task, both proposed approaches are suitable for detecting ofensive texts. Nevertheless, the Deep Learning approach achieves the best results

    Anomalous behaviour detection using heterogeneous data

    Get PDF
    Anomaly detection is one of the most important methods to process and find abnormal data, as this method can distinguish between normal and abnormal behaviour. Anomaly detection has been applied in many areas such as the medical sector, fraud detection in finance, fault detection in machines, intrusion detection in networks, surveillance systems for security, as well as forensic investigations. Abnormal behaviour can give information or answer questions when an investigator is performing an investigation. Anomaly detection is one way to simplify big data by focusing on data that have been grouped or clustered by the anomaly detection method. Forensic data usually consists of heterogeneous data which have several data forms or types such as qualitative or quantitative, structured or unstructured, and primary or secondary. For example, when a crime takes place, the evidence can be in the form of various types of data. The combination of all the data types can produce rich information insights. Nowadays, data has become ‘big’ because it is generated every second of every day and processing has become time-consuming and tedious. Therefore, in this study, a new method to detect abnormal behaviour is proposed using heterogeneous data and combining the data using data fusion technique. Vast challenge data and image data are applied to demonstrate the heterogeneous data. The first contribution in this study is applying the heterogeneous data to detect an anomaly. The recently introduced anomaly detection technique which is known as Empirical Data Analytics (EDA) is applied to detect the abnormal behaviour based on the data sets. Standardised eccentricity (a newly introduced within EDA measure offering a new simplified form of the well-known Chebyshev Inequality) can be applied to any data distribution. Then, the second contribution is applying image data. The image data is processed using pre-trained deep learning network, and classification is done using a support vector machine (SVM). After that, the last contribution is combining anomaly result from heterogeneous data and image recognition using new data fusion technique. There are five types of data with three different modalities and different dimensionalities. The data cannot be simply combined and integrated. Therefore, the new data fusion technique first analyses the abnormality in each data type separately and determines the degree of suspicious between 0 and 1 and sums up all the degrees of suspicion data afterwards. This method is not intended to be a fully automatic system that resolves investigations, which would likely be unacceptable in any case. The aim is rather to simplify the role of the humans so that they can focus on a small number of cases to be looked in more detail. The proposed approach does simplify the processing of such huge amounts of data. Later, this method can assist human experts in their investigations and making final decisions

    Sentiment Analysis for Social Media

    Get PDF
    Sentiment analysis is a branch of natural language processing concerned with the study of the intensity of the emotions expressed in a piece of text. The automated analysis of the multitude of messages delivered through social media is one of the hottest research fields, both in academy and in industry, due to its extremely high potential applicability in many different domains. This Special Issue describes both technological contributions to the field, mostly based on deep learning techniques, and specific applications in areas like health insurance, gender classification, recommender systems, and cyber aggression detection

    “And all the pieces matter...” Hybrid Testing Methods for Android App's Privacy Analysis

    Get PDF
    Smartphones have become inherent to the every day life of billions of people worldwide, and they are used to perform activities such as gaming, interacting with our peers or working. While extremely useful, smartphone apps also have drawbacks, as they can affect the security and privacy of users. Android devices hold a lot of personal data from users, including their social circles (e.g., contacts), usage patterns (e.g., app usage and visited websites) and their physical location. Like in most software products, Android apps often include third-party code (Software Development Kits or SDKs) to include functionality in the app without the need to develop it in-house. Android apps and third-party components embedded in them are often interested in accessing such data, as the online ecosystem is dominated by data-driven business models and revenue streams like advertising. The research community has developed many methods and techniques for analyzing the privacy and security risks of mobile apps, mostly relying on two techniques: static code analysis and dynamic runtime analysis. Static analysis analyzes the code and other resources of an app to detect potential app behaviors. While this makes static analysis easier to scale, it has other drawbacks such as missing app behaviors when developers obfuscate the app’s code to avoid scrutiny. Furthermore, since static analysis only shows potential app behavior, this needs to be confirmed as it can also report false positives due to dead or legacy code. Dynamic analysis analyzes the apps at runtime to provide actual evidence of their behavior. However, these techniques are harder to scale as they need to be run on an instrumented device to collect runtime data. Similarly, there is a need to stimulate the app, simulating real inputs to examine as many code-paths as possible. While there are some automatic techniques to generate synthetic inputs, they have been shown to be insufficient. In this thesis, we explore the benefits of combining static and dynamic analysis techniques to complement each other and reduce their limitations. While most previous work has often relied on using these techniques in isolation, we combine their strengths in different and novel ways that allow us to further study different privacy issues on the Android ecosystem. Namely, we demonstrate the potential of combining these complementary methods to study three inter-related issues: • A regulatory analysis of parental control apps. We use a novel methodology that relies on easy-to-scale static analysis techniques to pin-point potential privacy issues and violations of current legislation by Android apps and their embedded SDKs. We rely on the results from our static analysis to inform the way in which we manually exercise the apps, maximizing our ability to obtain real evidence of these misbehaviors. We study 46 publicly available apps and find instances of data collection and sharing without consent and insecure network transmissions containing personal data. We also see that these apps fail to properly disclose these practices in their privacy policy. • A security analysis of the unauthorized access to permission-protected data without user consent. We use a novel technique that combines the strengths of static and dynamic analysis, by first comparing the data sent by applications at runtime with the permissions granted to each app in order to find instances of potential unauthorized access to permission protected data. Once we have discovered the apps that are accessing personal data without permission, we statically analyze their code in order to discover covert- and side-channels used by apps and SDKs to circumvent the permission system. This methodology allows us to discover apps using the MAC address as a surrogate for location data, two SDKs using the external storage as a covert-channel to share unique identifiers and an app using picture metadata to gain unauthorized access to location data. • A novel SDK detection methodology that relies on obtaining signals observed both in the app’s code and static resources and during its runtime behavior. Then, we rely on a tree structure together with a confidence based system to accurately detect SDK presence without the need of any a priory knowledge and with the ability to discern whether a given SDK is part of legacy or dead code. We prove that this novel methodology can discover third-party SDKs with more accuracy than state-of-the-art tools both on a set of purpose-built ground-truth apps and on a dataset of 5k publicly available apps. With these three case studies, we are able to highlight the benefits of combining static and dynamic analysis techniques for the study of the privacy and security guarantees and risks of Android apps and third-party SDKs. The use of these techniques in isolation would not have allowed us to deeply investigate these privacy issues, as we would lack the ability to provide real evidence of potential breaches of legislation, to pin-point the specific way in which apps are leveraging cover and side channels to break Android’s permission system or we would be unable to adapt to an ever-changing ecosystem of Android third-party companies.The works presented in this thesis were partially funded within the framework of the following projects and grants: • European Union’s Horizon 2020 Innovation Action program (Grant Agreement No. 786741, SMOOTH Project and Grant Agreement No. 101021377, TRUST AWARE Project). • Spanish Government ODIO NºPID2019-111429RB-C21/PID2019-111429RBC22. • The Spanish Data Protection Agency (AEPD) • AppCensus Inc.This work has been supported by IMDEA Networks InstitutePrograma de Doctorado en Ingeniería Telemática por la Universidad Carlos III de MadridPresidente: Srdjan Matic.- Secretario: Guillermo Suárez-Tangil.- Vocal: Ben Stoc

    Man vs machine – Detecting deception in online reviews

    Get PDF
    This study focused on three main research objectives: analyzing the methods used to identify deceptive online consumer reviews, evaluating insights provided by multi-method automated approaches based on individual and aggregated review data, and formulating a review interpretation framework for identifying deception. The theoretical framework is based on two critical deception-related models, information manipulation theory and self-presentation theory. The findings confirm the interchangeable characteristics of the various automated text analysis methods in drawing insights about review characteristics and underline their significant complementary aspects. An integrative multi-method model that approaches the data at the individual and aggregate level provides more complex insights regarding the quantity and quality of review information, sentiment, cues about its relevance and contextual information, perceptual aspects, and cognitive material
    corecore