3 research outputs found

    Short Messages Spam Filtering Combining Personality Recognition and Sentiment Analysis

    Get PDF
    Currently, short communication channels are growing up due to the huge increase in the number of smartphones and online social networks users. This growth attracts malicious campaigns, such as spam campaigns, that are a direct threat to the security and privacy of the users. While most researches are focused on automatic text classification, in this work we demonstrate the possibility of improving current short messages spam detection systems using a novel method. We combine personality recognition and sentiment analysis techniques to analyze Short Message Services (SMS) texts. We enrich a publicly available dataset adding these features, first separately and after in combination, of each message to the dataset, creating new datasets. We apply several combinations of the best SMS spam classifiers and filters to each dataset in order to compare the results of each one. Taking into account the experimental results we analyze the real inuence of each feature and the combination of both. At the end, the best results are improved in terms of accuracy, reaching to a 99.01% and the number of false positive is reduced

    SDRS: a new lossless dimensionality reduction for text corpora

    Get PDF
    In recent years, most content-based spam filters have been implemented using Machine Learning (ML) approaches by means of token-based representations of textual contents. After introducing multiple performance enhancements, the impact has been virtually irrelevant. Recent studies have introduced synset-based content representations as a reliable way to improve classification, as well as different forms to take advantage of semantic information to address problems, such as dimensionality reduction. These preliminary solutions present some limitations and enforce simplifications that must be gradually redefined in order to obtain significant improvements in spam content filtering. This study addresses the problem of feature reduction by introducing a new semantic-based proposal (SDRS) that avoids losing knowledge (lossless). Synset-features can be semantically grouped by taking advantage of taxonomic relations (mainly hypernyms) provided by BabelNet ontological dictionary (e.g. “Viagra” and “Cialis” can be summarized into the single features “anti-impotence drug”, “drug” or “chemical substance” depending on the generalization of 1, 2 or 3 levels). In order to decide how many levels should be used to generalize each synset of a dataset, our proposal takes advantage of Multi-Objective Evolutionary Algorithms (MOEA) and particularly, of the Non-dominated Sorting Genetic Algorithm (NSGA-II). We have compared the performance achieved by a Naïve Bayes classifier, using both token-based and synset-based dataset representations, with and without executing dimensional reductions. As a result, our lossless semantic reduction strategy was able to find optimal semantic-based feature grouping strategies for the input texts, leading to a better performance of Naïve Bayes classifiers.info:eu-repo/semantics/acceptedVersio

    Relationship Between Personality Patterns and Harmfulness : Analysis and Prediction Based on Sentence Embedding

    Get PDF
    This paper hypothesizes that harmful utterances need to be judged in the context of whole sentences, and the authors extract features of harmful expressions using a general-purpose language model. Based on the extracted features, the authors propose a method to predict the presence or absence of harmful categories. In addition, the authors believe that it is possible to analyze users who incite others by combining this method with research on analyzing the personality of the speaker from statements on social networking sites. The results confirmed that the proposed method can judge the possibility of harmful comments with higher accuracy than simple dictionary-based models or models using a distributed representation of words. The relationship between personality patterns and harmful expressions was also confirmed by an analysis based on a harmful judgment model
    corecore