948 research outputs found

    BlogForever: D2.5 Weblog Spam Filtering Report and Associated Methodology

    Get PDF
    This report is written as a first attempt to define the BlogForever spam detection strategy. It comprises a survey of weblog spam technology and approaches to their detection. While the report was written to help identify possible approaches to spam detection as a component within the BlogForver software, the discussion has been extended to include observations related to the historical, social and practical value of spam, and proposals of other ways of dealing with spam within the repository without necessarily removing them. It contains a general overview of spam types, ready-made anti-spam APIs available for weblogs, possible methods that have been suggested for preventing the introduction of spam into a blog, and research related to spam focusing on those that appear in the weblog context, concluding in a proposal for a spam detection workflow that might form the basis for the spam detection component of the BlogForever software

    Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection

    Full text link
    This paper investigates the effectiveness of large language models (LLMs) in email spam detection by comparing prominent models from three distinct families: BERT-like, Sentence Transformers, and Seq2Seq. Additionally, we examine well-established machine learning techniques for spam detection, such as Na\"ive Bayes and LightGBM, as baseline methods. We assess the performance of these models across four public datasets, utilizing different numbers of training samples (full training set and few-shot settings). Our findings reveal that, in the majority of cases, LLMs surpass the performance of the popular baseline techniques, particularly in few-shot scenarios. This adaptability renders LLMs uniquely suited to spam detection tasks, where labeled samples are limited in number and models require frequent updates. Additionally, we introduce Spam-T5, a Flan-T5 model that has been specifically adapted and fine-tuned for the purpose of detecting email spam. Our results demonstrate that Spam-T5 surpasses baseline models and other LLMs in the majority of scenarios, particularly when there are a limited number of training samples available. Our code is publicly available at https://github.com/jpmorganchase/emailspamdetection

    Deep learning to filter SMS spam

    Get PDF
    The popularity of short message service (SMS) has been growing over the last decade. For businesses, these text messages are more effective than even emails. This is because while 98% of mobile users read their SMS by the end of the day, about 80% of the emails remain unopened. The popularity of SMS has also given rise to SMS Spam, which refers to any irrelevant text messages delivered using mobile networks. They are severely annoying to users. Most existing research that has attempted to filter SMS Spam has relied on manually identified features. Extending the current literature, this paper uses deep learning to classify Spam and Not-Spam text messages. Specifically, Convolutional Neural Network and Long Short-term memory models were employed. The proposed models were based on text data only, and self-extracted the feature set. On a benchmark dataset consisting of 747 Spam and 4,827 Not-Spam text messages, a remarkable accuracy of 99.44% was achieved

    A Review on mobile SMS Spam filtering techniques

    Get PDF
    Under short messaging service (SMS) spam is understood the unsolicited or undesired messages received on mobile phones. These SMS spams constitute a veritable nuisance to the mobile subscribers. This marketing practice also worries service providers in view of the fact that it upsets their clients or even causes them lose subscribers. By way of mitigating this practice, researchers have proposed several solutions for the detection and filtering of SMS spams. In this paper, we present a review of the currently available methods, challenges, and future research directions on spam detection techniques, filtering, and mitigation of mobile SMS spams. The existing research literature is critically reviewed and analyzed. The most popular techniques for SMS spam detection, filtering, and mitigation are compared, including the used data sets, their findings, and limitations, and the future research directions are discussed. This review is designed to assist expert researchers to identify open areas that need further improvement

    Artificial intelligence in the cyber domain: Offense and defense

    Get PDF
    Artificial intelligence techniques have grown rapidly in recent years, and their applications in practice can be seen in many fields, ranging from facial recognition to image analysis. In the cybersecurity domain, AI-based techniques can provide better cyber defense tools and help adversaries improve methods of attack. However, malicious actors are aware of the new prospects too and will probably attempt to use them for nefarious purposes. This survey paper aims at providing an overview of how artificial intelligence can be used in the context of cybersecurity in both offense and defense.Web of Science123art. no. 41

    Investigating and Validating Scam Triggers: A Case Study of a Craigslist Website

    Get PDF
    The internet and digital infrastructure play an important role in our day-to-day live, and it has also a huge impact on the organizations and how we do business transactions every day. Online business is booming in this 21st century, and there are many online platforms that enable sellers and buyers to do online transactions collectively. People can sell and purchase products that include vehicles, clothes, and shoes from anywhere and anytime. Thus, the purpose of this study is to identify and validate scam triggers using Craigslist as a case study. Craigslist is one of the websites where people can post advertising to sell and buy personal belongings online. However, with the growing number of people buying and selling, new threats and scams are created daily. Private cars are among the most significant items sold and purchased over the craigslist website. In this regard, several scammers have been drawn by the large number of vehicles being traded over craigslist. Scammers also use this forum to cheat others and exploit the vulnerable. The study identified online scam triggers including Bad key words, dealers’ posts as owners, personal email, multiple location, rogue picture and voice over IP to detect online scams that exists in craigslist. The study also found over 360 ads from craigslist based on our scam trigger. Finally, the study validated each and every one of the scam triggers and found 53.31% of our data is likelihood to be considered as a scam

    Leveraging Sociological Models for Predictive Analytics

    Get PDF
    Abstract—There is considerable interest in developing techniques for predicting human behavior, for instance to enable emerging contentious situations to be forecast or the nature of ongoing but “hidden ” activities to be inferred. A promising approach to this problem is to identify and collect appropriate empirical data and then apply machine learning methods to these data to generate the predictions. This paper shows the performance of such learning algorithms often can be improved substantially by leveraging sociological models in their development and implementation. In particular, we demonstrate that sociologically-grounded learning algorithms outperform gold-standard methods in three important and challenging tasks: 1.) inferring the (unobserved) nature of relationships in adversarial social networks, 2.) predicting whether nascent social diffusion events will “go viral”, and 3.) anticipating and defending future actions of opponents in adversarial settings. Significantly, the new algorithms perform well even when there is limited data available for their training and execution. Keywords—predictive analysis, sociological models, social networks, empirical analysis, machine learning. I
    • …
    corecore