1,623 research outputs found
Let Your CyberAlter Ego Share Information and Manage Spam
Almost all of us have multiple cyberspace identities, and these {\em
cyber}alter egos are networked together to form a vast cyberspace social
network. This network is distinct from the world-wide-web (WWW), which is being
queried and mined to the tune of billions of dollars everyday, and until
recently, has gone largely unexplored. Empirically, the cyberspace social
networks have been found to possess many of the same complex features that
characterize its real counterparts, including scale-free degree distributions,
low diameter, and extensive connectivity. We show that these topological
features make the latent networks particularly suitable for explorations and
management via local-only messaging protocols. {\em Cyber}alter egos can
communicate via their direct links (i.e., using only their own address books)
and set up a highly decentralized and scalable message passing network that can
allow large-scale sharing of information and data. As one particular example of
such collaborative systems, we provide a design of a spam filtering system, and
our large-scale simulations show that the system achieves a spam detection rate
close to 100%, while the false positive rate is kept around zero. This system
has several advantages over other recent proposals (i) It uses an already
existing network, created by the same social dynamics that govern our daily
lives, and no dedicated peer-to-peer (P2P) systems or centralized server-based
systems need be constructed; (ii) It utilizes a percolation search algorithm
that makes the query-generated traffic scalable; (iii) The network has a built
in trust system (just as in social networks) that can be used to thwart
malicious attacks; iv) It can be implemented right now as a plugin to popular
email programs, such as MS Outlook, Eudora, and Sendmail.Comment: 13 pages, 10 figure
Active Multi-Field Learning for Spam Filtering
Ubiquitous spam messages cause a serious waste of time and resources. This paper addresses the practical spam filtering problem, and proposes a universal approach to fight with various spam messages. The proposed active multi-field learning approach is based on: 1) It is cost-sensitive to obtain a label for a real-world spam filter, which suggests an active learning idea; and 2) Different messages often have a similar multi-field text structure, which suggests a multi-field learning idea. The multi-field learning framework combines multiple results predicted from field classifiers by a novel compound weight, and each field classifier calculates the arithmetical average of multiple conditional probabilities predicted from feature strings according to a data structure of string-frequency index. Comparing the current variance of field classifying results with the historical variance, the active learner evaluates the classifying confidence and regards the more uncertain message as the more informative sample for which to request a label. The experimental results show that the proposed approach can achieve the state-of-the-art performance at greatly reduced label requirements both in email spam filtering and short text spam filtering. Our active multi-field learning performance, the standard (1-ROCA) % measurement, even exceeds the full feedback performance of some advanced individual classifying algorithm
Hikester - the event management application
Today social networks and services are one of the most important part of our
everyday life. Most of the daily activities, such as communicating with
friends, reading news or dating is usually done using social networks. However,
there are activities for which social networks do not yet provide adequate
support. This paper focuses on event management and introduces "Hikester". The
main objective of this service is to provide users with the possibility to
create any event they desire and to invite other users. "Hikester" supports the
creation and management of events like attendance of football matches, quest
rooms, shared train rides or visit of museums in foreign countries. Here we
discuss the project architecture as well as the detailed implementation of the
system components: the recommender system, the spam recognition service and the
parameters optimizer
Spam-T5: Benchmarking Large Language Models for Few-Shot Email Spam Detection
This paper investigates the effectiveness of large language models (LLMs) in
email spam detection by comparing prominent models from three distinct
families: BERT-like, Sentence Transformers, and Seq2Seq. Additionally, we
examine well-established machine learning techniques for spam detection, such
as Na\"ive Bayes and LightGBM, as baseline methods. We assess the performance
of these models across four public datasets, utilizing different numbers of
training samples (full training set and few-shot settings). Our findings reveal
that, in the majority of cases, LLMs surpass the performance of the popular
baseline techniques, particularly in few-shot scenarios. This adaptability
renders LLMs uniquely suited to spam detection tasks, where labeled samples are
limited in number and models require frequent updates. Additionally, we
introduce Spam-T5, a Flan-T5 model that has been specifically adapted and
fine-tuned for the purpose of detecting email spam. Our results demonstrate
that Spam-T5 surpasses baseline models and other LLMs in the majority of
scenarios, particularly when there are a limited number of training samples
available. Our code is publicly available at
https://github.com/jpmorganchase/emailspamdetection
A systematic literature review on spam content detection and classification
The presence of spam content in social media is tremendously increasing, and therefore the detection of spam has become vital. The spam contents increase as people extensively use social media, i.e ., Facebook, Twitter, YouTube, and E-mail. The time spent by people using social media is overgrowing, especially in the time of the pandemic. Users get a lot of text messages through social media, and they cannot recognize the spam content in these messages. Spam messages contain malicious links, apps, fake accounts, fake news, reviews, rumors, etc. To improve social media security, the detection and control of spam text are essential. This paper presents a detailed survey on the latest developments in spam text detection and classification in social media. The various techniques involved in spam detection and classification involving Machine Learning, Deep Learning, and text-based approaches are discussed in this paper. We also present the challenges encountered in the identification of spam with its control mechanisms and datasets used in existing works involving spam detection
Automatic generation of meta classifiers with large levels for distributed computing and networking
This paper is devoted to a case study of a new construction of classifiers. These classifiers are called automatically generated multi-level meta classifiers, AGMLMC. The construction combines diverse meta classifiers in a new way to create a unified system. This original construction can be generated automatically producing classifiers with large levels. Different meta classifiers are incorporated as low-level integral parts of another meta classifier at the top level. It is intended for the distributed computing and networking. The AGMLMC classifiers are unified classifiers with many parts that can operate in parallel. This make it easy to adopt them in distributed applications. This paper introduces new construction of classifiers and undertakes an experimental study of their performance. We look at a case study of their effectiveness in the special case of the detection and filtering of phishing emails. This is a possible important application area for such large and distributed classification systems. Our experiments investigate the effectiveness of combining diverse meta classifiers into one AGMLMC classifier in the case study of detection and filtering of phishing emails. The results show that new classifiers with large levels achieved better performance compared to the base classifiers and simple meta classifiers classifiers. This demonstrates that the new technique can be applied to increase the performance if diverse meta classifiers are included in the system
Stability and Effective Process Control for Secure Email Filtering
A fantastic tool for both commercial and personal communication is electronic mail. It has increasingly become a necessary component of our working life since it is straightforward, available, and simple to use. Spam emails have started to tarnish internet experiences and threaten the integrity of email. Due to the exponential growth of spam, both people and organisations are under a great deal of financial and other strain. In order to prevent the future of email itself from being in jeopardy, a solution to the spam problem must be discovered. There is an urgent need to solve the Email spam issue since spam volume has been rising over the last several decades. As part of this effort, many effects of spam emails on businesses and people were noted and thoroughly examined. In order to properly assess current technologies, solutions, and methods, a comprehensive literature review was conducted throughout the procedures. The goals of this work is to develop new methodologies for the implementation of new strategies for the efficient management of email spam and to construct a proof-of-concept software system for the Process controlled assessment of such strategies
An Improved Transformer-based Model for Detecting Phishing, Spam, and Ham: A Large Language Model Approach
Phishing and spam detection is long standing challenge that has been the
subject of much academic research. Large Language Models (LLM) have vast
potential to transform society and provide new and innovative approaches to
solve well-established challenges. Phishing and spam have caused financial
hardships and lost time and resources to email users all over the world and
frequently serve as an entry point for ransomware threat actors. While
detection approaches exist, especially heuristic-based approaches, LLMs offer
the potential to venture into a new unexplored area for understanding and
solving this challenge. LLMs have rapidly altered the landscape from business,
consumers, and throughout academia and demonstrate transformational potential
for the potential of society. Based on this, applying these new and innovative
approaches to email detection is a rational next step in academic research. In
this work, we present IPSDM, our model based on fine-tuning the BERT family of
models to specifically detect phishing and spam email. We demonstrate our
fine-tuned version, IPSDM, is able to better classify emails in both unbalanced
and balanced datasets. This work serves as an important first step towards
employing LLMs to improve the security of our information systems
Evaluation of Email Spam Detection Techniques
Email has become a vital form of communication among individuals and organizations in today’s world. However, simultaneously it became a threat to many users in the form of spam emails which are also referred as junk/unsolicited emails. Most of the spam emails received by the users are in the form of commercial advertising, which usually carry computer viruses without any notifications. Today, 95% of the email messages across the world are believed to be spam, therefore it is essential to develop spam detection techniques. There are different techniques to detect and filter the spam emails, but off recently all the developed techniques are being implemented successfully to minimize the threats. This paper describes how the current spam email detection approaches are determining and evaluating the problems. There are different types of techniques developed based on Reputation, Origin, Words, Multimedia, Textual, Community, Rules, Hybrid, Machine learning, Fingerprint, Social networks, Protocols, Traffic analysis, OCR techniques, Low-level features, and many other techniques. All these filtering techniques are developed to detect and evaluate spam emails. Along with classification of the email messages into spam or ham, this paper also demonstrates the effectiveness and accuracy of the spam detection techniques
- …