Search CORE

6 research outputs found

Model for optimising the execution of anti-spam filters

Author: Ruano Ordas David Alfonso
Publication venue: Programa Oficial de Doutoramento en Sistemas Software Intelixentes e Adaptables (RD 1393/2007)
Publication date: 01/01/2015
Field of study

The establishment of the first interconnection between two remote hosts in 1969 originated the beginning of one of the most important technological phenomena of humanity, Internet. In fact, Internet has become an essential part of life for many people inhabiting the most industrialized nations, reaching a percentage of penetration during 2014 of 40% of the world population. One of the reasons that propitiated the massive proliferation of Internet is attributable to the e-mail. This service allows an easy and fast (nearly instantaneous) communication between users by sending messages. This fact has meant that e-mail service acquired a surprising popularity. However, the uncontrolled nature of Internet has turned e-mail communications into the best framework for the promotion of illegal advertisements (such as those about drugs selling), the delivery of phishing e-mails, the virus propagation and other forms of electronic scam (also called spam). Although the amount of spam e-mail deliveries undergoes continuous fluctuations, current statistics show that more than 60% of the e-mail transferred through Internet are spam. This spam ratio is supported by newest communication advances such as 4G new generation networks, ensures a quick an easy Internet connection almost everywhere. Under these circunstances, the use of spam filtering services and products is the most effective mechanism to fight against spam. However, the massive amount of e-mail deliveries per day (an average of 125 billion in 2015) has encouraged the need of improving spam filtering services in order to adapt them to the current needs. In this research work, is introduce a new filtering model able to enhance speed and accuracy while maintaining the same philosophy and anti-spam techniques used in the most popular anti-spam filtering systems. This goal has been achieved through improving several aspects including: (i) the design and development of small technical improvements to enhance overall filter throughput, (ii) the application of genetic algorithms in order to enhance the filter accuracy and finally, (iii) the use of scheduling algorithms to increase speed filtering.Durante la última década, Internet se convirtió en una herramienta esencial para la comunicación entre personas. Las ventajas introducidas por Internet fueron rápidamente aprovechadas por millones de usuarios de la red para hacer realidad servicios como el comercio electrónico, la banca online, las redes sociales, etc. El aprovechamiento de este entorno también fue perseguido por aquellos que desean hacer uso de las novas tecnologías para comercializar productos ilegales o de dudosa reputación, o publicar/enviar contenidos molestos para los usuarios de la red. Así, aparecieron los spammers y los contenidos SPAM que ya se extienden por las redes sociales, correo electrónico, foros, blogs, etc. Para filtrar y eliminar los contenidos SPAM es necesario contar con software o servicios que permitan su detección. En la actualidad, la eliminación de contenidos antispam se distribuye como un servicio. Actualmente resulta habitual y efectiva la contratación de servicios de filtrado antispam que se componen de un software o hardware específico de filtrado y de un servicio de actualización del comportamiento del filtro que permite la adaptación a las variaciones que se pueden producir en los correos distribuidos. En la actualidad, estos servicios de filtrado se basan en la utilización de un software SpamAssassin que, por sus características, permite el modelado del comportamiento del filtro de forma dinámica y la distribución de estos filtros al software de filtrado instalado en los clientes. La posibilidad de modelar los filtros de contenidos fue, sin duda la característica más valorada de SpamAssassin y que motivó a que esta solución fuera adoptada incluso por grandes empresas como Symantec (Symantec Brightmail) ou McAfee (McAfee SpamKiller).Durante a última década, Internet converteuse nunha ferramenta esencial para a comunicación entre persoas. As vantaxes introducidas por Internet foron rápidamente aproveitadas por milleiros de usuarios da rede para facer realidade servizos como o comercio electrónico, a banca online, as redes sociais, etc. O aproveitamento deste entorno tamén foi perseguido por aqueles que desexaron facer uso das novas tecnoloxías para comercializar productos ilegais ou de dudosa reputación ou publicar/enviar contidos molestos para os usuarios da rede. Así, apareceron os spammers e os contidos SPAM que xa se extenden por redes sociais, correo electrónico, foros, blogs, etc. Para filtrar e eliminar os contidos SPAM es necesario contar con software ou servizos que permitan a sua detección. Na actualidade, a eliminación de contidos antispam distribúese como un servizo. Actualmente resulta habitual e efectiva a contratación de servizos de filtrado antispam que se compoñen dun software ou hardware específico de filtrado e dun servizo de actualización do comportamento do filtro que permite a adaptación ás variacións que se poden producir nos correos distribuídos. Na actualidade, estes servizos de filtrado confórmanse mediante a utilización dun software SpamAssassin que, polas súas características, permiten o modelado do comportamento do filtro de forma dinámca e a súa distribución destes filtros ao software de filtrado instalado nos clientes. A posibilidade de modelar os filtros de contidos foi, sen dúbida a característica máis valorada de SpamAssassin que motivou que esta solución fora adoitada incluso por grandes empresas como Symantec (Symantec Brightmail) ou McAfee (McAfee SpamKiller).Xunta de Galicia | Ref. 08TIC041EXunta de Galicia | Ref. 09TIC028

Investigo

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Wirebrush4SPAM Trend Filters

Author: David Ruano-Ordas (698733)
Publication venue
Publication date
Field of study

<p>Contains two filter definition (positive and negative trend filter) for Wirebrush4SPAM filtering platform</p

FigShare

Corpus 200 Emails

Author: David Ruano-Ordas (698733)
Publication venue
Publication date
Field of study

<p>Corpus of 200 multilingual emails (Spanish, English and Portuguese) equally distributed (100 ham and 100 spam).</p

FigShare

Corpus 200 Emails

Author: David Ruano-Ordas (698733)
Publication venue
Publication date
Field of study

<p>Corpus containing 200 multilingual emails (Spanish, English and Portuguese) structured according to the RFC2822 specification.</p

FigShare

A new semantic-based feature selection method for spam filtering

Author: Cotos Yáñez Tomas Raimundo
Mendez Reboredo Jose Ramon
Ruano Ordas David Alfonso
Publication venue: 'Elsevier BV'
Publication date: 15/01/2019
Field of study

The Internet emerged as a powerful infrastructure for the worldwide communication and interaction of people. Some unethical uses of this technology (for instance spam or viruses) generated challenges in the development of mechanisms to guarantee an affordable and secure experience concerning its usage. This study deals with the massive delivery of unwanted content or advertising campaigns without the accordance of target users (also known as spam). Currently, words (tokens) are selected by using feature selection schemes; they are then used to create feature vectors for training different Machine Learning (ML) approaches. This study introduces a new feature selection method able to take advantage of a semantic ontology to group words into topics and use them to build feature vectors. To this end, we have compared the performance of nine well-known ML approaches in conjunction with (i) Information Gain, the most popular feature selection method in the spam-filtering domain and (ii) Latent Dirichlet Allocation, a generative statistical model that allows sets of observations to be explained by unobserved groups that describe why some parts of the data are similar, and (iii) our semantic-based feature selection proposal. Results have shown the suitability and additional benefits of topic-driven methods to develop and deploy high-performance spam filters.Xunta de Galicia | Ref. ED481B 2017/018Xunta de Galicia | Ref. ED431C2016-040Agencia Estatal de Investigación | Ref. MTM2017-89422-PSMEIC/SRA/ERDF | Ref. TIN2017-84658-C2-1-

Investigo

Using Natural Language Preprocessing Architecture (NLPA) for Big Data Text Sources

Author: David Ruano-Ordas
Jose R. Méndez
María Novo-Lourés
Reyes Pavón
Rosalía Laza
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2020
Field of study

During the last years, big data analysis has become a popular means of taking advantage of multiple (initially valueless) sources to find relevant knowledge about real domains. However, a large number of big data sources provide textual unstructured data. A proper analysis requires tools able to adequately combine big data and text-analysing techniques. Keeping this in mind, we combined a pipelining framework (BDP4J (Big Data Pipelining For Java)) with the implementation of a set of text preprocessing techniques in order to create NLPA (Natural Language Preprocessing Architecture), an extendable open-source plugin implementing preprocessing steps that can be easily combined to create a pipeline. Additionally, NLPA incorporates the possibility of generating datasets using either a classical token-based representation of data or newer synset-based datasets that would be further processed using semantic information (i.e., using ontologies). This work presents a case study of NLPA operation covering the transformation of raw heterogeneous big data into different dataset representations (synsets and tokens) and using the Weka application programming interface (API) to launch two well-known classifiers

Directory of Open Access Journals