39 research outputs found
Cyber-Physical Security with RF Fingerprint Classification through Distance Measure Extensions of Generalized Relevance Learning Vector Quantization
Radio frequency (RF) fingerprinting extracts fingerprint features from RF signals to protect against masquerade attacks by enabling reliable authentication of communication devices at the “serial number” level. Facilitating the reliable authentication of communication devices are machine learning (ML) algorithms which find meaningful statistical differences between measured data. The Generalized Relevance Learning Vector Quantization-Improved (GRLVQI) classifier is one ML algorithm which has shown efficacy for RF fingerprinting device discrimination. GRLVQI extends the Learning Vector Quantization (LVQ) family of “winner take all” classifiers that develop prototype vectors (PVs) which represent data. In LVQ algorithms, distances are computed between exemplars and PVs, and PVs are iteratively moved to accurately represent the data. GRLVQI extends LVQ with a sigmoidal cost function, relevance learning, and PV update logic improvements. However, both LVQ and GRLVQI are limited due to a reliance on squared Euclidean distance measures and a seemingly complex algorithm structure if changes are made to the underlying distance measure. Herein, the authors (1) develop GRLVQI-D (distance), an extension of GRLVQI to consider alternative distance measures and (2) present the Cosine GRLVQI classifier using this framework. To evaluate this framework, the authors consider experimentally collected Z -wave RF signals and develop RF fingerprints to identify devices. Z -wave devices are low-cost, low-power communication technologies seen increasingly in critical infrastructure. Both classification and verification, claimed identity, and performance comparisons are made with the new Cosine GRLVQI algorithm. The results show more robust performance when using the Cosine GRLVQI algorithm when compared with four algorithms in the literature. Additionally, the methodology used to create Cosine GRLVQI is generalizable to alternative measures
Voting-based Classification for E-mail Spam Detection
The problem of spam e-mail has gained a tremendous amount of attention. Although entities tend to use e-mail spam filter applications to filter out received spam e-mails, marketing companies still tend to send unsolicited e-mails in bulk and users still receive a reasonable amount of spam e-mail despite those filtering applications. This work proposes a new method for classifying e-mails into spam and non-spam. First, several e-mail content features are extracted and then those features are used for classifying each e-mail individually. The classification results of three different classifiers (i.e. Decision Trees, Random Forests and k-Nearest Neighbor) are combined in various voting schemes (i.e. majority vote, average probability, product of probabilities, minimum probability and maximum probability) for making the final decision. To validate our method, two different spam e-mail collections were used
Holistic Network Defense: Fusing Host and Network Features for Attack Classification
This work presents a hybrid network-host monitoring strategy, which fuses data from both the network and the host to recognize malware infections. This work focuses on three categories: Normal, Scanning, and Infected. The network-host sensor fusion is accomplished by extracting 248 features from network traffic using the Fullstats Network Feature generator and from the host using text mining, looking at the frequency of the 500 most common strings and analyzing them as word vectors. Improvements to detection performance are made by synergistically fusing network features obtained from IP packet flows and host features, obtained from text mining port, processor, logon information among others. In addition, the work compares three different machine learning algorithms and updates the script required to obtain network features. Hybrid method results outperformed host only classification by 31.7% and network only classification by 25%. The new approach also reduces the number of alerts while remaining accurate compared with the commercial IDS SNORT. These results make it such that even the most typical users could understand alert classification messages
Learning to Filter Text in Forum Malay Message using Naive Bayesian Technique
Applying the basic filtering technique in forum application has been discussed in [I]. The
paper explains about me use of the basic naive Bayesian algorithm to classify forum
messages whether clean or bad where clean message has no bad words, while bad
message contains at least one bad word. In this Final Year Project paper, the application
ofthe algorithm in the filtering forum messages will be discussed in the attempt to apply
learning to filter forum messages
Tackling Spam and Spoof Email
The loss of productivity due to Spam has reached a critical limit. Spoof emails have
dented confidence of people in communications from organisations. This is happening in an age
where email has been recognised as a cost effective way of communicating. Companies have to invest
resources to increase the confidence of consumers rather than abandoning the use of emails. This
leaves two avenues of pursuing the matter, either email vendors have to implement safeguards or
users have to implement technology and procedures. The paper will look at ways in which spam and
spoof emails are being tackled and also make suggestions on how confidence can be raised by the use
of hybrid approaches
Um Sistema Antispam de Três Estágios.
Desde sua concepção, no final dos anos 80, a rede Internet vem consolidando-se como um dos mais eficientes meios para troca de informação. O correio eletrônico, ou email, tornou-se a principal ferramenta da Internet para troca de informações. Infelizmente, porém, o correio eletrônico tornou-se alvo de oportunistas, que se valem da praticidade e do baixo custo da ferramenta para disseminar conteúdo indesejado pela rede. Emails spam ou spams são informações recebidas sem o consentimento prévio dos destinatários. Os spams, na maioria das vezes, possuem conteúdo publicitário, visando a promoção de serviços, produtos ou eventos. Acabam gerando problemas, tais como o desperdício de largura de banda da rede e perda de tempo e produtividade por parte dos servidores de emails e dos próprios usuários. Este trabalho propõe um sistema antispam de três estágios. O primeiro, o pré-processamento, analisa o conteúdo do email em busca de padrões conhecidos e realiza eliminações e/ou substituições de conteúdo para simplifica-los e uniformiza-los. O segundo estágio, a seleção de características, determina as características mais relevantes do email, segundo duas classes de e-mails - Ham e Spam. O terceiro estágio, a classificação, classifica o email. O sistema antispam é exaustivamente testado sobre três bases de dados públicas, disponíveis na Internet - SpamAssassin, LingSpam e Trec. O desempenho do sistema é avaliado segundo o percentual de classificações corretas nas duas classes - Ham e Spam. São avaliados também os tempos gastos no treinamento e teste do classificador neural, bem como os aspectos relacionados à manipulação dos emails presentes nas bases de dados. Os resultados obtidos mostram-se bastante promissores. O sistema antispam apresenta ótimo desempenho nas três bases de dados empregadas
Computing with Granular Words
Computational linguistics is a sub-field of artificial intelligence; it is an interdisciplinary field dealing with statistical and/or rule-based modeling of natural language from a computational perspective. Traditionally, fuzzy logic is used to deal with fuzziness among single linguistic terms in documents. However, linguistic terms may be related to other types of uncertainty. For instance, different users search ‘cheap hotel’ in a search engine, they may need distinct pieces of relevant hidden information such as shopping, transportation, weather, etc. Therefore, this research work focuses on studying granular words and developing new algorithms to process them to deal with uncertainty globally. To precisely describe the granular words, a new structure called Granular Information Hyper Tree (GIHT) is constructed. Furthermore, several technologies are developed to cooperate with computing with granular words in spam filtering and query recommendation. Based on simulation results, the GIHT-Bayesian algorithm can get more accurate spam filtering rate than conventional method Naive Bayesian and SVM; computing with granular word also generates better recommendation results based on users’ assessment when applied it to search engine