39 research outputs found

    Cyber-Physical Security with RF Fingerprint Classification through Distance Measure Extensions of Generalized Relevance Learning Vector Quantization

    Get PDF
    Radio frequency (RF) fingerprinting extracts fingerprint features from RF signals to protect against masquerade attacks by enabling reliable authentication of communication devices at the “serial number” level. Facilitating the reliable authentication of communication devices are machine learning (ML) algorithms which find meaningful statistical differences between measured data. The Generalized Relevance Learning Vector Quantization-Improved (GRLVQI) classifier is one ML algorithm which has shown efficacy for RF fingerprinting device discrimination. GRLVQI extends the Learning Vector Quantization (LVQ) family of “winner take all” classifiers that develop prototype vectors (PVs) which represent data. In LVQ algorithms, distances are computed between exemplars and PVs, and PVs are iteratively moved to accurately represent the data. GRLVQI extends LVQ with a sigmoidal cost function, relevance learning, and PV update logic improvements. However, both LVQ and GRLVQI are limited due to a reliance on squared Euclidean distance measures and a seemingly complex algorithm structure if changes are made to the underlying distance measure. Herein, the authors (1) develop GRLVQI-D (distance), an extension of GRLVQI to consider alternative distance measures and (2) present the Cosine GRLVQI classifier using this framework. To evaluate this framework, the authors consider experimentally collected Z -wave RF signals and develop RF fingerprints to identify devices. Z -wave devices are low-cost, low-power communication technologies seen increasingly in critical infrastructure. Both classification and verification, claimed identity, and performance comparisons are made with the new Cosine GRLVQI algorithm. The results show more robust performance when using the Cosine GRLVQI algorithm when compared with four algorithms in the literature. Additionally, the methodology used to create Cosine GRLVQI is generalizable to alternative measures

    Voting-based Classification for E-mail Spam Detection

    Get PDF
    The problem of spam e-mail has gained a tremendous amount of attention. Although entities tend to use e-mail spam filter applications to filter out received spam e-mails, marketing companies still tend to send unsolicited e-mails in bulk and users still receive a reasonable amount of spam e-mail despite those filtering applications. This work proposes a new method for classifying e-mails into spam and non-spam. First, several e-mail content features are extracted and then those features are used for classifying each e-mail individually. The classification results of three different classifiers (i.e. Decision Trees, Random Forests and k-Nearest Neighbor) are combined in various voting schemes (i.e. majority vote, average probability, product of probabilities, minimum probability and maximum probability) for making the final decision. To validate our method, two different spam e-mail collections were used

    Possibility Theory-Based Approach to Spam Email Detection

    Get PDF

    Self-organizing maps in computer security

    Get PDF

    Self-organizing maps in computer security

    Get PDF

    Holistic Network Defense: Fusing Host and Network Features for Attack Classification

    Get PDF
    This work presents a hybrid network-host monitoring strategy, which fuses data from both the network and the host to recognize malware infections. This work focuses on three categories: Normal, Scanning, and Infected. The network-host sensor fusion is accomplished by extracting 248 features from network traffic using the Fullstats Network Feature generator and from the host using text mining, looking at the frequency of the 500 most common strings and analyzing them as word vectors. Improvements to detection performance are made by synergistically fusing network features obtained from IP packet flows and host features, obtained from text mining port, processor, logon information among others. In addition, the work compares three different machine learning algorithms and updates the script required to obtain network features. Hybrid method results outperformed host only classification by 31.7% and network only classification by 25%. The new approach also reduces the number of alerts while remaining accurate compared with the commercial IDS SNORT. These results make it such that even the most typical users could understand alert classification messages

    Learning to Filter Text in Forum Malay Message using Naive Bayesian Technique

    Get PDF
    Applying the basic filtering technique in forum application has been discussed in [I]. The paper explains about me use of the basic naive Bayesian algorithm to classify forum messages whether clean or bad where clean message has no bad words, while bad message contains at least one bad word. In this Final Year Project paper, the application ofthe algorithm in the filtering forum messages will be discussed in the attempt to apply learning to filter forum messages

    Tackling Spam and Spoof Email

    Get PDF
    The loss of productivity due to Spam has reached a critical limit. Spoof emails have dented confidence of people in communications from organisations. This is happening in an age where email has been recognised as a cost effective way of communicating. Companies have to invest resources to increase the confidence of consumers rather than abandoning the use of emails. This leaves two avenues of pursuing the matter, either email vendors have to implement safeguards or users have to implement technology and procedures. The paper will look at ways in which spam and spoof emails are being tackled and also make suggestions on how confidence can be raised by the use of hybrid approaches

    Um Sistema Antispam de Três Estágios.

    Get PDF
    Desde sua concepção, no final dos anos 80, a rede Internet vem consolidando-se como um dos mais eficientes meios para troca de informação. O correio eletrônico, ou email, tornou-se a principal ferramenta da Internet para troca de informações. Infelizmente, porém, o correio eletrônico tornou-se alvo de oportunistas, que se valem da praticidade e do baixo custo da ferramenta para disseminar conteúdo indesejado pela rede. Emails spam ou spams são informações recebidas sem o consentimento prévio dos destinatários. Os spams, na maioria das vezes, possuem conteúdo publicitário, visando a promoção de serviços, produtos ou eventos. Acabam gerando problemas, tais como o desperdício de largura de banda da rede e perda de tempo e produtividade por parte dos servidores de emails e dos próprios usuários. Este trabalho propõe um sistema antispam de três estágios. O primeiro, o pré-processamento, analisa o conteúdo do email em busca de padrões conhecidos e realiza eliminações e/ou substituições de conteúdo para simplifica-los e uniformiza-los. O segundo estágio, a seleção de características, determina as características mais relevantes do email, segundo duas classes de e-mails - Ham e Spam. O terceiro estágio, a classificação, classifica o email. O sistema antispam é exaustivamente testado sobre três bases de dados públicas, disponíveis na Internet - SpamAssassin, LingSpam e Trec. O desempenho do sistema é avaliado segundo o percentual de classificações corretas nas duas classes - Ham e Spam. São avaliados também os tempos gastos no treinamento e teste do classificador neural, bem como os aspectos relacionados à manipulação dos emails presentes nas bases de dados. Os resultados obtidos mostram-se bastante promissores. O sistema antispam apresenta ótimo desempenho nas três bases de dados empregadas

    Computing with Granular Words

    Get PDF
    Computational linguistics is a sub-field of artificial intelligence; it is an interdisciplinary field dealing with statistical and/or rule-based modeling of natural language from a computational perspective. Traditionally, fuzzy logic is used to deal with fuzziness among single linguistic terms in documents. However, linguistic terms may be related to other types of uncertainty. For instance, different users search ‘cheap hotel’ in a search engine, they may need distinct pieces of relevant hidden information such as shopping, transportation, weather, etc. Therefore, this research work focuses on studying granular words and developing new algorithms to process them to deal with uncertainty globally. To precisely describe the granular words, a new structure called Granular Information Hyper Tree (GIHT) is constructed. Furthermore, several technologies are developed to cooperate with computing with granular words in spam filtering and query recommendation. Based on simulation results, the GIHT-Bayesian algorithm can get more accurate spam filtering rate than conventional method Naive Bayesian and SVM; computing with granular word also generates better recommendation results based on users’ assessment when applied it to search engine
    corecore