4 research outputs found

    Quiet in class: classification, noise and the dendritic cell algorithm

    Get PDF
    Theoretical analyses of the Dendritic Cell Algorithm (DCA) have yielded several criticisms about its underlying structure and operation. As a result, several alterations and fixes have been suggested in the literature to correct for these findings. A contribution of this work is to investigate the effects of replacing the classification stage of the DCA (which is known to be flawed) with a traditional machine learning technique. This work goes on to question the merits of those unique properties of the DCA that are yet to be thoroughly analysed. If none of these properties can be found to have a benefit over traditional approaches, then “fixing” the DCA is arguably less efficient than simply creating a new algorithm. This work examines the dynamic filtering property of the DCA and questions the utility of this unique feature for the anomaly detection problem. It is found that this feature, while advantageous for noisy, time-ordered classification, is not as useful as a traditional static filter for processing a synthetic dataset. It is concluded that there are still unique features of the DCA left to investigate. Areas that may be of benefit to the Artificial Immune Systems community are suggested

    Detecting Malicious Spam Mails: An Online Machine Learning Approach

    No full text

    Using biased discriminant analysis for email filtering

    Get PDF
    This paper reports on email filtering based on content features. We test the validity of a novel statistical feature extraction method, which relies on dimensionality reduction to retain the most informative and discriminative features from messages. The approach, named Biased Discriminant Analysis (BDA), aims at finding a feature space transformation that closely clusters positive examples while pushing away the negative ones. This method is an extension of Linear Discriminant Analysis (LDA), but introduces a different transformation to improve the separation between classes and it has up till now not been applied for text mining tasks. We successfully test BDA under two schemas. The first one is a traditional classification scenario using a 10-fold cross validation for four ground truth standard corpora: LingSpam, SpamAssassin, Phishing corpus and a subset of the TREC 2007 spam corpus. In the second schema we test the anticipatory properties of the statistical features with the TREC 2007 spam corpus. The contributions of this work is the evidence that BDA offers better discriminative features for email filtering, gives stable classification results notwithstanding the amount of features chosen, and robustly retains their discriminative value over time.status: publishe

    A DBN-Based Classifying Approach to Discover the Internet Water Army

    No full text
    Part 3: Web MiningInternational audienceThe Internet water army (IWA) usually refers to hidden paid posters and collusive spammers, which has already generated big threats for cyber security. Many researchers begin to study how to effectively identify the IWA. Currently, most efforts to distinguish non-IWA and IWA in data mining context focus on utilizing classification-based algorithms, including Bayesian Network, SVM, KNN and etc... However, Bayesian Network need strong conditional independence assumption, KNN has big computation costs, above approach may affect the effectiveness to some extent in real industrial applications. Hence, Neural Networks-like deep approach for IWA identification gradually becomes an emerging but possible direction and attempt. Unfortunately, there also exists one main problem, which is how to balance the deep learning and computation costs in hierarchical architecture. More specially, combine leaning-level heuristic training design and computing-level concurrent computation is a challenging issue. In this paper, we propose a collaborative hierarchical approach based on the deep belief network (DBN) for IWA identification. Firstly, a DBN-based collaborative model with hierarchical classifying mechanism is built. Then towards Hadoop platform, the Downpour Stochastic gradient descent (Downpour SGD) is exploited for DBN pre-training. Finally, the dynamical workflow will be designed for managing the whole learning-based classifying process. The experimental evaluation shows that the valid of our approach
    corecore