8 research outputs found

    Hybrid approach for spam email detection

    Get PDF
    On this era, email is a convenient way to enable the user to communicate everywhere in the world which it has the internet. It is because of the economic and fast method of communication. The email message can send to the single user or distribute to the group. Majority of the users does not know the life exclusive of e-mail. For this issue, it becomes an email as the medium of communication of a malicious person. This project aimed at Spam Email. This project concentrated on a hybrid approach namely Neural Network (NN) and Particle Swarm Optimization (PSO) designed to detect the spam emails. The comparisons between the hybrid approach for NN_PSO with GA algorithm and NN classifiers to show the best performance for spam detection. The Spambase used contains 1813 as spams (39.40%) and 2788 as non-spam (60.6%) implemented on these algorithms. The comparisons performance criteria based on accuracy, false positive, false negative, precision, recall and f-measure. The feature selection used by applying GA algorithm to reducing the redundant and irrelevant features. The performance of F-Measure shows that the hybrid NN_PSO, GA_NN and NN are 94.10%, 92.60% and 91.39% respectively. The results recommended using the hybrid of NN_PSO with GA algorithm for the best performance for spam email detection

    Cost-sensitive spam detection using parameters optimization and feature selection

    Get PDF
    E-mail spam is no more garbage but risk since it recently includes virus attachments and spyware agents which make the recipients' system ruined, therefore, there is an emerging need for spam detection. Many spam detection techniques based on machine learning techniques have been proposed. As the amount of spam has been increased tremendously using bulk mailing tools, spam detection techniques should counteract with it. To cope with this, parameters optimization and feature selection have been used to reduce processing overheads while guaranteeing high detection rates. However, previous approaches have not taken into account feature variable importance and optimal number of features. Moreover, to the best of our knowledge, there is no approach which uses both parameters optimization and feature selection together for spam detection. In this paper, we propose a spam detection model enabling both parameters optimization and optimal feature selection; we optimize two parameters of detection models using Random Forests (RF) so as to maximize the detection rates. We provide the variable importance of each feature so that it is easy to eliminate the irrelevant features. Furthermore, we decide an optimal number of selected features using two methods; (i) only one parameters optimization during overall feature selection and (ii) parameters optimization in every feature elimination phase. Finally, we evaluate our spam detection model with cost-sensitive measures to avoid misclassification of legitimate messages, since the cost of classifying a legitimate message as a spam far outweighs the cost of classifying a spam as a legitimate message. We perform experiments on Spambase dataset and show the feasibility of our approaches

    Dynamic Data Mining: Methodology and Algorithms

    No full text
    Supervised data stream mining has become an important and challenging data mining task in modern organizations. The key challenges are threefold: (1) a possibly infinite number of streaming examples and time-critical analysis constraints; (2) concept drift; and (3) skewed data distributions. To address these three challenges, this thesis proposes the novel dynamic data mining (DDM) methodology by effectively applying supervised ensemble models to data stream mining. DDM can be loosely defined as categorization-organization-selection of supervised ensemble models. It is inspired by the idea that although the underlying concepts in a data stream are time-varying, their distinctions can be identified. Therefore, the models trained on the distinct concepts can be dynamically selected in order to classify incoming examples of similar concepts. First, following the general paradigm of DDM, we examine the different concept-drifting stream mining scenarios and propose corresponding effective and efficient data mining algorithms. • To address concept drift caused merely by changes of variable distributions, which we term pseudo concept drift, base models built on categorized streaming data are organized and selected in line with their corresponding variable distribution characteristics. • To address concept drift caused by changes of variable and class joint distributions, which we term true concept drift, an effective data categorization scheme is introduced. A group of working models is dynamically organized and selected for reacting to the drifting concept. Secondly, we introduce an integration stream mining framework, enabling the paradigm advocated by DDM to be widely applicable for other stream mining problems. Therefore, we are able to introduce easily six effective algorithms for mining data streams with skewed class distributions. In addition, we also introduce a new ensemble model approach for batch learning, following the same methodology. Both theoretical and empirical studies demonstrate its effectiveness. Future work would be targeted at improving the effectiveness and efficiency of the proposed algorithms. Meantime, we would explore the possibilities of using the integration framework to solve other open stream mining research problems

    Novel techniques for modelling uncertain human reasoning in explainable artificial intelligence

    Get PDF
    In recent years, there has been a growing need for intelligent systems that not only are able to provide reliable predictions but can also produce explanations for their outputs. The demand for increased explainability has led to the emergence of explainable artificial intelligence (XAI) as a specific research field. In this context, fuzzy logic systems represent a promising tool thanks to their inherently interpretable structure. The use of a rule-base and linguistic terms, in fact, have allowed researchers to design models with a transparent decision process, from which it is possible to extract human-understandable explanations. The use of interval type-2 fuzzy logic in the XAI field, however, is limited: the improved performances of interval type-2 fuzzy systems and their ability to handle a higher degree of uncertainty comes at the cost of increased complexity that makes the semantic mapping between the input and outputs harder to understand intuitively. The presence of type-reduction, in some contexts fail to preserve the semantic value of the fuzzy sets and rules involved in the decision process. By semantic value, we specifically refer to the capacity of interpreting the output of the fuzzy system in respect to the pre-defined and thus understood linguistic variables used for the antecedents and consequents of the system. An attempt at increasing the explainability of interval type-2 fuzzy logic was first established by Garibaldi and Guadarrama in 2011, with the introduction of constrained type-2 fuzzy sets. However, extensive work needs to be carried out to develop the algorithms necessary for their practical use in fuzzy systems. The aim of this thesis is to extend the initial work on constrained interval type-2 fuzzy sets to develop a framework that preserves the semantic value throughout the modelling and decision process. Achieving this goal would allow the creation of a new class of fuzzy systems that show additional interpretable properties, and could further encourage the use of interval type-2 fuzzy logic in XAI. After the formal definition of the required components and theorems, different approaches are explored to develop inference algorithms that preserve the semantic value of the sets during the input-output mapping, while keeping reasonable run-times on modern computer hardware. The novel frameworks are then tested in a series of practical applications from the real world, in order to assess both their prediction performances and show the quality of the explanations these models can generate. Finally, the original definitions of constrained intervals type-2 fuzzy sets are refined to produce a novel approach which combines uncertain data and represents them using intuitive constrained interval type-2 fuzzy sets. Overall, as a result of the work presented here, it is now possible to design constrained interval type-2 fuzzy systems that preserve the enhanced semantic value provided by constrained interval-type-2 fuzzy sets throughout the inference, type-reduction and defuzzification stages. This characteristic is then used to improve the semantic interpretability of the system outputs, making constrained interval type-2 fuzzy systems a valuable alternative to interval type-2 fuzzy systems in XAI. The research presented here has resulted in three journal articles, two of which have already been published in IEEE Transactions on Fuzzy Systems, and four papers presented at the FUZZ-IEEE international conference between 2018 and 2020

    Novel techniques for modelling uncertain human reasoning in explainable artificial intelligence

    Get PDF
    In recent years, there has been a growing need for intelligent systems that not only are able to provide reliable predictions but can also produce explanations for their outputs. The demand for increased explainability has led to the emergence of explainable artificial intelligence (XAI) as a specific research field. In this context, fuzzy logic systems represent a promising tool thanks to their inherently interpretable structure. The use of a rule-base and linguistic terms, in fact, have allowed researchers to design models with a transparent decision process, from which it is possible to extract human-understandable explanations. The use of interval type-2 fuzzy logic in the XAI field, however, is limited: the improved performances of interval type-2 fuzzy systems and their ability to handle a higher degree of uncertainty comes at the cost of increased complexity that makes the semantic mapping between the input and outputs harder to understand intuitively. The presence of type-reduction, in some contexts fail to preserve the semantic value of the fuzzy sets and rules involved in the decision process. By semantic value, we specifically refer to the capacity of interpreting the output of the fuzzy system in respect to the pre-defined and thus understood linguistic variables used for the antecedents and consequents of the system. An attempt at increasing the explainability of interval type-2 fuzzy logic was first established by Garibaldi and Guadarrama in 2011, with the introduction of constrained type-2 fuzzy sets. However, extensive work needs to be carried out to develop the algorithms necessary for their practical use in fuzzy systems. The aim of this thesis is to extend the initial work on constrained interval type-2 fuzzy sets to develop a framework that preserves the semantic value throughout the modelling and decision process. Achieving this goal would allow the creation of a new class of fuzzy systems that show additional interpretable properties, and could further encourage the use of interval type-2 fuzzy logic in XAI. After the formal definition of the required components and theorems, different approaches are explored to develop inference algorithms that preserve the semantic value of the sets during the input-output mapping, while keeping reasonable run-times on modern computer hardware. The novel frameworks are then tested in a series of practical applications from the real world, in order to assess both their prediction performances and show the quality of the explanations these models can generate. Finally, the original definitions of constrained intervals type-2 fuzzy sets are refined to produce a novel approach which combines uncertain data and represents them using intuitive constrained interval type-2 fuzzy sets. Overall, as a result of the work presented here, it is now possible to design constrained interval type-2 fuzzy systems that preserve the enhanced semantic value provided by constrained interval-type-2 fuzzy sets throughout the inference, type-reduction and defuzzification stages. This characteristic is then used to improve the semantic interpretability of the system outputs, making constrained interval type-2 fuzzy systems a valuable alternative to interval type-2 fuzzy systems in XAI. The research presented here has resulted in three journal articles, two of which have already been published in IEEE Transactions on Fuzzy Systems, and four papers presented at the FUZZ-IEEE international conference between 2018 and 2020

    Special Issue on Hybrid Intelligent Systems 2007

    No full text
    Special Issue on Hybrid Intelligent Systems 2007. Neural Network World. Vol. 17, No. 6 (2007), p.505-688 The issue contains papers prepared specially for this issue by authors of some best evaluated papers presented on HIS'07) at Kaiserslautern, Germany, during September 17-19, 2007. The Current research interests in HIS and covered in this issue focus on integration of the different computing paradigms such as fuzzy logic, euro-computation, evolutionary computation, probabilistic computing, intelligent agents, machine learning, and other intelligent computing frameworks. There is also a growing interest in the role of sensors, their integration and evaluation in such frameworks. The phenomenal growth of hybrid intelligent systems and related topics has obliged
    corecore