1,024 research outputs found
Evolutionary design of nearest prototype classifiers
In pattern classification problems, many works have been carried out with the aim of designing good classifiers from different perspectives. These works achieve very good results in many domains. However, in general they are very dependent on some crucial parameters involved in the design. These parameters have to be found by a trial and error process or by some automatic methods, like heuristic search and genetic algorithms, that strongly decrease the performance of the method. For instance, in nearest prototype approaches, main parameters are the number of prototypes to use, the initial set, and a smoothing parameter. In this work, an evolutionary approach based on Nearest Prototype Classifier (ENPC) is introduced where no parameters are involved, thus overcoming all the problems that classical methods have in tuning and searching for the appropiate values. The algorithm is based on the evolution of a set of prototypes that can execute several operators in order to increase their quality in a local sense, and with a high classification accuracy emerging for the whole classifier. This new approach has been tested using four different classical domains, including such artificial distributions as spiral and uniform distibuted data sets, the Iris Data Set and an application domain about diabetes. In all the cases, the experiments show successfull results, not only in the classification accuracy, but also in the number and distribution of the prototypes achieved.Publicad
One-Class Classification: Taxonomy of Study and Review of Techniques
One-class classification (OCC) algorithms aim to build classification models
when the negative class is either absent, poorly sampled or not well defined.
This unique situation constrains the learning of efficient classifiers by
defining class boundary just with the knowledge of positive class. The OCC
problem has been considered and applied under many research themes, such as
outlier/novelty detection and concept learning. In this paper we present a
unified view of the general problem of OCC by presenting a taxonomy of study
for OCC problems, which is based on the availability of training data,
algorithms used and the application domains applied. We further delve into each
of the categories of the proposed taxonomy and present a comprehensive
literature review of the OCC algorithms, techniques and methodologies with a
focus on their significance, limitations and applications. We conclude our
paper by discussing some open research problems in the field of OCC and present
our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure
Intrusion detection in wi-fi networks by modular and optimized ensemble of classifiers
4noopenWith the breakthrough of pervasive advanced networking infrastructures and paradigms such as 5G and IoT, cybersecurity became an active and crucial field in the last years. Furthermore, machine learning techniques are gaining more and more attention as prospective tools for mining of (possibly malicious) packet traces and automatic synthesis of network intrusion detection systems. In this work, we propose a modular ensemble of classifiers for spotting malicious attacks on Wi-Fi networks. Each classifier in the ensemble is tailored to characterize a given attack class and is individually optimized by means of a genetic algorithm wrapper with the dual goal of hyper-parameters tuning and retaining only relevant features for a specific attack class. Our approach also considers a novel false alarm management procedure thanks to a proper reliability measure formulation. The proposed system has been tested on the well-known AWID dataset, showing performances comparable with other state of the art works both in terms of accuracy and knowledge discovery capabilities. Our system is also characterized by a modular design of the classification model, allowing to include new possible attack classes in an efficient way.openAccademicoGiuseppe Granato; Alessio Martino; Luca Baldini; Antonello RizziGranato, Giuseppe; Martino, Alessio; Baldini, Luca; Rizzi, Antonell
Memetic micro-genetic algorithms for cancer data classification
Fast and precise medical diagnosis of human cancer is crucial for treatment decisions. Gene selection consists of identifying a set of informative genes from microarray data to allow high predictive accuracy in human cancer classification. This task is a combinatorial search problem, and optimisation methods can be applied for its resolution. In this paper, two memetic micro-genetic algorithms (MÎĽV1 and MÎĽV2) with different hybridisation approaches are proposed for feature selection of cancer microarray data. Seven gene expression datasets are used for experimentation. The comparison with stochastic state-of-the-art optimisation techniques concludes that problem-dependent local search methods combined with micro-genetic algorithms improve feature selection of cancer microarray data.Fil: Rojas, Matias Gabriel. Universidad Nacional de Lujan. Centro de Investigacion Docencia y Extension En Tecnologias de la Informacion y Las Comunicaciones.; Argentina. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Centro CientĂfico TecnolĂłgico Conicet - Mendoza; ArgentinaFil: Olivera, Ana Carolina. Universidad Nacional de Cuyo. Facultad de IngenierĂa; Argentina. Universidad Nacional de Lujan. Centro de Investigacion Docencia y Extension En Tecnologias de la Informacion y Las Comunicaciones.; Argentina. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Centro CientĂfico TecnolĂłgico Conicet - Mendoza; ArgentinaFil: Carballido, Jessica Andrea. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Centro CientĂfico TecnolĂłgico Conicet - BahĂa Blanca. Instituto de Ciencias e IngenierĂa de la ComputaciĂłn; ArgentinaFil: Vidal, Pablo Javier. Universidad Nacional de Cuyo. Facultad de IngenierĂa; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e IngenierĂa de la ComputaciĂłn; Argentina. Consejo Nacional de Investigaciones CientĂficas y TĂ©cnicas. Centro CientĂfico TecnolĂłgico Conicet - Mendoza; Argentin
An ensemble based approach for effective intrusion detection using majority voting
Of late, Network Security Research is taking center stage given the vulnerability of computing ecosystem with networking systems increasingly falling to hackers. On the network security canvas, Intrusion detection system (IDS) is an essential tool used for timely detection of cyber-attacks. A designated set of reliable safety has been put in place to check any severe damage to the network and the user base. Machine learning (ML) is being frequently used to detect intrusion owing to their understanding of intrusion detection systems in minimizing security threats. However, several single classifiers have their limitation and pose challenges to the development of effective IDS. In this backdrop, an ensemble approach has been proposed in current work to tackle the issues of single classifiers and accordingly, a highly scalable and constructive majority voting-based ensemble model was proposed which can be employed in real-time for successfully scrutinizing the network traffic to proactively warn about the possibility of attacks. By taking into consideration the properties of existing machine learning algorithms, an effective model was developed and accordingly, an accuracy of 99%, 97.2%, 97.2%, and 93.2% were obtained for DoS, Probe, R2L, and U2R attacks and thus, the proposed model is effective for identifying intrusion
Recommended from our members
Parallelizing support vector machines for scalable image annotation
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Machine learning techniques have facilitated image retrieval by automatically classifying and annotating images with keywords. Among them Support Vector Machines (SVMs) are used extensively due to their generalization properties. However, SVM training is notably a computationally intensive process especially when the training dataset is large.
In this thesis distributed computing paradigms have been investigated to speed up SVM training, by partitioning a large training dataset into small data chunks and process each chunk in parallel utilizing the resources of a cluster of computers. A resource aware parallel SVM algorithm is introduced for large scale image annotation in parallel using a cluster of computers. A genetic algorithm based load balancing scheme is designed to optimize the performance of the algorithm in heterogeneous computing environments.
SVM was initially designed for binary classifications. However, most classification problems arising in domains such as image annotation usually involve more than two classes. A resource aware parallel multiclass SVM algorithm for large scale image annotation in parallel using a cluster of computers is introduced.
The combination of classifiers leads to substantial reduction of classification error in a wide range of applications. Among them SVM ensembles with bagging is shown to outperform a single SVM in terms of classification accuracy. However, SVM ensembles training are notably a computationally intensive process especially when the number replicated samples based on bootstrapping is large. A distributed SVM ensemble algorithm for image annotation is introduced which re-samples the training data based on bootstrapping and training SVM on each sample in parallel using a cluster of computers.
The above algorithms are evaluated in both experimental and simulation environments showing that the distributed SVM algorithm, distributed multiclass SVM algorithm, and distributed SVM ensemble algorithm, reduces the training time significantly while maintaining a high level of accuracy in classifications
Systematic Review on Missing Data Imputation Techniques with Machine Learning Algorithms for Healthcare
Missing data is one of the most common issues encountered in data cleaning process especially when dealing with medical dataset. A real collected dataset is prone to be incomplete, inconsistent, noisy and redundant due to potential reasons such as human errors, instrumental failures, and adverse death. Therefore, to accurately deal with incomplete data, a sophisticated algorithm is proposed to impute those missing values. Many machine learning algorithms have been applied to impute missing data with plausible values. However, among all machine learning imputation algorithms, KNN algorithm has been widely adopted as an imputation for missing data due to its robustness and simplicity and it is also a promising method to outperform other machine learning methods. This paper provides a comprehensive review of different imputation techniques used to replace the missing data. The goal of the review paper is to bring specific attention to potential improvements to existing methods and provide readers with a better grasps of imputation technique trends
- …