1,518 research outputs found

    EPRENNID: An evolutionary prototype reduction based ensemble for nearest neighbor classification of imbalanced data

    Get PDF
    Classification problems with an imbalanced class distribution have received an increased amount of attention within the machine learning community over the last decade. They are encountered in a growing number of real-world situations and pose a challenge to standard machine learning techniques. We propose a new hybrid method specifically tailored to handle class imbalance, called EPRENNID. It performs an evolutionary prototype reduction focused on providing diverse solutions to prevent the method from overfitting the training set. It also allows us to explicitly reduce the underrepresented class, which the most common preprocessing solutions handling class imbalance usually protect. As part of the experimental study, we show that the proposed prototype reduction method outperforms state-of-the-art preprocessing techniques. The preprocessing step yields multiple prototype sets that are later used in an ensemble, performing a weighted voting scheme with the nearest neighbor classifier. EPRENNID is experimentally shown to significantly outperform previous proposals

    Fraud Detection in Telecommunications Industry: Bridging the Gap with Random Rough Subspace Based Neural Network Ensemble Method

    Get PDF
    Fraud has been very common in the society and it affects private enterprises as well as public entities. Telecommunication companies worldwide suffer from customers who use the provided services without paying. There are also different types of telecommunication fraud such as subscription fraud, clip on fraud, call forwarding, cloning fraud, roaming fraud and calling card fraud. Thus, detection and prevention of these frauds are the main targets of the telecommunication industry. This paper addresses the various techniques of detecting fraud, giving the limitations of each technique and proposes random rough subspace-based neural network ensemble method for effective fraud detection. Keywords: Fraud, Fraud detection, Random rough subspace, Neural network, Telecommunication

    Efficient image retrieval by fuzzy rules from boosting and metaheuristic

    Get PDF
    Fast content-based image retrieval is still a challenge for computer systems. We present a novel method aimed at classifying images by fuzzy rules and local image features. The fuzzy rule base is generated in the first stage by a boosting procedure. Boosting meta-learning is used to find the most representative local features. We briefly explore the utilization of metaheuristic algorithms for the various tasks of fuzzy systems optimization. We also provide a comprehensive description of the current best-performing DISH algorithm, which represents a powerful version of the differential evolution algorithm with effective embedded mechanisms for stronger exploration and preservation of the population diversity, designed for higher dimensional and complex optimization tasks. The algorithm is used to fine-tune the fuzzy rule base. The fuzzy rules can also be used to create a database index to retrieve images similar to the query image fast. The proposed approach is tested on a state-of-the-art image dataset and compared with the bag-of-features image representation model combined with the Support Vector Machine classification. The novel method gives a better classification accuracy, and the time of the training and testing process is significantly shorter. © 2020 Marcin Korytkowski et al., published by Sciendo.program of the Polish Minister of Science and Higher Education under the name "Regional Initiative of Excellence" in the years 2019-2022 [020/RID/2018/19

    Water filtration by using apple and banana peels as activated carbon

    Get PDF
    Water filter is an important devices for reducing the contaminants in raw water. Activated from charcoal is used to absorb the contaminants. Fruit peels are some of the suitable alternative carbon to substitute the charcoal. Determining the role of fruit peels which were apple and banana peels powder as activated carbon in water filter is the main goal. Drying and blending the peels till they become powder is the way to allow them to absorb the contaminants. Comparing the results for raw water before and after filtering is the observation. After filtering the raw water, the reading for pH was 6.8 which is in normal pH and turbidity reading recorded was 658 NTU. As for the colour, the water becomes more clear compared to the raw water. This study has found that fruit peels such as banana and apple are an effective substitute to charcoal as natural absorbent

    Knowledge management overview of feature selection problem in high-dimensional financial data: Cooperative co-evolution and Map Reduce perspectives

    Get PDF
    The term big data characterizes the massive amounts of data generation by the advanced technologies in different domains using 4Vs volume, velocity, variety, and veracity-to indicate the amount of data that can only be processed via computationally intensive analysis, the speed of their creation, the different types of data, and their accuracy. High-dimensional financial data, such as time-series and space-Time data, contain a large number of features (variables) while having a small number of samples, which are used to measure various real-Time business situations for financial organizations. Such datasets are normally noisy, and complex correlations may exist between their features, and many domains, including financial, lack the al analytic tools to mine the data for knowledge discovery because of the high-dimensionality. Feature selection is an optimization problem to find a minimal subset of relevant features that maximizes the classification accuracy and reduces the computations. Traditional statistical-based feature selection approaches are not adequate to deal with the curse of dimensionality associated with big data. Cooperative co-evolution, a meta-heuristic algorithm and a divide-And-conquer approach, decomposes high-dimensional problems into smaller sub-problems. Further, MapReduce, a programming model, offers a ready-To-use distributed, scalable, and fault-Tolerant infrastructure for parallelizing the developed algorithm. This article presents a knowledge management overview of evolutionary feature selection approaches, state-of-The-Art cooperative co-evolution and MapReduce-based feature selection techniques, and future research directions

    Multiple Relevant Feature Ensemble Selection Based on Multilayer Co-Evolutionary Consensus MapReduce

    Full text link
    IEEE Although feature selection for large data has been intensively investigated in data mining, machine learning, and pattern recognition, the challenges are not just to invent new algorithms to handle noisy and uncertain large data in applications, but rather to link the multiple relevant feature sources, structured, or unstructured, to develop an effective feature reduction method. In this paper, we propose a multiple relevant feature ensemble selection (MRFES) algorithm based on multilayer co-evolutionary consensus MapReduce (MCCM). We construct an effective MCCM model to handle feature ensemble selection of large-scale datasets with multiple relevant feature sources, and explore the unified consistency aggregation between the local solutions and global dominance solutions achieved by the co-evolutionary memeplexes, which participate in the cooperative feature ensemble selection process. This model attempts to reach a mutual decision agreement among co-evolutionary memeplexes, which calls for the need for mechanisms to detect some noncooperative co-evolutionary behaviors and achieve better Nash equilibrium resolutions. Extensive experimental comparative studies substantiate the effectiveness of MRFES to solve large-scale dataset problems with the complex noise and multiple relevant feature sources on some well-known benchmark datasets. The algorithm can greatly facilitate the selection of relevant feature subsets coming from the original feature space with better accuracy, efficiency, and interpretability. Moreover, we apply MRFES to human cerebral cortex-based classification prediction. Such successful applications are expected to significantly scale up classification prediction for large-scale and complex brain data in terms of efficiency and feasibility

    IRS-BAG-Integrated Radius-SMOTE Algorithm with Bagging Ensemble Learning Model for Imbalanced Data Set Classification

    Get PDF
    Imbalanced learning problems are a challenge faced by classifiers when data samples have an unbalanced distribution among classes. The Synthetic Minority Over-Sampling Technique (SMOTE) is one of the most well-known data pre-processing methods. Problems that arise when oversampling with SMOTE are the phenomenon of noise, small disjunct samples, and overfitting due to a high imbalance ratio in a dataset. A high level of imbalance ratio and low variance conditions cause the results of synthetic data generation to be collected in narrow areas and conflicting regions among classes and make them susceptible to overfitting during the learning process by machine learning methods. Therefore, this research proposes a combination between Radius-SMOTE and Bagging Algorithm called the IRS-BAG Model. For each sub-sample generated by bootstrapping, oversampling was done using Radius SMOTE. Oversampling on the sub-sample was likely to overcome overfitting problems that might occur. Experiments were carried out by comparing the performance of the IRS-BAG model with various previous oversampling methods using the imbalanced public dataset. The experiment results using three different classifiers proved that all classifiers had gained a notable improvement when combined with the proposed IRS-BAG model compared with the previous state-of-the-art oversampling methods. Doi: 10.28991/ESJ-2023-07-05-04 Full Text: PD
    • …
    corecore