2,666 research outputs found

    Improving binary classification using filtering based on k-NN proximity graphs

    Get PDF
    © 2020, The Author(s). One of the ways of increasing recognition ability in classification problem is removing outlier entries as well as redundant and unnecessary features from training set. Filtering and feature selection can have large impact on classifier accuracy and area under the curve (AUC), as noisy data can confuse classifier and lead it to catch wrong patterns in training data. The common approach in data filtering is using proximity graphs. However, the problem of the optimal filtering parameters selection is still insufficiently researched. In this paper filtering procedure based on k-nearest neighbours proximity graph was used. Filtering parameters selection was adopted as the solution of outlier minimization problem: k-NN proximity graph, power of distance and threshold parameters are selected in order to minimize outlier percentage in training data. Then performance of six commonly used classifiers (Logistic Regression, Naïve Bayes, Neural Network, Random Forest, Support Vector Machine and Decision Tree) and one heterogeneous classifiers combiner (DES-LA) are compared with and without filtering. Dynamic ensemble selection (DES) systems work by estimating the level of competence of each classifier from a pool of classifiers. Only the most competent ones are selected to classify a given test sample. This is achieved by defining a criterion to measure the level of competence of base classifiers, such as, its accuracy in local regions of the feature space around the query instance. In our case the combiner is based on the local accuracy of single classifiers and its output is a linear combination of single classifiers ranking. As results of filtering, accuracy of DES-LA combiner shows big increase for low-accuracy datasets. But filtering doesn’t have sufficient impact on DES-LA performance while working with high-accuracy datasets. The results are discussed, and classifiers, which performance was highly affected by pre-processing filtering step, are defined. The main contribution of the paper is introducing modifications to the DES-LA combiner, as well as comparative analysis of filtering impact on the classifiers of various type. Testing the filtering algorithm on real case dataset (Taiwan default credit card dataset) confirmed the efficiency of automatic filtering approach

    A Novel Chimp Optimized Linear Kernel Regression (COLKR) Model for Call Drop Prediction in Mobile Networks

    Get PDF
    Call failure can be caused by a variety of factors, including inadequate cellular infrastructure, undesirable system structuring, busy mobile phone towers, changing between towers, and many more. Outdated equipment and networks worsen call failure, and installing more towers to improve coverage might harm the regional ecosystems. In the existing studies, a variety of machine learning algorithms are implemented for call drop prediction in the mobile networks. But it facing problems in terms of high error rate, low prediction accuracy, system complexity, and more training time. Therefore, the proposed work intends to develop a new and sophisticated framework, named as, Chimp Optimized Linear Kernel Regression (COLKR) for predicting call drops in the mobile networks. For the analysis, the Call Detail Record (CDR) has been collected and used in this framework. By preprocessing the attributes, the normalized dataset is constructed using the median regression-based filtering technique. To extract the most significant features for training the classifier with minimum processing complexity, a sophisticated Chimp Optimization Algorithm (COA) is applied. Then, a new machine learning model known as the Linear Kernel Regression Model (LKRM) has been deployed to predict call drops with greater accuracy and less error. For the performance assessment of COLKR, several machine learning classifiers are compared with the proposed model using a variety of measures. By using the proposed COLKR mechanism, the call drop detection accuracy is improved to 99.4%, and the error rate is reduced to 0.098%, which determines the efficiency and superiority of the proposed system

    INFFC: An iterative class noise filter based on the fusion of classifiers with noise sensitivity control

    Get PDF
    Supported by the Projects TIN2011-28488, TIN2013-40765-P, P10-TIC-06858 and P11-TIC-7765. J.A. Saez was supported by EC under FP7, Coordination and Support Action, Grant Agreement Number 316097, ENGINE European Research Centre of Network Intelligence for Innovation Enhancement (http://engine.pwr.wroc.pl/).In classification, noise may deteriorate the system performance and increase the complexity of the models built. In order to mitigate its consequences, several approaches have been proposed in the literature. Among them, noise filtering, which removes noisy examples from the training data, is one of the most used techniques. This paper proposes a new noise filtering method that combines several filtering strategies in order to increase the accuracy of the classification algorithms used after the filtering process. The filtering is based on the fusion of the predictions of several classifiers used to detect the presence of noise. We translate the idea behind multiple classifier systems, where the information gathered from different models is combined, to noise filtering. In this way, we consider the combination of classifiers instead of using only one to detect noise. Additionally, the proposed method follows an iterative noise filtering scheme that allows us to avoid the usage of detected noisy examples in each new iteration of the filtering process. Finally, we introduce a noisy score to control the filtering sensitivity, in such a way that the amount of noisy examples removed in each iteration can be adapted to the necessities of the practitioner. The first two strategies (use of multiple classifiers and iterative filtering) are used to improve the filtering accuracy, whereas the last one (the noisy score) controls the level of conservation of the filter removing potentially noisy examples. The validity of the proposed method is studied in an exhaustive experimental study. We compare the new filtering method against several state-of-the-art methods to deal with datasets with class noise and study their efficacy in three classifiers with different sensitivity to noise.EC under FP7, Coordination and Support Action, ENGINE European Research Centre of Network Intelligence for Innovation Enhancement 316097TIN2011-28488TIN2013-40765-PP10-TIC-06858P11-TIC-776

    A Comprehensive Survey on Rare Event Prediction

    Full text link
    Rare event prediction involves identifying and forecasting events with a low probability using machine learning and data analysis. Due to the imbalanced data distributions, where the frequency of common events vastly outweighs that of rare events, it requires using specialized methods within each step of the machine learning pipeline, i.e., from data processing to algorithms to evaluation protocols. Predicting the occurrences of rare events is important for real-world applications, such as Industry 4.0, and is an active research area in statistical and machine learning. This paper comprehensively reviews the current approaches for rare event prediction along four dimensions: rare event data, data processing, algorithmic approaches, and evaluation approaches. Specifically, we consider 73 datasets from different modalities (i.e., numerical, image, text, and audio), four major categories of data processing, five major algorithmic groupings, and two broader evaluation approaches. This paper aims to identify gaps in the current literature and highlight the challenges of predicting rare events. It also suggests potential research directions, which can help guide practitioners and researchers.Comment: 44 page
    • …
    corecore