8 research outputs found

    A conceptual model of enhanced undersampling technique

    Get PDF
    Imbalanced datasets often lead to decrement of classifiers’ performance.Undersampling technique is one of the approaches that is used when dealing with imbalanced datasets problem.This paper discusses on the advantages and disadvantages of several undersampling techniques.An enhanced Distancebased undersampling technique is proposed to balance the imbalanced data that will be used for classification. The fuzzy logic has been integrated in the distance-based undersampling technique to resolve the ambiguity and bias issues

    Predicting the quantity of recycled end-of-life products using a hybrid SVR-based model

    Get PDF
    End-of-life product recycling is crucial for achieving sustainability in circular supply chains and improving resource utilization. Forecasting the quantity of recycled end-of-life products is essential for planning and managing reverse supply chain operations. Decision-makers and practitioners can benefit from this information when designing reverse logistics networks, managing tactical disposal, planning capacity, and operational production. To address the challenge of small sample data with multiple factors influencing the recycling number, and to deal with the randomness and nonlinearity of the recycling quantity, a hybrid predictive model has been developed in this research. The model is based on k-nearest neighbor mega-trend diffusion (KNNMTD), particle swarm optimization (PSO), and support vector regression (SVR) using the data from the field of end-of-life vehicles as a case study. Unlike existing literature, this research incorporates the data augmentation method to build an SVR-based model for end-of-life product recycling. The study shows that developing the predictive model using artificial virtual samples supported by the KNNMTD method is feasible, the PSO algorithm effectively brings strong approximation ability to the SVR-based model, and the KNNMTD-PSO-SVR model perform well in predicting the recycled end-of-life products quantity. These research findings could be considered a fundamental component of the smart system for circular supply chains, which will enable the smart platform to achieve supply chain sustainability through resource allocation and regional industry deployment

    Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model

    Get PDF
    To handle imbalanced datasets in machine learning or deep learning models, some studies suggest sampling techniques to generate virtual examples of minority classes to improve the models' prediction accuracy. However, for kernel-based support vector machines (SVM), some sampling methods suggest generating synthetic examples in an original data space rather than in a high-dimensional feature space. This may be ineffective in improving SVM classification for imbalanced datasets. To address this problem, we propose a novel hybrid sampling technique termed modified mega-trend-diffusion-extreme learning machine (MMTD-ELM) to effectively move the SVM decision boundary toward a region of the majority class. By this movement, the prediction of SVM for minority class examples can be improved. The proposed method combines α-cut fuzzy number method for screening representative examples of majority class and MMTD method for creating new examples of the minority class. Furthermore, we construct a bagging ELM model to monitor the similarity between new examples and original data. In this paper, four datasets are used to test the efficiency of the proposed MMTD-ELM method in imbalanced data prediction. Additionally, we deployed two SVM models to compare prediction performance of the proposed MMTD-ELM method with three state-of-the-art sampling techniques in terms of geometric mean (G-mean), F-measure (F1), index of balanced accuracy (IBA) and area under curve (AUC) metrics. Furthermore, paired t-test is used to elucidate whether the suggested method has statistically significant differences from the other sampling techniques in terms of the four evaluation metrics. The experimental results demonstrated that the proposed method achieves the best average values in terms of G-mean, F1, IBA and AUC. Overall, the suggested MMTD-ELM method outperforms these sampling methods for imbalanced datasets

    A swarm intelligence-based ensemble learning model for optimizing customer churn prediction in the telecommunications sector

    Get PDF
    In today's competitive market, predicting clients' behavior is crucial for businesses to meet their needs and prevent them from being attracted by competitors. This is especially important in industries like telecommunications, where the cost of acquiring new customers exceeds retaining existing ones. To achieve this, companies employ Customer Churn Prediction approaches to identify potential customer attrition and develop retention plans. Machine learning models are highly effective in identifying such customers; however, there is a need for more effective techniques to handle class imbalance in churn datasets and enhance prediction accuracy in complex churn prediction datasets. To address these challenges, we propose a novel two-level stacking-mode ensemble learning model that utilizes the Whale Optimization Algorithm for feature selection and hyper-parameter optimization. We also introduce a method combining K-member clustering and Whale Optimization to effectively handle class imbalance in churn datasets. Extensive experiments conducted on well-known datasets, along with comparisons to other machine learning models and existing churn prediction methods, demonstrate the superiority of the proposed approach

    An integrated approach to artificial neural network based process modelling

    Get PDF
    ANN technology exploded into the world of process modelling and control in the late 1980’s. The technology shows great promise and is seen as a technology that could provide models for most systems without the need to understand the fundamental behaviour or relationships among the process variables. Today, ANN applications have been applied successfully in a number of areas of process modelling and control, with the best-established applications being in the area of inferential measurements or soft sensors.Unfortunately, ‘the free lunch did not have much meat’. Overtime, people focused more on the true capabilities and power of ANN, the ability to model nonlinear relationships in data without having to define the form of the nonlinearity. However, there is often a tendency to merely plug in the data, turn the ANN training software on, and blindly accept the results. This is probably inevitable since, to date, there are no textbooks or scientific journal papers providing an integrated and systematic approach for ANN model development addressing pre-modelling, training and postmodelling stages. Therefore, addressing issues in those three phases of ANN model development is essential to support and to improve further applications of ANN technology in the area of process modelling and control.The model development issues in pre-modelling and training phases were addressed by reviewing current practice and existing techniques. For each issue, a novel method was proposed to improve the performance of ANN models. The new approaches were tested in a variety of benchmarking studies using artificial samples and coal property datasets from power station boilers.The research work in the post-modelling stage analysis which emphasises on taking the lid off black box model, proposes a novel technique to extract knowledge from the models and simultaneously obtain better understanding of the process. Postmodelling phase issues were addressed thoroughly including construction of prediction limit, sensitivity analysis and development of mathematical representation of the trained ANN model.Confidence intervals of the ANN models were analysed to construct the prediction boundary of the model. This analysis provides useful information related to interpolation and extrapolation of the model. It also highlighted how good the ANN models can be used for extrapolation purposes.An effort based on sensitivity analysis of hidden layers is also proposed to understand the behaviours of the ANN models. Using this technique, knowledge and information are retrieved from the developed models. A comparative study of the proposed techniques and the current practice was also presented.The last topic addressed in this thesis is knowledge extraction of ANN models using mathematical analysis of the hidden layers. The proposed analysis is applied in order to open the black box of the ANN models and is implemented to simulated and real historical plant data so that useful information from those data and better understanding of the process are obtained.All in all, efforts have been made in this thesis to minimise the use of abstract mathematical language and in some cases, simplify the language so that ANN modelling theory can be understood by a wider range of audience, especially the new practitioners in ANN based modelling and control. It is hoped that the insight provided in the dissertation will provide an integrated approach to pre-modelling, training and post-modelling stages of ANN models. This ‘new guideline’ of ANN model development is unique and beneficial, providing a systematic framework for the preparation, design, evaluation and implementation of ANN models in process modelling and control in particular and prediction / forecasting tool in general

    Lernfähiges Assistenzsystem zur Optimierung der Planung maritimer Großprojekte in der Anbahnungsphase

    Get PDF
    In der vorliegenden Dissertation wird ein digitales Assistenzsystem entwickelt, das die Planungsprozesse in der Anbahnungsphase maritimer Großprojekte unterstützt und optimiert. Dafür wird ein branchenspezifischer Simulationskern entwickelt, dessen Daten-basis mittels Machine Learning vervollständigt wird, um eine Anwendung in frühen Projektphasen trotz der vergleichsweise schlechten Datenlage überhaupt erst zu ermöglichen. Zudem wird ein Data Interface entwickelt, um die Integration der Teilsysteme hin zu einem gesamtheitlichen Assistenzsystem zu gewährleisten

    Using machine learning technique to classify geographic areas with socioeconomic potential for broadband investment in Malaysia

    Get PDF
    The telecommunication companies (TELCO) in Malaysia commonly use the return on investment (ROI) model for techno-economic analysis to strategize their network investment plan in their intended markets. The number of subscribers and average revenue per user (ARPU) are two dominant contributions to a good ROI. Rural areas are lacking in both dominant factors and thus very often fall outside the radar of TELCO’s investment plans. The government agencies, therefore, shoulder the responsibility to provide broadband services in rural areas through the implementation of national broadband initiatives, regulated policies and funding for universal service provision. This thesis outlines a framework of machine learning technique which the TELCOs and government agencies can use to plan for broadband investments in Malaysia, especially for rural areas. The framework is implemented in four stages: data collection, machine learning, machine testing, and machine application. In this framework, a curve-fitting technique will be applied to formulate an empirical model by using prototyping data from the World Bank databank. The empirical model serves as a fitness function for a genetic algorithm (GA) to generate large virtual samples to train, validate and test the support vector machines (SVM). Real-life field data for geographic areas in Malaysia are then provided to the tested SVM to predict which areas have the socioeconomic potential for broadband investment. By using this technique as a policy tool, TELCOs and government agencies will be able to prioritize areas where broadband infrastructure can be implemented using a government-industry partnership approach. Both public and private parties can share the initial cost and collect future revenues appropriately as the socioeconomic correlation coefficient improves
    corecore