Search CORE

8 research outputs found

A conceptual model of enhanced undersampling technique

Author: Ku-Mahamud Ku Ruhana
Mohamed Din Aniza
Zorkeflee Maisarah
Publication venue
Publication date: 12/08/2014
Field of study

Imbalanced datasets often lead to decrement of classifiers’ performance.Undersampling technique is one of the approaches that is used when dealing with imbalanced datasets problem.This paper discusses on the advantages and disadvantages of several undersampling techniques.An enhanced Distancebased undersampling technique is proposed to balance the imbalanced data that will be used for classification. The fuzzy logic has been integrated in the distance-based undersampling technique to resolve the ambiguity and bias issues

UUM Repository

Predicting the quantity of recycled end-of-life products using a hybrid SVR-based model

Author: Han Ji
Milisavljevic-Syed Jelena
Xia Hanbing
Publication venue: American Society of Mechanical Engineers
Publication date: 21/11/2023
Field of study

End-of-life product recycling is crucial for achieving sustainability in circular supply chains and improving resource utilization. Forecasting the quantity of recycled end-of-life products is essential for planning and managing reverse supply chain operations. Decision-makers and practitioners can benefit from this information when designing reverse logistics networks, managing tactical disposal, planning capacity, and operational production. To address the challenge of small sample data with multiple factors influencing the recycling number, and to deal with the randomness and nonlinearity of the recycling quantity, a hybrid predictive model has been developed in this research. The model is based on k-nearest neighbor mega-trend diffusion (KNNMTD), particle swarm optimization (PSO), and support vector regression (SVR) using the data from the field of end-of-life vehicles as a case study. Unlike existing literature, this research incorporates the data augmentation method to build an SVR-based model for end-of-life product recycling. The study shows that developing the predictive model using artificial virtual samples supported by the KNNMTD method is feasible, the PSO algorithm effectively brings strong approximation ability to the SVR-based model, and the KNNMTD-PSO-SVR model perform well in predicting the recycled end-of-life products quantity. These research findings could be considered a fundamental component of the smart system for circular supply chains, which will enable the smart platform to achieve supply chain sustainability through resource allocation and regional industry deployment

Cranfield CERES

Improved support vector machine classification for imbalanced medical datasets by novel hybrid sampling combining modified mega-trend-diffusion and bagging extreme learning machine model

Author: Chen-Huan Kao
Hao-Hsuan Chen
Hung-Yu Chen
Liang-Sian Lin
Yi-Jie Li
Publication venue: AIMS Press
Publication date: 01/09/2023
Field of study

To handle imbalanced datasets in machine learning or deep learning models, some studies suggest sampling techniques to generate virtual examples of minority classes to improve the models' prediction accuracy. However, for kernel-based support vector machines (SVM), some sampling methods suggest generating synthetic examples in an original data space rather than in a high-dimensional feature space. This may be ineffective in improving SVM classification for imbalanced datasets. To address this problem, we propose a novel hybrid sampling technique termed modified mega-trend-diffusion-extreme learning machine (MMTD-ELM) to effectively move the SVM decision boundary toward a region of the majority class. By this movement, the prediction of SVM for minority class examples can be improved. The proposed method combines α-cut fuzzy number method for screening representative examples of majority class and MMTD method for creating new examples of the minority class. Furthermore, we construct a bagging ELM model to monitor the similarity between new examples and original data. In this paper, four datasets are used to test the efficiency of the proposed MMTD-ELM method in imbalanced data prediction. Additionally, we deployed two SVM models to compare prediction performance of the proposed MMTD-ELM method with three state-of-the-art sampling techniques in terms of geometric mean (G-mean), F-measure (F1), index of balanced accuracy (IBA) and area under curve (AUC) metrics. Furthermore, paired t-test is used to elucidate whether the suggested method has statistically significant differences from the other sampling techniques in terms of the four evaluation metrics. The experimental results demonstrated that the proposed method achieves the best average values in terms of G-mean, F1, IBA and AUC. Overall, the suggested MMTD-ELM method outperforms these sampling methods for imbalanced datasets

Directory of Open Access Journals

A swarm intelligence-based ensemble learning model for optimizing customer churn prediction in the telecommunications sector

Author: Ali Taghizadeh Herat
Alireza Tamjid Yamcholo
Asghar Darigh
Bijan Moradi
Mehran Khalaj
Publication venue: AIMS Press
Publication date: 01/01/2024
Field of study

In today's competitive market, predicting clients' behavior is crucial for businesses to meet their needs and prevent them from being attracted by competitors. This is especially important in industries like telecommunications, where the cost of acquiring new customers exceeds retaining existing ones. To achieve this, companies employ Customer Churn Prediction approaches to identify potential customer attrition and develop retention plans. Machine learning models are highly effective in identifying such customers; however, there is a need for more effective techniques to handle class imbalance in churn datasets and enhance prediction accuracy in complex churn prediction datasets. To address these challenges, we propose a novel two-level stacking-mode ensemble learning model that utilizes the Whale Optimization Algorithm for feature selection and hyper-parameter optimization. We also introduce a method combining K-member clustering and Whale Optimization to effectively handle class imbalance in churn datasets. Extensive experiments conducted on well-known datasets, along with comparisons to other machine learning models and existing churn prediction methods, demonstrate the superiority of the proposed approach

Directory of Open Access Journals

An integrated approach to artificial neural network based process modelling

Author: Saptoro Agus
Publication venue: Curtin University
Publication date: 01/01/2010
Field of study

ANN technology exploded into the world of process modelling and control in the late 1980’s. The technology shows great promise and is seen as a technology that could provide models for most systems without the need to understand the fundamental behaviour or relationships among the process variables. Today, ANN applications have been applied successfully in a number of areas of process modelling and control, with the best-established applications being in the area of inferential measurements or soft sensors.Unfortunately, ‘the free lunch did not have much meat’. Overtime, people focused more on the true capabilities and power of ANN, the ability to model nonlinear relationships in data without having to define the form of the nonlinearity. However, there is often a tendency to merely plug in the data, turn the ANN training software on, and blindly accept the results. This is probably inevitable since, to date, there are no textbooks or scientific journal papers providing an integrated and systematic approach for ANN model development addressing pre-modelling, training and postmodelling stages. Therefore, addressing issues in those three phases of ANN model development is essential to support and to improve further applications of ANN technology in the area of process modelling and control.The model development issues in pre-modelling and training phases were addressed by reviewing current practice and existing techniques. For each issue, a novel method was proposed to improve the performance of ANN models. The new approaches were tested in a variety of benchmarking studies using artificial samples and coal property datasets from power station boilers.The research work in the post-modelling stage analysis which emphasises on taking the lid off black box model, proposes a novel technique to extract knowledge from the models and simultaneously obtain better understanding of the process. Postmodelling phase issues were addressed thoroughly including construction of prediction limit, sensitivity analysis and development of mathematical representation of the trained ANN model.Confidence intervals of the ANN models were analysed to construct the prediction boundary of the model. This analysis provides useful information related to interpolation and extrapolation of the model. It also highlighted how good the ANN models can be used for extrapolation purposes.An effort based on sensitivity analysis of hidden layers is also proposed to understand the behaviours of the ANN models. Using this technique, knowledge and information are retrieved from the developed models. A comparative study of the proposed techniques and the current practice was also presented.The last topic addressed in this thesis is knowledge extraction of ANN models using mathematical analysis of the hidden layers. The proposed analysis is applied in order to open the black box of the ANN models and is implemented to simulated and real historical plant data so that useful information from those data and better understanding of the process are obtained.All in all, efforts have been made in this thesis to minimise the use of abstract mathematical language and in some cases, simplify the language so that ANN modelling theory can be understood by a wider range of audience, especially the new practitioners in ANN based modelling and control. It is hoped that the insight provided in the dissertation will provide an integrated approach to pre-modelling, training and post-modelling stages of ANN models. This ‘new guideline’ of ANN model development is unique and beneficial, providing a systematic framework for the preparation, design, evaluation and implementation of ANN models in process modelling and control in particular and prediction / forecasting tool in general

espace@Curtin

Recommended from our members

Quantitative image analysis of peripheral nerves in whiplash injury patients

Author: Anantharaman Kamakshi Pradeep
Publication venue
Publication date: 21/03/2018
Field of study

The research in this thesis has examined the use of texture and shape analysis to characterise Magnetic Resonance (MR) images of peripheral nerves in order to provide a potential quantitative tool for better diagnosis and treatments. Texture and shape can be considered as inherent properties of all surfaces and have the potential to provide sensitive information which cannot be quantitatively perceived by human vision. Texture analysis has been successfully used in image classification of aerial and satellite imagery and the diagnosis and prognosis of several types of cancer. However, to date, it has never been used in investigating peripheral nerve damage. In this thesis, we study the application of texture and shape analysis to the peripheral nerves in the upper extremities of patients suffering from Whiplash Associated Disorders (WAD). Specifically, quantitative texture analysis was performed on MR images of the carpal tunnel which contains the median nerve. The median nerve was studied to identify differences in textural patterns. Texture methods such as: first order features; co-occurrence matrices; run-length matrices and autocorrelation function were applied and their performance was assessed. Texture analysis was also performed to investigate nerve damage in the MR images of the brachial plexus, both in controls and patients. Further, spatial domain shape metrics were used to quantify and study the morphological differences of the median nerve in controls and patients. This highlighted that some significant differences exist between groups and thus could potentially be reliably used in combination with clinical scale metrics to identify possible nerve damage. As MR images contain noise, locating the median nerve accurately to perform image analysis is very important. Therefore, we further investigated the application of an enhanced correlation filtering method that could be trained on images of the median nerve and then applied to detect the median nerve in test images. The Optimal Trade-off Maximum Average Correlation Height (OT-MACH) filter includes the expected distortions in the target in the construction of the filter reference function. The OT-MACH filter was tuned in a bandpass to maximize the correlation peak and thereby successfully locate the position of the median nerve in the carpal tunnel. This study has successfully demonstrated that texture and shape analysis can be used to investigate possible peripheral nerve damage. Further research is required using larger datasets to establish a quantitative image analysis tool to support clinical decision making and thereby improve patient care and treatment outcome

Sussex Research Online

Lernfähiges Assistenzsystem zur Optimierung der Planung maritimer Großprojekte in der Anbahnungsphase

Author: Illgen Benjamin (gnd: 1261263243)
Publication venue: Universität Rostock Rostock
Publication date
Field of study

In der vorliegenden Dissertation wird ein digitales Assistenzsystem entwickelt, das die Planungsprozesse in der Anbahnungsphase maritimer Großprojekte unterstützt und optimiert. Dafür wird ein branchenspezifischer Simulationskern entwickelt, dessen Daten-basis mittels Machine Learning vervollständigt wird, um eine Anwendung in frühen Projektphasen trotz der vergleichsweise schlechten Datenlage überhaupt erst zu ermöglichen. Zudem wird ein Data Interface entwickelt, um die Integration der Teilsysteme hin zu einem gesamtheitlichen Assistenzsystem zu gewährleisten

Rostocker Dokumentenserver

Using machine learning technique to classify geographic areas with socioeconomic potential for broadband investment in Malaysia

Author: Tan Taik Guan
Publication venue
Publication date: 27/02/2022
Field of study

The telecommunication companies (TELCO) in Malaysia commonly use the return on investment (ROI) model for techno-economic analysis to strategize their network investment plan in their intended markets. The number of subscribers and average revenue per user (ARPU) are two dominant contributions to a good ROI. Rural areas are lacking in both dominant factors and thus very often fall outside the radar of TELCO’s investment plans. The government agencies, therefore, shoulder the responsibility to provide broadband services in rural areas through the implementation of national broadband initiatives, regulated policies and funding for universal service provision. This thesis outlines a framework of machine learning technique which the TELCOs and government agencies can use to plan for broadband investments in Malaysia, especially for rural areas. The framework is implemented in four stages: data collection, machine learning, machine testing, and machine application. In this framework, a curve-fitting technique will be applied to formulate an empirical model by using prototyping data from the World Bank databank. The empirical model serves as a fitness function for a genetic algorithm (GA) to generate large virtual samples to train, validate and test the support vector machines (SVM). Real-life field data for geographic areas in Malaysia are then provided to the tested SVM to predict which areas have the socioeconomic potential for broadband investment. By using this technique as a policy tool, TELCOs and government agencies will be able to prioritize areas where broadband infrastructure can be implemented using a government-industry partnership approach. Both public and private parties can share the initial cost and collect future revenues appropriately as the socioeconomic correlation coefficient improves

Nottingham eTheses