5 research outputs found

    Penerapan Metode Random Over-Under Sampling dan Random Forest Untuk Klasifikasi Penilaian Kredit

    Get PDF
    Abstrak Penilaian kredit telah menjadi salah satu cara utama bagi sebuah lembaga keuangan untuk menilai resiko kredit,  meningkatkan arus kas, mengurangi kemungkinan resiko dan membuat keputusan manajerial. Salah satu permasalahan yang dihadapai pada penilaian kredit yaitu adanya ketidakseimbangan distribusi dataset. Metode untuk mengatasi ketidakseimbangan kelas yaitu dengan metode resampling, seperti menggunakan Oversampling, undersampling dan hibrida yaitu dengan menggabungkan kedua pendekatan sampling. Metode yang diusulkan pada penelitian ini adalah penerapan metode Random Over-Under Sampling Random Forest untuk meningkatkan kinerja akurasi klasifikasi penilaian kredit pada dataset German Credit.  Hasil pengujian menunjukan bahwa klasifikasi tanpa melalui proses resampling menghasilkan kinerja akurasi rata-rata 70 % pada semua classifier. Metode Random Forest memiliki nilai akurasi yang lebih baik dibandingkan dengan beberapa metode lainnya dengan nilai akurasi sebesar 0,76 atau 76%. Sedangkan klasifikasi dengan penerapan metode Random Over-under sampling Random Forest  dapat meningkatkan kinerja akurasi sebesar 14,1% dengan nilai akurasi sebesar 0,901 atau 90,1 %. Hasil penelitian menunjukan bahwa penerapan  resampling dengan metode Random Over-Under Sampling pada algoritma Random Forest dapat meningkatkan kinerja akurasi secara efektif pada klasifikasi  tidak seimbang untuk penilaian kredit pada dataset German Credit.   Kata kunci: Penilaian Kredit, Random Forest, Klasifikasi, ketidakseimbangan kelas, Random Over-Under Sampling                                                   Abstract Credit scoring has become one of the main ways for a financial institution to assess credit risk, improve cash flow, reduce the possibility of risk and make managerial decisions. One of the problems faced by credit scoring is the imbalance in the distribution of datasets. The method to overcome class imbalances is the resampling method, such as using Oversampling, undersampling and hybrids by combining both sampling approaches. The method proposed in this study is the application of the Random Over-Under Sampling Random Forest method to improve the accuracy of the credit scoring classification performance on German Credit dataset. The test results show that the classification without going through the resampling process results in an average accuracy performance of 70% for all classifiers. The Random Forest method has a better accuracy value compared to some other methods with an accuracy value of 0.76 or 76%. While classification by applying the Random Over-under sampling + Random Forest method can improve accuracy performance 14.1% with an accuracy value of 0.901 or 90.1%. The results showed that the application of resampling using Random Over-Under Sampling method in the Random Forest algorithm can improve accuracy performance effectively on an unbalanced classification for credit scoring on German Credit dataset.   Keywords: Imbalance Class, Credit Scoring, Random Forest, Classification, Resamplin

    Development of a Data-Driven Soft Sensor for Multivariate Chemical Processes Using Concordance Correlation Coefficient Subsets Integrated with Parallel Inverse-Free Extreme Learning Machine

    Get PDF
    Nonlinearity, complexity, and technological limitations are causes of troublesome measurements in multivariate chemical processes. In order to deal with these problems, a soft sensor based on concordance correlation coefficient subsets integrated with parallel inverse-free extreme learning machine (CCCS-PIFELM) is proposed for multivariate chemical processes. In comparison to the forward propagation architecture of neural network with a single hidden layer, i.e., a traditional extreme learning machine (ELM), the CCCS-PIFELM approach has two notable points. Firstly, there are two subsets obtained through the concordance correlation coefficient (CCC) values between input and output variables. Hence, impacts of input variables on output variables can be assessed. Secondly, an inverse-free algorithm is used to reduce the computational load. In the evaluation of the prediction performance, the Tennessee Eastman (TE) benchmark process is employed as a case study to develop the CCCS-PIFELM approach for predicting product compositions. According to the simulation results, the proposed CCCS-PIFELM approach can obtain higher prediction accuracy compared to traditional approaches

    Which Channel to Ask My Question? Personalized Customer Service Request Stream Routing using Deep Reinforcement Learning

    Full text link
    Customer services are critical to all companies, as they may directly connect to the brand reputation. Due to a great number of customers, e-commerce companies often employ multiple communication channels to answer customers' questions, for example, chatbot and hotline. On one hand, each channel has limited capacity to respond to customers' requests, on the other hand, customers have different preferences over these channels. The current production systems are mainly built based on business rules, which merely considers tradeoffs between resources and customers' satisfaction. To achieve the optimal tradeoff between resources and customers' satisfaction, we propose a new framework based on deep reinforcement learning, which directly takes both resources and user model into account. In addition to the framework, we also propose a new deep-reinforcement-learning based routing method-double dueling deep Q-learning with prioritized experience replay (PER-DoDDQN). We evaluate our proposed framework and method using both synthetic and a real customer service log data from a large financial technology company. We show that our proposed deep-reinforcement-learning based framework is superior to the existing production system. Moreover, we also show our proposed PER-DoDDQN is better than all other deep Q-learning variants in practice, which provides a more optimal routing plan. These observations suggest that our proposed method can seek the trade-off where both channel resources and customers' satisfaction are optimal.Comment: 13 pages, 7 figure

    Performance analysis and characterisation of a high concentrating solar photovoltaic receiver

    Get PDF
    Solar energy is deemed to be one the most efficient and clean energy resources to generate electricity. Photovoltaic technologies have a promising future in space and terrestrial applications. Photovoltaic concentrating is a technique to increase the conversion efficiency of high-efficiency solar cells. Multi-junction solar cells are designed to exploit a larger range of solar spectrum photons and convert to electricity. In this study, triple-junction III–V solar cells compound consisting of GaInP/GaInAs/Ge semiconductor materials is considered. This work investigates terrestrial multi-junction solar cells performance characterisation, which is important for the design of high concentration photovoltaic systems. The research has developed a model of a III–V solar cell operating at high flux conditions induced by light concentration. The thermal management on such an assembly is a focus of this work. This research also presents the effects of Air Mass (AM) on solar cell performance. This atmospheric parameter has a strong influence on the behaviour of high concentrating photovoltaic solar cells. As air mass increases, the corresponding Direct Normal Irradiance (DNI) and Cell Temperature (Tc) decrease. The effects of air mass (AM =1–10D) atmospheric changes on triple-junction solar cells have been assessed. For High Concentration Photovoltaic (HCPV) the light concentration on to a relatively small solar cell area leads to high power densities. Effective thermal management is essential to avoid damaging high temperatures. A thermal model by using a convergent iterative technique has been developed; the predicted convergent cell temperature limit is ≤ 80oC. The proportion of the incident radiation not converted to electricity leads to the generation of heat; this is a function of material temperature coefficients and current mismatch in variable atmospheric conditions and results in an increase in cell temperature. The rate of heat loss by convective transfer is also considered for air mass values AM =1.5, 4 and 8D. In addition, a Finite Element Method (FEM) model is developed in COMSOL Multiphysics® in order to predict the temperature distribution of the PV cells and thermal behaviour of the receiver assembly. Furthermore, in this study, a transient model of the HCPV cell has been developed using MATLAB® Live-Link with COMSOL Multiphysics. In order to characterise the behaviour of a triple-junction solar cell, it is essential to find the transient cell operating temperature. The behaviour of electrical parameters of the Jsc, Voc, FF and conversion efficiency are considered. However, in the proposed model, a dynamical efficiency is compared with constant efficiency and the error is about 12%. The research has given a better understanding of the overall daily/annual performance prediction of CPVs and is important for future system design in variable environment conditions. At higher values of DNI, Tamb and lower AM the thermal response needs enhanced/forced convection to maintain cell operation within/below safe operating temperature and to optimise energy yield. For long-term performance evaluation, the average of monthly variations of atmospheric parameters throughout the year is considered. Thus, during the summer months, a higher record of the atmospheric parameters values in which need more consideration. The annual cell operating temperature of ˃ 80oC represents about 13% of the time, which happened during the Summer season. As is noted, the cell temperature between 65 – 70oC is predominate in the Spring and Autumn seasons and represent about 24%, (the highest frequency)

    Evolutionary multivariate time series prediction

    Get PDF
    Multivariate time series (MTS) prediction plays a significant role in many practical data mining applications, such as finance, energy supply, and medical care domains. Over the years, various prediction models have been developed to obtain robust and accurate prediction. However, this is not an easy task by considering a variety of key challenges. First, not all channels (each channel represents one time series) are informative (channel selection). Considering the complexity of each selected time series, it is difficult to predefine a time window used for inputs. Second, since the selected time series may come from cross domains collected with different devices, they may require different feature extraction techniques by considering suitable parameters to extract meaningful features (feature extraction), which influences the selection and configuration of the predictor, i.e., prediction (configuration). The challenge arising from channel selection, feature extraction, and prediction (configuration) is to perform them jointly to improve prediction performance. Third, we resort to ensemble learning to solve the MTS prediction problem composed of the previously mentioned operations,  where the challenge is to obtain a set of models satisfied both accurate and diversity. Each of these challenges leads to an NP-hard combinatorial optimization problem, which is impossible to be solved using the traditional methods since it is non-differentiable. Evolutionary algorithm (EA), as an efficient metaheuristic stochastic search technique, which is highly competent to solve complex combinatorial optimization problems having mixed types of decision variables, may provide an effective way to address the challenges arising from MTS prediction. The main contributions are supported by the following investigations. First, we propose a discrete evolutionary model, which mainly focuses on seeking the influential subset of channels of MTS and the optimal time windows for each of the selected channels for the MTS prediction task. A comprehensively experimental study on a real-world electricity consumption data with auxiliary environmental factors demonstrates the efficiency and effectiveness of the proposed method in searching for the informative time series and respective time windows and parameters in a predictor in comparison to the result obtained through enumeration. Subsequently, we define the basic MTS prediction pipeline containing channel selection, feature extraction, and prediction (configuration). To perform these key operations, we propose an evolutionary model construction (EMC) framework to seek the optimal subset of channels of MTS, suitable feature extraction methods and respective time windows applied to the selected channels, and parameter settings in the predictor simultaneously for the best prediction performance. To implement EMC, a two-step EA is proposed, where the first step EA mainly focuses on channel selection while in the second step, a specially designed EA works on feature extraction and prediction (configuration). A real-world electricity data with exogenous environmental information is used and the whole dataset is split into another two datasets according to holiday and nonholiday events. The performance of EMC is demonstrated on all three datasets in comparison to hybrid models and some existing methods. Then, based on the prediction pipeline defined previously, we propose an evolutionary multi-objective ensemble learning model (EMOEL) by employing multi-objective evolutionary algorithm (MOEA) subjected to two conflicting objectives, i.e., accuracy and model diversity. MOEA leads to a pareto front (PF) composed of non-dominated optimal solutions, where each of them represents the optimal subset of the selected channels, the selected feature extraction methods and the selected time windows, and the selected parameters in the predictor. To boost ultimate prediction accuracy, the models with respect to these optimal solutions are linearly combined with combination coefficients being optimized via a single-objective task-oriented EA. The superiority of EMOEL is identified on electricity consumption data with climate information in comparison to several state-of-the-art models. We also propose a multi-resolution selective ensemble learning model, where multiple resolutions are constructed from the minimal granularity using statistics. At the current time stamp, the preceding time series data is sampled at different time intervals (i.e., resolutions) to constitute the time windows. For each resolution, multiple base learners with different parameters are first trained. Feature selection technique is applied to search for the optimal set of trained base learners and least square regression is used to combine them. The performance of the proposed ensemble model is verified on the electricity consumption data for the next-step and next-day prediction. Finally, based on EMOEL and multi-resolution, instead of only combining the models generated from each PF, we propose an evolutionary ensemble learning (EEL) framework, where multiple PFs are aggregated to produce a composite PF (CPF) after removing the same solutions in PFs and being sorted into different levels of non-dominated fronts (NDFs). Feature selection techniques are applied to exploit the optimal subset of models in level-accumulated NDF and least square is used to combine the selected models. The performance of EEL that chooses three different predictors as base learners is evaluated by the comprehensive analysis of the parameter sensitivity. The superiority of EEL is demonstrated in comparison to the best result from single-objective EA and the best individual from the PF, and several state-of-the-art models across electricity consumption and air quality datasets, both of which use the environmental factors from other domains as the auxiliary factors. In summary, this thesis provides studies on how to build efficient and effective models for MTS prediction. The built frameworks investigate the influential factors, consider the pipeline composed of channel selection, feature extraction, and prediction (configuration) simultaneously, and keep good generalization and accuracy across different applications. The proposed algorithms to implement the frameworks use techniques from evolutionary computation (single-objective EA and MOEA), machine learning and data mining areas. We believe that this research provides a significant step towards constructing robust and accurate models for solving MTS prediction problems. In addition, with the case study on electricity consumption prediction, it will contribute to helping decision-makers in determining the trend of future energy consumption for scheduling and planning of the operations of the energy supply system
    corecore