9 research outputs found

    Feature Selection Based on Multi-Filters for Classification of Mammogram Images to Look for Signs of Breast Cancer

    Get PDF
    The accuracy of classification results on mammogram images has a significant role in breast cancer diagnosis. Therefore, many stages consider finding the model has a high level of accuracy and minimizing the computing load, one of which is the accuracy in using the best feature. This needs to be prioritized considering that mammogram image has many features resulting from the mammogram extraction process. Our research has four stages: feature extraction, feature selection-multi filters, classification, and performance evaluation. Thus, in this research, we propose algorithms that can select the features by utilizing multiple filters simultaneously on the filter model for feature selection of mammogram images based on multi-filters/FSbMF. There are six feature selection algorithms with a filter approach (information gain, rule, relief, correlation, gini index, and chi-square) used in this research. Based on the testing result using 10-fold cross-validation, the features resulting from the FSbMF algorithm have the best performance based on the accuracy, recall, and precision from 72,63%, 70,38%, 75,01% to be 100%. Furthermore, the number of resulting features is the minimum because it results from intersection operation from the feature subsets resulting from the multi-filter

    Average Weight Information Gain Untuk Menangani Data Berdimensi Tinggi Menggunakan Algoritma C4.5

    Get PDF
    Abstract. In the recent decades, a large data are stored by companies and organizations. In terms of use, big data will be useless if not processed into information according to the usability. The method used to process data into information is called data mining. The problem in data mining especially classification is data with a number of attributes that many and each attribute are irrelevant. This study proposes attribute weighting method using weight information gain method, then the attribute weights calculates the average value. Having calculated the average value of the attribute selection, the selected attributes are those with a value weights above average value. Attributes are selected then performed using an algorithm C4.5 classification, this method is named Average Weight Information Gain聽 C4.5 (AWEIG-C4.5). The results show that AWEIG-C4.5 method is better than C4.5 method with the accuracy of the average value of each is 0.906 and 0.898.聽Keywords: data mining, high dimensional data, weight information gain, C4.5 algorithmAbstrak. Dalam beberapa dekade terakhir, data yang besar disimpan oleh perusahaan dan organisasi. Dari segi penggunaan, data besar tersebut akan menjadi tidak berguna jika tidak diolah menjadi informasi yang sesuai dengan kegunaan. Metode yang digunakan untuk mengolah data menjadi informasi adalah data mining. Masalah dalam data mining khususnya klasifikasi adalah data dengan jumlah atribut yang banyak atau dalam bahasa komputer disebut data berdimensi tinggi. Pada penelitian ini diusulkan metode pembobotan atribut menggunakan metode weight information gain, kemudian bobot atribut tersebut dihitung nilai rata-rata. Setelah dihitung nilai rata-rata dilakukan pemilihan atribut, atribut yang dipilih adalah atribut dengan nilai bobot di atas nilai rata-rata. Atribut yang terpilih kemudian dilakukan klasifikasi menggunakan algoritma C4.5, metode ini diberi nama Average Weight Information Gain C4.5 (AWEIG-C4.5). Hasil penelitian menunjukkan metode AWEIG-C4.5 lebih baik daripada metode C4.5 dengan nilai rata-rata akurasi masing-masing adalah 0,906 dan 0,898. Dari uji paired t-Test terdapat perbedaan signifikan antara metode AWEIG C4.5 dengan metode C4.5.Kata Kunci: data mining, data berdimensi tinggi, weight information gain, algoritma C4.

    Multivariate Time Series Forecasting Of Crude Palm Oil Price Using Machine Learning Techniques

    Get PDF
    The aim of this paper was to study the correlation between crude palm oil (CPO) price,selected vegetable oil prices (such as soybean oil,coconut oil,and olive oil, rapeseed oil and sunflower oil),crude oil and the monthly exchange rate.Comparative analysis was then performed on CPO price forecasting results using the machine learning techniques.Monthly CPO prices,selected vegetable oil prices,crude oil prices and monthly exchange rate data from January 1987 to February 2017 were utilized. Preliminary analysis showed a positive and high correlation between the CPO price and soy bean oil price and also between CPO price and crude oil price. Experiments were conducted using multi-layer perception, support vector regression and Holt Winter exponential smoothing techniques.The results were assessed by using criteria of root mean square error (RMSE),means absolute error (MAE),means absolute percentage error (MAPE) and Direction of accuracy (DA).Among these three techniques, support vector regression(SVR) with Sequential minimal optimization (SMO) algorithm showed relatively better results compared to multi-layer perceptron and Holt Winters exponential smoothing method

    Machine Learning for Load Profile Data Analytics and Short-term Load Forecasting

    Get PDF
    Short-term load forecasting (STLF) is a key issue for the operation and dispatch of day ahead energy market. It is a prerequisite for the economic operation of power systems and the basis of dispatching and making startup-shutdown plans, which plays a key role in the automatic control of power systems. Accurate power load forecasting not only help users choose a more appropriate electricity consumption scheme and reduces a lot of electric cost expenditure but also is conducive to optimizing the resources of power systems. This advantage helps while improving equipment utilization for reducing the production cost and improving the economic benefit, and improving power supply capability. Therefore, ultimately achieving the aim of efficient demand response program. This thesis outlines some machine learning based data driven models for STLF in smart grid. It also presents different policies and current statuses as well as future research direction for developing new STLF models. This thesis outlines three projects for load profile data analytics and machine learning based STLF models. First project is, load profile classification and determining load demand variability with the aim to estimate the load demand of a customer. In this project load profile data collected from smart meter are classified using recently developed extended nearest neighbor (ENN) algorithm. Here we have calculated generalized class wise statistics which will give the idea of load demand variability of a customer. Finally the load demand of a particular customer is estimated based on generalized class wise statistics, maximum load demand and minimum load demand. In the second project, a composite ENN model is proposed for STLF. The ENN model is proposed to improve the performance of k-nearest neighbor (kNN) algorithm based STLF models. In this project we have developed three individual models to process weather data i.e., temperature, social variables, and load demand data. The load demand is predicted separately for different input variables. Finally the load demand is forecasted from the weighted average of three models. The weights are determined based on the change in generalized class wise statistics. This projects provides a significant improvement in the performance of load forecasting accuracy compared to kNN based models. In the third project, an advanced data driven model is developed. Here, we have proposed a novel hybrid load forecasting model based on novel signal decomposition and correlation analysis. The hybrid model consists of improved empirical mode decomposition, T-Copula based correlation analysis. Finally we have employed deep belief network for making load demand forecasting. The results are compared with previous studies and it is evident that there is a significant improvement in mean absolute percentage error (MAPE) and root mean square error (RMSE)

    Modeling Energy Demand鈥擜 Systematic Literature Review

    Get PDF
    In this article, a systematic literature review of 419 articles on energy demand modeling, published between 2015 and 2020, is presented. This provides researchers with an exhaustive overview of the examined literature and classification of techniques for energy demand modeling. Unlike in existing literature reviews, in this comprehensive study all of the following aspects of energy demand models are analyzed: techniques, prediction accuracy, inputs, energy carrier, sector, temporal horizon, and spatial granularity. Readers benefit from easy access to a broad literature base and find decision support when choosing suitable data-model combinations for their projects. Results have been compiled in comprehensive figures and tables, providing a structured summary of the literature, and containing direct references to the analyzed articles. Drawbacks of techniques are discussed as well as countermeasures. The results show that among the articles, machine learning (ML) techniques are used the most, are mainly applied to short-term electricity forecasting on a regional level and rely on historic load as their main data source. Engineering-based models are less dependent on historic load data and cover appliance consumption on long temporal horizons. Metaheuristic and uncertainty techniques are often used in hybrid models. Statistical techniques are frequently used for energy demand modeling as well and often serve as benchmarks for other techniques. Among the articles, the accuracy measured by mean average percentage error (MAPE) proved to be on similar levels for all techniques. This review eases the reader into the subject matter by presenting the emphases that have been made in the current literature, suggesting future research directions, and providing the basis for quantitative testing of hypotheses regarding applicability and dominance of specific methods for sub-categories of demand modeling.BMBF, 03SFK4T0, Verbundvorhaben ENavi: Energiewende-Navigationssystem zur Erfassung, Analyse und Simulation der systemischen Vernetzungen" - Teilvorhaben T0BMWi, 03ET4040C, Verbundvorhaben: Harmonisierung und Entwicklung von Verfahren zur regional und zeitlich aufgel枚sten Modellierung von Energienachfragen (DemandRegio) Teilvorhaben: ProfileDFG, 414044773, Open Access Publizieren 2021 - 2022 / Technische Universit盲t Berli

    Evolutionary multivariate time series prediction

    Get PDF
    Multivariate time series (MTS) prediction plays a significant role in many practical data mining applications, such as finance, energy supply, and medical care domains. Over the years, various prediction models have been developed to obtain robust and accurate prediction. However, this is not an easy task by considering a variety of key challenges. First, not all channels (each channel represents one time series) are informative (channel selection). Considering the complexity of each selected time series, it is difficult to predefine a time window used for inputs. Second, since the selected time series may come from cross domains collected with different devices, they may require different feature extraction techniques by considering suitable parameters to extract meaningful features (feature extraction), which influences the selection and configuration of the predictor, i.e., prediction (configuration). The challenge arising from channel selection, feature extraction, and prediction (configuration) is to perform them jointly to improve prediction performance. Third, we resort to ensemble learning to solve the MTS prediction problem composed of the previously mentioned operations,  where the challenge is to obtain a set of models satisfied both accurate and diversity. Each of these challenges leads to an NP-hard combinatorial optimization problem, which is impossible to be solved using the traditional methods since it is non-differentiable. Evolutionary algorithm (EA), as an efficient metaheuristic stochastic search technique, which is highly competent to solve complex combinatorial optimization problems having mixed types of decision variables, may provide an effective way to address the challenges arising from MTS prediction. The main contributions are supported by the following investigations. First, we propose a discrete evolutionary model, which mainly focuses on seeking the influential subset of channels of MTS and the optimal time windows for each of the selected channels for the MTS prediction task. A comprehensively experimental study on a real-world electricity consumption data with auxiliary environmental factors demonstrates the efficiency and effectiveness of the proposed method in searching for the informative time series and respective time windows and parameters in a predictor in comparison to the result obtained through enumeration. Subsequently, we define the basic MTS prediction pipeline containing channel selection, feature extraction, and prediction (configuration). To perform these key operations, we propose an evolutionary model construction (EMC) framework to seek the optimal subset of channels of MTS, suitable feature extraction methods and respective time windows applied to the selected channels, and parameter settings in the predictor simultaneously for the best prediction performance. To implement EMC, a two-step EA is proposed, where the first step EA mainly focuses on channel selection while in the second step, a specially designed EA works on feature extraction and prediction (configuration). A real-world electricity data with exogenous environmental information is used and the whole dataset is split into another two datasets according to holiday and nonholiday events. The performance of EMC is demonstrated on all three datasets in comparison to hybrid models and some existing methods. Then, based on the prediction pipeline defined previously, we propose an evolutionary multi-objective ensemble learning model (EMOEL) by employing multi-objective evolutionary algorithm (MOEA) subjected to two conflicting objectives, i.e., accuracy and model diversity. MOEA leads to a pareto front (PF) composed of non-dominated optimal solutions, where each of them represents the optimal subset of the selected channels, the selected feature extraction methods and the selected time windows, and the selected parameters in the predictor. To boost ultimate prediction accuracy, the models with respect to these optimal solutions are linearly combined with combination coefficients being optimized via a single-objective task-oriented EA. The superiority of EMOEL is identified on electricity consumption data with climate information in comparison to several state-of-the-art models. We also propose a multi-resolution selective ensemble learning model, where multiple resolutions are constructed from the minimal granularity using statistics. At the current time stamp, the preceding time series data is sampled at different time intervals (i.e., resolutions) to constitute the time windows. For each resolution, multiple base learners with different parameters are first trained. Feature selection technique is applied to search for the optimal set of trained base learners and least square regression is used to combine them. The performance of the proposed ensemble model is verified on the electricity consumption data for the next-step and next-day prediction. Finally, based on EMOEL and multi-resolution, instead of only combining the models generated from each PF, we propose an evolutionary ensemble learning (EEL) framework, where multiple PFs are aggregated to produce a composite PF (CPF) after removing the same solutions in PFs and being sorted into different levels of non-dominated fronts (NDFs). Feature selection techniques are applied to exploit the optimal subset of models in level-accumulated NDF and least square is used to combine the selected models. The performance of EEL that chooses three different predictors as base learners is evaluated by the comprehensive analysis of the parameter sensitivity. The superiority of EEL is demonstrated in comparison to the best result from single-objective EA and the best individual from the PF, and several state-of-the-art models across electricity consumption and air quality datasets, both of which use the environmental factors from other domains as the auxiliary factors. In summary, this thesis provides studies on how to build efficient and effective models for MTS prediction. The built frameworks investigate the influential factors, consider the pipeline composed of channel selection, feature extraction, and prediction (configuration) simultaneously, and keep good generalization and accuracy across different applications. The proposed algorithms to implement the frameworks use techniques from evolutionary computation (single-objective EA and MOEA), machine learning and data mining areas. We believe that this research provides a significant step towards constructing robust and accurate models for solving MTS prediction problems. In addition, with the case study on electricity consumption prediction, it will contribute to helping decision-makers in determining the trend of future energy consumption for scheduling and planning of the operations of the energy supply system

    Contributions to industrial process condition forecasting applied to copper rod manufacturing process

    Get PDF
    Ensuring reliability and robustness of operation is one of the main concerns in industrial anufacturing processes , dueto the ever-increasing demand for improvements over the cost and quality ofthe processes outcome. In this regard , a deviation from the nominal operating behaviours implies a divergence from the optimal condition specification, anda misalignment from the nominal product quality, causing a critica! loss of potential earnings . lndeed, since a decade ago, the industrial sector has been carried out a significant effortAsegurar la fiabilidad y la robustez es uno de los principales objetivos en la monitorizaci贸n de los procesos industriales, ya que estos cada vez se encuentran sometidos a demandas de producci贸n m谩s elevadas a la vez que se deben bajar costes de fabricaci贸n manteniendo la calidad del producto final. En este sentido, una desviaci贸n de la operaci贸n del proceso implica una divergencia de los par谩metros 贸ptimos preestablecidos, lo que conlleva a una desviaci贸n respecto la calidad nominal del producto final, causando as铆 un rechazo de dicho producto y una perdida en costes para la empresa. De hecho, tanto es as铆, que desde hace m谩s de una d茅cada el sector industrial ha dedicado un esfuerzo considerable a la implantaci贸n de metodolog铆as de monitorizaci贸n inteligente. Dichos m茅todos son capaces extraer informaci贸n respecto a la condici贸n de las diferentes maquinarias y procesos involucrados en el proceso de fabricaci贸n. No obstante, esta informaci贸n extra铆da corresponde al estado actual del proceso. Por lo que obtener informaci贸n respecto a la condici贸n futura de dicho proceso representa una mejora significativa para poder ganar tiempo de respuesta para la detecci贸n y correcci贸n de desviaciones en la operaci贸n de dicho proceso. Por lo tanto, la combinaci贸n del conocimiento futuro del comportamiento del proceso con la consecuente evaluaci贸n de la condici贸n del mismo, es un objetivo a cumplir para la definici贸n de las nuevas generaciones de sistemas de monitorizaci贸n de procesos industriales. En este sentido, la presente tesis tiene como objetivo la propuesta de metodolog铆as para evaluar la condici贸n, actual y futura, de procesos industriales. Dicha metodolog铆a debe estimar la condici贸n de forma fiable y con una alta resoluci贸n. Por lo tanto, en esta tesis se pretende extraer la informaci贸n de la condici贸n futura a partir de un modelado, basado en series temporales, de las se帽ales cr铆ticas del proceso, para despu茅s, en base a enfoques no lineales de preservaci贸n de la topolog铆a, fusionar dichas se帽ales proyectadas a futuro para conocer la condici贸n. El rendimiento y la bondad de las metodolog铆as propuestas en la tesis han sido validadas mediante su aplicaci贸n en un proceso industrial real, concretamente, con datos de una planta de fabricaci贸n de alambr贸n de cobre
    corecore