49 research outputs found

    Meta learning of bounds on the Bayes classifier error

    Full text link
    Meta learning uses information from base learners (e.g. classifiers or estimators) as well as information about the learning problem to improve upon the performance of a single base learner. For example, the Bayes error rate of a given feature space, if known, can be used to aid in choosing a classifier, as well as in feature selection and model selection for the base classifiers and the meta classifier. Recent work in the field of f-divergence functional estimation has led to the development of simple and rapidly converging estimators that can be used to estimate various bounds on the Bayes error. We estimate multiple bounds on the Bayes error using an estimator that applies meta learning to slowly converging plug-in estimators to obtain the parametric convergence rate. We compare the estimated bounds empirically on simulated data and then estimate the tighter bounds on features extracted from an image patch analysis of sunspot continuum and magnetogram images.Comment: 6 pages, 3 figures, to appear in proceedings of 2015 IEEE Signal Processing and SP Education Worksho

    A Decision tree-based attribute weighting filter for naive Bayes

    Get PDF
    The naive Bayes classifier continues to be a popular learning algorithm for data mining applications due to its simplicity and linear run-time. Many enhancements to the basic algorithm have been proposed to help mitigate its primary weakness--the assumption that attributes are independent given the class. All of them improve the performance of naĆÆve Bayes at the expense (to a greater or lesser degree) of execution time and/or simplicity of the final model. In this paper we present a simple filter method for setting attribute weights for use with naive Bayes. Experimental results show that naive Bayes with attribute weights rarely degrades the quality of the model compared to standard naive Bayes and, in many cases, improves it dramatically. The main advantages of this method compared to other approaches for improving naive Bayes is its run-time complexity and the fact that it maintains the simplicity of the final model

    analisis metoda nearest neighbour with generalised exemplar pada noise domain

    Get PDF
    ABSTRAKSI: Nearest Neighbour merupakan metode klasifikasi dalam data mining yang memiliki performansi yang baik khususnya pada data set yang bersih, tetapi metode NN kurang bekerja dengan baik pada data yang mengandung noise. Untuk menangani data yang mengandung noise metode NN dapat diperbaiki kinerjanya dengan menjadi metode k-NN, dimana metode k-NN menggunakan nilai k didalamnya yang berguna untuk proses voting. Metode NN dapat diperluas menjadi metode Nearest Neigbour With Generalised Exemplar (NNGE) yaitu metode NN yang memperkenalkan konsep hyperractangle pada algoritmanya. Metode NNGE dapat bekerja dengan baik pada data set yang bersih dari noise terutama pada data yang berukuran besar, akan tetapi jika pada data yang mengandung noise didalamnya metode NNGE kurang dapat bekerja dengan baik karena pada metode NNGE tidak mengijinkan adanya konflik pada saat pembentukan rectangle. Tugas Akhir ini akan melakukan penelitian bagaimana jika metode k-NN dimasukan ke dalam metode NNGE yang selanjutnya disebut metode k-NNGE. Hasil dari penelitian ini menunjukan bahwa perubahan dari metode NNGE ke metode k-NNGE mengalami peningkatan akurasi pada noise domain, akan tetapi jika dibandingkan dengan peningkatan akurasi dari metode NN ke metode k-NN pada noise domain, maka peningkatan akurasi metode NNGE ke metode k-NNGE tidak sebaik peningkatan akurasi dari metode NN ke metode k-NN Kata Kunci : noise domain, NNGE, k-NNGEABSTRACT: Nearest Neighbour is a classification method in data maining that have been known to have an acceptable performance in a clean datasets, but doesnā€™t works very well when implemented in noisy domain. Neverheless NN can be upgraded to have a better performance in noisy domains, this is known as k-NN method, this is done with the introduction of the k value that helps in the voting process. NN can be extended to a method called Nearest Neighbour With Generalised Exemplar (NNGE) that is basicly a NN with the incorporation of hyperrectangle concept in the algorithm. Just like NN, NNGE works well with a clean dataset, but performs poorly in a set with noisy domain, this is due that NNGE doesnā€™t compromise conflict in the rectangle forming process. This paper deals with the research of the incorporation of the k-NN into NNGE, later known as the k-NNGE. The outcome of the research shows that the k-NNGE indeed have better accuracy in noisy domain than NNGE, but then compared with the increase in accuracy from NN to k-NN in noisy domain, the increase in accuracy on NNGE to k-NNGE is not as good as the increase in accuracy on NN to k-NN.Keyword: noise domain, NNGE, k-NNG

    Quality Prediction in Interlinked Manufacturing Processes based on Supervised & Unsupervised Machine Learning

    Get PDF
    AbstractIn the context of a rolling mill case study, this paper presents a methodical framework based on data mining for predicting the physical quality of intermediate products in interlinked manufacturing processes. In the first part, implemented data preprocessing and feature extraction components of the Inline Quality Prediction System are introduced. The second part shows how the combination of supervised and unsupervised data mining methods can be applied to identify most striking operational patterns, promising quality-related features and production parameters. The results indicate how sustainable and energy-efficient interlinked manufacturing processes can be achieved by the application of data mining

    Product-Driven Data Mining

    Get PDF
    Manifold Data Mining has developed innovative demographic and household spending pattern databases for six-digit postal codes in Canada. Their collection of information consists of both demographic and expenditure variables which are expressed through thousands of individually tracked factors. This large collection of information about consumer behaviour is typically referred to as a mine. Although very large in practice, for the purposes of this report, the data mine consisted of mm individuals and nn factors where māˆ¼2000m \sim 2000 and nāˆ¼50n \sim 50 . Ideally, the first algorithm would identify a few factors in the data mine which would differentiate customers in terms of a particular product preference. Then the second algorithm would build on this information by looking for patterns in the data mine which would identify related areas of consumer spending. To test the algorithms two case studies were undertaken. The first study involved differentiating BMW and Honda car owners. The algorithms developed were reasonably successful at both finding questions that differentiate these two populations and identifying common characteristics amongst the groups of respondents. For the second case study it was hoped that the same algorithms could differentiate between consumers of two brands of beer. In this case the first algorithm was not as successful as differentiating between all groups; it showed some distinctions between beer drinkers and non-beer drinkers, but not as clearly defined as in the first case study. The second algorithm was then used successfully to further identify spending patterns once this distinction was made. In this second case study a deeper factor analysis could be used to identify a combination of factors which could be used in the first algorithm

    Rule-based classification approach for railway wagon health monitoring

    Get PDF
    Modern machine learning techniques have encouraged interest in the development of vehicle health monitoring systems that ensure secure and reliable operations of rail vehicles. In an earlier study, an energy-efficient data acquisition method was investigated to develop a monitoring system for railway applications using modern machine learning techniques, more specific classification algorithms. A suitable classifier was proposed for railway monitoring based on relative weighted performance metrics. To improve the performance of the existing approach, a rule-based learning method using statistical analysis has been proposed in this paper to select a unique classifier for the same application. This selected algorithm works more efficiently and improves the overall performance of the railway monitoring systems. This study has been conducted using six classifiers, namely REPTree, J48, Decision Stump, IBK, PART and OneR, with twenty-five datasets. The Waikato Environment for Knowledge Analysis (WEKA) learning tool has been used in this study to develop the prediction models

    Classification of Infrasound Events with Various Machine Learning Techniques

    Get PDF
    ABSTRACT This paper presents classification results for infrasonic events using practically all well-known machine learning algorithms together with wavelet transforms for preprocessing. We show that there are great differences between different groups of classification algorithms and that nearest neighbor classifiers are superior to all others for accurate classification of infrasonic events

    An Overview of the Algorithm Selection Problem

    Get PDF
    Users of machine learning algorithms need methods that can help them to identify algorithm or their combinations (workflows) that achieve the potentially best performance. Selecting the best algorithm to solve a given problem has been the subject of many studies over the past four decades. This survey presents an overview of the contributions made in the area of algorithm selection problems. We present different methods for solving the algorithm selection problem identifying some of the future research challenges in this domain

    Predicting vertical acceleration of railway wagons using regression algorithms

    Get PDF
    The performance of rail vehicles running on railway tracks is governed by the dynamic behaviors of railway bogies, particularly in cases of lateral instability and track irregularities. To ensure reliable, safe, and secure operation of railway systems, it is desirable to adopt intelligent monitoring systems for railway wagons. In this paper, a forecasting model is developed to investigate the vertical-acceleration behavior of railway wagons that are attached to a moving locomotive using modern machine-learning techniques. Both front- and rear-body vertical-acceleration conditions are predicted using popular regression algorithms. Different types of models can be built using a uniform platform to evaluate their performance. The estimation techniques' performance has been measured using a set of attributes' correlation coefficient (CC), root mean square error (RMSE), mean absolute error (MAE), root relative squared error (RRSE), relative absolute error (RAE), and computational complexity for each of the algorithms. Statistical hypothesis analysis is applied to determine the most suitable regression algorithm for this application. Finally, spectral analysis of the front- and rear-body vertical condition is produced from the predicted data using the fast Fourier transform (FFT) and is used to generate precautionary signals and system status that can be used by a locomotive driver for necessary actions
    corecore