33 research outputs found

    Learning a fuzzy decision tree from uncertain data

    Full text link
    © 2017 IEEE. Uncertainty in data exists when the value of a data item is not a precise value, but rather by an interval data with a probability distribution function, or a probability distribution of multiple values. Since there are intrinsic differences between uncertain and certain data, it is difficult to deal with uncertain data using traditional classification algorithms. Therefore, in this paper, we propose a fuzzy decision tree algorithm based on a classical ID3 algorithm, it integrates fuzzy set theory and ID3 to overcome the uncertain data classification problem. Besides, we propose a discretization algorithm that enables our proposed Fuzzy-ID3 algorithm to handle the interval data. Experimental results show that our Fuzzy-ID3 algorithm is a practical and robust solution to the problem of uncertain data classification and that it performs better than some of the existing algorithms

    Probabilistic Kernel Support Vector Machines

    Full text link
    We propose a probabilistic enhancement of standard kernel Support Vector Machines for binary classification, in order to address the case when, along with given data sets, a description of uncertainty (e.g., error bounds) may be available on each datum. In the present paper, we specifically consider Gaussian distributions to model uncertainty. Thereby, our data consist of pairs (xi,Σi)(x_i,\Sigma_i), i∈{1,…,N}i\in\{1,\ldots,N\}, along with an indicator yi∈{−1,1}y_i\in\{-1,1\} to declare membership in one of two categories for each pair. These pairs may be viewed to represent the mean and covariance, respectively, of random vectors ξi\xi_i taking values in a suitable linear space (typically Rn\mathbb R^n). Thus, our setting may also be viewed as a modification of Support Vector Machines to classify distributions, albeit, at present, only Gaussian ones. We outline the formalism that allows computing suitable classifiers via a natural modification of the standard "kernel trick." The main contribution of this work is to point out a suitable kernel function for applying Support Vector techniques to the setting of uncertain data for which a detailed uncertainty description is also available (herein, "Gaussian points").Comment: 6 pages, 6 figure

    Associative classifier for uncertain data

    Get PDF
    Associative classifiers are relatively easy for people to understand and often outperform decision tree learners on many classification problems. Existing associative classifiers only work with certain data. However, data uncertainty is prevalent in many real-world applications such as sensor network, market analysis and medical diagnosis. And uncertainty may render many conventional classifiers inapplicable to uncertain classification tasks. In this paper, based on U-Apriori algorothm and CBA algorithm, we propose an associative classifier for uncertain data, uCBA (uncertain Classification Based on Associative), which can classify both certain and uncertain data. The algorithm redefines the support, confidence, rule pruning and classification strategy of CBA. Experimental results on 21 datasets from UCI Repository demonstrate that the proposed algorithm yields good performance and has satisfactory performance even on highly uncertain data

    Naive bayes classification of uncertain data

    Get PDF
    Traditional machine learning algorithms assume that data are exact or precise. However, this assumption may not hold in some situations because of data uncertainty arising from measurement errors, data staleness, and repeated measurements, etc. With uncertainty, the value of each data item is represented by a probability distribution function (pdf). In this paper, we propose a novel naive Bayes classification algorithm for uncertain data with a pdf. Our key solution is to extend the class conditional probability estimation in the Bayes model to handle pdf's. Extensive experiments on UCI datasets show that the accuracy of naive Bayes model can be improved by taking into account the uncertainty information. © 2009 IEEE.published_or_final_versionThe 9th IEEE International Conference on Data Mining (ICDM), Miami, FL., 6-9 December 2009. In Proceedings of the 9th ICDM, 2009, p. 944-94

    A hybrid constructive algorithm incorporating teaching-learning based optimization for neural network training

    Get PDF
    In neural networks, simultaneous determination of the optimum structure and weights is a challenge. This paper proposes a combination of teaching-learning based optimization (TLBO) algorithm and a constructive algorithm (CA) to cope with the challenge. In literature, TLBO is used to choose proper weights, while CA is adopted to construct different structures in order to select the proper one. In this study, the basic TLBO algorithm along with an improved version of this algorithm for network weights selection are utilized. Meanwhile, as a constructive algorithm, a novel modification to multiple operations, using statistical tests (MOST), is applied and tested to choose the proper structure. The proposed combinatorial algorithms are applied to ten classification problems and two-time-series prediction problems, as the benchmark. The results are evaluated based on training and testing error, network complexity and mean-square error. The experimental results illustrate that the proposed hybrid method of the modified MOST constructive algorithm and the improved TLBO (MCO-ITLBO) algorithm outperform the others; moreover, they have been proven by Wilcoxon statistical tests as well. The proposed method demonstrates less average error with less complexity in the network structure

    Exploratory Models in a time of Big Data

    Get PDF
    This paper aims to trigger discourse about the emergence of a new type of social scientific model — Exploratory Models — which draw on Big Data, computer modeling and interdisciplinary research to tackle complex social scientific processes. First, we define Exploratory Models referring to Batty and Morgan and Morrison. We then present changes to the traditional modeling paradigm. We show how Exploratory Models circumvent challenges related to the idiosyncracy, self-reflexivity and acceleration of social phenomena, which limit predictive effectiveness of traditional models. We show that Exploratory Models are better equipped to tackle complex problems due to their capacity to process heterogeneous datasets. Having established that Exploratory Models are predominantly problem- and data-driven, we emphasize that scientific theory is indispensable to their progress. Finally, the development of an integrative platform is suggested as a way of maximizing the benefits of this approach. Discussion concludes by flagging areas for further research

    Novel gear fault diagnosis approach using native Bayes uncertain classification based on PSO algorithm

    Get PDF
    Traditionally, gear faults can be classified with the ignorance of the sample uncertainty. In this paper, a novel approach is proposed for the problem diagnosis of uncertain gear interval faults. First, a statistical property interval feature vector composed of mean, standard deviation, skewness, kurtosis, etc. is proposed. Then, the native Bayes uncertain classification (NBU) is used for the diagnostics of these uncertain gear interval faults. Conventionally, the NBU utilizes all the attributes to distinguish fault types. However, each fault type has its own distinct classification accuracy for different feature vector attributes. Thus, the particle swarm optimization (PSO) is used to select the optimal feature vector attributes for each fault type in the NBU (NBU_PSO_EACH). The experimental results show: (1) the accuracy of the proposed method is better than that of NBU1, NBU2 or FBC; (2) in terms of accuracy, the proposed method is also more advanced than the method which selects the same optimal attributes for all fault types based on the PSO (NBU_PSO); (3) the proposed method can reduce the physical size of feature vectors

    Clustering uncertain data using voronoi diagrams and R-tree index

    Get PDF
    We study the problem of clustering uncertain objects whose locations are described by probability density functions (pdfs). We show that the UK-means algorithm, which generalizes the k-means algorithm to handle uncertain objects, is very inefficient. The inefficiency comes from the fact that UK-means computes expected distances (EDs) between objects and cluster representatives. For arbitrary pdfs, expected distances are computed by numerical integrations, which are costly operations. We propose pruning techniques that are based on Voronoi diagrams to reduce the number of expected distance calculations. These techniques are analytically proven to be more effective than the basic bounding-box-based technique previously known in the literature. We then introduce an R-tree index to organize the uncertain objects so as to reduce pruning overheads. We conduct experiments to evaluate the effectiveness of our novel techniques. We show that our techniques are additive and, when used in combination, significantly outperform previously known methods. © 2006 IEEE.published_or_final_versio

    Sistem Pengambilan Keputusan dalam Penentuan Kelas Jabatan Fungsional Umum (JFU) Pegawai Negeri Sipil (PNS) Menggunakan Metode Multi Rough Set dan Fuzzifikasi

    Get PDF
    Seorang Pegawai Negeri Sipil (PNS) pada instansi pemerintah, dituntut harus memiliki kompetensi atau kemampuan untuk dapat melakukan pekerjaan secara efektif dan efisien sesuai dengan bidang dan lingkup pekerjaannya. Pada kenyataannya, proses penentuan kompetensi dan kelas jabatan sangat berpengaruh bagi proses penempatan Jejabat Fungsional Umum (JFU) seorang Pegawai Negeri Sipil dan karena proses tersebut selama inimasih dilakukan secara manual, maka waktu yang dibutuhkan cukup lama dan hasil yang diperoleh belum tentu akurat sesuai dengan kompetensi yang dimiliki. Pada penelitian ini, Metode Multi Rough Set digunakan dalam penentuan klasifikasi kompetensi dan kelas jabatan bagi PNS yang belum diketahui kompetensinya maupun sebagai bahan evaluasi kinerja pegawai yang telah menduduki suatu jabatan. Metode Multi Rough Set  ini dilakukan dengan cara membagi data set menjadi beberapa data set dengan atribut yang sejenis. Berdasarkan penelitian yang telah dilakukan, dapat diketahui bahwa Metode Multi Rough Set sebagai metode klasifikasi yang baik (Good Classifier) dalam pengambilan keputusan klasifikasi kompetensi pegawai dalam Jabatan Fungsional Umum, karena berdasarkan hasil kurva pada Receiver Operating Characteristic (ROC) mempunyai luas daerah di bawah kurva sebesar 0,866, selain itu rata-rata error dari hasil klasifikasi dengan Metode Multi Rough Set yang digabungkan dengan pengambilan keputusan melalui fuzzifikasi meningkat secara signifikan dibandingkan dengan Metode Single Rough Set yaitu dari 28,75% menjadi 0% untuk hasil yang tidak terklasifikasi.AbstractA Civil Servant in government agencies is required to have the competency or ability to be able to perform work effectively and efficiently in accordance with the field and scope of work. In fact, the process of determining the competency and class of works is very influential for the process of placement of General Functional Works of a Civil Servant. However, the process takes a long time because it is still done manually.  Moreover, the obtained results are not necessarily accurate in accordance with the competence which is owned by the civil servants. In this study, Multi Rough Set Method is used for determining unknown civil servants competency classification and class position, or as civil servants performance evaluation. The multi Rough Set method is applied by dividing the data set into several similar attributes data sets. Based on the research that has been conducted, it can be seen that the Multi Rough Set Method is a good classifier method in decision making of employee competency classification in General Functional Work. It is because based on the Receiver Operating Characteristic (ROC) curve results, the area under the curve reaches 0.866. Besides, the average error from the results of the classification using the combination of Multi Rough Set Method and fuzzification increased significantly compared to the Single Rough Set Method which goes from 28.75% to 0% for unclassified results
    corecore