Search CORE

7,959 research outputs found

A Comparison of the Quality of Rule Induction from Inconsistent Data Sets and Incomplete Data Sets

Author: Su Xiaomeng
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2015
Field of study

In data mining, decision rules induced from known examples are used to classify unseen cases. There are various rule induction algorithms, such as LEM1 (Learning from Examples Module version 1), LEM2 (Learning from Examples Module version 2) and MLEM2 (Modified Learning from Examples Module version 2). In the real world, many data sets are imperfect, either inconsistent or incomplete. The idea of lower and upper approximations, or more generally, the probabilistic approximation, provides an effective way to induce rules from inconsistent data sets and incomplete data sets. But the accuracies of rule sets induced from imperfect data sets are expected to be lower. The objective of this project is to investigate which kind of imperfect data sets (inconsistent or incomplete) is worse in terms of the quality of rule induction. In this project, experiments were conducted on eight inconsistent data sets and eight incomplete data sets with lost values. We implemented the MLEM2 algorithm to induce certain and possible rules from inconsistent data sets, and implemented the local probabilistic version of MLEM2 algorithm to induce certain and possible rules from incomplete data sets. A program called Rule Checker was also developed to classify unseen cases with induced rules and measure the classification error rate. Ten-fold cross validation was carried out and the average error rate was used as the criterion for comparison. The Mann-Whitney nonparametric tests were performed to compare, separately for certain and possible rules, incompleteness with inconsistency. The results show that there is no significant difference between inconsistent and incomplete data sets in terms of the quality of rule induction

KU ScholarWorks

Distribution of Mutual Information from Complete and Incomplete Data

Author: Hutter Marcus
Zaffalon Marco
Publication venue
Publication date: 01/01/2004
Field of study

Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sample-to-population inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(1/n^3), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would benefit from moving to the inductive side. Some of these prospective applications are discussed, and one of them, namely feature selection, is shown to perform significantly better when inductive mutual information is used.Comment: 26 pages, LaTeX, 5 figures, 4 table

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

ISIPTA'07: Proceedings of the Fifth International Symposium on Imprecise Probability: Theories and Applications

Author: De Cooman Gert
Vejnarová Jirina
Zaffalon Marco
Publication venue: SIPTA - International Society for Imprecise Probability: Theories and Applications
Publication date: 01/01/2007
Field of study

Ghent University Academic Bibliography

Archivsystem Ask23

Recommended from our members

Heuristics based on greedy randomized adaptive search and variable neighbourhood search for the minimum labelling spanning tree problem

Author: Consoli S
Darby-Dowman K
Mladenović N
Moreno-Pérez JA
Publication venue: Brunel University
Publication date: 01/01/2007
Field of study

This paper studies heuristics for the minimum labelling spanning tree (MLST) problem. The purpose is to find a spanning tree using edges that are as similar as possible. Given an undirected labelled connected graph, the minimum labelling spanning tree problem seeks a spanning tree whose edges have the smallest number of distinct labels. This problem has been shown to be NP-complete. A Greedy Randomized Adaptive Search Procedure (GRASP) and different versions of Variable Neighbourhood Search (VNS) are proposed. They are compared with other algorithms recommended in the literature: the Modified Genetic Algorithm and the Pilot Method. Nonparametric statistical tests show that the heuristics based on GRASP and VNS outperform the other algorithms tested. Furthermore, a comparison with the results provided by an exact approach shows that we may quickly obtain optimal or near-optimal solutions with the proposed heuristics

Brunel University Research Archive

Dominance-based Rough Set Approach, basic ideas and main trends

Author: Błaszczyński Jerzy
Greco Salvatore
Matarazzo Benedetto
Szeląg Marcin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/10/2022
Field of study

Dominance-based Rough Approach (DRSA) has been proposed as a machine learning and knowledge discovery methodology to handle Multiple Criteria Decision Aiding (MCDA). Due to its capacity of asking the decision maker (DM) for simple preference information and supplying easily understandable and explainable recommendations, DRSA gained much interest during the years and it is now one of the most appreciated MCDA approaches. In fact, it has been applied also beyond MCDA domain, as a general knowledge discovery and data mining methodology for the analysis of monotonic (and also non-monotonic) data. In this contribution, we recall the basic principles and the main concepts of DRSA, with a general overview of its developments and software. We present also a historical reconstruction of the genesis of the methodology, with a specific focus on the contribution of Roman S{\l}owi\'nski.Comment: This research was partially supported by TAILOR, a project funded by European Union (EU) Horizon 2020 research and innovation programme under GA No 952215. This submission is a preprint of a book chapter accepted by Springer, with very few minor differences of just technical natur

arXiv.org e-Print Archive

Fuzzy Rough Signatures

Author: Koczy Laszlo T.
Mendis B Sumudu
Publication venue: Institute of Electrical and Electronics Engineers (IEEE Inc)
Publication date: 09/12/2015
Field of study

The Australian National University

Uncertainty Management of Intelligent Feature Selection in Wireless Sensor Networks

Author: Mal-sarkar Sanchita
Publication venue: EngagedScholarship@CSU
Publication date: 01/01/2009
Field of study

Wireless sensor networks (WSN) are envisioned to revolutionize the paradigm of monitoring complex real-world systems at a very high resolution. However, the deployment of a large number of unattended sensor nodes in hostile environments, frequent changes of environment dynamics, and severe resource constraints pose uncertainties and limit the potential use of WSN in complex real-world applications. Although uncertainty management in Artificial Intelligence (AI) is well developed and well investigated, its implications in wireless sensor environments are inadequately addressed. This dissertation addresses uncertainty management issues of spatio-temporal patterns generated from sensor data. It provides a framework for characterizing spatio-temporal pattern in WSN. Using rough set theory and temporal reasoning a novel formalism has been developed to characterize and quantify the uncertainties in predicting spatio-temporal patterns from sensor data. This research also uncovers the trade-off among the uncertainty measures, which can be used to develop a multi-objective optimization model for real-time decision making in sensor data aggregation and samplin

OhioLINK Electronic Thesis and Dissertation Center

Cleveland-Marshall College of Law

Recommended from our members

ABC for Climate: Dealing with Expensive Simulators

Author: Edwards Neil
Hensman James
Holden Philip
Wilkinson Richard
Publication venue: Chapman and Hall
Publication date: 09/02/2009
Field of study

A single molecule or molecule complex detection method is disclosed in certain aspects, comprising nano- or micro-fluidic channels.U

Open Research Online (The Open University)

Texas A&M Repository