Search CORE

38,403 research outputs found

Rough sets theory and uncertainty into information system

Author: Jirava Pavel
Publication venue: 'Univerzita Pardubice'
Publication date: 01/01/2006
Field of study

This article is focused on rough sets approach to expression of uncertainty into information system. We assume that the data are presented in the decision table and that some attribute values are lost. At first the theoretical background is described and after that, computations on real-life data are presented. In computation we wok with uncertainty coming from missing attribute values

Digital Library of the University of Pardubice

HANDLING MISSING ATTRIBUTE VALUES IN DECISION TABLES USING VALUED TOLERANCE APPROACH

Author: Vasudevan Supriya
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2008
Field of study

Rule induction is one of the key areas in data mining as it is applied to a large number of real life data. However, in such real life data, the information is incompletely specified most of the time. To induce rules from these incomplete data, more powerful algorithms are necessary. This research work mainly focuses on a probabilistic approach based on the valued tolerance relation. This thesis is divided into two parts. The first part describes the implementation of the valued tolerance relation. The induced rules are then evaluated based on the error rate due to incorrectly classified and unclassified examples. The second part of this research work shows a comparison of the rules induced by the MLEM2 algorithm that has been implemented before, with the rules induced by the valued tolerance based approach which was implemented as part of this research. Hence, through this thesis, the error rate for the MLEM2 algorithm and the valued tolerance based approach are compared and the results are documented

KU ScholarWorks

Ontology-Based Quality Evaluation of Value Generalization Hierarchies for Data Anonymization

Author: Ayala-Rivera Vanessa
Cerqueus Thomas
McDonagh Patrick
Murphy Liam
Publication venue
Publication date: 29/01/2015
Field of study

In privacy-preserving data publishing, approaches using Value Generalization Hierarchies (VGHs) form an important class of anonymization algorithms. VGHs play a key role in the utility of published datasets as they dictate how the anonymization of the data occurs. For categorical attributes, it is imperative to preserve the semantics of the original data in order to achieve a higher utility. Despite this, semantics have not being formally considered in the specification of VGHs. Moreover, there are no methods that allow the users to assess the quality of their VGH. In this paper, we propose a measurement scheme, based on ontologies, to quantitatively evaluate the quality of VGHs, in terms of semantic consistency and taxonomic organization, with the aim of producing higher-quality anonymizations. We demonstrate, through a case study, how our evaluation scheme can be used to compare the quality of multiple VGHs and can help to identify faulty VGHs.Comment: 18 pages, 7 figures, presented in the Privacy in Statistical Databases Conference 2014 (Ibiza, Spain

arXiv.org e-Print Archive

Research Repository UCD

Irish Universities

A Comparison of the Quality of Rule Induction from Inconsistent Data Sets and Incomplete Data Sets

Author: Su Xiaomeng
Publication venue: 'Paleontological Institute at The University of Kansas'
Publication date: 01/01/2015
Field of study

In data mining, decision rules induced from known examples are used to classify unseen cases. There are various rule induction algorithms, such as LEM1 (Learning from Examples Module version 1), LEM2 (Learning from Examples Module version 2) and MLEM2 (Modified Learning from Examples Module version 2). In the real world, many data sets are imperfect, either inconsistent or incomplete. The idea of lower and upper approximations, or more generally, the probabilistic approximation, provides an effective way to induce rules from inconsistent data sets and incomplete data sets. But the accuracies of rule sets induced from imperfect data sets are expected to be lower. The objective of this project is to investigate which kind of imperfect data sets (inconsistent or incomplete) is worse in terms of the quality of rule induction. In this project, experiments were conducted on eight inconsistent data sets and eight incomplete data sets with lost values. We implemented the MLEM2 algorithm to induce certain and possible rules from inconsistent data sets, and implemented the local probabilistic version of MLEM2 algorithm to induce certain and possible rules from incomplete data sets. A program called Rule Checker was also developed to classify unseen cases with induced rules and measure the classification error rate. Ten-fold cross validation was carried out and the average error rate was used as the criterion for comparison. The Mann-Whitney nonparametric tests were performed to compare, separately for certain and possible rules, incompleteness with inconsistency. The results show that there is no significant difference between inconsistent and incomplete data sets in terms of the quality of rule induction

KU ScholarWorks

Rough-set-based ADR signaling from spontaneous reporting data with missing values

Author: Huang Feng-Hsiung
Lan Lin
Lin Wen-Yang
Wang Min-Hsien
Publication venue: 'Elsevier BV'
Publication date: 31/12/2015
Field of study

AbstractSpontaneous reporting systems of adverse drug events have been widely established in many countries to collect as could as possible all adverse drug events to facilitate the detection of suspected ADR signals via some statistical or data mining methods. Unfortunately, due to privacy concern or other reasons, the reporters sometimes may omit consciously some attributes, causing many missing values existing in the reporting database. Most of research work on ADR detection or methods applied in practice simply adopted listwise deletion to eliminate all data with missing values. Very little work has noticed the possibility and examined the effect of including the missing data in the process of ADR detection.This paper represents our endeavor towards the exploration of this question. We aim at inspecting the feasibility of applying rough set theory to the ADR detection problem. Based on the concept of utilizing characteristic set based approximation to measure the strength of ADR signals, we propose twelve different rough set based measuring methods and show only six of them are feasible for the purpose. Experimental results conducted on the FARES database show that our rough-set-based approach exhibits similar capability in timeline warning of suspicious ADR signals as traditional method with missing deletion, and sometimes can yield noteworthy measures earlier than the traditional method

Elsevier - Publisher Connector

The Bases of Association Rules of High Confidence

Author: Adaricheva Kira
Cabot-Miller Justin
Nation J. B.
Segal Oren
Sharafudinov Anuar
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 05/08/2018
Field of study

We develop a new approach for distributed computing of the association rules of high confidence in a binary table. It is derived from the D-basis algorithm in K. Adaricheva and J.B. Nation (TCS 2017), which is performed on multiple sub-tables of a table given by removing several rows at a time. The set of rules is then aggregated using the same approach as the D-basis is retrieved from a larger set of implications. This allows to obtain a basis of association rules of high confidence, which can be used for ranking all attributes of the table with respect to a given fixed attribute using the relevance parameter introduced in K. Adaricheva et al. (Proceedings of ICFCA-2015). This paper focuses on the technical implementation of the new algorithm. Some testing results are performed on transaction data and medical data.Comment: Presented at DTMN, Sydney, Australia, July 28, 201

arXiv.org e-Print Archive

Crossref