Search CORE

404 research outputs found

Post-processing of association rules.

Author: Baesens Bart
Vanthienen Jan
Viaene Stijn
Publication venue
Publication date
Field of study

In this paper, we situate and motivate the need for a post-processing phase to the association rule mining algorithm when plugged into the knowledge discovery in databases process. Major research effort has already been devoted to optimising the initially proposed mining algorithms. When it comes to effectively extrapolating the most interesting knowledge nuggets from the standard output of these algorithms, one is faced with an extreme challenge, since it is not uncommon to be confronted with a vast amount of association rules after running the algorithms. The sheer multitude of generated rules often clouds the perception of the interpreters. Rightful assessment of the usefulness of the generated output introduces the need to effectively deal with different forms of data redundancy and data being plainly uninteresting. In order to do so, we will give a tentative overview of some of the main post-processing tasks, taking into account the efforts that have already been reported in the literature.

Research Papers in Economics

Knowledge-based Systems and Interestingness Measures: Analysis with Clinical Datasets

Author: Jabez J. Christopher
Kannan Arputharaj
Khanna H. Nehemiah
Publication venue: 'Faculty of Electrical Engineering and Computing, Univ. of Zagreb'
Publication date: 01/01/2016
Field of study

Knowledge mined from clinical data can be used for medical diagnosis and prognosis. By improving the quality of knowledge base, the efficiency of prediction of a knowledge-based system can be enhanced. Designing accurate and precise clinical decision support systems, which use the mined knowledge, is still a broad area of research. This work analyses the variation in classification accuracy for such knowledge-based systems using different rule lists. The purpose of this work is not to improve the prediction accuracy of a decision support system, but analyze the factors that influence the efficiency and design of the knowledge base in a rule-based decision support system. Three benchmark medical datasets are used. Rules are extracted using a supervised machine learning algorithm (PART). Each rule in the ruleset is validated using nine frequently used rule interestingness measures. After calculating the measure values, the rule lists are used for performance evaluation. Experimental results show variation in classification accuracy for different rule lists. Confidence and Laplace measures yield relatively superior accuracy: 81.188% for heart disease dataset and 78.255% for diabetes dataset. The accuracy of the knowledge-based prediction system is predominantly dependent on the organization of the ruleset. Rule length needs to be considered when deciding the rule ordering. Subset of a rule, or combination of rule elements, may form new rules and sometimes be a member of the rule list. Redundant rules should be eliminated. Prior knowledge about the domain will enable knowledge engineers to design a better knowledge base

Directory of Open Access Journals

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

I-prune: Item selection for associative classification

Author: Baralis
Coenen
Coenen
Guyon
Hall
Li
Quinlan
Rak
Tan
Wang
Wang
Zaïane
Publication venue: John Wiley & Sons, Inc.
Publication date: 01/01/2012
Field of study

Associative classification is characterized by accurate models and high model generation time. Most time is spent in extracting and postprocessing a large set of irrelevant rules, which are eventually pruned.We propose I-prune, an item-pruning approach that selects uninteresting items by means of an interestingness measure and prunes them as soon as they are detected. Thus, the number of extracted rules is reduced and model generation time decreases correspondingly. A wide set of experiments on real and synthetic data sets has been performed to evaluate I-prune and select the appropriate interestingness measure. The experimental results show that I-prune allows a significant reduction in model generation time, while increasing (or at worst preserving) model accuracy. Experimental evaluation also points to the chi-square measure as the most effective interestingness measure for item pruning

Crossref

Archivio istituzionale della ricerca - Politecnico di Milano

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Various Sequence Classification Mechanisms for Knowledge Discovery

Author: Goutami R. Mane, Suhas B. Bhagate
Publication venue: Auricle Global Society of Education and Research
Publication date: 30/11/2017
Field of study

Sequence classification is an efficient task in data mining. The knowledge obtained from training stage can be used for sequence classification that assigns class labels to the new sequences. Relevant patterns can be found by using sequential pattern mining in which the values are represented in sequential manner. Classification process has explicit features but these features are not found in sequences. Feature selection techniques are sophisticated, but the potential features dimensionality may be very high. It is hard to find the sequential nature of feature. Sequence classification is a more challenging task than feature vector classification. Sequence classification problem can be solved by rules that consist of interesting patterns. These patterns are found in datasets that have labeled sequences along with class labels. The cohesion and support of the pattern are used to define interestingness of a pattern. In a given class of sequences, interestingness of a pattern can be measured by combining these two factors. Confident classification rules can be generated by using the discovered patterns. Two different approaches to build a classifier are used. The first classifier consists of an advanced form of classification method that depends on association rule. In the second classifier, the value belonging to the new data object is first measured then the rules are ranked

International Journal on Future Revolution in Computer Science & Communication Engineering

A survey of temporal knowledge discovery paradigms and methods

Author: Roddick John Francis
Spiliopoulou Myra
Publication venue: Institute of Electrical and Electronics Engineers Computer Society (IEEE Publishing)
Publication date: 01/01/2002
Field of study

With the increase in the size of data sets, data mining has recently become an important research topic and is receiving substantial interest from both academia and industry. At the same time, interest in temporal databases has been increasing and a growing number of both prototype and implemented systems are using an enhanced temporal understanding to explain aspects of behavior associated with the implicit time-varying nature of the universe. This paper investigates the confluence of these two areas, surveys the work to date, and explores the issues involved and the outstanding problems in temporal data mining

Flinders Academic Commons

Doctor of Philosophy

Author: Welch Susan Rea
Publication venue: University of Utah
Publication date: 01/05/2011
Field of study

dissertationWith the growing national dissemination of the electronic health record (EHR), there are expectations that the public will benefit from biomedical research and discovery enabled by electronic health data. Clinical data are needed for many diseases and conditions to meet the demands of rapidly advancing genomic and proteomic research. Many biomedical research advancements require rapid access to clinical data as well as broad population coverage. A fundamental issue in the secondary use of clinical data for scientific research is the identification of study cohorts of individuals with a disease or medical condition of interest. The problem addressed in this work is the need for generalized, efficient methods to identify cohorts in the EHR for use in biomedical research. To approach this problem, an associative classification framework was designed with the goal of accurate and rapid identification of cases for biomedical research: (1) a set of exemplars for a given medical condition are presented to the framework, (2) a predictive rule set comprised of EHR attributes is generated by the framework, and (3) the rule set is applied to the EHR to identify additional patients that may have the specified condition. iv Based on this functionality, the approach was termed the ‘cohort amplification' framework. The development and evaluation of the cohort amplification framework are the subject of this dissertation. An overview of the framework design is presented. Improvements to some standard associative classification methods are described and validated. A qualitative evaluation of predictive rules to identify diabetes cases and a study of the accuracy of identification of asthma cases in the EHR using frameworkgenerated prediction rules are reported. The framework demonstrated accurate and reliable rules to identify diabetes and asthma cases in the EHR and contributed to methods for identification of biomedical research cohorts

The University of Utah: J. Willard Marriott Digital Library

A Survey on Data Mining Algorithm for Market Basket Analysis

Author: Dr. M. Dhanabhakyam
Publication venue: Global Journals Inc. (US)
Publication date: 26/05/2011
Field of study

Association rule mining identifies the remarkable association or relationship between a large set of data items. With huge quantity of data constantly being obtained and stored in databases, several industries are becoming concerned in mining association rules from their databases. For example, the detection of interesting association relationships between large quantities of business transaction data can assist in catalog design, cross-marketing, lossleader analysis, and various business decision making processes. A typical example of association rule mining is market basket analysis. This method examines customer buying patterns by identifying associations among various items that customers place in their shopping baskets. The identification of such associations can assist retailers expand marketing strategies by gaining insight into which items are frequently purchased jointly by customers. It is helpful to examine the customer purchasing behavior and assists in increasing the sales and conserve inventory by focusing on the point of sale transaction data. This work acts as a broad area for the researchers to develop a better data mining algorithm. This paper presents a survey about the existing data mining algorithm for market basket analysis

Global Journal of Computer Science and Technology (GJCST)

Robust and cost-effective approach for discovering action rules

Author: Kalanat N
Saraee MH
Shamsinejad P
Publication venue: 'IACSIT Press'
Publication date: 01/10/2011
Field of study

The main goal of Knowledge Discovery in Databases is to find interesting and usable patterns, meaningful in their domain. Actionable Knowledge Discovery came to existence as a direct respond to the need of finding more usable patterns called actionable patterns. Traditional data mining and algorithms are often confined to deliver frequent patterns and come short for suggesting how to make these patterns actionable. In this scenario the users are expected to act. However, the users are not advised about what to do with delivered patterns in order to make them usable. In this paper, we present an automated approach to focus on not only creating rules but also making the discovered rules actionable. Up to now few works have been reported in this field which lacking incomprehensibility to the user, overlooking the cost and not providing rule generality. Here we attempt to present a method to resolving these issues. In this paper CEARDM method is proposed to discover cost-effective action rules from data. These rules offer some cost-effective changes to transferring low profitable instances to higher profitable ones. We also propose an idea for improving in CEARDM method

University of Salford Institutional Repository