Search CORE

856 research outputs found

Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns

Author: Amer-Yahia Sihem
Kirchgessner Martin
Leroy Vincent
Mishra Shashwat
Publication venue
Publication date: 15/03/2016
Field of study

Understanding customer buying patterns is of great interest to the retail industry and has shown to benefit a wide variety of goals ranging from managing stocks to implementing loyalty programs. Association rule mining is a common technique for extracting correlations such as "people in the South of France buy ros\'e wine" or "customers who buy pat\'e also buy salted butter and sour bread." Unfortunately, sifting through a high number of buying patterns is not useful in practice, because of the predominance of popular products in the top rules. As a result, a number of "interestingness" measures (over 30) have been proposed to rank rules. However, there is no agreement on which measures are more appropriate for retail data. Moreover, since pattern mining algorithms output thousands of association rules for each product, the ability for an analyst to rely on ranking measures to identify the most interesting ones is crucial. In this paper, we develop CAPA (Comparative Analysis of PAtterns), a framework that provides analysts with the ability to compare the outcome of interestingness measures applied to buying patterns in the retail industry. We report on how we used CAPA to compare 34 measures applied to over 1,800 stores of Intermarch\'e, one of the largest food retailers in France

arXiv.org e-Print Archive

Crossref

Hal - Université Grenoble Alpes

New probabilistic interest measures for association rules

Author: Hahsler Michael
Hornik Kurt
Publication venue
Publication date: 07/02/2008
Field of study

Mining association rules is an important technique for discovering meaningful patterns in transaction databases. Many different measures of interestingness have been proposed for association rules. However, these measures fail to take the probabilistic properties of the mined data into account. In this paper, we start with presenting a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a real-world database from a grocery outlet to explore the behavior of confidence and lift, two popular interest measures used for rule mining. The results show that confidence is systematically influenced by the frequency of the items in the left hand side of rules and that lift performs poorly to filter random noise in transaction data. Based on the probabilistic framework we develop two new interest measures, hyper-lift and hyper-confidence, which can be used to filter or order mined association rules. The new measures show significantly better performance than lift for applications where spurious rules are problematic

arXiv.org e-Print Archive

CiteSeerX

Guided Interaction Exploration in Artifact-centric Process Models

Author: Sidorova Natalia
van der Aalst Wil M. P.
van Eck Maikel L.
Publication venue
Publication date: 01/01/2017
Field of study

Artifact-centric process models aim to describe complex processes as a collection of interacting artifacts. Recent development in process mining allow for the discovery of such models. However, the focus is often on the representation of the individual artifacts rather than their interactions. Based on event data we can automatically discover composite state machines representing artifact-centric processes. Moreover, we provide ways of visualizing and quantifying interactions among different artifacts. For example, we are able to highlight strongly correlated behaviours in different artifacts. The approach has been fully implemented as a ProM plug-in; the CSM Miner provides an interactive artifact-centric process discovery tool focussing on interactions. The approach has been evaluated using real life data sets, including the personal loan and overdraft process of a Dutch financial institution.Comment: 10 pages, 4 figures, to be published in proceedings of the 19th IEEE Conference on Business Informatics, CBI 201

arXiv.org e-Print Archive

Crossref

Repository TU/e

Pure OAI Repository

Knowledge-based Systems and Interestingness Measures: Analysis with Clinical Datasets

Author: Jabez J. Christopher
Kannan Arputharaj
Khanna H. Nehemiah
Publication venue: 'Faculty of Electrical Engineering and Computing, Univ. of Zagreb'
Publication date: 01/01/2016
Field of study

Knowledge mined from clinical data can be used for medical diagnosis and prognosis. By improving the quality of knowledge base, the efficiency of prediction of a knowledge-based system can be enhanced. Designing accurate and precise clinical decision support systems, which use the mined knowledge, is still a broad area of research. This work analyses the variation in classification accuracy for such knowledge-based systems using different rule lists. The purpose of this work is not to improve the prediction accuracy of a decision support system, but analyze the factors that influence the efficiency and design of the knowledge base in a rule-based decision support system. Three benchmark medical datasets are used. Rules are extracted using a supervised machine learning algorithm (PART). Each rule in the ruleset is validated using nine frequently used rule interestingness measures. After calculating the measure values, the rule lists are used for performance evaluation. Experimental results show variation in classification accuracy for different rule lists. Confidence and Laplace measures yield relatively superior accuracy: 81.188% for heart disease dataset and 78.255% for diabetes dataset. The accuracy of the knowledge-based prediction system is predominantly dependent on the organization of the ruleset. Rule length needs to be considered when deciding the rule ordering. Subset of a rule, or combination of rule elements, may form new rules and sometimes be a member of the rule list. Redundant rules should be eliminated. Prior knowledge about the domain will enable knowledge engineers to design a better knowledge base

Crossref

Directory of Open Access Journals

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Mining Indirect Association Rules (Evolutionary Advancement in Fundamental Theories of Computer Science)

Author: Hamano Shinichi
Mukouchi Yasuhito
Sato Masako
Publication venue: 京都大学数理解析研究所
Publication date: 01/05/2004
Field of study

Kyoto University Research Information Repository

Visual grouping of association rules by clustering conditional probabilities for categorical data

Author: Ghosh Ranadhir
Ivkovic Sasha
Yearwood John
Publication venue: 'IGI Global'
Publication date: 01/01/2005
Field of study

We demonstrate the use of a visual data-mining tool for non-technical domain experts within organizations to facilitate the extraction of meaningful information and knowledge from in-house databases. The tool is mainly based on the basic notion of grouping association rules. Association rules are useful in discovering items that are frequently found together. However in many applications, rules with lower frequencies are often interesting for the user. Grouping of association rules is one way to overcome the rare item problem. However some groups of association rules are too large for ease of understanding. In this chapter we propose a method for clustering categorical data based on the conditional probabilities of association rules for data sets with large numbers of attributes. We argue that the proposed method provides non-technical users with a better understanding of discovered patterns in the data set

Deakin Research Online

Federation ResearchOnline

A survey of temporal knowledge discovery paradigms and methods

Author: Roddick John Francis
Spiliopoulou Myra
Publication venue: Institute of Electrical and Electronics Engineers Computer Society (IEEE Publishing)
Publication date: 01/01/2002
Field of study

With the increase in the size of data sets, data mining has recently become an important research topic and is receiving substantial interest from both academia and industry. At the same time, interest in temporal databases has been increasing and a growing number of both prototype and implemented systems are using an enhanced temporal understanding to explain aspects of behavior associated with the implicit time-varying nature of the universe. This paper investigates the confluence of these two areas, surveys the work to date, and explores the issues involved and the outstanding problems in temporal data mining

Flinders Academic Commons