747 research outputs found

    Statistical strategies for pruning all the uninteresting association rules

    Get PDF
    We propose a general framework to describe formally the problem of capturing the intensity of implication for association rules through statistical metrics. In this framework we present properties that influence the interestingness of a rule, analyze the conditions that lead a measure to perform a perfect prune at a time, and define a final proper order to sort the surviving rules. We will discuss why none of the currently employed measures can capture objective interestingness, and just the combination of some of them, in a multi-step fashion, can be reliable. In contrast, we propose a new simple modification of the Pearson coefficient that will meet all the necessary requirements. We statistically infer the convenient cut-off threshold for this new metric by empirically describing its distribution function through simulation. Final experiments serve to show the ability of our proposal.Postprint (published version

    Case Teknos Group Oy Paint Store Transaction Data

    Get PDF
    Companies operating in challenging business environments, characterized by the proliferation of disruptive technologies and intensifying competition, are obliged to re-evaluate their strategic approach. This has become the norm in the retail industry and traditional brick-and-mortar stores. Particularly local market players with scarce resources are looking into alternative solutions to delivering a unique customer experience with the intention to preserve their profitability. Customer experience has been an integral topic within academic research for decades, and has also substantiated its value in pragmatic contexts. Recent developments in this field have triggered the constitution of customer experience management functions, which aim to adopt a holistic approach to the customer experience. This enforces a quantitative perspective highlighting the role of customer transaction data. Association analysis is one of the most well-known methodology used to detect underlying patterns hidden in large transaction data sets. It uses machine learning techniques to firstly identify frequently purchased product combinations and secondly, to discover concealed associations among the products. The association rules derived and evaluated during the process can potentially reveal implicit, yet interesting customer insight, which may translate into actionable implications. The practical consequences in the framework of this study are referred to as sales increasing strategies, namely targeted marketing, cross-selling and space management. This thesis uses Python programming language in Anaconda’s Jupyter Notebook environment to perform association analysis on customer transaction data provided by the case company. The Apriori algorithm is applied to constitute the frequent itemsets and generate association rules between these itemsets. The interestingness and actionability of the rules will be evaluated based on various scoring measures computed for each rule. The outcomes of this study contribute to finding interesting customer insight and actionable recommendations for the case company to support their success in demanding market conditions. Furthermore, this research describes and discusses the relative success factors from the theoretical point of view and demonstrates the process of association rule mining when applied to customer transaction data

    Deriving Association between Student Comprehension and Facial Expressions using Class Association Rule Mining

    Get PDF
    The scope of this study was to discover the association between facial expressions of students in an academic lecture and the level of comprehension shown by their expressions. This study focused on finding the relationship between the specific elements of learner2019;s behavior for the different emotional states and the relevant expression that could be observed from individual students. The experimentation was done through surveying quantitative observations of the lecturers in the classroom in which the behavior of students are recorded and were statistically analyzed. The main aim of this paper is to derive association rules that represent relationships between input conditions and results of domain experiments. Hence the relationship between the physical behaviors that are linked to emotional state with the student2019;s comprehension is being formulated in the form of rules. We present Predictive Apriori algorithm that is able to find all valid class association rules with high accuracy. The rules derived by Predictive Apriori are pruned by objective and subjective measures

    Novel Algorithms for Cross-Ontology Multi-Level Data Mining

    Get PDF
    The wide spread use of ontologies in many scientific areas creates a wealth of ontologyannotated data and necessitates the development of ontology-based data mining algorithms. We have developed generalization and mining algorithms for discovering cross-ontology relationships via ontology-based data mining. We present new interestingness measures to evaluate the discovered cross-ontology relationships. The methods presented in this dissertation employ generalization as an ontology traversal technique for the discovery of interesting and informative relationships at multiple levels of abstraction between concepts from different ontologies. The generalization algorithms combine ontological annotations with the structure and semantics of the ontologies themselves to discover interesting crossontology relationships. The first algorithm uses the depth of ontological concepts as a guide for generalization. The ontology annotations are translated to higher levels of abstraction one level at a time accompanied by incremental association rule mining. The second algorithm conducts a generalization of ontology terms to all their ancestors via transitive ontology relations and then mines cross-ontology multi-level association rules from the generalized transactions. Our interestingness measures use implicit knowledge conveyed by the relation semantics of the ontologies to capture the usefulness of cross-ontology relationships. We describe the use of information theoretic metrics to capture the interestingness of cross-ontology relationships and the specificity of ontology terms with respect to an annotation dataset. Our generalization and data mining agorithms are applied to the Gene Ontology and the postnatal Mouse Anatomy Ontology. The results presented in this work demonstrate that our generalization algorithms and interestingness measures discover more interesting and better quality relationships than approaches that do not use generalization. Our algorithms can be used by researchers and ontology developers to discover inter-ontology connections. Additionally, the cross-ontology relationships discovered using our algorithms can be used by researchers to understand different aspects of entities that interest them

    Experiences with knowledge discovery paradigms

    Get PDF

    Experiences with knowledge discovery paradigms

    Get PDF

    Objective novelty of association rules: measuring the confidence boost

    Get PDF
    On sait bien que la confiance des régles d’association n’est pas vraiment satisfaisant comme mésure d’interêt. Nous proposons, au lieu de la substituer par des autres mésures (soit, en l’employant de façon conjointe a des autres mésures), évaluer la nouveauté de chaque régle par comparaison de sa confiance par rapport á des régles plus fortes qu’on trouve au même ensemble de données. C’est á dire, on considère un seuil “relative” de confiance au lieu du seuil absolute habituel. Cette idée se précise avec la magnitude du “confidence boost”, mésurant l’increment rélative de confiance prés des régles plus fortes. Nous prouvons que nôtre proposte peut remplacer la “confidence width” et le blockage de régles employés a des publications précedentes.Postprint (author’s final draft

    Mining Closed Itemsets for Coherent Rules: An Inference Analysis Approach

    Get PDF
    Past observations have shown that a frequent item set mining algorithm are alleged to mine the closed ones because the finish offers a compact and a whole progress set and higher potency. Anyhow, the most recent closed item set mining algorithms works with candidate maintenance combined with check paradigm that is dear in runtime likewise as area usage when support threshold is a smaller amount or the item sets gets long. Here, we show, PEPP with inference analysis that could be a capable approach used for mining closed sequences for coherent rules while not candidate. It implements a unique sequence closure checking format with inference analysis that based mostly on Sequence Graph protruding by an approach labeled Parallel Edge projection and pruning in brief will refer as PEPP. We describe a novel inference analysis approach to prune patterns that tends to derive coherent rules. A whole observation having sparse and dense real-life information sets proved that PEPP with inference analysis performs larger compared to older algorithms because it takes low memory and is quicker than any algorithms those cited in literature frequently

    Generating High Precision Classification Rules for Screening of Irrelevant Studies in Systematic Review Literature Searches

    Get PDF
    Systematic reviews aim to produce repeatable, unbiased, and comprehensive answers to clinical questions. Systematic reviews are an essential component of modern evidence based medicine, however due to the risks of omitting relevant research they are highly time consuming to create and are largely conducted manually. This thesis presents a novel framework for partial automation of systematic review literature searches. We exploit the ubiquitous multi-stage screening process by training the classifier using annotations made by reviewers in previous screening stages. Our approach has the benefit of integrating seamlessly with the existing screening process, minimising disruption to users. Ideally, classification models for systematic reviews should be easily interpretable by users. We propose a novel, rule based algorithm for use with our framework. A new approach for identifying redundant associations when generating rules is also presented. The proposed approach to redundancy seeks to both exclude redundant specialisations of existing rules (those with additional terms in their antecedent), as well as redundant generalisations (those with fewer terms in their antecedent). We demonstrate the ability of the proposed approach to improve the usability of the generated rules. The proposed rule based algorithm is evaluated by simulated application to several existing systematic reviews. Workload savings of up to 10% are demonstrated. There is an increasing demand for systematic reviews related to a variety of clinical disciplines, such as diagnosis. We examine reviews of diagnosis and contrast them against more traditional systematic reviews of treatment. We demonstrate existing challenges such as target class heterogeneity and high data imbalance are even more pronounced for this class of reviews. The described algorithm accounts for this by seeking to label subsets of non-relevant studies with high precision, avoiding the need to generate a high recall model of the minority class
    • …
    corecore