Search CORE

1,405 research outputs found

Evaluation and optimization of frequent association rule based classification

Author: Izwan Nizal Mohd Shaharanee
Jastini Jamil
Publication venue: 'Penerbit Universiti Kebangsaan Malaysia (UKM Press)'
Publication date: 01/06/2014
Field of study

Deriving useful and interesting rules from a data mining system is an essential and important task. Problems such as the discovery of random and coincidental patterns or patterns with no significant values, and the generation of a large volume of rules from a database commonly occur. Works on sustaining the interestingness of rules generated by data mining algorithms are actively and constantly being examined and developed. In this paper, a systematic way to evaluate the association rules discovered from frequent itemset mining algorithms, combining common data mining and statistical interestingness measures, and outline an appropriated sequence of usage is presented. The experiments are performed using a number of real-world datasets that represent diverse characteristics of data/items, and detailed evaluation of rule sets is provided. Empirical results show that with a proper combination of data mining and statistical analysis, the framework is capable of eliminating a large number of non-significant, redundant and contradictive rules while preserving relatively valuable high accuracy and coverage rules when used in the classification problem. Moreover, the results reveal the important characteristics of mining frequent itemsets, and the impact of confidence measure for the classification task

UKM Journal Article Repository

Explainable subgraphs with surprising densities : a subgroup discovery approach

Author: De Bie Tijl
Deng Junning
Kang Bo
Lijffijt Jefrey
Publication venue
Publication date: 01/01/2019
Field of study

The connectivity structure of graphs is typically related to the attributes of the nodes. In social networks for example, the probability of a friendship between any pair of people depends on a range of attributes, such as their age, residence location, workplace, and hobbies. The high-level structure of a graph can thus possibly be described well by means of patterns of the form `the subgroup of all individuals with a certain properties X are often (or rarely) friends with individuals in another subgroup defined by properties Y', in comparison to what is expected. Such rules present potentially actionable and generalizable insight into the graph. We present a method that finds node subgroup pairs between which the edge density is interestingly high or low, using an information-theoretic definition of interestingness. Additionally, the interestingness is quantified subjectively, to contrast with prior information an analyst may have about the connectivity. This view immediatly enables iterative mining of such patterns. This is the first method aimed at graph connectivity relations between different subgroups. Our method generalizes prior work on dense subgraphs induced by a subgroup description. Although this setting has been studied already, we demonstrate for this special case considerable practical advantages of our subjective interestingness measure with respect to a wide range of (objective) interestingness measures

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

Principal investigator in a box: Version 1.2 documentation

Author: Adolph Jurine
Bhatnagar Rajiv
Colombano Silvano P.
Compton Michael
Frainer Richard
Groleau Nicolas
Holden Kritina
Lai Sen-Hao
Lam Chih-Chao
Manahan Meera
Publication venue
Publication date
Field of study

Principal Investigator (PI) in a box is a computer system designed to help optimize the scientific results of experiments that are performed in space. The system will assist the astronaut experimenters in the collection and analysis of experimental data, recognition and pursuit of 'interesting' results, optimal use of the time allocated to the experiment, and troubleshooting of the experiment apparatus. This document discusses the problems that motivate development of 'PI-in-a-box', and presents a high- level system overview and a detailed description of each of the modules that comprise the current version of the system

NASA Technical Reports Server

Subjectively interesting connecting trees

Author: Adriaens Florian
De Bie Tijl
Lijffijt Jefrey
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Crossref

Ghent University Academic Bibliography

On the Notion of Interestingness in Automated Mathematical Discovery

Author: ALAN BUNDY
BAGAI
BAILEY
BENNETT
CHOU
COLTON
COLTON
DAVIS
EPSTEIN
EPSTEIN
EPSTEIN
FAJTLOWICZ
HAASE
HILDERMAN
KENNEDY
KODRATOFF
KUHN
LANGLEY
LANGLEY
LENAT
LENAT
LENAT
MCCUNE
MILLER
MORALES
NICKLES
PISTORI
RITCHIE
SIMON COLTON
SIMS
SINGH
SWANSON
TOBY WALSH
VALDÉS-PÉREZ
VALDÉS-PÉREZ
WILLIAMS
ZHANG
Publication venue: 'Elsevier BV'
Publication date: 01/01/2000
Field of study

Deciding whether something is interesting or not is of central importance in automated mathematical discovery, as it helps determine both the search space and search strategy for finding and evaluating concepts and conjectures

CiteSeerX

Crossref

Edinburgh Research Explorer

Spiral - Imperial College Digital Repository

Simple and Effective Visual Models for Gene Expression Cancer Diagnostics

Author: Bratko Ivan
Leban Gregor
Mramor Minca
Zupan Blaz
Publication venue
Publication date: 01/01/2005
Field of study

In the paper we show that diagnostic classes in cancer gene expression data sets, which most often include thousands of features (genes), may be effectively separated with simple two-dimensional plots such as scatterplot and radviz graph. The principal innovation proposed in the paper is a method called VizRank, which is able to score and identify the best among possibly millions of candidate projections for visualizations. Compared to recently much applied techniques in the field of cancer genomics that include neural networks, support vector machines and various ensemble-based approaches, VizRank is fast and finds visualization models that can be easily examined and interpreted by domain experts. Our experiments on a number of gene expression data sets show that VizRank was always able to find data visualizations with a small number of (two to seven) genes and excellent class separation. In addition to providing grounds for gene expression cancer diagnosis, VizRank and its visualizations also identify small sets of relevant genes, uncover interesting gene interactions and point to outliers and potential misclassifications in cancer data sets

ePrints.FRI

High Quality, Efficient Hierarchical Document Clustering Using Closed Interesting Itemsets

Author: Kender John R.
Malik Hassan H.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2006
Field of study

High dimensionality remains a significant challenge for document clustering. Recent approaches used frequent itemsets and closed frequent itemsets to reduce dimensionality, and to improve the efficiency of hierarchical document clustering. In this paper, we introduce the notion of "closed interesting" itemsets (i.e. closed itemsets with high interestingness). We provide heuristics such as "super item" to efficiently mine these itemsets and show that they provide significant dimensionality reduction over closed frequent itemsets. Using "closed interesting" itemsets, we propose a new hierarchical document clustering method that outperforms state of the art agglomerative, partitioning and frequent-itemset based methods both in terms of FScore and Entropy, without requiring dataset specific parameter tuning. We evaluate twenty interestingness measures on nine standard datasets and show that when used to generate "closed interesting" itemsets, and to select parent nodes, Mutual Information, Added Value, Yule's Q and Chi-Square offers best clustering performance, regardless of the characteristics of underlying dataset. We also show that our method is more scalable, and results in better run-time performance as compare to leading approaches. On a dual processor machine, our method scaled sub-linearly and was able to cluster 200K documents in about 40 seconds

Crossref

Columbia University Academic Commons

Machine learning stochastic design models.

Author: Lewis W.P.
Matthews P.C.
Samuel A.E.
Publication venue: 'The Design Society'
Publication date: 01/08/2005
Field of study

Due to the fluid nature of the early stages of the design process, it is difficult to obtain deterministic product design evaluations. This is primarily due to the flexibility of the design at this stage, namely that there can be multiple interpretations of a single design concept. However, it is important for designers to understand how these design concepts are likely to fulfil the original specification, thus enabling the designer to select or bias towards solutions with favourable outcomes. One approach is to create a stochastic model of the design domain. This paper tackles the issues of using a product database to induce a Bayesian model that represents the relationships between the design parameters and characteristics. A greedy learning algorithm is presented and illustrated using a simple case study

Durham Research Online