Search CORE

25,104 research outputs found

Association Rule Based Classification

Author: Palanisamy Senthil Kumar
Publication venue: Digital WPI
Publication date: 03/05/2006
Field of study

In this thesis, we focused on the construction of classification models based on association rules. Although association rules have been predominantly used for data exploration and description, the interest in using them for prediction has rapidly increased in the data mining community. In order to mine only rules that can be used for classification, we modified the well known association rule mining algorithm Apriori to handle user-defined input constraints. We considered constraints that require the presence/absence of particular items, or that limit the number of items, in the antecedents and/or the consequents of the rules. We developed a characterization of those itemsets that will potentially form rules that satisfy the given constraints. This characterization allows us to prune during itemset construction itemsets such that neither they nor any of their supersets will form valid rules. This improves the time performance of itemset construction. Using this characterization, we implemented a classification system based on association rules and compared the performance of several model construction methods, including CBA, and several model deployment modes to make predictions. Although the data mining community has dealt only with the classification of single-valued attributes, there are several domains in which the classification target is set-valued. Hence, we enhanced our classification system with a novel approach to handle the prediction of set-valued class attributes. Since the traditional classification accuracy measure is inappropriate in this context, we developed an evaluation method for set-valued classification based on the E-Measure. Furthermore, we enhanced our algorithm by not relying on the typical support/confidence framework, and instead mining for the best possible rules above a user-defined minimum confidence and within a desired range for the number of rules. This avoids long mining times that might produce large collections of rules with low predictive power. For this purpose, we developed a heuristic function to determine an initial minimum support and then adjusted it using a binary search strategy until a number of rules within the given range was obtained. We implemented all of our techniques described above in WEKA, an open source suite of machine learning algorithms. We used several datasets from the UCI Machine Learning Repository to test and evaluate our techniques

DigitalCommons@WPI

Association Analysis Techniques for Discovering Functional Modules from Microarray Data

Author: Gaurav Pandey
Gowtham Atluri
Michael Steinbach
Vipin Kumar
Publication venue
Publication date: 13/08/2008
Field of study

An application of great interest in microarray data analysis is the identification of a group of genes that show very similar patterns of expression in a data set, and are expected to represent groups of genes that perform common/similar functions, also known as functional modules. Although clustering offers a natural solution to this problem, it suffers from the limitation that it uses all the conditions to compare two genes, whereas only a subset of them may be relevant. Association analysis offers an alternative route for finding such groups of genes that may be co-expressed only over a subset of the experimental conditions used to prepare the data set. The techniques in this field attempt to find groups of data objects that contain coherent values across a set of attributes, in an exhaustive and efficient manner. In this paper, we illustrate how a generalization of the techniques in association analysis for real-valued data can be utilized to extract coherent functional modules from large microarray data sets

Crossref

Nature Precedings

Datamining for Web-Enabled Electronic Business Applications

Author: Nayak Richi
Publication venue: Idea Group
Publication date: 01/01/2003
Field of study

Web-Enabled Electronic Business is generating massive amount of data on customer purchases, browsing patterns, usage times and preferences at an increasing rate. Data mining techniques can be applied to all the data being collected for obtaining useful information. This chapter attempts to present issues associated with data mining for web-enabled electronic-business

Queensland University of Technology ePrints Archive

Data mining: a tool for detecting cyclical disturbances in supply networks.

Author: Chan F. T. S.
Chatfield C.
Davis T.
Devijver P. A.
Fayyad U. M.
Forrester J. W.
Han J.
Harding J. A.
Jolliffe I. T.
Kaufman L.
Klösgen W.
Koopmans L. H.
Mason-Jones R.
Monostori L.
Pyle D.
Witten I. H.
Publication venue: 'SAGE Publications'
Publication date: 21/12/2007
Field of study

Disturbances in supply chains may be either exogenous or endogenous. The ability automatically to detect, diagnose, and distinguish between the causes of disturbances is of prime importance to decision makers in order to avoid uncertainty. The spectral principal component analysis (SPCA) technique has been utilized to distinguish between real and rogue disturbances in a steel supply network. The data set used was collected from four different business units in the network and consists of 43 variables; each is described by 72 data points. The present paper will utilize the same data set to test an alternative approach to SPCA in detecting the disturbances. The new approach employs statistical data pre-processing, clustering, and classification learning techniques to analyse the supply network data. In particular, the incremental k-means clustering and the RULES-6 classification rule-learning algorithms, developed by the present authors’ team, have been applied to identify important patterns in the data set. Results show that the proposed approach has the capability automatically to detect and characterize network-wide cyclical disturbances and generate hypotheses about their root cause

Crossref

Middlesex University Research Repository