Search CORE

166 research outputs found

Modeling interestingness of streaming association rules as a benefit-maximizing classification problem

Author: Aydin T.
Guvenir H. A.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

Cataloged from PDF version of article.In a typical application of association rule learning from market basket data, a set of transactions for a fixed period of time is used as input to rule learning algorithms. For example, the well-known Apriori algorithm can be applied to learn a set of association rules from such a transaction set. However, learning association rules from a set of transactions is not a one time only process. For example, a market manager may perform the association rule learning process once every month over the set of transactions collected through the last month. For this reason, we will consider the problem where transaction sets are input to the system as a stream of packages. The sets of transactions may come in varying sizes and in varying periods. Once a set of transactions arrive, the association rule learning algorithm is executed on the last set of transactions, resulting in new association rules. Therefore, the set of association rules learned will accumulate and increase in number over time, making the mining of interesting ones out of this enlarging set of association rules impractical for human experts. We refer to this sequence of rules as "association rule set stream" or "streaming association rules" and the main motivation behind this research is to develop a technique to overcome the interesting rule selection problem. A successful association rule mining system should select and present only the interesting rules to the domain experts. However, definition of interestingness of association rules on a given domain usually differs from one expert to another and also over time for a given expert. This paper proposes a post-processing method to learn a subjective model for the interestingness concept description of the streaming association rules. The uniqueness of the proposed method is its ability to formulate the interestingness issue of association rules as a benefit-maximizing classification problem and obtain a different interestingness model for each user. In this new classification scheme, the determining features are the selective objective interestingness factors related to the interestingness of the association rules, and the target feature is the interestingness label of those rules. The proposed method works incrementally and employs user interactivity at a certain level. It is evaluated on a real market dataset. The results show that the model can successfully select the interesting ones. (C) 2008 Elsevier B.V. All rights reserved

Bilkent University Institutional Repository

Regression on feature projections

Author: Guvenir H. A.
Uysal I.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2000
Field of study

Cataloged from PDF version of article.This paper describes a machine learning method, called Regression on Feature Projections (RFP), for predicting a real-valued target feature, given the values of multiple predictive features. In RFP training is based on simply storing the projections of the training instances on each feature separately. Prediction of the target value for a query point is obtained through two averaging procedures executed sequentially. The ®rst averaging process is to ®nd the individual predictions of features by using the K-Nearest Neighbor (KNN) algorithm. The second averaging process combines the predictions of all features. During the ®rst averaging step, each feature is associated with a weight in order to determine the prediction ability of the feature at the local query point. The weights, found for each local query point, are used in the second prediction step and enforce the method to have an adaptive or context-sensitive nature. We have compared RFP with KNN and the rule based-regression algorithms. Results on real data sets show that RFP achieves better or comparable accuracy and is faster than both KNN and Rule-based regression algorithms. (C)2000 Elsevier Science B.V. All rights reserved

Bilkent University Institutional Repository

A discretization method based on maximizing the area under receiver operating characteristic curve

Author: Guvenir H. A.
Kurtcephe M.
Publication venue: 'World Scientific Pub Co Pte Lt'
Publication date: 01/01/2013
Field of study

Cataloged from PDF version of article.Many machine learning algorithms require the features to be categorical. Hence, they require all numeric-valued data to be discretized into intervals. In this paper, we present a new discretization method based on the receiver operating characteristics (ROC) Curve (AUC) measure. Maximum area under ROC curve-based discretization (MAD) is a global, static and supervised discretization method. MAD uses the sorted order of the continuous values of a feature and discretizes the feature in such a way that the AUC based on that feature is to be maximized. The proposed method is compared with alternative discretization methods such as ChiMerge, Entropy-Minimum Description Length Principle (MDLP), Fixed Frequency Discretization (FFD), and Proportional Discretization (PD). FFD and PD have been recently proposed and are designed for Naive Bayes learning. ChiMerge is a merging discretization method as the MAD method. Evaluations are performed in terms of M-Measure, an AUC-based metric for multi-class classification, and accuracy values obtained from Naive Bayes and Aggregating One-Dependence Estimators (AODE) algorithms by using real-world datasets. Empirical results show that MAD is a strong candidate to be a good alternative to other discretization methods

Bilkent University Institutional Repository

Ranking instances by maximizing the area under ROC curve

Author: Guvenir H. A.
Kurtcephe M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Cataloged from PDF version of article.In recent years, the problem of learning a real-valued function that induces a ranking over an instance space has gained importance in machine learning literature. Here, we propose a supervised algorithm that learns a ranking function, called ranking instances by maximizing the area under the ROC curve (RIMARC). Since the area under the ROC curve (AUC) is a widely accepted performance measure for evaluating the quality of ranking, the algorithm aims to maximize the AUC value directly. For a single categorical feature, we show the necessary and sufficient condition that any ranking function must satisfy to achieve the maximum AUC. We also sketch a method to discretize a continuous feature in a way to reach the maximum AUC as well. RIMARC uses a heuristic to extend this maximization to all features of a data set. The ranking function learned by the RIMARC algorithm is in a humanreadable form; therefore, it provides valuable information to domain experts for decision making. Performance of RIMARC is evaluated on many real-life data sets by using different state-of-the-art algorithms. Evaluations of the AUC metric show that RIMARC achieves significantly better performance compared to other similar methods

Bilkent University Institutional Repository

Multicriteria inventory classification using a genetic algorithm

Author: Erel E.
Guvenir H. A.
Publication venue: 'Elsevier BV'
Publication date: 01/01/1998
Field of study

Cataloged from PDF version of article.One of the application areas of genetic algorithms is parameter optimization. This paper addresses the problem of optimizing a set of parameters that represent the weights of criteria, where the sum of all weights is 1. A chromosome represents the values of the weights, possibly along with some cut-off points. A new crossover operation, called continuous uniform crossover, is proposed, such that it produces valid chromosomes given that the parent chromosomes are valid. The new crossover technique is applied to the problem of multicriteria inventory classification. The results are compared with the classical inventory classification technique using the Analytical Hierarchy Process. @ 1998 Elsevier Science B.V

Bilkent University Institutional Repository

An expert system for the differential diagnosis of erythemato-squamous diseases

Author: Emeksiz N.
Guvenir H. A.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2000
Field of study

Cataloged from PDF version of article.This paper presents an expert system for differential diagnosis of erythemato-squamous diseases incorporating decisions made by three classification algorithms: nearest neighbor classifier, naive Bayesian classifier and voting feature intervals-5. This tool enables doctors to differentiate six types of erythemato-squamous diseases using clinical and histopathological parameters obtained from a patient. The program also gives explanations for the classifications of each classifier. The patient records are also maintained in a database for further references. (C) 2000 Elsevier Science Ltd. All rights reserved

Bilkent University Institutional Repository

OpenMETU (Middle East Technical University)

Learning differential diagnosis of erythemato-squamous diseases using voting feature intervals

Author: Demiroz G.
Guvenir H. A.
Ilter N.
Publication venue: 'Elsevier BV'
Publication date: 01/01/1998
Field of study

Cataloged from PDF version of article.A new classification algorithm, called VFI5 (for Voting Feature Intervals), is developed and applied to problem of differential diagnosis of erythemato-squamous diseases. The domain contains records of patients with known diagnosis. Given a training set of such records, the VFI5 classifier learns how to differentiate a new case in the domain. VFI5 represents a concept in the form of feature intervals on each feature dimension separately. classification in the VFI5 algorithm is based on a real-valued voting. Each feature equally participates in the voting process and the class that receives the maximum amount of votes is declared to be the predicted class. The performance of the VFI5 classifier is evaluated empirically in terms of classification accuracy and running time. (C) 1998 Elsevier Science B.V. All rights reserved

Bilkent University Institutional Repository

Classification by feature partitioning

Author: Guvenir H.A.
Publication venue
Publication date: 01/01/1996
Field of study

This paper presents a new form of exemplar-based learning, based on a representation scheme called jfaliirf parluinning, and a panitular implementation of this technique called CFF (for Classification by feature Partioning). Learning in CFP is accomplished by storing the objects separately in each (tenure dimension as disjoint sets of values called segments A segment is; expanded through generalization or specialized by dividing in into sub-segments. Cklassification is based on a weighted voting among the individual productions of the features, which are simply the class values of the segments corresponding to the values of a test instance fur each feature An empirical evaluation of CFP and its comparison with two other classification techniques, lhai consider each feature separately are given. © 1996 Kluwer Academic Publishers,

Bilkent University Institutional Repository

Diagnosis of gastric carcinoma by classification on feature projections

Author: Emeksiz N.
Guvenir H. A.
Ikizler N.
Ormeci N.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

Cataloged from PDF version of article.A new classification algorithm, called benefit maximizing classifier on feature projections (BCFP), is developed and applied to the problem of diagnosis of gastric carcinoma. The domain contains records of patients with known diagnosis through gastroscopy results. Given a training set of such records, the BCFP classifier learns how to differentiate a new case in the domain. BCFP represents a concept in the form of feature projections on each feature dimension separately. Classification in the BCFP algorithm is based on a voting among the individual predictions made on each feature. In the gastric carcinoma domain, a lesion can be an indicator of one of nine different Levels of gastric carcinoma, from early to late stages. The benefit of correct classification of early levels is much more than that of late cases. Also, the costs of wrong classifications are not symmetric. In the training phase, the BCFP algorithm learns classification rules that maximize the benefit of classification. In the querying phase, using these rules, the BCFP algorithm tries to make a prediction maximizing the benefit. A genetic algorithm is applied to select the relevant features. The performance of the BCFP algorithm is evaluated in terms of accuracy and running time. The rules induced are verified by experts of the domain. (C) 2004 Elsevier B.V. All rights reserved

Bilkent University Institutional Repository