Search CORE

4,297 research outputs found

Exploiting the bin-class histograms for feature selection on discrete data

Author: A Ferreira
B Franay
G Brown
G Strang
I Witten
L Kurgan
N Srebro
R Duda
S Garcia
T Cover
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

In machine learning and pattern recognition tasks, the use of feature discretization techniques may have several advantages. The discretized features may hold enough information for the learning task at hand, while ignoring minor fluctuations that are irrelevant or harmful for that task. The discretized features have more compact representations that may yield both better accuracy and lower training time, as compared to the use of the original features. However, in many cases, mainly with medium and high-dimensional data, the large number of features usually implies that there is some redundancy among them. Thus, we may further apply feature selection (FS) techniques on the discrete data, keeping the most relevant features, while discarding the irrelevant and redundant ones. In this paper, we propose relevance and redundancy criteria for supervised feature selection techniques on discrete data. These criteria are applied to the bin-class histograms of the discrete features. The experimental results, on public benchmark data, show that the proposed criteria can achieve better accuracy than widely used relevance and redundancy criteria, such as mutual information and the Fisher ratio

Repositório Científico do Instituto Politécnico de Lisboa

Crossref

Hopfield Networks in Relevance and Redundancy Feature Selection Applied to Classification of Biomedical High-Resolution Micro-CT Images

Author: Auffarth B.
Cerquides J.
Lopez M.
Publication venue: Springer Heidelberg
Publication date: 17/07/2008
Field of study

We study filter–based feature selection methods for classification of biomedical images. For feature selection, we use two filters — a relevance filter which measures usefulness of individual features for target prediction, and a redundancy filter, which measures similarity between features. As selection method that combines relevance and redundancy we try out a Hopfield network. We experimentally compare selection methods, running unitary redundancy and relevance filters, against a greedy algorithm with redundancy thresholds [9], the min-redundancy max-relevance integration [8,23,36], and our Hopfield network selection. We conclude that on the whole, Hopfield selection was one of the most successful methods, outperforming min-redundancy max-relevance when\ud more features are selected

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive

Feature Selection Using Different Mutual Information Estimation Methods

Author: Kule Ahmet Kenan
Publication venue: 'Nara Institute of Science and Technology'
Publication date: 30/11/2010
Field of study

Tez (Yüksek Lisans) -- İstanbul Teknik Üniversitesi, Fen Bilimleri Enstitüsü, 2010Thesis (M.Sc.) -- İstanbul Technical University, Institute of Science and Technology, 2010Bu çalışmada, farklı karşılıklı bilgi kestirim yöntemlerinin öznitelik seçimi üzerindeki etkisi incelenmiş, minimum-bolluk-maksimum-ilgi (mRMR) ve karşılıklı bilgi filtresi öznitelik seçim yöntemleri, bölümlemeden daha gelişmiş kestirim yöntemleri olan çekirdek yoğunluk kestirimi (KDE) bazlı ve k en yakın komşu (KNN) bazlı yöntemler kullanılarak iyileştirilmeye çalışılmıştır. Ayrıca bu karşılıklı bilgi kestirim yöntemlerinin yapay ve gerçek veriler üzerindeki başarımı ölçülmüş ve yöntemlerin başarımı altküme seçimi ve birleştirme yolları ile arttırılmaya çalışılmıştır. Altküme seçimi ve birleştirme yöntemlerinin başarımı arttırmadığı, k en yakın komşu bazlı kestirim yönteminin karşılıklı bilgi filtresi için kullanıldığında bölümlemeden daha yüksek başarım sağladığı, fakat mRMR’ın bundan yararlanamadığı görülmüştür.In this study, effect of different mutual information estimation methods on feature selection is examined, minimum-redundancy-maximum-relevance and mutual information filter feature selection methods are tried to be improved by using more advanced mutual information estimation methods than binning like k-nearest-neighbour (KNN) based and kernel density estimation (KDE) based methods. Besides, performances of these mutual information estimation methods on artificial and real data are measured and this performance is tried to be improved by subset selection and combination. It is concluded that subset selection and combination does not improve performance, KNN based estimation method improves performance when used in mutual information filter but mRMR does not benefit from this.Yüksek LisansM.Sc

Ulusal Üniversitelerarası Açık Erişim Sistemi - İstanbul Teknik Üniversitesi

A Max-relevance-min-divergence Criterion for Data Discretization with Applications on Naive Bayes

Author: Bai Ruibin
Jiang Xudong
Ren Jianfeng
Wang Shihe
Yao Yuan
Publication venue
Publication date: 04/04/2023
Field of study

In many classification models, data is discretized to better estimate its distribution. Existing discretization methods often target at maximizing the discriminant power of discretized data, while overlooking the fact that the primary target of data discretization in classification is to improve the generalization performance. As a result, the data tend to be over-split into many small bins since the data without discretization retain the maximal discriminant information. Thus, we propose a Max-Dependency-Min-Divergence (MDmD) criterion that maximizes both the discriminant information and generalization ability of the discretized data. More specifically, the Max-Dependency criterion maximizes the statistical dependency between the discretized data and the classification variable while the Min-Divergence criterion explicitly minimizes the JS-divergence between the training data and the validation data for a given discretization scheme. The proposed MDmD criterion is technically appealing, but it is difficult to reliably estimate the high-order joint distributions of attributes and the classification variable. We hence further propose a more practical solution, Max-Relevance-Min-Divergence (MRmD) discretization scheme, where each attribute is discretized separately, by simultaneously maximizing the discriminant information and the generalization ability of the discretized data. The proposed MRmD is compared with the state-of-the-art discretization algorithms under the naive Bayes classification framework on 45 machine-learning benchmark datasets. It significantly outperforms all the compared methods on most of the datasets.Comment: Under major revision of Pattern Recognitio

arXiv.org e-Print Archive