Search CORE

13 research outputs found

Подход к построению ансамбля классификаторов с использованием генетического алгоритма

Author: Новоселова Н.А.
Том И.Э.
Publication venue: Інститут проблем штучного інтелекту МОН України та НАН України
Publication date: 01/01/2009
Field of study

В статье рассматривается новый эволюционный подход к построению ансамбля классификаторов. Предложенный подход разработан на основе генетического алгоритма с модифицированной схемой реализации. В процессе оптимизации происходит определение параметров как отдельных классификаторов, так и всего ансамбля. С использованием подхода выполнено построение ансамбля классификаторов на нескольких наборах данных из архива данных по машинному обучению и на одном реальном наборе медицинских данных. Сравнительное тестирование показало преимущества использования предложенного подхода при работе с многомерными данными, характеризующимися большим количеством признаков.У статті розглядається новий еволюційний підхід до побудови ансамблю класифікаторів. Запропонований підхід розроблений на основі генетичного алгоритму з модифікованою схемою реалізації. У процесі оптимізації відбувається визначення параметрів як окремих класифікаторів, так і всього ансамблю. З використанням підходу виконана побудова ансамблю класифікаторів на декількох наборах даних з архіву даних по машинному навчанню й на одному реальному наборі медичних даних. Порівняльне тестування показало переваги використання запропонованого підходу при роботі з багатовимірними даними, що характеризуються більшою кількістю ознак.The paper proposes a new evolutionary approach to classifier ensemble design. The proposed approach is developed on the basis of genetic algorithm with modified realization scheme as applied to the optimization of feature set decomposition into the subsets, which define the individual ensemble’s classifiers and provide the high classification accuracy. During optimization both individual classifiers’ parameters and the ensemble parameters are defined. With the approach a few ensembles were designed for several datasets from machine learning database and for one real medical dataset. The comparative testing shows the advantages of the proposed approach for multivariate data analysis with great number of features

Наукова електронна бібліотека періодичних видань НАН України (Vernadsky National Library of Ukraine)

Evolutionary Design of the Classifier Ensemble

Author: Ablameyko S.
Novoselova N.
Tom I.
Publication venue: Інститут проблем штучного інтелекту МОН України та НАН України
Publication date: 01/01/2011
Field of study

This paper presents two novel approaches to evolutionary design of the classifier ensemble. The first one presents the task of one-objective optimization of feature set partitioning together with feature weighting for the construction of the inividual classifiers. The second approach deals with multi-objective optimization of classifier ensemble design. The proposed approaches have been tested on two data sets from the machine learning repository and one real data set on transient ischemic attack. The experiments show the advantages of the feature weighting in terms of classification accuracy when dealing with multivariate data sets and the possibility in one run of multi-objective genetic algorithm to get the non-dominated ensembles of different sizes and thereby skip the tedious process of iterative search for the best ensemble of fixed size.У статті запропоновано два нові підходи до еволюційної побудови ансамблю класифікаторів. Перший підхід є завданням одинкритерійної оптимізації розбиття безлічі ознак на окремі підмножини, які використовуються для побудови класифікаторів ансамблю. Другий підхід здійснює багатокритеріальну оптимізацію структури ансамблю класифікаторів.В статье предложены два новых подхода к эволюционному построению ансамбля классификаторов. Первый подход представляет собой задачу однокритериальной оптимизации разбиения множества признаков на отдельные подмножества, которые используются для построения классификаторов ансамбля. Второй подход осуществляет многокритериальную оптимизацию структуры ансамбля классификаторов

Наукова електронна бібліотека періодичних видань НАН України (Vernadsky National Library of Ukraine)

Classification in high-dimensional feature spaces: Random subsample ensemble

Author: Gursel Serpen
Santhosh Pathical
Publication venue
Publication date: 03/04/2020
Field of study

Abstract-This paper presents application of machine learning ensembles, that randomly project the original high dimensional feature space onto multiple lower dimensional feature subspaces, to classification problems with highdimensional feature spaces. The motivation is to address challenges associated with algorithm scalability, data sparsity and information loss due to the so-called curse of dimensionality. The original high dimensional feature space is randomly projected onto a number of lower-dimensional feature subspaces. Each of these subspaces constitutes the domain of a classification subtask, and is associated with a base learner within an ensemble machine-learner context. Such an ensemble conceptualization is called as random subsample ensemble. Simulation results performed on data sets with up to 20,000 features indicate that the random subsample ensemble classifier performs comparably to other benchmark machine learners based on performance measures of prediction accuracy and cpu time. This finding establishes the feasibility of the ensemble and positions it to tackle classification problems with even much higher dimensional feature spaces

CiteSeerX

Fusing diverse monitoring algorithms for robust change detection

Author
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date
Field of study

Crossref

A New Ensemble Method with Feature Space Partitioning for High-Dimensional Data Classification

Author: Buhyun Hwang
Cheng Hao Jin
Ho Sun Shon
Ji-Moon Chung
Keun Ho Ryu
Minghao Piao
Yongjun Piao
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Crossref

Weight-Selected Attribute Bagging for Credit Scoring

Author: Haizhou Wei
Jianwu Li
Wangli Hao
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2013
Field of study

Assessment of credit risk is of great importance in financial risk management. In this paper, we propose an improved attribute bagging method, weight-selected attribute bagging (WSAB), to evaluate credit risk. Weights of attributes are first computed using attribute evaluation method such as linear support vector machine (LSVM) and principal component analysis (PCA). Subsets of attributes are then constructed according to weights of attributes. For each of attribute subsets, the larger the weights of the attributes the larger the probabilities by which they are selected into the attribute subset. Next, training samples and test samples are projected onto each attribute subset, respectively. A scoring model is then constructed based on each set of newly produced training samples. Finally, all scoring models are used to vote for test instances. An individual model that only uses selected attributes will be more accurate because of elimination of some of redundant and uninformative attributes. Besides, the way of selecting attributes by probability can also guarantee the diversity of scoring models. Experimental results based on two credit benchmark databases show that the proposed method, WSAB, is outstanding in both prediction accuracy and stability, as compared to analogous methods

Crossref

Directory of Open Access Journals

Tackling Distribution Shift - Detection and Mitigation

Author: Rafiee Sevyeri Laya
Publication venue
Publication date: 04/12/2022
Field of study

One of the biggest challenges of employing supervised deep learning approaches is their inability to perform as well beyond standardized datasets in real-world applications. Therefore, abrupt changes in the form of an outlier or overall changes in data distribution after model deployment result in a performance drop. Owing to these changes that induce distributional shifts, we propose two methodologies; the first is the detection of these shifts, and the second is adapting the model to overcome the low predictive performance due to these shifts. The former usually refers to anomaly detection, the process of finding patterns in the data that do not resemble the expected behavior. Understanding the behavior of data by capturing their distribution might help us to find those rare and uncommon samples without the need for annotated data. In this thesis, we exploit the ability of generative adversarial networks (GANs) in capturing the latent representation to design a model that differentiates the expected behavior from deviated samples. Furthermore, we integrate self-supervision into generative adversarial networks to improve the predictive performance of our proposed anomaly detection model. In addition, to shift detection, we propose an ensemble approach to adapt a model under varied distributional shifts using domain adaptation. In summary, this thesis focuses on detecting shifts under the umbrella of anomaly detection as well as mitigating the effect of several distributional shifts by adapting deep learning models using a Bayesian and information theory approach

Concordia University Research Repository

Ensemble-based Supervised Learning for Predicting Diabetes Onset

Author: Nnamoko NA
Publication venue
Publication date
Field of study

The research presented in this thesis aims to address the issue of undiagnosed diabetes cases. The current state of knowledge is that one in seventy people in the United Kingdom are living with undiagnosed diabetes, and only one in a hundred people could identify the main signs of diabetes. Some of the tools available for predicting diabetes are either too simplistic and/or rely on superficial data for inference. On the positive side, the National Health Service (NHS) are improving data recording in this domain by offering health check to adults aged 40 - 70. Data from such programme could be utilised to mitigate the issue of superficial data; but also help to develop a predictive tool that facilitates a change from the current reactive care, onto one that is proactive. This thesis presents a tool based on a machine learning ensemble for predicting diabetes onset. Ensembles often perform better than a single classifier, and accuracy and diversity have been highlighted as the two vital requirements for constructing good ensemble classifiers. Experiments in this thesis explore the relationship between diversity from heterogeneous ensemble classifiers and the accuracy of predictions through feature subset selection in order to predict diabetes onset. Data from a national health check programme (similar to NHS health check) was used. The aim is to predict diabetes onset better than other similar studies within the literature. For the experiments, predictions from five base classifiers (Sequential Minimal Optimisation (SMO), Radial Basis Function (RBF), Naïve Bayes (NB), Repeated Incremental Pruning to Produce Error Reduction (RIPPER) and C4.5 decision tree), performing the same task, are exploited in all possible combinations to construct 26 ensemble models. The training data feature space was searched to select the best feature subset for each classifier. Selected subsets are used to train the classifiers and their predictions are combined using k-Nearest Neighbours algorithm as meta-classifier. Results are analysed using four performance metrics (accuracy, sensitivity, specificity and AUC) to determine (i) if ensembles always perform better than single classifier; and (ii) the impact of diversity (from heterogeneous classifiers) and accuracy (through feature subset selection) on ensemble performance. At base classification level, RBF produced better results than the other four classifiers with 78%accuracy, 82% sensitivity, 73% specificity and 85% AUC. A comparative study shows that RBF model is more accurate than 9 ensembles, more sensitive than 13 ensembles, more specific than 9 ensembles; and produced better AUC than 25 ensembles. This means that ensembles do not always perform better than its constituent classifiers. Of those ensembles that performed better than RBF, the combination of C4.5, RIPPER and NB produced the highest results with 83% accuracy, 87% sensitivity, 79% specificity, and 86% AUC. When compared to the RBF model, the result shows 5.37% accuracy improvement which is significant (p = 0.0332). The experiments show how data from medical health examination can be utilised to address the issue of undiagnosed cases of diabetes. Models constructed with such data would facilitate the much desired shift from preventive to proactive care for individuals at high risk of diabetes. From the machine learning view point, it was established that ensembles constructed based on diverse and accurate base learners, have the potential to produce significant improvement in accuracy, compared to its individual constituent classifiers. In addition, the ensemble presented in this thesis is at least 1% and at most 23% more accurate than similar research studies found within the literature. This validates the superiority of the method implemented

LJMU Research Online (Liverpool John Moores University)

Decimated Input Ensembles for Improved Generalization

Author: Kagan Tumer et al.
Publication venue
Publication date
Field of study

Using an ensemble of classifiers instead of a single classifier has been demonstrated to improve generalization performance in many difficult problems. However, for this improvement to take place it is necessary to make the classifiers in an ensemble more complementary. In this paper, we highlight the need to reduce the correlation among the component classifiers and investigate one method for correlation reduction: input decimation. We elaborate on input decimation, a method that uses the discriminating features of the inputs to decouple classifiers. By presenting different parts of the feature set to each individual classifier, input decimation generates a diverse pool of classifiers. Experimental results confirm that input decimation combining improves generalization performance

CiteSeerX

Decimated Input Ensembles for Improved Generalization

Author: Norvig Peter
Oza Nikunj C.
Tumer Kagan
Publication venue
Publication date
Field of study

Recently, many researchers have demonstrated that using classifier ensembles (e.g., averaging the outputs of multiple classifiers before reaching a classification decision) leads to improved performance for many difficult generalization problems. However, in many domains there are serious impediments to such "turnkey" classification accuracy improvements. Most notable among these is the deleterious effect of highly correlated classifiers on the ensemble performance. One particular solution to this problem is generating "new" training sets by sampling the original one. However, with finite number of patterns, this causes a reduction in the training patterns each classifier sees, often resulting in considerably worsened generalization performance (particularly for high dimensional data domains) for each individual classifier. Generally, this drop in the accuracy of the individual classifier performance more than offsets any potential gains due to combining, unless diversity among classifiers is actively promoted. In this work, we introduce a method that: (1) reduces the correlation among the classifiers; (2) reduces the dimensionality of the data, thus lessening the impact of the 'curse of dimensionality'; and (3) improves the classification performance of the ensemble

NASA Technical Reports Server