Search CORE

54 research outputs found

Pre-emphasizing Binarized Ensembles to Improve Classification Performance

Author: Ahachad Anas
Figueiras Aníbal
Álvarez Pérez Lorena
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2017
Field of study

14th International Work-Conference on Artificial Neural Networks, IWANN 2017Machine ensembles are learning architectures that offer high expressive capacities and, consequently, remarkable performances. This is due to their high number of trainable parameters.In this paper, we explore and discuss whether binarization techniques are effective to improve standard diversification methods and if a simple additional trick, consisting in weighting the training examples, allows to obtain better results. Experimental results, for three selected classification problems, show that binarization permits that standard direct diversification methods (bagging, in particular) achieve better results, obtaining even more significant performance improvements when pre-emphasizing the training samples. Some research avenues that this finding opens are mentioned in the conclusions.This work has been partly supported by research grants CASI-CAM-CM (S2013/ICE-2845, DGUI-CM and FEDER) and Macro-ADOBE (TEC2015-67719-P, MINECO)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Error-Correcting Neural Sequence Prediction

Author: Bollegala Danushka
Neill James O'
Publication venue
Publication date: 21/01/2019
Field of study

We propose a novel neural sequence prediction method based on \textit{error-correcting output codes} that avoids exact softmax normalization and allows for a tradeoff between speed and performance. Instead of minimizing measures between the predicted probability distribution and true distribution, we use error-correcting codes to represent both predictions and outputs. Secondly, we propose multiple ways to improve accuracy and convergence rates by maximizing the separability between codes that correspond to classes proportional to word embedding similarities. Lastly, we introduce our main contribution called \textit{Latent Variable Mixture Sampling}, a technique that is used to mitigate exposure bias, which can be integrated into training latent variable-based neural sequence predictors such as ECOC. This involves mixing the latent codes of past predictions and past targets in one of two ways: (1) according to a predefined sampling schedule or (2) a differentiable sampling procedure whereby the mixing probability is learned throughout training by replacing the greedy argmax operation with a smooth approximation. ECOC-NSP leads to consistent improvements on language modelling datasets and the proposed Latent Variable mixture sampling methods are found to perform well for text generation tasks such as image captioning

arXiv.org e-Print Archive

University of Liverpool Repository

Diversity creation methods: a survey and categorisation

Author: G BROWN
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

Crossref

Taxonomy for characterizing ensemble methods in classification tasks: A review and annotated bibliography

Author: Abeel
Adem
Ahn
Ali
Altincay
Anand
Arbel
Archer
Averbuch
Banfield
Bao
Bartlett
Bauer
Bay
Bennett
Brazdil
Breiman
Breiman
Breiman
Brodley
Brown
Brown
Bruzzone
Bryll
Buntine
Buttrey
Chan
Chawla
Christensen
Christmann
Clark
Cohen
Croux
Cunningham
Dasarathy
Denison
Derbeko
Dietterich
Dietterich
Dietterich
Dimitrakakis
Domingos
Drucker
Džeroski
Elovici
Elovici
Frank
Friedman
Friedman
Friedman
Gama
Gams
Gey
Gunter
Hansen
Ho
Ho
Ho
Ho
Hothorn
Hu
Hu
Huang
Islam
Jacobs
Jordan
Kamel
Kang
Kim
Kolen
Krogh
Kuncheva
Kuncheva
Kuncheva
Kusiak
Lam
Langdon
Leigh
Li
Liao
Lin
Lin
Lior Rokach
Liu
Liu
Lu
Maimon
Maimon
Maimon
Mangiameli
Menahem
Merkwirth
Merler
Merz
Michalski
Mitchell
Moskovitch
Nowlan
Opitz
Opitz
Opitz
Parmanto
Partridge
Phama
Polikar
Ridgeway
Rokach
Rokach
Rokach
Rokach
Rokach
Rokach
Rokach
Rokach
Rokach
Rokach
Rokach
Rokach
Rokach
Rokach
Rosen
Rudin
Schaffer
Schapire
Schclar
Seewald
Sexton
Sharkey
Sharkey
Sharkey
Sharkey
Shilen
Sivalingam
Skurichina
Sohna
Sun
Tan
Tao
Tao
Towell
Tsao
Tsymbal
Tsymbal
Tukey
Tumer
Tumer
Tumer
Valentini
Vilalta
Wanas
Wang
Webb
Webb
Windeatt
Wolpert
Woods
Wu
Xu
Yates
Zhang
Zhang
Zhou
Zhou
Zhou
Zhoua
Zupan
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Designing mixture of deep experts

Author: Kalyan Sai Krishna
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2017
Field of study

Mixture of Experts (MoE) is a classical architecture for ensembles where each member is specialised in a given part of the input space or its expertise area. Working in this manner, we aim to specialise the experts on smaller problems, solving the original problem through some type of divide and conquer approach. The goal of our research is to initially reproduce the work done by Collobert et al[1] , 2002 followed by extending this work by using neural networks as experts on different datasets. Specialised representations will be learned over different aspects of the problem, and the results of the different members will be merged according to their specific expertise. This expertise can then be learned itself by a given network acting as a gating function. MOE architecture composed on N expert networks. These experts are combined via a gating network, which partition the input space accordingly. It is based on divide and conquer strategy supervised by a gating network. Using a specialised cost function the experts specialise in their sub-space. Using the discriminative power of experts is much better than simply clustering. The gating network needs to needs to learn how to assign examples to different specialists. Such models show promise for building larger networks that are still cheap to compute at test time, and more parallelizable at training time. We were able to reproduce the work by the author and implemented a multi-class gater to classify images. We know that Neural Networks perform the best with lots of data. However, some of our experiments require us to divide the dataset and train multiple Neural Networks. We observe that in data deprived condition our MoE are almost on par and compete with ensembles trained on complete data. Keywords : Machine Learning, Multi Layer Perceptrons, Mixture of Experts, Support Vector Machines, Divide and Conquer, Stochastic Gradient Descent, Optimization

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Development of a Stope Stability Prediction Model Using Ensemble Learning Techniques - A Case Study

Author: Mireku-Gyimah D..
Olaleye B. M.
Saadaari F. Saadaari
Publication venue: 'African Journals Online (AJOL)'
Publication date: 31/12/2020
Field of study

The consequences of collapsed stopes can be dire in the mining industry. This can lead to the revocation of a mining license in most jurisdictions, especially when the harm costs lives. Therefore, as a mine planning and technical services engineer, it is imperative to estimate the stability status of stopes. This study has attempted to produce a stope stability prediction model adopted from stability graph using ensemble learning techniques. This study was conducted using 472 case histories from 120 stopes of AngloGold Ashanti Ghana, Obuasi Mine. Random Forest, Gradient Boosting, Bootstrap Aggregating and Adaptive Boosting classification algorithms were used to produce the models. A comparative analysis was done using six classification performance metrics namely Accuracy, Precision, Sensitivity, F1-score, Specificity and Mathews Correlation Coefficient (MCC) to determine which ensemble learning technique performed best in predicting the stability of a stope. The Bootstrap Aggregating model obtained the highest MCC score of 96.84% while the Adaptive Boosting model obtained the lowest score. The Specificity scores in decreasing order of performance were 98.95%, 97.89%, 96.32% and 95.26% for Bootstrap Aggregating, Gradient Boosting, Random Forest and Adaptive Boosting respectively. The results showed equal Accuracy, Precision, F1-score and Sensitivity score of 97.89% for the Bootstrap Aggregating model while the same observation was made for Adaptive Boosting, Gradient Boosting and Random Forest with 90.53%, 92.63% and 95.79% scores respectively. At a 95% confidence interval using Wilson Score Interval, the results showed that the Bootstrap Aggregating model produced the minimal error and hence was selected as the alternative stope design tool for predicting the stability status of stopes.   Keywords: Stope Stability, Ensemble Learning Techniques, Stability Graph, Machine Learnin

AJOL - African Journals Online

Modelling for the optimal product to offer a financial services customer

Author: Mukomberanwa John Shingirai
Publication venue
Publication date: 31/07/2014
Field of study

A research report submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Master of Science. Johannesburg, 2014.This study, illustrates how various statistical classification models can be compared and utilised to resolve cross-selling problems encountered in a financial services environment. Various statistical classification algorithms were deployed to model for the appropriate product to sell to a financial services customer under a multi-classifier setting. Four models were used, namely: multinomial logistic regression, multinomial bagging with logistic regression, multinomial random forests with decision trees and error correcting output coding. The models were compared in terms of predictive accuracy, generalisation, interpretability, ability to handle rare instances and ease of use. A weighted score for each model was obtained based on the evaluation criteria stated above and an overall model ranking thereof. In terms of the data, banked customers who only had a transactional account at the start of the observation period were used for the modelling process. Varying samples of the customers were obtained from different time points with the preceding six to twelve months information being used to derive the predictor variables and the following six months used to monitor product take-up. Error correcting output coding performed the best in terms of predictive accuracy but did not perform as well on other metrics. Overall, multinomial bagging with logistic regression proved to be the best model. All the models struggled with modelling for the rare classes. Weighted classification was deployed to improve the rare-class prediction accuracy. Classification accuracy showed significant limitation under the multi-classifier setting as it tended to be biased towards the majority class. The measure of area under the receiver operating characteristic curve (AUC) as proposed by Hand and Till (2001) proved to be a powerful metric for model evaluation

Wits Institutional Repository on DSPACE

Building well-performing classifier ensembles: model and decision level combination.

Author: Eastwood Mark
Publication venue
Publication date
Field of study

There is a continuing drive for better, more robust generalisation performance from classification systems, and prediction systems in general. Ensemble methods, or the combining of multiple classifiers, have become an accepted and successful tool for doing this, though the reasons for success are not always entirely understood. In this thesis, we review the multiple classifier literature and consider the properties an ensemble of classifiers - or collection of subsets - should have in order to be combined successfully. We find that the framework of Stochastic Discrimination provides a well-defined account of these properties, which are shown to be strongly encouraged in a number of the most popular/successful methods in the literature via differing algorithmic devices. This uncovers some interesting and basic links between these methods, and aids understanding of their success and operation in terms of a kernel induced on the training data, with form particularly well suited to classification. One property that is desirable in both the SD framework and in a regression context, the ambiguity decomposition of the error, is de-correlation of individuals. This motivates the introduction of the Negative Correlation Learning method, in which neural networks are trained in parallel in a way designed to encourage de-correlation of the individual networks. The training is controlled by a parameter λ governing the extent to which correlations are penalised. Theoretical analysis of the dynamics of training results in an exact expression for the interval in which we can choose λ while ensuring stability of the training, and a value λ∗ for which the training has some interesting optimality properties. These values depend only on the size N of the ensemble. Decision level combination methods often result in a difficult to interpret model, and NCL is no exception. However in some applications, there is a need for understandable decisions and interpretable models. In response to this, we depart from the standard decision level combination paradigm to introduce a number of model level combination methods. As decision trees are one of the most interpretable model structures used in classification, we chose to combine structure from multiple individual trees to build a single combined model. We show that extremely compact, well performing models can be built in this way. In particular, a generalisation of bottom-up pruning to a multiple-tree context produces good results in this regard. Finally, we develop a classification system for a real-world churn prediction problem, illustrating some of the concepts introduced in the thesis, and a number of more practical considerations which are of importance when developing a prediction system for a specific problem

Bournemouth University Research Online