54 research outputs found

    Pre-emphasizing Binarized Ensembles to Improve Classification Performance

    Get PDF
    14th International Work-Conference on Artificial Neural Networks, IWANN 2017Machine ensembles are learning architectures that offer high expressive capacities and, consequently, remarkable performances. This is due to their high number of trainable parameters.In this paper, we explore and discuss whether binarization techniques are effective to improve standard diversification methods and if a simple additional trick, consisting in weighting the training examples, allows to obtain better results. Experimental results, for three selected classification problems, show that binarization permits that standard direct diversification methods (bagging, in particular) achieve better results, obtaining even more significant performance improvements when pre-emphasizing the training samples. Some research avenues that this finding opens are mentioned in the conclusions.This work has been partly supported by research grants CASI-CAM-CM (S2013/ICE-2845, DGUI-CM and FEDER) and Macro-ADOBE (TEC2015-67719-P, MINECO)

    Error-Correcting Neural Sequence Prediction

    Get PDF
    We propose a novel neural sequence prediction method based on \textit{error-correcting output codes} that avoids exact softmax normalization and allows for a tradeoff between speed and performance. Instead of minimizing measures between the predicted probability distribution and true distribution, we use error-correcting codes to represent both predictions and outputs. Secondly, we propose multiple ways to improve accuracy and convergence rates by maximizing the separability between codes that correspond to classes proportional to word embedding similarities. Lastly, we introduce our main contribution called \textit{Latent Variable Mixture Sampling}, a technique that is used to mitigate exposure bias, which can be integrated into training latent variable-based neural sequence predictors such as ECOC. This involves mixing the latent codes of past predictions and past targets in one of two ways: (1) according to a predefined sampling schedule or (2) a differentiable sampling procedure whereby the mixing probability is learned throughout training by replacing the greedy argmax operation with a smooth approximation. ECOC-NSP leads to consistent improvements on language modelling datasets and the proposed Latent Variable mixture sampling methods are found to perform well for text generation tasks such as image captioning

    Diversity creation methods: a survey and categorisation

    Get PDF

    Designing mixture of deep experts

    Get PDF
    Mixture of Experts (MoE) is a classical architecture for ensembles where each member is specialised in a given part of the input space or its expertise area. Working in this manner, we aim to specialise the experts on smaller problems, solving the original problem through some type of divide and conquer approach. The goal of our research is to initially reproduce the work done by Collobert et al[1] , 2002 followed by extending this work by using neural networks as experts on different datasets. Specialised representations will be learned over different aspects of the problem, and the results of the different members will be merged according to their specific expertise. This expertise can then be learned itself by a given network acting as a gating function. MOE architecture composed on N expert networks. These experts are combined via a gating network, which partition the input space accordingly. It is based on divide and conquer strategy supervised by a gating network. Using a specialised cost function the experts specialise in their sub-space. Using the discriminative power of experts is much better than simply clustering. The gating network needs to needs to learn how to assign examples to different specialists. Such models show promise for building larger networks that are still cheap to compute at test time, and more parallelizable at training time. We were able to reproduce the work by the author and implemented a multi-class gater to classify images. We know that Neural Networks perform the best with lots of data. However, some of our experiments require us to divide the dataset and train multiple Neural Networks. We observe that in data deprived condition our MoE are almost on par and compete with ensembles trained on complete data. Keywords : Machine Learning, Multi Layer Perceptrons, Mixture of Experts, Support Vector Machines, Divide and Conquer, Stochastic Gradient Descent, Optimization

    Development of a Stope Stability Prediction Model Using Ensemble Learning Techniques - A Case Study

    Get PDF
    The consequences of collapsed stopes can be dire in the mining industry. This can lead to the revocation of a mining license in most jurisdictions, especially when the harm costs lives. Therefore, as a mine planning and technical services engineer, it is imperative to estimate the stability status of stopes. This study has attempted to produce a stope stability prediction model adopted from stability graph using ensemble learning techniques. This study was conducted using 472 case histories from 120 stopes of AngloGold Ashanti Ghana, Obuasi Mine. Random Forest, Gradient Boosting, Bootstrap Aggregating and Adaptive Boosting classification algorithms were used to produce the models. A comparative analysis was done using six classification performance metrics namely Accuracy, Precision, Sensitivity, F1-score, Specificity and Mathews Correlation Coefficient (MCC) to determine which ensemble learning technique performed best in predicting the stability of a stope. The Bootstrap Aggregating model obtained the highest MCC score of 96.84% while the Adaptive Boosting model obtained the lowest score. The Specificity scores in decreasing order of performance were 98.95%, 97.89%, 96.32% and 95.26% for Bootstrap Aggregating, Gradient Boosting, Random Forest and Adaptive Boosting respectively. The results showed equal Accuracy, Precision, F1-score and Sensitivity score of 97.89% for the Bootstrap Aggregating model while the same observation was made for Adaptive Boosting, Gradient Boosting and Random Forest with 90.53%, 92.63% and 95.79% scores respectively. At a 95% confidence interval using Wilson Score Interval, the results showed that the Bootstrap Aggregating model produced the minimal error and hence was selected as the alternative stope design tool for predicting the stability status of stopes.   Keywords: Stope Stability, Ensemble Learning Techniques, Stability Graph, Machine Learnin

    Modelling for the optimal product to offer a financial services customer

    Get PDF
    A research report submitted to the Faculty of Science, University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Master of Science. Johannesburg, 2014.This study, illustrates how various statistical classification models can be compared and utilised to resolve cross-selling problems encountered in a financial services environment. Various statistical classification algorithms were deployed to model for the appropriate product to sell to a financial services customer under a multi-classifier setting. Four models were used, namely: multinomial logistic regression, multinomial bagging with logistic regression, multinomial random forests with decision trees and error correcting output coding. The models were compared in terms of predictive accuracy, generalisation, interpretability, ability to handle rare instances and ease of use. A weighted score for each model was obtained based on the evaluation criteria stated above and an overall model ranking thereof. In terms of the data, banked customers who only had a transactional account at the start of the observation period were used for the modelling process. Varying samples of the customers were obtained from different time points with the preceding six to twelve months information being used to derive the predictor variables and the following six months used to monitor product take-up. Error correcting output coding performed the best in terms of predictive accuracy but did not perform as well on other metrics. Overall, multinomial bagging with logistic regression proved to be the best model. All the models struggled with modelling for the rare classes. Weighted classification was deployed to improve the rare-class prediction accuracy. Classification accuracy showed significant limitation under the multi-classifier setting as it tended to be biased towards the majority class. The measure of area under the receiver operating characteristic curve (AUC) as proposed by Hand and Till (2001) proved to be a powerful metric for model evaluation

    Building well-performing classifier ensembles: model and decision level combination.

    Get PDF
    There is a continuing drive for better, more robust generalisation performance from classification systems, and prediction systems in general. Ensemble methods, or the combining of multiple classifiers, have become an accepted and successful tool for doing this, though the reasons for success are not always entirely understood. In this thesis, we review the multiple classifier literature and consider the properties an ensemble of classifiers - or collection of subsets - should have in order to be combined successfully. We find that the framework of Stochastic Discrimination provides a well-defined account of these properties, which are shown to be strongly encouraged in a number of the most popular/successful methods in the literature via differing algorithmic devices. This uncovers some interesting and basic links between these methods, and aids understanding of their success and operation in terms of a kernel induced on the training data, with form particularly well suited to classification. One property that is desirable in both the SD framework and in a regression context, the ambiguity decomposition of the error, is de-correlation of individuals. This motivates the introduction of the Negative Correlation Learning method, in which neural networks are trained in parallel in a way designed to encourage de-correlation of the individual networks. The training is controlled by a parameter λ governing the extent to which correlations are penalised. Theoretical analysis of the dynamics of training results in an exact expression for the interval in which we can choose λ while ensuring stability of the training, and a value λ∗ for which the training has some interesting optimality properties. These values depend only on the size N of the ensemble. Decision level combination methods often result in a difficult to interpret model, and NCL is no exception. However in some applications, there is a need for understandable decisions and interpretable models. In response to this, we depart from the standard decision level combination paradigm to introduce a number of model level combination methods. As decision trees are one of the most interpretable model structures used in classification, we chose to combine structure from multiple individual trees to build a single combined model. We show that extremely compact, well performing models can be built in this way. In particular, a generalisation of bottom-up pruning to a multiple-tree context produces good results in this regard. Finally, we develop a classification system for a real-world churn prediction problem, illustrating some of the concepts introduced in the thesis, and a number of more practical considerations which are of importance when developing a prediction system for a specific problem
    • 

    corecore