24,354 research outputs found

    Popular Ensemble Methods: An Empirical Study

    Full text link
    An ensemble consists of a set of individually trained classifiers (such as neural networks or decision trees) whose predictions are combined when classifying novel instances. Previous research has shown that an ensemble is often more accurate than any of the single classifiers in the ensemble. Bagging (Breiman, 1996c) and Boosting (Freund and Shapire, 1996; Shapire, 1990) are two relatively new but popular methods for producing ensembles. In this paper we evaluate these methods on 23 data sets using both neural networks and decision trees as our classification algorithm. Our results clearly indicate a number of conclusions. First, while Bagging is almost always more accurate than a single classifier, it is sometimes much less accurate than Boosting. On the other hand, Boosting can create ensembles that are less accurate than a single classifier -- especially when using neural networks. Analysis indicates that the performance of the Boosting methods is dependent on the characteristics of the data set being examined. In fact, further results show that Boosting ensembles may overfit noisy data sets, thus decreasing its performance. Finally, consistent with previous studies, our work suggests that most of the gain in an ensemble's performance comes in the first few classifiers combined; however, relatively large gains can be seen up to 25 classifiers when Boosting decision trees

    Generative Adversarial Networks for Financial Trading Strategies Fine-Tuning and Combination

    Get PDF
    Systematic trading strategies are algorithmic procedures that allocate assets aiming to optimize a certain performance criterion. To obtain an edge in a highly competitive environment, the analyst needs to proper fine-tune its strategy, or discover how to combine weak signals in novel alpha creating manners. Both aspects, namely fine-tuning and combination, have been extensively researched using several methods, but emerging techniques such as Generative Adversarial Networks can have an impact into such aspects. Therefore, our work proposes the use of Conditional Generative Adversarial Networks (cGANs) for trading strategies calibration and aggregation. To this purpose, we provide a full methodology on: (i) the training and selection of a cGAN for time series data; (ii) how each sample is used for strategies calibration; and (iii) how all generated samples can be used for ensemble modelling. To provide evidence that our approach is well grounded, we have designed an experiment with multiple trading strategies, encompassing 579 assets. We compared cGAN with an ensemble scheme and model validation methods, both suited for time series. Our results suggest that cGANs are a suitable alternative for strategies calibration and combination, providing outperformance when the traditional techniques fail to generate any alpha

    A low variance error boosting algorithm

    Get PDF
    This paper introduces a robust variant of AdaBoost, cw-AdaBoost, that uses weight perturbation to reduce variance error, and is particularly effective when dealing with data sets, such as microarray data, which have large numbers of features and small number of instances. The algorithm is compared with AdaBoost, Arcing and MultiBoost, using twelve gene expression datasets, using 10-fold cross validation. The new algorithm consistently achieves higher classification accuracy over all these datasets. In contrast to other AdaBoost variants, the algorithm is not susceptible to problems when a zero-error base classifier is encountered

    Oversampling for Imbalanced Learning Based on K-Means and SMOTE

    Full text link
    Learning from class-imbalanced data continues to be a common and challenging problem in supervised learning as standard classification algorithms are designed to handle balanced class distributions. While different strategies exist to tackle this problem, methods which generate artificial data to achieve a balanced class distribution are more versatile than modifications to the classification algorithm. Such techniques, called oversamplers, modify the training data, allowing any classifier to be used with class-imbalanced datasets. Many algorithms have been proposed for this task, but most are complex and tend to generate unnecessary noise. This work presents a simple and effective oversampling method based on k-means clustering and SMOTE oversampling, which avoids the generation of noise and effectively overcomes imbalances between and within classes. Empirical results of extensive experiments with 71 datasets show that training data oversampled with the proposed method improves classification results. Moreover, k-means SMOTE consistently outperforms other popular oversampling methods. An implementation is made available in the python programming language.Comment: 19 pages, 8 figure

    Coupling different methods for overcoming the class imbalance problem

    Get PDF
    Many classification problems must deal with imbalanced datasets where one class \u2013 the majority class \u2013 outnumbers the other classes. Standard classification methods do not provide accurate predictions in this setting since classification is generally biased towards the majority class. The minority classes are oftentimes the ones of interest (e.g., when they are associated with pathological conditions in patients), so methods for handling imbalanced datasets are critical. Using several different datasets, this paper evaluates the performance of state-of-the-art classification methods for handling the imbalance problem in both binary and multi-class datasets. Different strategies are considered, including the one-class and dimension reduction approaches, as well as their fusions. Moreover, some ensembles of classifiers are tested, in addition to stand-alone classifiers, to assess the effectiveness of ensembles in the presence of imbalance. Finally, a novel ensemble of ensembles is designed specifically to tackle the problem of class imbalance: the proposed ensemble does not need to be tuned separately for each dataset and outperforms all the other tested approaches. To validate our classifiers we resort to the KEEL-dataset repository, whose data partitions (training/test) are publicly available and have already been used in the open literature: as a consequence, it is possible to report a fair comparison among different approaches in the literature. Our best approach (MATLAB code and datasets not easily accessible elsewhere) will be available at https://www.dei.unipd.it/node/2357

    Learning Hybrid Neuro-Fuzzy Classifier Models From Data: To Combine or Not to Combine?

    Get PDF
    To combine or not to combine? Though not a question of the same gravity as the Shakespeareā€™s to be or not to be, it is examined in this paper in the context of a hybrid neuro-fuzzy pattern classifier design process. A general fuzzy min-max neural network with its basic learning procedure is used within six different algorithm independent learning schemes. Various versions of cross-validation, resampling techniques and data editing approaches, leading to a generation of a single classifier or a multiple classifier system, are scrutinised and compared. The classification performance on unseen data, commonly used as a criterion for comparing different competing designs, is augmented by further four criteria attempting to capture various additional characteristics of classifier generation schemes. These include: the ability to estimate the true classification error rate, the classifier transparency, the computational complexity of the learning scheme and the potential for adaptation to changing environments and new classes of data. One of the main questions examined is whether and when to use a single classifier or a combination of a number of component classifiers within a multiple classifier system
    • ā€¦
    corecore