17 research outputs found

    A Comparative Analysis of Ensemble Classifiers: Case Studies in Genomics

    Full text link
    The combination of multiple classifiers using ensemble methods is increasingly important for making progress in a variety of difficult prediction problems. We present a comparative analysis of several ensemble methods through two case studies in genomics, namely the prediction of genetic interactions and protein functions, to demonstrate their efficacy on real-world datasets and draw useful conclusions about their behavior. These methods include simple aggregation, meta-learning, cluster-based meta-learning, and ensemble selection using heterogeneous classifiers trained on resampled data to improve the diversity of their predictions. We present a detailed analysis of these methods across 4 genomics datasets and find the best of these methods offer statistically significant improvements over the state of the art in their respective domains. In addition, we establish a novel connection between ensemble selection and meta-learning, demonstrating how both of these disparate methods establish a balance between ensemble diversity and performance.Comment: 10 pages, 3 figures, 8 tables, to appear in Proceedings of the 2013 International Conference on Data Minin

    Sequential churn prediction and analysis of cellular network users - A multi-class, multi-label perspective

    Get PDF
    We investigate the problem of churn detection and prediction using sequential cellular network data. We introduce a cleaning and preprocessing of the dataset that makes it suitable for the analysis. We draw a comparison of the churn prediction results from the-state-of-the-art algorithms such as the Gradient Boosting Trees, Random Forests, basic Long Short-Term Memory (LSTM) and Support Vector Machines (SVM). We achieve significant performance boost by incorporating the sequential nature of the data, imputing missing information and analyzing the effects of various features. This in turns makes the classifier rigorous enough to give highly accurate results. We emphasize on the sequential nature of the problem and seek algorithms that can track the variations in the data. We test and compare the performance of proposed algorithms using performance measures on real life cellular network data for churn detection. © 2017 IEEE

    A Model for Stock Market Value Forecasting using Ensemble Artificial Neural Network

    Get PDF
    Artificial Neural Network (ANN) is a model used in capturing linear and non-linear relationship of input and output data. Its usage has been predominant in the prediction and forecasting market time series. However, there has been low bias and high variance issues associated with ANN models such as the simple multi-layer perceptron model. This usually happens when training large dataset. The objective of this work was to develop an efficient forecasting model using Ensemble ANN to unravel the market mysteries for accurate decision on investment. This paper employed the Ensemble ANN modeling technique to tackle the high variations in stock market training dataset faced when using a simple multi-layer perceptron model by using the theory of ensemble averaging. The Ensemble ANN model was developed and implemented using NeurophStudio and Java programming language, then trained and tested using daily data of stock market prices from various banks, for a period of 497 days. The methodology adopted to achieve this task is the agile methodology. The output of the proposed predictive model was compared with four traditional neural network multilayer perceptron algorithms, and outperformed the traditional neural network multilayer perceptron algorithms. The proposed model gave an average to best predictive error for any day when compared with the other four traditional models

    Intelligent IoT Traffic Classification Using Novel Search Strategy for Fast Based-Correlation Feature Selection in Industrial Environments

    Full text link
    [EN] Internet of Things (IoT) can be combined with machine learning in order to provide intelligent applications to the network nodes. Furthermore, IoT expands these advantages and technologies to the industry. In this paper, we propose a modification of one of the most popular algorithms for feature selection, fast-based-correlation feature (FCBF). The key idea is to split the feature space in fragments with the same size. By introducing this division, we can improve the correlation and, therefore, the machine learning applications that are operating on each node. This kind of IoT applications for industry allows us to separate and prioritize the sensor data from the multimedia-related traffic. With this separation, the sensors are able to detect efficiently emergency situations and avoid both material and human damage. The results show the performance of the three FCBF-based algorithms for different problems and different classifiers, confirming the improvements achieved by our approach in terms of model accuracy and execution time.This paper was supported in part by the Ministerio de Economia y Competitividad del Gobierno de Espana and the Fondo de Desarrollo Regional within the project Inteligencia distribuida para el control y adaptacion de redes dinamicas definidas por software under Grant TIN2014-57991-C3-1-P, in part by the Ministerio de Educacion, Cultura y Deporte, through the Ayudas para contratos predoctorales de Formacion del Profesorado Universitario FPU (Convocatoria 2015) under Grant FPU15/06837, and in part by the Ministerio de Economia y Competitividad in the Programa Estatal de Fomento de la Investigacion Cientifica y Tecnica de Excelencia, Subprograma Estatal de Generacion de Conocimiento within the Project TIN2017-84802-C2-1-P. (Corresponding author: Jaime Lloret.)Egea, S.; Rego Mañez, A.; Carro, B.; Sánchez-Esguevillas, A.; Lloret, J. (2018). Intelligent IoT Traffic Classification Using Novel Search Strategy for Fast Based-Correlation Feature Selection in Industrial Environments. IEEE Internet of Things. 5(3):1616-1624. https://doi.org/10.1109/JIOT.2017.2787959S161616245

    Ensembling predictions of student post-test scores for an intelligent tutoring system.

    Get PDF
    ________________________________________________________________________ Over the last few decades, there have been a rich variety of approaches towards modeling student knowledge and skill within interactive learning environments. There have recently been several empirical comparisons as to which types of student models are better at predicting future performance, both within and outside of the interactive learning environment. A recent paper (Baker et al., in press) considers whether ensembling can produce better prediction than individual models, when ensembling is performed at the level of predictions of performance within the tutor. However, better performance was not achieved for predicting the post-test. In this paper, we investigate ensembling at the post-test level, to see if this approach can produce better prediction of post-test scores within the context of a Cognitive Tutor for Genetics. We find no improvement for ensembling over the best individual models and we consider possible explanations for this finding, including the limited size of the data set

    Large-scale protein function prediction using heterogeneous ensembles [version 1; referees: 2 approved]

    Get PDF
    Heterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall goal of this study is to critically assess this ability of a variety of heterogeneous ensemble methods across a multitude of functional terms, proteins and organisms. Our results show that these methods, especially Stacking using Logistic Regression, indeed produce more accurate predictions for a variety of Gene Ontology terms differing in size and specificity. To enable the application of these methods to other related problems, we have publicly shared the HPC-enabled code underlying this work as LargeGOPred (https://github.com/GauravPandeyLab/LargeGOPred)

    Auto-Sklearn 2.0: The Next Generation

    Full text link
    Automated Machine Learning, which supports practitioners and researchers with the tedious task of manually designing machine learning pipelines, has recently achieved substantial success. In this paper we introduce new Automated Machine Learning (AutoML) techniques motivated by our winning submission to the second ChaLearn AutoML challenge, PoSH Auto-sklearn. For this, we extend Auto-sklearn with a new, simpler meta-learning technique, improve its way of handling iterative algorithms and enhance it with a successful bandit strategy for budget allocation. Furthermore, we go one step further and study the design space of AutoML itself and propose a solution towards truly hand-free AutoML. Together, these changes give rise to the next generation of our AutoML system, Auto-sklearn (2.0). We verify the improvement by these additions in a large experimental study on 39 AutoML benchmark datasets and conclude the paper by comparing to Auto-sklearn (1.0), reducing the regret by up to a factor of five
    corecore