12 research outputs found

    A framework for increasing the value of predictive data-driven models by enriching problem domain characterization with novel features

    Get PDF
    The need to leverage knowledge through data mining has driven enterprises in a demand for more data. However, there is a gap between the availability of data and the application of extracted knowledge for improving decision support. In fact, more data do not necessarily imply better predictive data-driven marketing models, since it is often the case that the problem domain requires a deeper characterization. Aiming at such characterization, we propose a framework drawn on three feature selection strategies, where the goal is to unveil novel features that can effectively increase the value of data by providing a richer characterization of the problem domain. Such strategies involve encompassing context (e.g., social and economic variables), evaluating past history, and disaggregate the main problem into smaller but interesting subproblems. The framework is evaluated through an empirical analysis for a real bank telemarketing application, with the results proving the benefits of such approach, as the area under the receiver operating characteristic curve increased with each stage, improving previous model in terms of predictive performance.The work of P. Cortez was supported by FCT within the Project Scope UID/CEC/00319/2013. The authors would like to thank the anonymous reviewers for their helpful comments.info:eu-repo/semantics/publishedVersio

    A data mining approach for predicting academic success – a case study

    No full text
    The present study puts forward a regression analytic model based on the random forest algorithm, developed to predict, at an early stage, the global academic performance of the undergraduates of a polytechnic higher education institution. The study targets the universe of an institution composed of 5 schools rather than following the usual procedure of delimiting the prediction to one single specific degree course. Hence, we intend to provide the institution with one single tool capable of including the heterogeneity of the universe of students as well as educational dynamics. A different approach to feature selection is proposed, which enables to completely exclude categories of predictive variables, making the model useful for scenarios in which not all categories of data considered are collected. The introduced model can be used at a central level by the decision-makers who are entitled to design actions to mitigate academic failure.This work was supported by the Portuguese Foundation for Science and Technology (FCT) under Project UID/EEA/04131/2013. The authors would also like to thank the Polytechnic Institute of Bragan¸ca for making available the data analysed in this study.info:eu-repo/semantics/publishedVersio
    corecore