1,966 research outputs found

    Customer purchase behavior prediction in E-commerce: a conceptual framework and research agenda

    Get PDF
    Digital retailers are experiencing an increasing number of transactions coming from their consumers online, a consequence of the convenience in buying goods via E-commerce platforms. Such interactions compose complex behavioral patterns which can be analyzed through predictive analytics to enable businesses to understand consumer needs. In this abundance of big data and possible tools to analyze them, a systematic review of the literature is missing. Therefore, this paper presents a systematic literature review of recent research dealing with customer purchase prediction in the E-commerce context. The main contributions are a novel analytical framework and a research agenda in the field. The framework reveals three main tasks in this review, namely, the prediction of customer intents, buying sessions, and purchase decisions. Those are followed by their employed predictive methodologies and are analyzed from three perspectives. Finally, the research agenda provides major existing issues for further research in the field of purchase behavior prediction online

    Introducing artificial data generation in active learning for land use/land cover classification

    Get PDF
    Fonseca, J., Douzas, G., & Bacao, F. (2021). Increasing the effectiveness of active learning: Introducing artificial data generation in active learning for land use/land cover classification. Remote Sensing, 13(13), 1-20. [2619]. https://doi.org/10.3390/rs13132619In remote sensing, Active Learning (AL) has become an important technique to collect informative ground truth data “on-demand” for supervised classification tasks. Despite its effectiveness, it is still significantly reliant on user interaction, which makes it both expensive and time consuming to implement. Most of the current literature focuses on the optimization of AL by modifying the selection criteria and the classifiers used. Although improvements in these areas will result in more effective data collection, the use of artificial data sources to reduce human–computer interaction remains unexplored. In this paper, we introduce a new component to the typical AL framework, the data generator, a source of artificial data to reduce the amount of user-labeled data required in AL. The implementation of the proposed AL framework is done using Geometric SMOTE as the data generator. We compare the new AL framework to the original one using similar acquisition functions and classifiers over three AL-specific performance metrics in seven benchmark datasets. We show that this modification of the AL framework significantly reduces cost and time requirements for a successful AL implementation in all of the datasets used in the experiment.publishersversionpublishe

    Geometric SMOTE for imbalanced datasets with nominal and continuous features

    Get PDF
    Fonseca, J., & Bacao, F. (2023). Geometric SMOTE for imbalanced datasets with nominal and continuous features. Expert Systems with Applications, 234(December), 1-9. [121053]. https://doi.org/10.1016/j.eswa.2023.121053 --- This research was supported by research grants of the Portuguese Foundation for Science and Technology (“Fundação para a Ciência e a Tecnologia”), references SFRH/BD/151473/2021, DSAIPA/DS/0116/2019, and by project UIDB/04152/2020 — Centro de Investigação em Gestão de Informação (MagIC) .Imbalanced learning can be addressed in 3 different ways: Resampling, algorithmic modifications and cost-sensitive solutions. Resampling, and specifically oversampling, are more general approaches when opposed to algorithmic and cost-sensitive methods. Since the proposal of the Synthetic Minority Oversampling TEchnique (SMOTE), various SMOTE variants and neural network-based oversampling methods have been developed. However, the options to oversample datasets with nominal and continuous features are limited. We propose Geometric SMOTE for Nominal and Continuous features (G-SMOTENC), based on a combination of G-SMOTE and SMOTENC. Our method modifies SMOTENC’s encoding and generation mechanism for nominal features while using G-SMOTE’s data selection mechanism to determine the center observation and k-nearest neighbors and generation mechanism for continuous features. G-SMOTENC’s performance is compared against SMOTENC’s along with two other baseline methods, a State-of-the-art oversampling method and no oversampling. The experiment was performed over 20 datasets with varying imbalance ratios, number of metric and non-metric features and target classes. We found a significant improvement in classification performance when using G-SMOTENC as the oversampling method. An open-source implementation of G-SMOTENC is made available in the Python programming language.publishersversionpublishe
    corecore