8 research outputs found

    Hierarchical clustering of products using market-basket data

    Get PDF
    The goal of this paper is to present a new method of clustering products based only on the market-basket data from the retail store. The presented approach uses a special way of computing the dissimilarity matrix on which Ward’s hierarchical clustering method is used. The similarity matrix stems from the co-occurrence of products in same basket as a utility data. As a similar are denoted products which have similar co-occurring products and simultaneously are not often present in the same basket. Hence, the method does not require the identification of the customer, neither the data from fixed time frame, which is an advantage over commonly used methods. The method is reasonably fast even over huge dataset of tens of millions rows. The results are promising and easy to interpret

    Implementation of RFM Method and K-Means Algorithm for Customer Segmentation in E-Commerce with Streamlit

    Get PDF
    E-commerce is selling and buying goods through an online or online system. One of the business models in which consumers sell products to other consumers is the Customer to Customer (C2C) business model. One thing that needs to be considered in the business model is knowing the level of customer loyalty. By knowing the level of customer loyalty, the company can provide several different treatments to its customers to maintain good relationships with customers and increase product purchase revenue. In this study, the author wants to segment customers on data in E-commerce companies in Brazil using the K-Means clustering algorithm using the RFM (Recency, Frequency, Monetary) feature and display it in the form of a dashboard using the Streamlit framework. Several stages of research must be carried out. Firstly, taking data from the open public data site (Kaggle), then merging the data to select some data that needs to be used, understanding data by displaying it in graphic form, and conducting data selection to select features/attributes. The step follows the proposed method, performs data preprocessing, creates a model to get the cluster, and finally displays it as a dashboard using Streamlit. Based on the results of the research that has been done, the number of clusters is 4 clusters with the evaluation value of the model using the silhouette score is 0.470

    An approach based on tunicate swarm algorithm to solve partitional clustering problem

    Get PDF
    The tunicate swarm algorithm (TSA) is a newly proposed population-based swarm optimizer for solving global optimization problems. TSA uses best solution in the population in order improve the intensification and diversification of the tunicates. Thus, the possibility of finding a better position for search agents has increased. The aim of the clustering algorithms is to distributed the data instances into some groups according to similar and dissimilar features of instances. Therefore, with a proper clustering algorithm the dataset will be separated to some groups and it’s expected that the similarities of groups will be minimum. In this work, firstly, an approach based on TSA has proposed for solving partitional clustering problem. Then, the TSA is implemented on ten different clustering problems taken from UCI Machine Learning Repository, and the clustering performance of the TSA is compared with the performances of the three well known clustering algorithms such as fuzzy c-means, k-means and k-medoids. The experimental results and comparisons show that the TSA based approach is highly competitive and robust optimizer for solving the partitional clustering problems

    Classification of retail products: From probabilistic ranking to neural networks

    Full text link
    Food retailing is now on an accelerated path to a success penetration into the digital market by new ways of value creation at all stages of the consumer decision process. One of the most important imperatives in this path is the availability of quality data to feed all the process in digital transformation. But the quality of data is not so obvious if we consider the variety of products and suppliers in the grocery market. Within this context of digital transformation of grocery industry, \textit{Midiadia} is Spanish data provider company that works on converting data from the retailers' products into knowledge with attributes and insights from the product labels, that is, maintaining quality data in a dynamic market with a high dispersion of products. Currently, they manually categorize products (groceries) according to the information extracted directly (text processing) from the product labelling and packaging. This paper introduces a solution to automatically categorize the constantly changing product catalogue into a 3-level food taxonomy. Our proposal studies three different approaches: a score-based ranking method, traditional machine learning algorithms, and deep neural networks. Thus, we provide four different classifiers that support a more efficient and less error-prone maintenance of groceries catalogues, the main asset of the company. Finally, we have compared the performance of these three alternatives, concluding that traditional machine learning algorithms perform better, but closely followed by the score-based approach.Comment: 17 pages, 8 figures, journa

    A Comparative Study on Statistical and Machine Learning Forecasting Methods for an FMCG Company

    Get PDF
    Demand forecasting has been an area of study among scholars and businessmen ever since the start of the industrial revolution and has only gained focus in recent years with the advancements in AI. Accurate forecasts are no longer a luxury, but a necessity to have for effective decisions made in planning production and marketing. Many aspects of the business depend on demand, and this is particularly true for the Fast-Moving Consumer Goods industry where the high volume and demand volatility poses a challenge for planners to generate accurate forecasts as consumer demand complexity rises. Inaccurate demand forecasts lead to multiple issues such as high holding costs on excess inventory, shortages on certain SKUs in the market leading to sales loss and a significant impact on both top line and bottom line for the business. Researchers have attempted to look at the performance of statistical time series models in comparison to machine learning methods to evaluate their robustness, computational time and power. In this paper, a comparative study was conducted using statistical and machine learning techniques to generate an accurate forecast using shipment data of an FMCG company. NaĂŻve method was used as a benchmark to evaluate performance of other forecasting techniques, and was compared to exponential smoothing, ARIMA, KNN, Facebook Prophet and LSTM using past 3 years shipments. Methodology followed was CRISP-DM from data exploration, pre-processing and transformation before applying different forecasting algorithms and evaluation. Moreover, secondary goals behind this paper include understanding associations between SKUs through market basket analysis, and clustering using KNN based on brand, customer, order quantity and value to propose a product segmentation strategy. The results of both clustering and forecasting models are then evaluated to choose the optimal forecasting technique, and a visual representation of the forecast and exploratory analysis conducted is displayed using R

    Review on recent advances in information mining from big consumer opinion data for product design

    Get PDF
    In this paper, based on more than ten years' studies on this dedicated research thrust, a comprehensive review concerning information mining from big consumer opinion data in order to assist product design is presented. First, the research background and the essential terminologies regarding online consumer opinion data are introduced. Next, studies concerning information extraction and information utilization of big consumer opinion data for product design are reviewed. Studies on information extraction of big consumer opinion data are explained from various perspectives, including data acquisition, opinion target recognition, feature identification and sentiment analysis, opinion summarization and sampling, etc. Reviews on information utilization of big consumer opinion data for product design are explored in terms of how to extract critical customer needs from big consumer opinion data, how to connect the voice of the customers with product design, how to make effective comparisons and reasonable ranking on similar products, how to identify ever-evolving customer concerns efficiently, and so on. Furthermore, significant and practical aspects of research trends are highlighted for future studies. This survey will facilitate researchers and practitioners to understand the latest development of relevant studies and applications centered on how big consumer opinion data can be processed, analyzed, and exploited in aiding product design
    corecore