7,674 research outputs found

    A Comparative Study on Statistical and Machine Learning Forecasting Methods for an FMCG Company

    Get PDF
    Demand forecasting has been an area of study among scholars and businessmen ever since the start of the industrial revolution and has only gained focus in recent years with the advancements in AI. Accurate forecasts are no longer a luxury, but a necessity to have for effective decisions made in planning production and marketing. Many aspects of the business depend on demand, and this is particularly true for the Fast-Moving Consumer Goods industry where the high volume and demand volatility poses a challenge for planners to generate accurate forecasts as consumer demand complexity rises. Inaccurate demand forecasts lead to multiple issues such as high holding costs on excess inventory, shortages on certain SKUs in the market leading to sales loss and a significant impact on both top line and bottom line for the business. Researchers have attempted to look at the performance of statistical time series models in comparison to machine learning methods to evaluate their robustness, computational time and power. In this paper, a comparative study was conducted using statistical and machine learning techniques to generate an accurate forecast using shipment data of an FMCG company. Naïve method was used as a benchmark to evaluate performance of other forecasting techniques, and was compared to exponential smoothing, ARIMA, KNN, Facebook Prophet and LSTM using past 3 years shipments. Methodology followed was CRISP-DM from data exploration, pre-processing and transformation before applying different forecasting algorithms and evaluation. Moreover, secondary goals behind this paper include understanding associations between SKUs through market basket analysis, and clustering using KNN based on brand, customer, order quantity and value to propose a product segmentation strategy. The results of both clustering and forecasting models are then evaluated to choose the optimal forecasting technique, and a visual representation of the forecast and exploratory analysis conducted is displayed using R

    DIGITAL WINE: HOW PLATFORMS AND ALGORITHMS WILL RESHAPE THE WINE INDUSTRY

    Get PDF
    La tesi si propone di analizzare come la digitalizzazione e gli approcci basati sui dati, in particolare quelli che sfruttano l'intelligenza artificiale, stiano impattando il settore vitivinicolo e facendo emergere modelli nuovi di business. Quest'ultimo aspetto sarà approfondito tramite due casi studio di piattaforme digitali che, attraverso approcci diversi, stanno contribuendo a generare un ecosistema digitale virtuoso, con potenziali benefici per tutta la catena del valore a livello di settore.The thesis aims to analyze how digitalization and data-driven approaches, in particular those that leverage artificial intelligence, are impacting the wine industry and generating new business models. The latter aspect will be explored through two case studies of digital platforms which, through different approaches, are helping to generate a virtuous digital ecosystem, with potential benefits for the entire value chain at the industry level

    Psychographic And Behavioral Segmentation Of Food Delivery Application Customers To Increase Intention To Use

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThis study presents a framework for segmenting Food Delivery Application (FDA) customers based on psychographic and behavioral variables as an alternative to existing segmentation. Customer segments are proposed by applying clustering methods to primary data from an electronic survey. Psychographic and behavioral constructs are formulated as hypotheses based on existing literature, and then evaluated as segmentation variables regarding their discriminatory power for customer segmentation. Detected relevant variables are used in the application of clustering techniques to find adequate boundaries within customer groupings for segmentation purposes. Characterization of customer segments is performed and enriched with implications of findings in FDA marketing strategies. This paper contributes to theory by providing new findings on segmentation that are relevant for an online context. In addition, it contributes to practice by detailing implications of customer segments in an online sales strategy, allowing marketing managers and FDA businesses to capitalize knowledge in their conversion funnel designs

    Essays on Structural Econometric Modeling and Machine Learning

    Get PDF
    This dissertation is composed of three independent chapters relating the theory and empirical methodology in economics to machine learning and important topics in information age . The first chapter raises an important problem in structural estimation and provide a solution to it by incorporating a culture in machine learning. The second chapter investigates a problem of statistical discrimination in big data era. The third chapter studies the implication of information uncertainty in the security software market. Structural estimation is a widely used methodology in empirical economics, and a large class of structural econometric models are estimated through the generalized method of moments (GMM). Traditionally, a model to be estimated is chosen by researchers based on their intuition on the model, and the structural estimation itself does not directly test it from the data. In other words, not sufficient amount of attention is paid to devise a principled method to verify such an intuition. In the first chapter, we propose a model selection for GMM by using cross-validation, which is widely used in machine learning and statistics communities. We prove the consistency of the cross-validation. The empirical property of the proposed model selection is compared with existing model selection methods by Monte Carlo simulations of a linear instrumental variable regression and oligopoly pricing model. In addition, we propose the way to apply our method to Mathematical Programming of Equilibrium Constraint (MPEC) approach. Finally, we perform our method to online-retail sales data to compare dynamic model to static model. In the second chapter, we study a fair machine learning algorithm that avoids a statistical discrimination when making a decision. Algorithmic decision making process now affects many aspects of our lives. Standard tools for machine learning, such as classification and regression, are subject to the bias in data, and thus direct application of such off-the-shelf tools could lead to a specific group being statistically discriminated. Removing sensitive variables such as race or gender from data does not solve this problem because a disparate impact can arise when non-sensitive variables and sensitive variables are correlated. This problem arises severely nowadays as bigger data is utilized, it is of particular importance to invent an algorithmic solution. Inspired by the two-stage least squares method that is widely used in the field of economics, we propose a two-stage algorithm that removes bias in the training data. The proposed algorithm is conceptually simple. Unlike most of existing fair algorithms that are designed for classification tasks, the proposed method is able to (i) deal with regression tasks, (ii) combine explanatory variables to remove reverse discrimination, and (iii) deal with numerical sensitive variables. The performance and fairness of the proposed algorithm are evaluated in simulations with synthetic and real-world datasets. The third chapter examines the issue of information uncertainty in the context of information security. Many users lack the ability to correctly estimate the true quality of the security software they purchase, as evidenced by some anecdotes and even some academic research. Yet, most of the analytical research assumes otherwise. Hence, we were motivated to incorporate this “false sense of security” behavior into a game-theoretic model and study the implications on welfare parameters. Our model features two segments of consumers, well-and ill-informed, and the monopolistic software vendor. Well-informed consumers observe the true quality of the security software, while the ill-informed ones overestimate. While the proportion of both segments are known to the software vendor, consumers are uncertain about the segment they belong to. We find that, in fact, the level of the uncertainty is not necessarily harmful to society. Furthermore, there exist some extreme circumstances where society and consumers could be better off if the security software did not exist. Interestingly, we also find that the case where consumers know the information structure and weight their expectation accordingly does not always lead to optimal social welfare. These results contrast with the conventional wisdom and are crucially important in developing appropriate policies in this context

    Essays on the Influence of Review and Reviewer Attributes on Online Review Helpfulness: Attribution Theory Perspective

    Get PDF
    With the emergence of digital technology and the increasing availability of information on the internet, customers rely heavily on online reviews to inform their purchasing decisions. However, not all online reviews are helpful, and the factors that contribute to their helpfulness are complex and multifaceted. This dissertation addresses this gap in the literature by examining the antecedents that determine online review helpfulness using attribution theory. The dissertation consists of three essays. The first essay examines the impact of authenticity (review attribute) on review helpfulness, showing that the expressive authenticity of a review enhances its helpfulness. The second essay investigates the relationship between the reviewer attributes i.e., motivation, activity, and goals in online reviews. The study employs various machine learning techniques to investigate the influence of these factors on reviewers\u27 goal attainment. The third essay explores how the reviewer attributes are related to the helpfulness of online reviews. The dissertation offers significant theoretical and practical implications. Theoretically, the dissertation provides new insights into novel review and reviewer attributes. The study proposes a taxonomy of online reviews using means-ends fusion theory offering a framework for understanding the relationships between different components of online reviewer attributes and their contribution to the attainment of specific goals, such as emotional satisfaction. The study also highlights the importance of understanding the motivations and activities of online reviewers in predicting emotional satisfaction and the conditional effects of complaining behavior on emotional satisfaction. The findings inform review platform owners, business owners, reviewers, and prospective consumers in decision-making through helpful reviews. To review platform owners, the findings help segregate helpful reviews from the humongous number of reviews by determining the authenticity of the review. To business owners, the findings can help in understanding consumer behavior and taking necessary actions to provide better service to their customers. To reviewers, this dissertation can act as a guideline to write helpful reviews and to determine their helpfulness. Finally, to consumers or review readers, this dissertation provides an understanding of helpful reviews, thus allowing them to take product or service purchase decisions

    Streaming Infrastructure and Natural Language Modeling with Application to Streaming Big Data

    Get PDF
    Streaming data are produced in great velocity and diverse variety. The vision of this research is to build an end-to-end system that handles the collection, curation and analysis of streaming data. The streaming data used in this thesis contain both numeric type data and text type data. First, in the field of data collection, we design and evaluate a data delivery framework that handles the real-time nature of streaming data. In this component, we use streaming data in automotive domain since it is suitable for testing and evaluating our data delivery system. Secondly, in the field of data curation, we use a language model to analyze two online automotive forums as an example for streaming text data curation. Last but not least, we present our approach for automated query expansion on Twitter data as an example of streaming social media data analysis. This thesis provides a holistic view of the end-to-end system we have designed, built and analyzed. To study the streaming data in automotive domain, a complex and massive amount of data is being collected from on-board sensors of operational connected vehicles (CVs), infrastructure data sources such as roadway sensors and traffic signals, mobile data sources such as cell phones, social media sources such as Twitter, and news and weather data services. Unfortunately, these data create a bottleneck at data centers for processing and retrievals of collected data, and require the deployment of additional message transfer infrastructure between data producers and consumers to support diverse CV applications. The first part of this dissertation, we present a strategy for creating an efficient and low-latency distributed message delivery system for CV systems using a distributed message delivery platform. This strategy enables large-scale ingestion, curation, and transformation of unstructured data (roadway traffic-related and roadway non-traffic-related data) into labeled and customized topics for a large number of subscribers or consumers, such as CVs, mobile devices, and data centers. We evaluate the performance of this strategy by developing a prototype infrastructure using Apache Kafka, an open source message delivery system, and compared its performance with the latency requirements of CV applications. We present experimental results of the message delivery infrastructure on two different distributed computing testbeds at Clemson University. Experiments were performed to measure the latency of the message delivery system for a variety of testing scenarios. These experiments reveal that measured latencies are less than the U.S. Department of Transportation recommended latency requirements for CV applications, which provides evidence that the system is capable for managing CV related data distribution tasks. Human-generated streaming data are large in volume and noisy in content. Direct acquisition of the full scope of human-generated data is often ineffective. In our research, we try to find an alternative resource to study such data. Common Crawl is a massive multi-petabyte dataset hosted by Amazon. It contains archived HTML web page data from 2008 to date. Common Crawl has been widely used for text mining purposes. Using data extracted from Common Crawl has several advantages over a direct crawl of web data, among which is removing the likelihood of a user\u27s home IP address becoming blacklisted for accessing a given web site too frequently. However, Common Crawl is a data sample, and so questions arise about the quality of Common Crawl as a representative sample of the original data. We perform systematic tests on the similarity of topics estimated from Common Crawl compared to topics estimated from the full data of online forums. Our target is online discussions from a user forum for car enthusiasts, but our research strategy can be applied to other domains and samples to evaluate the representativeness of topic models. We show that topic proportions estimated from Common Crawl are not significantly different than those estimated on the full data. We also show that topics are similar in terms of their word compositions, and not worse than topic similarity estimated under true random sampling, which we simulate through a series of experiments. Our research will be of interest to analysts who wish to use Common Crawl to study topics of interest in user forum data, and analysts applying topic models to other data samples. Twitter data is another example of high-velocity streaming data. We use it as an example to study the query expansion application in streaming social media data analysis. Query expansion is a problem concerned with gathering more relevant documents from a given set that cover a certain topic. Here in this thesis we outline a number of tools for a query expansion system that will allow its user to gather more relevant documents (in this case, tweets from the Twitter social media system), while discriminating from irrelevant documents. These tools include a method for triggering a given query expansion using a Jaccard similarity threshold between keywords, and a query expansion method using archived news reports to create a vector space of novel keywords. As the nature of streaming data, Twitter stream contains emerging events that are constantly changing and therefore not predictable using static queries. Since keywords used in static query method often mismatch the words used in topics around emerging events. To solve this problem, our proposed approach of automated query expansion detects the emerging events in the first place. Then we combine both local analysis and global analysis methods to generate queries for capturing the emerging topics. Experiment results show that by combining the global analysis and local analysis method, our approach can capture the semantic information in the emerging events with high efficiency
    corecore