215 research outputs found

    Predictive User Modeling with Actionable Attributes

    Get PDF
    Different machine learning techniques have been proposed and used for modeling individual and group user needs, interests and preferences. In the traditional predictive modeling instances are described by observable variables, called attributes. The goal is to learn a model for predicting the target variable for unseen instances. For example, for marketing purposes a company consider profiling a new user based on her observed web browsing behavior, referral keywords or other relevant information. In many real world applications the values of some attributes are not only observable, but can be actively decided by a decision maker. Furthermore, in some of such applications the decision maker is interested not only to generate accurate predictions, but to maximize the probability of the desired outcome. For example, a direct marketing manager can choose which type of a special offer to send to a client (actionable attribute), hoping that the right choice will result in a positive response with a higher probability. We study how to learn to choose the value of an actionable attribute in order to maximize the probability of a desired outcome in predictive modeling. We emphasize that not all instances are equally sensitive to changes in actions. Accurate choice of an action is critical for those instances, which are on the borderline (e.g. users who do not have a strong opinion one way or the other). We formulate three supervised learning approaches for learning to select the value of an actionable attribute at an instance level. We also introduce a focused training procedure which puts more emphasis on the situations where varying the action is the most likely to take the effect. The proof of concept experimental validation on two real-world case studies in web analytics and e-learning domains highlights the potential of the proposed approaches

    Leveraging Advanced Analytics for Backorder Prediction and Optimization of Business Operations in the Supply Chain

    Get PDF
    Businesses can unlock valuable insights by leveraging advanced analytics techniques to optimize supply chain processes and address backorders. Backorders occur when a customer order cannot be fulfilled immediately due to lack of available supply. Root causes of backorders can range from supply chain complications and manufacturing miscalculations to logistical challenges. While a surge in demand might initially seem beneficial, backorders come with inherent costs, leading to supply chain disruptions, dissatisfied customers, and lost sales. This research aimed to assess the efficacy of predictive analytics in detecting early backorder signs and to understand how parameter tuning influences the performance of these predictive models. The foundation of this study was laid through an exhaustive literature review. In-depth Exploratory Data Analytics/ EDA was utilized to investigate datasets, followed by rigorous preprocessing steps, including data cleaning, feature engineering, scaling, and resampling. Machine learning models were subsequently trained, tuned, and assessed using appropriate evaluation metrics. Findings from this research underscored the value of predictive analytics in early backorder identification. They also offered a comparative analysis of machine learning algorithms and highlighted the significance of parameter tuning. Additionally, they established the necessity of multi-metric evaluations for imbalanced datasets. Thus, the study has provided a fundamental framework that can serve as a basis for future research endeavors

    Predicting Account Receivables Outcomes with Machine-Learning

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceThe Account Receivables (AR) of a company are considered an important determinant of a company’s Cash Flow – the backbone of a company’s financial performance or health. It has been proved that by efficiently managing the money owed by customers for goods and services (AR), a company can avoid financial difficulties and even stabilize results in moments of extreme volatility. The aim of this project is to use machine-learning and data visualization techniques to predict invoice outcomes and provide useful information and a solution using analytics to the collection management team. Specifically, this project demonstrates how supervised learning models can classify with high accuracy whether a newly created invoice will be paid earlier, on-time or later than the contracted due date. It is also studied how to predict the magnitude of the delayed payments by classifying them into interesting, delayed categories for the business: up to 1 month late, from 1 to 3 months late and delayed for more than 3 months. The developed models use real-life data from a multinational company in the manufacturing and automation industries and can predict payments with higher accuracy than the baseline achieved by the business

    APPLYING MACHINE LEARNING MODELS TO DIAGNOSE FAILURES IN ELECTRICAL SUBMERSIBLE PUMPS

    Get PDF
    Electrical Submersible Pump (ESP) failures are unanticipated but common occurrences in oil and gas wells. It is necessary to detect the onset of failures early and prevent excessive downtime. This study proposes a novel approach utilizing multi-class classification machine learning models to predict various ESP specific failure modes (SFM’s). A comprehensive dataset and various machine learning algorithms are utilized. The prediction periods of 3 hours to 7 days before the failure are evaluated to minimize false alarms and predict the true events. The ML models are based on field data gathered from surface and downhole ESP monitoring equipment over five years of production of 10 wells. The dataset includes the failure cause, duration of downtime, the corresponding high-frequency pump data, and well production data. According to these data, most ESP operational failures are characterized as electrical failures. Four modeling designs are used to handle the data and transform them into actionable information to predict various ESP failure modes at different prediction periods. Several ML models are tested and evaluated using precision, recall, and F1-score performance measures. The K-Nearest Neighbor (KNN) model outperforms the other algorithms in forecasting ESP failures. Some other tested models are Random Forest (RF), Decision Tree (DT), Multilayer Perceptron (MLP) Neural Network, etc. The findings of these ML models reveal that as the prediction period extends beyond three days, it becomes more challenging to predict the true failures. Furthermore, all tested designs show similarly good performances in predicting ESP specific failures. The design that integrates the impacts of gas presence and pump efficiency while minimizing the number of input variables is suggested for general use. Based on the field data, a Weibull model is built to estimate the probability of failure. The mean time between failure (MTBF) values are utilized as inputs to the Weibull analysis. The Weibull shape and scale parameters are estimated using Median Rank Regression. Then the Weibull Probability plots are generated with high R2 values (86.5-99.4%) and a low p-value for all wells. The results show increases in pump unreliability with time for all the wells. By integrating the outcomes of the ESP Failure prediction ML model with the Weibull unreliability model, a powerful tool is provided. This tool allows the engineers to detect failures early, diagnose potential causes, and propose preventive actions. It is crucial in aiding the operators in transitioning from reactive to proactive and predictive maintenance of artificial lift operations

    Large Language Models for Forecasting and Anomaly Detection: A Systematic Literature Review

    Full text link
    This systematic literature review comprehensively examines the application of Large Language Models (LLMs) in forecasting and anomaly detection, highlighting the current state of research, inherent challenges, and prospective future directions. LLMs have demonstrated significant potential in parsing and analyzing extensive datasets to identify patterns, predict future events, and detect anomalous behavior across various domains. However, this review identifies several critical challenges that impede their broader adoption and effectiveness, including the reliance on vast historical datasets, issues with generalizability across different contexts, the phenomenon of model hallucinations, limitations within the models' knowledge boundaries, and the substantial computational resources required. Through detailed analysis, this review discusses potential solutions and strategies to overcome these obstacles, such as integrating multimodal data, advancements in learning methodologies, and emphasizing model explainability and computational efficiency. Moreover, this review outlines critical trends that are likely to shape the evolution of LLMs in these fields, including the push toward real-time processing, the importance of sustainable modeling practices, and the value of interdisciplinary collaboration. Conclusively, this review underscores the transformative impact LLMs could have on forecasting and anomaly detection while emphasizing the need for continuous innovation, ethical considerations, and practical solutions to realize their full potential

    Fairness-aware Machine Learning in Educational Data Mining

    Get PDF
    Fairness is an essential requirement of every educational system, which is reflected in a variety of educational activities. With the extensive use of Artificial Intelligence (AI) and Machine Learning (ML) techniques in education, researchers and educators can analyze educational (big) data and propose new (technical) methods in order to support teachers, students, or administrators of (online) learning systems in the organization of teaching and learning. Educational data mining (EDM) is the result of the application and development of data mining (DM), and ML techniques to deal with educational problems, such as student performance prediction and student grouping. However, ML-based decisions in education can be based on protected attributes, such as race or gender, leading to discrimination of individual students or subgroups of students. Therefore, ensuring fairness in ML models also contributes to equity in educational systems. On the other hand, bias can also appear in the data obtained from learning environments. Hence, bias-aware exploratory educational data analysis is important to support unbiased decision-making in EDM. In this thesis, we address the aforementioned issues and propose methods that mitigate discriminatory outcomes of ML algorithms in EDM tasks. Specifically, we make the following contributions: We perform bias-aware exploratory analysis of educational datasets using Bayesian networks to identify the relationships among attributes in order to understand bias in the datasets. We focus the exploratory data analysis on features having a direct or indirect relationship with the protected attributes w.r.t. prediction outcomes. We perform a comprehensive evaluation of the sufficiency of various group fairness measures in predictive models for student performance prediction problems. A variety of experiments on various educational datasets with different fairness measures are performed to provide users with a broad view of unfairness from diverse aspects. We deal with the student grouping problem in collaborative learning. We introduce the fair-capacitated clustering problem that takes into account cluster fairness and cluster cardinalities. We propose two approaches, namely hierarchical clustering and partitioning-based clustering, to obtain fair-capacitated clustering. We introduce the multi-fair capacitated (MFC) students-topics grouping problem that satisfies students' preferences while ensuring balanced group cardinalities and maximizing the diversity of members regarding the protected attribute. We propose three approaches: a greedy heuristic approach, a knapsack-based approach using vanilla maximal 0-1 knapsack formulation, and an MFC knapsack approach based on group fairness knapsack formulation. In short, the findings described in this thesis demonstrate the importance of fairness-aware ML in educational settings. We show that bias-aware data analysis, fairness measures, and fairness-aware ML models are essential aspects to ensure fairness in EDM and the educational environment.Ministry of Science and Culture of Lower Saxony/LernMINT/51410078/E

    Predicting Factors of Re-Hospitalization After Medically Managed Intensive Inpatient Services in Opioid Use Disorder

    Get PDF
    IntroductionOpioid use disorder has continued to rise in prevalence across the United States, with an estimated 2.5 million Americans ailing from the condition (NIDA, 2020). Medically managed detoxification incurs substantial costs and, when used independently, may not be effective in preventing relapse (Kosten & Baxter, 2019). While numerous studies have focused on predicting the factors of developing opioid use disorder, few have identified predictors of readmission to medically managed withdrawal at an inpatient level of care. Utilizing a high-fidelity dataset from a large multi-site behavioral health hospital, these predictors are explored. MethodsPatients diagnosed with Opioid Use Disorder and hospitalized in the inpatient level of care were analyzed to identify readmission predictors. Factors including patient demographics, patient-reported outcome measures, and post-discharge treatment interventions were included. Patients re-hospitalized to the inpatient level of care were binary labeled in the dataset, and various machine learning algorithms were tested, including machine learning techniques. Methods include random forest, gradient boosting, and deep learning techniques. Evaluation statistics include specificity, accuracy, precision, and Matthew\u27s Coefficient. ResultsOverall, there was a wide variation if correctly predicting the class of patients that would readmit to a medically managed level of inpatient detoxification. Out of the six models evaluated, three of the six did not converge, thus not producing a viable feature ranking. However, of the other three models that did converge, the deep learning model produced almost perfect classification, producing an accuracy of .98. AdaBoost and the logistic regression model produced an accuracy of .97 and .61, respectively. Each of these models produced a similar set of features that were important to predicting which patient profile would readmit to medically managed inpatient detoxification. ConclusionsThe results indicate that overall reduction in the Quick Inventory of Depressive Symptomology, discharge disposition, age, length of stay, and a patient\u27s total number of diagnoses were important features at predicting readmission. Additionally, deep learning algorithms vastly outperformed other machine learning algorithms

    Analyzing Granger causality in climate data with time series classification methods

    Get PDF
    Attribution studies in climate science aim for scientifically ascertaining the influence of climatic variations on natural or anthropogenic factors. Many of those studies adopt the concept of Granger causality to infer statistical cause-effect relationships, while utilizing traditional autoregressive models. In this article, we investigate the potential of state-of-the-art time series classification techniques to enhance causal inference in climate science. We conduct a comparative experimental study of different types of algorithms on a large test suite that comprises a unique collection of datasets from the area of climate-vegetation dynamics. The results indicate that specialized time series classification methods are able to improve existing inference procedures. Substantial differences are observed among the methods that were tested

    Next best action – a data-driven marketing approach

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsThe Next Best Action (NBA) is a framework that is built in order to assign to each client three (or more) actions that are considered to be the best actions to perform with the client. These actions can range from product offering to pro-active retention actions and upselling recommendations. It can be a useful tool to generate leads for ongoing campaigns but also an excellent tool for analysis and a driver for the creation of new campaigns, being a key element in Customer Relationship Management (CRM) as a Data-Driven Marketing approach. Initially planned as a joint collaboration between a Bank and an Insurance Company to improve the Bancassurance business model, three versions of the NBA were built with the first two being tested on a campaign setting showing promising results. The last version, NBA 3.0, later became a sole project of the Insurance Company due to GPDR compliance policies and due to time constraints could not be evaluated
    • …
    corecore