2,512 research outputs found

    Predictive Modelling of Retail Banking Transactions for Credit Scoring, Cross-Selling and Payment Pattern Discovery

    Get PDF
    Evaluating transactional payment behaviour offers a competitive advantage in the modern payment ecosystem, not only for confirming the presence of good credit applicants or unlocking the cross-selling potential between the respective product and service portfolios of financial institutions, but also to rule out bad credit applicants precisely in transactional payments streams. In a diagnostic test for analysing the payment behaviour, I have used a hybrid approach comprising a combination of supervised and unsupervised learning algorithms to discover behavioural patterns. Supervised learning algorithms can compute a range of credit scores and cross-sell candidates, although the applied methods only discover limited behavioural patterns across the payment streams. Moreover, the performance of the applied supervised learning algorithms varies across the different data models and their optimisation is inversely related to the pre-processed dataset. Subsequently, the research experiments conducted suggest that the Two-Class Decision Forest is an effective algorithm to determine both the cross-sell candidates and creditworthiness of their customers. In addition, a deep-learning model using neural network has been considered with a meaningful interpretation of future payment behaviour through categorised payment transactions, in particular by providing additional deep insights through graph-based visualisations. However, the research shows that unsupervised learning algorithms play a central role in evaluating the transactional payment behaviour of customers to discover associations using market basket analysis based on previous payment transactions, finding the frequent transactions categories, and developing interesting rules when each transaction category is performed on the same payment stream. Current research also reveals that the transactional payment behaviour analysis is multifaceted in the financial industry for assessing the diagnostic ability of promotion candidates and classifying bad credit applicants from among the entire customer base. The developed predictive models can also be commonly used to estimate the credit risk of any credit applicant based on his/her transactional payment behaviour profile, combined with deep insights from the categorised payment transactions analysis. The research study provides a full review of the performance characteristic results from different developed data models. Thus, the demonstrated data science approach is a possible proof of how machine learning models can be turned into cost-sensitive data models

    Machine Learning-Driven Decision Making based on Financial Time Series

    Get PDF
    L'abstract è presente nell'allegato / the abstract is in the attachmen

    Augmented Cross-Selling Through Explainable AI—A Case From Energy Retailing

    Get PDF
    The advance of Machine Learning (ML) has led to a strong interest in this technology to support decision making. While complex ML models provide predictions that are often more accurate than those of traditional tools, such models often hide the reasoning behind the prediction from their users, which can lead to lower adoption and lack of insight. Motivated by this tension, research has put forth Explainable Artificial Intelligence (XAI) techniques that uncover patterns discovered by ML. Despite the high hopes in both ML and XAI, there is little empirical evidence of the benefits to traditional businesses. To this end, we analyze data on 220,185 customers of an energy retailer, predict cross-purchases with up to 86% correctness (AUC), and show that the XAI method SHAP provides explanations that hold for actual buyers. We further outline implications for research in information systems, XAI, and relationship marketing

    Click-through rate prediction : a comparative study of ensemble techniques in real-time bidding

    Get PDF
    Dissertation presented as a partial requirement for obtaining the Master’s degree in Information Management, with a specialization in Business Intelligence and Knowledge ManagementReal-Time Bidding is an automated mechanism to buy and sell ads in real time that uses data collected from internet users, to accurately deliver the right audience to the best-matched advertisers. It goes beyond contextual advertising by motivating the bidding focused on user data and also, it is different from the sponsored search auction where the bid price is associated with keywords. There is extensive literature regarding the classification and prediction of performance metrics such as click-through-rate, impression rate and bidding price. However, there is limited research on the application of advanced machine learning techniques, such as ensemble methods, on predicting click-through rate of real-time bidding campaigns. This paper presents an in-depth analysis of predicting click-through rate in real-time bidding campaigns by comparing the classification results from six traditional classification models (Linear Discriminant Analysis, Logistic Regression, Regularised Regression, Decision trees, k-nearest neighbors and Support Vector Machines) with two popular ensemble learning techniques (Voting and BootStrap Aggregation). The goal of our research is to determine whether ensemble methods can accurately predict click-through rate and compared to standard classifiers. Results showed that ensemble techniques outperformed simple classifiers performance. Moreover, also, highlights the excellent performance of linear algorithms (Linear Discriminant Analysis and Regularized Regression)

    A survey of outlier detection methodologies

    Get PDF
    Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

    Machine Learning and Alternative Data Analytics for Fashion Finance

    Get PDF
    This dissertation investigates the application of Machine Learning, Natural Language Processing and computational finance to a novel area Fashion Finance. Specifically identifying investment opportunities within the Apparel industry using influential alternative data sources such as Instagram. Fashion investment is challenging due to the ephemeral nature of the industry and the difficulty for investors who lack an understanding of how to analyze trend-driven consumer brands. Unstructured online data (e-commerce stores, social media, online blogs, news, etc.), introduce new opportunities for investment signals extraction. We focus on how trading signals can be generated from the Instagram data and events reported in the news articles. Part of this research work was done in collaboration with Arabesque Asset Management. Farfetch, the online luxury retailer, and Living Bridge Private Equity provided industry advice. Research Datasets The datasets used for this research are collected from various sources and include the following types of data: - Financial data: daily stock prices of 50 U.S. and European Apparel and Footwear equities, daily U.S. Retail Trade and U.S. Consumer Non-Durables sectors indices, Form 10-K reports. - Instagram data: daily Instagram profile followers for 11 fashion companies. - News data: 0.5 mln news articles that mention selected 50 equities. Research Experiments The thesis consists of the below studies: 1. Relationship between Instagram Popularity and Stock Prices. This study investigates a link between the changes in a company's popularity (daily followers counts) on Instagram and its stock price, revenue movements. We use cross-correlation analysis to find whether the signals derived from the followers' data could help to infer a company's future financial performance. Two hypothetical trading strategies are designed to test if the changes in a company's Instagram popularity could improve the returns. To test the hypotheses, Wilcoxon signed-rank test is used. 2. Dynamic Density-based News Clustering. The aim of this study is twofold: 1) analyse the characteristics of relevant news event articles and how they differ from the noisy/irrelevant news; 2) using the insights, design an unsupervised framework that clusters news articles and identifies events clusters without predefined parameters or expert knowledge. The framework incorporates the density-based clustering algorithm DBSCAN where the clustering parameters are selected dynamically with Gaussian Mixture Model and by maximizing the inter-cluster Information Entropy. 3. ALGA: Automatic Logic Gate Annotator for Event Detection. We design a news classification model for detecting fashion events that are likely to impact a company's stock price. The articles are represented by the following text embeddings: TF-IDF, Doc2Vec and BERT (Transformer Neural Network). The study is comprised of two parts: 1) we design a domain-specific automatic news labelling framework ALGA. The framework incorporates topic extraction (Latent Dirichlet Allocation) and clustering (DBSCAN) algorithms in addition to other filters to annotate the dataset; 2) using the labelled dataset, we train Logistic Regression classifier for identifying financially relevant news. The model shows the state-of-the-art results in the domain-specific financial event detection problem. Contribution to Science This research work presents the following contributions to science: - Introducing original work in Machine Learning and Natural Language Processing application for analysing alternative data on ephemeral fashion assets. - Introducing the new metrics to measure and track a fashion brand's popularity for investment decision making. - Design of the dynamic news events clustering framework that finds events clusters of various sizes in the news articles without predefined parameters. - Present the original Automatic Logic Gate Annotator framework (ALGA) for automatic labelling of news articles for the financial event detection task. - Design of the Apparel and Footwear news events classifier using the datasets generated by the ALGA's framework and show the state-of-the-art performance in a domain-specific financial event detection task. - Build the \textit{Fashion Finance Dictionary} that contains 320 phrases related to various financially-relevant events in the Apparel and Footwear industry

    A novel approach for cross-selling insurance products using positive unlabelled learning

    Get PDF
    Successful cross-selling of products is a key goal of companies operating within the insurance industry. Choosing the right customer to approach for cross-purchase opportunities has a direct effect on both decreasing customer churn rate and increasing revenue. Unlike sales data of general products, insurance sales data typically contains only a few products (e.g., private medical insurance, life insurance, etc), it is highly imbalanced with a vast majority of customers with no cross-purchasing information, highly noisy due to varying purchase behaviour between different customers, and has no ground truth for knowing if the majority customers are truly non-cross-sell customers or they are missed opportunities. These data challenges render the building of machine learning models for accurately identifying potential cross-sell customers extremely difficult. This paper proposes a novel approach to solve this challenging problem of cross-sell customer identification using Positive Unlabelled (PU) learning in conjunction with advanced feature engineering on customer demographic data and unstructured customer question-response texts through topic modelling. We implement a bagging approach to iteratively learn the positive samples (the confirmed cross-sells) alongside random sub-samples of the unlabelled set. The proposed approach is extensively evaluated on real insurance data that has been newly collected from a leading insurance company for this study. Evaluation results demonstrate that our approach can successfully identify new potential opportunities for likely cross-sell customers
    • …
    corecore