752 research outputs found

    Adaptive algorithms for real-world transactional data mining.

    Get PDF
    The accurate identiļ¬cation of the right customer to target with the right product at the right time, through the right channel, to satisfy the customerā€™s evolving needs, is a key performance driver and enhancer for businesses. Data mining is an analytic process designed to explore usually large amounts of data (typically business or market related) in search of consistent patterns and/or systematic relationships between variables for the purpose of generating explanatory/predictive data models from the detected patterns. It provides an effective and established mechanism for accurate identiļ¬cation and classiļ¬cation of customers. Data models derived from the data mining process can aid in effectively recognizing the status and preference of customers - individually and as a group. Such data models can be incorporated into the business market segmentation, customer targeting and channelling decisions with the goal of maximizing the total customer lifetime proļ¬t. However, due to costs, privacy and/or data protection reasons, the customer data available for data mining is often restricted to veriļ¬ed and validated data,(in most cases,only the business owned transactional data is available). Transactional data is a valuable resource for generating such data models. Transactional data can be electronically collected and readily made available for data mining in large quantity at minimum extra cost. Transactional data is however, inherently sparse and skewed. These inherent characteristics of transactional data give rise to the poor performance of data models built using customer data based on transactional data. Data models for identifying, describing, and classifying customers, constructed using evolving transactional data thus need to effectively handle the inherent sparseness and skewness of evolving transactional data in order to be efficient and accurate. Using real-world transactional data, this thesis presents the ļ¬ndings and results from the investigation of data mining algorithms for analysing, describing, identifying and classifying customers with evolving needs. In particular, methods for handling the issues of scalability, uncertainty and adaptation whilst mining evolving transactional data are analysed and presented. A novel application of a new framework for integrating transactional data binning and classiļ¬cation techniques is presented alongside an effective prototype selection algorithm for efficient transactional data model building. A new change mining architecture for monitoring, detecting and visualizing the change in customer behaviour using transactional data is proposed and discussed as an effective means for analysing and understanding the change in customer buying behaviour over time. Finally, the challenging problem of discerning between the change in the customer proļ¬le (which may necessitate the effective change of the customerā€™s label) and the change in performance of the model(s) (which may necessitate changing or adapting the model(s)) is introduced and discussed by way of a novel ļ¬‚exible and efficient architecture for classiļ¬er model adaptation and customer proļ¬les class relabeling

    Predictive Modelling of Retail Banking Transactions for Credit Scoring, Cross-Selling and Payment Pattern Discovery

    Get PDF
    Evaluating transactional payment behaviour offers a competitive advantage in the modern payment ecosystem, not only for confirming the presence of good credit applicants or unlocking the cross-selling potential between the respective product and service portfolios of financial institutions, but also to rule out bad credit applicants precisely in transactional payments streams. In a diagnostic test for analysing the payment behaviour, I have used a hybrid approach comprising a combination of supervised and unsupervised learning algorithms to discover behavioural patterns. Supervised learning algorithms can compute a range of credit scores and cross-sell candidates, although the applied methods only discover limited behavioural patterns across the payment streams. Moreover, the performance of the applied supervised learning algorithms varies across the different data models and their optimisation is inversely related to the pre-processed dataset. Subsequently, the research experiments conducted suggest that the Two-Class Decision Forest is an effective algorithm to determine both the cross-sell candidates and creditworthiness of their customers. In addition, a deep-learning model using neural network has been considered with a meaningful interpretation of future payment behaviour through categorised payment transactions, in particular by providing additional deep insights through graph-based visualisations. However, the research shows that unsupervised learning algorithms play a central role in evaluating the transactional payment behaviour of customers to discover associations using market basket analysis based on previous payment transactions, finding the frequent transactions categories, and developing interesting rules when each transaction category is performed on the same payment stream. Current research also reveals that the transactional payment behaviour analysis is multifaceted in the financial industry for assessing the diagnostic ability of promotion candidates and classifying bad credit applicants from among the entire customer base. The developed predictive models can also be commonly used to estimate the credit risk of any credit applicant based on his/her transactional payment behaviour profile, combined with deep insights from the categorised payment transactions analysis. The research study provides a full review of the performance characteristic results from different developed data models. Thus, the demonstrated data science approach is a possible proof of how machine learning models can be turned into cost-sensitive data models

    Cross channel fraud detection framework in financial services using recurrent neural networks

    Get PDF
    The reliability and performance of real time fraud detection techniques has been a major concern for the financial institutions as traditional fraud detection models couldnā€™t cope with the emerging new and innovative attacks that deceive banks. The problems are further exacerbated with evolving customer behaviour as existing fraud detection models unable to cope with class imbalance problem and longer feedback loop. This thesis looks at the holistic view of fraud detection and proposes a conceptual fraud detection framework that can detect anomalous transaction quickly and accurately, as well as dynamically evolve to maintain the efficiency with minimum input from subject matter expert. The framework is used to analyse Internet Banking (IB) transactions and contextual information to reduce the false positives and improve fraud detection rates. Based on the proposed framework, Long Short-Term Memory (LSTM) based Recurrent Neural Network model for detecting fraud in remote banking is implemented and performance is evaluated against Support Vector Machine (SVM) and Markov models. The main research element is to model events as state vectors so that sequence-based learning can be applied, followed by a weak classifier to deal with noise. Firstly, the study focuses on Feature Engineering where along raw attributes such as IP Address, Amount and other, two novel features for remote banking fraud are evaluated, i.e., the time spend on a page and the time between page transition. The second focus is on modelling which is performed on an anonymised real-life dataset, provided by a large financial institution in Europe. The results of the modelling demonstrate that given the labelled dataset all models can detect payment fraud with acceptable accuracy. Various tests proved that the LSTM model achieves a F1 score of 97.7% whereas the SVM and Markov model achieve 93.5% and 95.0% respectively. As the time elapsed, the LSTM model performance significantly improves as the sequence of events became larger. As the dataset increases that time it takes to train traditional models becomes a bottleneck. This proves the hypothesis that the events across banking channels can be modelled as time series data and then sequence-based learners such as Recurrent Neural Network (RNN) can be applied to improve or reduce the False Positive Rate (FPR) and False Negative Rate (FNR)

    Automation of Smart Grid operations through spatio-temporal data-driven systems

    Get PDF

    An association rule dynamics and classification approach to event detection and tracking in Twitter.

    Get PDF
    Twitter is a microblogging application used for sending and retrieving instant on-line messages of not more than 140 characters. There has been a surge in Twitter activities since its launch in 2006 as well as steady increase in event detection research on Twitter data (tweets) in recent years. With 284 million monthly active users Twitter has continued to grow both in size and activity. The network is rapidly changing the way global audience source for information and influence the process of journalism [Newman, 2009]. Twitter is now perceived as an information network in addition to being a social network. This explains why traditional news media follow activities on Twitter to enhance their news reports and news updates. Knowing the significance of the network as an information dissemination platform, news media subscribe to Twitter accounts where they post their news headlines and include the link to their on-line news where the full story may be found. Twitter users in some cases, post breaking news on the network before such news are published by traditional news media. This can be ascribed to Twitter subscribers' nearness to location of events. The use of Twitter as a network for information dissemination as well as for opinion expression by different entities is now common. This has also brought with it the issue of computational challenges of extracting newsworthy contents from Twitter noisy data. Considering the enormous volume of data Twitter generates, users append the hashtag (#) symbol as prefix to keywords in tweets. Hashtag labels describe the content of tweets. The use of hashtags also makes it easy to search for and read tweets of interest. The volume of Twitter streaming data makes it imperative to derive Topic Detection and Tracking methods to extract newsworthy topics from tweets. Since hashtags describe and enhance the readability of tweets, this research is developed to show how the appropriate use of hashtags keywords in tweets can demonstrate temporal evolvements of related topic in real-life and consequently enhance Topic Detection and Tracking on Twitter network. We chose to apply our method on Twitter network because of the restricted number of characters per message and for being a network that allows sharing data publicly. More importantly, our choice was based on the fact that hashtags are an inherent component of Twitter. To this end, the aim of this research is to develop, implement and validate a new approach that extracts newsworthy topics from tweets' hashtags of real-life topics over a specified period using Association Rule Mining. We termed our novel methodology Transaction-based Rule Change Mining (TRCM). TRCM is a system built on top of the Apriori method of Association Rule Mining to extract patterns of Association Rules changes in tweets hashtag keywords at different periods of time and to map the extracted keywords to related real-life topic or scenario. To the best of our knowledge, the adoption of dynamics of Association Rules of hashtag co-occurrences has not been explored as a Topic Detection and Tracking method on Twitter. The application of Apriori to hashtags present in tweets at two consecutive period t and t + 1 produces two association rulesets, which represents rules evolvement in the context of this research. A change in rules is discovered by matching every rule in ruleset at time t with those in ruleset at time t + 1. The changes are grouped under four identified rules namely 'New' rules, 'Unexpected Consequent' and 'Unexpected Conditional' rules, 'Emerging' rules and 'Dead' rules. The four rules represent different levels of topic real-life evolvements. For example, the emerging rule represents very important occurrence such as breaking news, while unexpected rules represents unexpected twist of event in an on-going topic. The new rule represents dissimilarity in rules in rulesets at time t and t+1. Finally, the dead rule represents topic that is no longer present on the Twitter network. TRCM revealed the dynamics of Association Rules present in tweets and demonstrates the linkage between the different types of rule dynamics to targeted real-life topics/events. In this research, we conducted experimental studies on tweets from different domains such as sports and politics to test the performance effectiveness of our method. We validated our method, TRCM with carefully chosen ground truth. The outcome of our research experiments include: Identification of 4 rule dynamics in tweets' hashtags namely: New rules, Emerging rules, Unexpected rules and 'Dead' rules using Association Rule Mining. These rules signify how news and events evolved in real-life scenario. Identification of rule evolvements on Twitter network using Rule Trend Analysis and Rule Trace. Detection and tracking of topic evolvements on Twitter using Transaction-based Rule Change Mining TRCM. Identification of how the peculiar features of each TRCM rules affect their performance effectiveness on real datasets

    Low-Coders, No-Coders, and Citizen Developers in Demand: Examining Knowledge, Skills, and Abilities Through a Job Market Analysis

    Get PDF
    The emergence of low-code/no-code (LCNC) platform technologies and the resulting increase in citizen development programs are facilitating the democratization of the design, development, and deployment of digital solutions. Citizen developers, non-technical employees who leverage LCNC platforms, are at the heart of this trend. While many firms perceive LCNC and citizen development as a crucial component of their digital transformation strategy, little is known about the evolving roles in this field or the necessary knowledge, skills, and abilities (KSA). To address this knowledge gap, we processed 113,106 job postings published on Indeed.com. Our topic modeling methodology identified 34 KSA topics and classified them into the three domains platform, business, and technology. We contribute to research by empirically demonstrating which competencies are required to successfully work in the LCNC field. Our findings can guide individual professionals and organizations alike

    Information visualisation and data analysis using web mash-up systems

    Get PDF
    A thesis submitted in partial fulfilment for the degree of Doctor of PhilosophyThe arrival of E-commerce systems have contributed greatly to the economy and have played a vital role in collecting a huge amount of transactional data. It is becoming difficult day by day to analyse business and consumer behaviour with the production of such a colossal volume of data. Enterprise 2.0 has the ability to store and create an enormous amount of transactional data; the purpose for which data was collected could quite easily be disassociated as the essential information goes unnoticed in large and complex data sets. The information overflow is a major contributor to the dilemma. In the current environment, where hardware systems have the ability to store such large volumes of data and the software systems have the capability of substantial data production, data exploration problems are on the rise. The problem is not with the production or storage of data but with the effectiveness of the systems and techniques where essential information could be retrieved from complex data sets in a comprehensive and logical approach as the data questions are asked. Using the existing information retrieval systems and visualisation tools, the more specific questions are asked, the more definitive and unambiguous are the visualised results that could be attained, but when it comes to complex and large data sets there are no elementary or simple questions. Therefore a profound information visualisation model and system is required to analyse complex data sets through data analysis and information visualisation, to make it possible for the decision makers to identify the expected and discover the unexpected. In order to address complex data problems, a comprehensive and robust visualisation model and system is introduced. The visualisation model consists of four major layers, (i) acquisition and data analysis, (ii) data representation, (iii) user and computer interaction and (iv) results repositories. There are major contributions in all four layers but particularly in data acquisition and data representation. Multiple attribute and dimensional data visualisation techniques are identified in Enterprise 2.0 and Web 2.0 environment. Transactional tagging and linked data are unearthed which is a novel contribution in information visualisation. The visualisation model and system is first realised as a tangible software system, which is then validated through different and large types of data sets in three experiments. The first experiment is based on the large Royal Mail postcode data set. The second experiment is based on a large transactional data set in an enterprise environment while the same data set is processed in a non-enterprise environment. The system interaction facilitated through new mashup techniques enables users to interact more fluently with data and the representation layer. The results are exported into various reusable formats and retrieved for further comparison and analysis purposes. The information visualisation model introduced in this research is a compact process for any size and type of data set which is a major contribution in information visualisation and data analysis. Advanced data representation techniques are employed using various web mashup technologies. New visualisation techniques have emerged from the research such as transactional tagging visualisation and linked data visualisation. The information visualisation model and system is extremely useful in addressing complex data problems with strategies that are easy to interact with and integrate
    • ā€¦
    corecore