    Using customer lifetime value and neural networks to improve the prediction of bank deposit subscription in telemarketing campaigns

    Customer lifetime value (LTV) enables using client characteristics, such as recency, frequency and monetary value, to describe the value of a client through time in terms of profitability. We present the concept of LTV applied to telemarketing for improving the return-on-investment, using a recent (from 2008 to 2013) and real case study of bank campaigns to sell long-term deposits. The goal was to benefit from past contacts history to extract additional knowledge. A total of twelve LTV input variables were tested, under a forward selection method and using a realistic rolling windows scheme, highlighting the validity of five new LTV features. The results achieved by our LTV data-driven approach using neural networks allowed an improvement up to 4 pp in the Lift cumulative curve for targeting the deposit subscribers when compared with a baseline model (with no history data). Explanatory knowledge was also extracted from the proposed model, revealing two highly relevant LTV features, the last result of the previous campaign to sell the same product and the frequency of past client successes. The obtained results are particularly valuable for contact center companies, which can improve predictive performance without even having to ask for more information to the companies they serve.info:eu-repo/semantics/acceptedVersio

    Applying Convolutional-GRU for Term Deposit Likelihood Prediction

    Banks are normally offered two kinds of deposit accounts. It consists of deposits like current/saving account and term deposits like fixed or recurring deposits.For enhancing the maximized profit from bank as well as customer perspective, term deposit can accelerate uplifting of finance fields. This paper focuses on likelihood of term deposit subscription taken by the customers. Bank campaign efforts and customer detail analysis caninfluence term deposit subscription chances. An automated system is approached in this paper that works towards prediction of term deposit investment possibilities in advance. This paper proposes deep learning based hybrid model that stacks Convolutional layers and Recurrent Neural Network (RNN) layers as predictive model. For RNN, Gated Recurrent Unit (GRU) is employed. The proposed predictive model is later compared with other benchmark classifiers such as k-Nearest Neighbor (k-NN), Decision tree classifier (DT), and Multi-layer perceptron classifier (MLP). Experimental study concludesthat proposed model attainsan accuracy of 89.59% and MSE of 0.1041 which outperform wellother baseline models

    The impact of social media in brand building

    Project / JEL classification system: M31 Marketing; M37 AdvertisingThe impact of publications on social networks directly affects brand building through customers’ perceptions on the brand. This research presents a data mining approach for predicting the impact of posts published on a Facebook page. Twelve posts’ performance metrics extracted from a cosmetic company’s page including 791 publications were modeled, with the two best results achieving a mean absolute percentage error of around 27%. One of them, the “Lifetime Post Consumers” model, was assessed using sensitivity analysis to understand how each of the seven input features influenced it. The type of content was considered the most relevant feature for the model, with a relevance of 36%. A status post captures around twice the attention of the remaining three types. Also, seasonality was observed regarding the month of the publication. Such knowledge is valuable for content managers’ to make informed decisions on whether to publish or not a post.As publicações nas redes sociais influenciam de forma direta a formação das marcas na medida em que afectam a percepção que os consumidores têm da marca. Este estudo apresenta uma abordagem alavancada através de data mining que prevê o impacto de publicações numa página de Facebook de uma marca. Foram modeladas 791 publicações através de 12 métricas de performance, sendo que os dois melhores resultados atingiram um erro médio de cerca de 27%. Uma dessas variáveis, “Lifetime Post Consumers”, foi analisada através de uma análise de sensibilidade para perceber de que forma é que cada uma das sete variáveis de input a influenciam. O tipo de conteúdo foi considerado a mais relevante com uma relevância de 36%. Uma publicação do tipo “Status” capta o dobro da atenção dos consumidores, quando comparado com os outros dois tipos de publicação. Foi também verificada elevada sazonalidade, de acordo com o mês da publicação. Este tipo de conclusões são interessantes para ajudar os gestores nas suas decisões sobre fazer ou não uma publicação e em que moldes

    A divide-and-conquer strategy using feature relevance and expert knowledge for enhancing a data mining approach to bank telemarketing

    The discovery of knowledge through data mining provides a valuable asset for addressing decision making problems. Although a list of features may characterize a problem, it is often the case that a subset of those features may influence more a certain group of events constituting a sub-problem within the original problem. We propose a divide-and-conquer strategy for data mining using both the data-based sensitivity analysis for extracting feature relevance and expert evaluation for splitting the problem of characterizing telemarketing contacts to sell bank deposits. As a result, the call direction (inbound/outbound) was considered the most suitable candidate feature. The inbound telemarketing sub-problem re-evaluation led to a large increase in targeting performance, confirming the benefits of such approach and considering the importance of telemarketing for business, in particular in bank marketing

    Anticipating the duration of public administration employees' future absences

    Absenteeism aff ects state-owned companies who are obliged to undertake strategies to prevent it, be efficient and conduct eff ective human resource (HR) management. This paper aims to understand the reasons for Public Administration Employees’ (PAE) absenteeism and predict future employee absences. Data from 17,600 PAE from seven public databases regarding their 2016 absences was collected, and the Recency, Frequency and Monetary (RFM) and Support Vector Machine (SVM) algorithm was used for modeling the absence duration, backed up with a 10-fold cross-validation scheme. Results revealed that the worker profi le is less relevant than the absence characteristics. The most concerning employee profi le was uncovered, and a set of scenarios is provided regarding the expected days of absence in the future for each scenario. The veracity of the absence motives could not be proven and thus are totally reliable. In addition, the number of records of one absence day was disproportionate to the other records. The findings are of value to the Human Capital Management department in order to support their decisions regarding the allocation of workers and productivity management and use these valuable insights in the recruitment process. Until now, little has been known concerning the characteristics that aff ect PAE absenteeism, therefore enriching the necessity for further understanding of this matter in this particular.info:eu-repo/semantics/publishedVersio

    Factors influencing hotels’ online prices

    Digital corporations are creating new paths of business driven by consumers empowered by social media. Understanding the role that each feature drawn from online platforms has on price fluctuation is vital for leveraging decision making. In this study, 5603 simulations of online reservations from 23 Portuguese cities were gathered, including characterizing features from social media, web visibility and hotel amenities, from four renowned online sources: Booking.com, TripAdvisor, Google, and Facebook. After data preparation, including removal of irrelevant features in terms of modeling and outlier cleaning, a tuned dataset of 3137 simulations and 30 features (including the price charged per day) was used first for evaluating the modeling performance of an ensemble of multilayer perceptrons, and then for extracting valuable knowledge through the data-based sensitivity analysis. Findings show that all features from the encompassed factors (social media, online reservation, hotel characteristics, web visibility and city) play a significant role in price.info:eu-repo/semantics/acceptedVersio

    Mutual information and sensitivity analysis for feature selection in customer targeting: a comparative study

    WOS:000454945400004Feature selection is a highly relevant task in any data-driven knowledge discovery project. The present research focuses on analysing the advantages and disadvantages of using mutual information (MI) and data-based sensitivity analysis (DSA) for feature selection in classification problems, by applying both to a bank telemarketing case. A logistic regression model is built on the tuned set of features identified by each of the two techniques as the most influencing set of features on the success of a telemarketing contact, in a total of 13 features for MI and 9 for DSA. The latter performs better for lower values of false positives while the former is slightly better for a higher false-positive ratio. Thus, MI becomes a better choice if the intention is reducing slightly the cost of contacts without risking losing a high number of successes. However, DSA achieved good prediction results with less features.info:eu-repo/semantics/acceptedVersio

    Factors influencing charter flight departure delay

    This study aims to identify the main factors leading to charter flight departure delay through data mining. The data sample analysed consists of 5,484 flights operated by a European airline between 2014 and 2017. The tuned dataset of 33 features was used for modelling departure delay (e.g., if the flight delayed more than 15 minutes). The results proved the value of the proposed approach by an area under the receiver operating characteristic curve of 0.831 and supported knowledge extraction through the data-based sensitivity analysis. The features related to previous flight delay information were considered as being the most influential toward current flight being delayed or not, which is consistent with the propagating effect of flight delays. However, it is not the reason for the previous delay nor the delay duration that accounted for the most relevance. Instead, a computed feature indicating if there were two or more registered reasons accounted for 33% of relevance. The contributions include also using a broader data mining approach supported by an extensive data understanding and preparation stage using both proprietary and open access data sources to build a comprehensive dataset.info:eu-repo/semantics/acceptedVersio

    Feature selection strategies for improving data-driven decision support in bank telemarketing

    The usage of data mining techniques to unveil previously undiscovered knowledge has been applied in past years to a wide number of domains, including banking and marketing. Raw data is the basic ingredient for successfully detecting interesting patterns. A key aspect of raw data manipulation is feature engineering and it is related with the correct characterization or selection of relevant features (or variables) that conceal relations with the target goal. This study is particularly focused on feature engineering, aiming at the unfolding features that best characterize the problem of selling long-term bank deposits through telemarketing campaigns. For the experimental setup, a case-study from a Portuguese bank, ranging the 2008-2013 year period and encompassing the recent global financial crisis, was addressed. To assess the relevance of such problem, a novel literature analysis using text mining and the latent Dirichlet allocation algorithm was conducted, confirming the existence of a research gap for bank telemarketing. Starting from a dataset containing typical telemarketing contacts and client information, research followed three different and complementary strategies: first, by enriching the dataset with social and economic context features; then, by including customer lifetime value related features; finally, by applying a divide and conquer strategy for splitting the problem in smaller fractions, leading to optimized sub-problems. Each of the three approaches improved previous results in terms of model metrics related to prediction performance. The relevance of the proposed features was evaluated, confirming the obtained models as credible and valuable for telemarketing campaign managers.A utilização de técnicas de data mining para a descoberta de conhecimento tem sido aplicada nos últimos anos a uma grande variedade de domínios, incluindo banca e marketing. Os dados no seu estado primitivo constituem o ingrediente básico para a deteção de padrões de informação. Um aspeto chave da manipulação de dados em bruto consiste na "engenharia de atributos", que compreende uma correta definição e seleção de atributos relevantes (ou variáveis) que se relacionem com o alvo da descoberta de conhecimento. Este trabalho foca-se numa abordagem de "engenharia de atributos" para definir as variáveis que melhor caraterizam o problema de vender depósitos bancários a prazo através de campanhas de telemarketing. Sendo um estudo empírico, foi utilizado um caso de estudo de um banco português, abrangendo o período 2008-2013, que inclui os efeitos da crise financeira internacional. Para aferir da importância deste problema, foi realizada uma inovadora análise da literatura recorrendo a text mining e ao algoritmo latent Dirichlet allocation, confirmando a existência de uma lacuna nesta matéria. Utilizando como base um conjunto de dados de contactos de telemarketing e informação sobre os clientes, três estratégias diferentes e complementares foram propostas: primeiro, os dados foram enriquecidos com atributos socioeconómicos; posteriormente, foram adicionadas características associadas ao valor do cliente ao longo do seu tempo de vida; finalmente, o problema foi dividido em problemas mais específicos, permitindo abordagens otimizadas a cada subproblema. Cada abordagem melhorou as métricas associadas à capacidade preditiva do modelo. Adicionalmente, a relevância dos atributos foi avaliada, confirmando os modelos obtidos como credíveis e valiosos para gestores de campanhas de telemarketing

    Unveiling the features of successful ebay sellers of smartphones: a data mining sales predictive model

    JEL Classification guidelines (M310); (C380).EBay is one of the largest online retailing corporations worldwide, providing numerous ways for customer feedback on registered sellers. In accordance, with the advent of Web 2.0 and online shopping, an immensity of data is collected from manifold devices. This data is often unstructured, which inevitably asks for some form of further treatment that allows classification, discovery of patterns and trends or prediction of outcomes. That treatment implies the usage of increasingly complex and combined statistical tools as the size of datasets builds up. Nowadays, datasets may extend to several exabytes, which can be transformed into knowledge using adequate methods. The aim of the present study is to evaluate and analyse which and in what way seller and product attributes such as feedback ratings and price influence sales of smartphones on eBay using data mining framework and techniques. The methods used include SVM algorithms for modelling the sales of smartphones by eBay sellers combined with 10-fold cross-validation scheme which ensured model robustness and employment of metrics MAE, RAE and NMAE for the sake of gauging prediction accuracy followed by sensitivity analysis in order to assess the influence of individual features on sales. The methods were considered effective for both modelling evaluation and knowledge extraction reaching positive results although with some discrepancies between different prediction accuracy metrics. Lastly, it was discovered that the number of items in auction, average price and the variety of products available from a given seller were the most significant attributes, i.e., the largest contributors for sales.O EBay é uma das plataformas e retalho online de maior dimensão e abarca inúmeras oportunidades de extração de dados de feedback dos consumidores sobre vários vendedores. Em concordância, o advento da Web 2.0 e das compras online está fortemente associado à geração de dados em abundância e à possibilidade da sua respetiva recolha através de variados dispositivos e plataformas. Estes dados encontram-se, frequentemente, desestruturados o que inevitavelmente revela a necessidade da sua normalização e tratamento mais aprofundado de modo a possibilitar tarefas de classificação, descoberta de padrões e tendências ou de previsão. A complexidade dos métodos estatísticos aplicados para executar essas tarefas aumenta ao mesmo tempo que a dimensão das bases de dados. Atualmente, existem bases de dados que atingem vários exabytes e que se constituem como oportunidades para extração de conhecimento dado que métodos apropriados e particularizados sejam utilizados. Pretende-se, então, com o presente estudo quantificar e analisar quais e de que modo as características de vendedores e produtos influenciam as vendas de smartphones no eBay, recorrendo ao enquadramento conceptual e técnicas de mineração de dados. Os métodos utilizados incluem máquinas de vetores de suporte (SVMs) visando a modelação das vendas de smartphones por vendedores do eBay em combinação com validação cruzada 10-fold de modo a assegurar a robustez do modelo e com recurso às métricas de avaliação de desempenho erro absoluto médio (MAE), erro absoluto relativo (RAE) e erro absoluto médio normalizado (NMAE) para garantir a precisão do modelo preditivo. Seguidamente, é implementada a análise de sensibilidade para aferir a contribuição individual de cada atributo para as vendas. Os métodos são considerados eficazes tanto na avaliação do modelo como na extração de conhecimento visto que viabilizam resultados positivos ainda que sejam verificadas discrepâncias entre as estimativas para diferentes métricas de desempenho. Finalmente, foi possível descobrir que número de itens em leilão, o preço médio e a variedade de produtos disponibilizada por cada vendedor foram os atributos mais significantes, i.e., os que mais contribuíram para as vendas