15 research outputs found
Using customer lifetime value and neural networks to improve the prediction of bank deposit subscription in telemarketing campaigns
Customer lifetime value (LTV) enables using client characteristics, such as recency, frequency and monetary value, to describe the value of a client through time in terms of profitability. We present the concept of LTV applied to telemarketing for improving the return-on-investment, using a recent (from 2008 to 2013) and real case study of bank campaigns to sell long-term deposits. The goal was to benefit from past contacts history to extract additional knowledge. A total of twelve LTV input variables were tested, under a forward selection method and using a realistic rolling windows scheme, highlighting the validity of five new LTV features. The results achieved by our LTV data-driven approach using neural networks allowed an improvement up to 4 pp in the Lift cumulative curve for targeting the deposit subscribers when compared with a baseline model (with no history data). Explanatory knowledge was also extracted from the proposed model, revealing two highly relevant LTV features, the last result of the previous campaign to sell the same product and the frequency of past client successes. The obtained results are particularly valuable for contact center companies, which can improve predictive performance without even having to ask for more information to the companies they serve.info:eu-repo/semantics/acceptedVersio
Applying Convolutional-GRU for Term Deposit Likelihood Prediction
Banks are normally offered two kinds of deposit accounts. It consists of deposits like current/saving account and term deposits like fixed or recurring deposits.For enhancing the maximized profit from bank as well as customer perspective, term deposit can accelerate uplifting of finance fields. This paper focuses on likelihood of term deposit subscription taken by the customers. Bank campaign efforts and customer detail analysis caninfluence term deposit subscription chances. An automated system is approached in this paper that works towards prediction of term deposit investment possibilities in advance. This paper proposes deep learning based hybrid model that stacks Convolutional layers and Recurrent Neural Network (RNN) layers as predictive model. For RNN, Gated Recurrent Unit (GRU) is employed. The proposed predictive model is later compared with other benchmark classifiers such as k-Nearest Neighbor (k-NN), Decision tree classifier (DT), and Multi-layer perceptron classifier (MLP). Experimental study concludesthat proposed model attainsan accuracy of 89.59% and MSE of 0.1041 which outperform wellother baseline models
The impact of social media in brand building
Project / JEL classification system: M31 Marketing;
M37 AdvertisingThe impact of publications on social networks directly affects brand building through
customers’ perceptions on the brand. This research presents a data mining approach for
predicting the impact of posts published on a Facebook page. Twelve posts’ performance
metrics extracted from a cosmetic company’s page including 791 publications were modeled,
with the two best results achieving a mean absolute percentage error of around 27%. One of
them, the “Lifetime Post Consumers” model, was assessed using sensitivity analysis to
understand how each of the seven input features influenced it. The type of content was
considered the most relevant feature for the model, with a relevance of 36%. A status post
captures around twice the attention of the remaining three types. Also, seasonality was
observed regarding the month of the publication. Such knowledge is valuable for content
managers’ to make informed decisions on whether to publish or not a post.As publicações nas redes sociais influenciam de forma direta a formação das marcas na
medida em que afectam a percepção que os consumidores têm da marca. Este estudo
apresenta uma abordagem alavancada através de data mining que prevê o impacto de
publicações numa página de Facebook de uma marca. Foram modeladas 791 publicações
através de 12 métricas de performance, sendo que os dois melhores resultados atingiram um
erro médio de cerca de 27%. Uma dessas variáveis, “Lifetime Post Consumers”, foi analisada
através de uma análise de sensibilidade para perceber de que forma é que cada uma das sete
variáveis de input a influenciam. O tipo de conteúdo foi considerado a mais relevante com
uma relevância de 36%. Uma publicação do tipo “Status” capta o dobro da atenção dos
consumidores, quando comparado com os outros dois tipos de publicação. Foi também
verificada elevada sazonalidade, de acordo com o mês da publicação.
Este tipo de conclusões são interessantes para ajudar os gestores nas suas decisões sobre fazer
ou não uma publicação e em que moldes
A divide-and-conquer strategy using feature relevance and expert knowledge for enhancing a data mining approach to bank telemarketing
The discovery of knowledge through data mining provides a valuable asset for addressing decision making problems. Although a list of features may characterize a problem, it is often the case that a subset of those features may influence more a certain group of events constituting a sub-problem within the original problem. We propose a divide-and-conquer strategy for data mining using both the data-based sensitivity analysis for extracting feature relevance and expert evaluation for splitting the problem of characterizing telemarketing contacts to sell bank deposits. As a result, the call direction (inbound/outbound) was considered the most suitable candidate feature. The inbound telemarketing sub-problem re-evaluation led to a large increase in targeting performance, confirming the benefits of such approach and considering the importance of telemarketing for business, in particular in bank marketing
Anticipating the duration of public administration employees' future absences
Absenteeism aff ects state-owned companies who are obliged to undertake strategies
to prevent it, be efficient and conduct eff ective human resource (HR) management. This
paper aims to understand the reasons for Public Administration Employees’ (PAE) absenteeism and predict future employee absences. Data from 17,600 PAE from seven public databases regarding their 2016 absences was collected, and the Recency, Frequency and Monetary (RFM) and Support Vector Machine (SVM) algorithm was used for modeling the absence duration, backed up with a 10-fold cross-validation scheme. Results revealed that the worker profi le is less relevant than the absence characteristics. The most concerning employee profi le was uncovered, and a set of scenarios is provided regarding the expected days of absence in the future for each scenario. The veracity of the absence motives could not be proven and thus are totally reliable. In addition, the number of records of one absence day was disproportionate to the other records. The findings are of value to the Human Capital Management department in order to support their decisions regarding the allocation of workers and productivity management and use these valuable insights in the recruitment process. Until now, little has been known concerning the characteristics that aff ect PAE absenteeism, therefore enriching the necessity for further understanding of this matter in this particular.info:eu-repo/semantics/publishedVersio
Factors influencing hotels’ online prices
Digital corporations are creating new paths of business driven by consumers empowered by social media. Understanding the role that each feature drawn from online platforms has on price fluctuation is vital for leveraging decision making.
In this study, 5603 simulations of online reservations from 23 Portuguese cities were gathered, including characterizing features from social media, web visibility and hotel amenities, from four renowned online sources: Booking.com, TripAdvisor, Google, and Facebook. After data preparation, including removal of irrelevant features in terms of modeling and outlier cleaning, a tuned dataset of 3137 simulations and 30 features (including the price charged per day) was used first for evaluating the modeling performance of an ensemble of multilayer perceptrons, and then for extracting valuable knowledge through the data-based sensitivity analysis.
Findings show that all features from the encompassed factors (social media, online reservation, hotel characteristics, web visibility and city) play a significant role in price.info:eu-repo/semantics/acceptedVersio
Mutual information and sensitivity analysis for feature selection in customer targeting: a comparative study
WOS:000454945400004Feature selection is a highly relevant task in any data-driven knowledge discovery project. The present research focuses on analysing the advantages and disadvantages of using mutual information (MI) and data-based sensitivity analysis (DSA) for feature selection in classification problems, by applying both to a bank telemarketing case. A logistic regression model is built on the tuned set of features identified by each of the two techniques as the most influencing set of features on the success of a telemarketing contact, in a total of 13 features for MI and 9 for DSA. The latter performs better for lower values of false positives while the former is slightly better for a higher false-positive ratio. Thus, MI becomes a better choice if the intention is reducing slightly the cost of contacts without risking losing a high number of successes. However, DSA achieved good prediction results with less features.info:eu-repo/semantics/acceptedVersio
Factors influencing charter flight departure delay
This study aims to identify the main factors leading to charter flight departure delay through data mining. The data sample analysed consists of 5,484 flights operated by a European airline between 2014 and 2017. The tuned dataset of 33 features was used for modelling departure delay (e.g., if the flight delayed more than 15 minutes). The results proved the value of the proposed approach by an area under the receiver operating characteristic curve of 0.831 and supported knowledge extraction through the data-based sensitivity analysis. The features related to previous flight delay information were considered as being the most influential toward current flight being delayed or not, which is consistent with the propagating effect of flight delays. However, it is not the reason for the previous delay nor the delay duration that accounted for the most relevance. Instead, a computed feature indicating if there were two or more registered reasons accounted for 33% of relevance. The contributions include also using a broader data mining approach supported by an extensive data understanding and preparation stage using both proprietary and open access data sources to build a comprehensive dataset.info:eu-repo/semantics/acceptedVersio
Feature selection strategies for improving data-driven decision support in bank telemarketing
The usage of data mining techniques to unveil previously undiscovered knowledge has
been applied in past years to a wide number of domains, including banking and marketing. Raw
data is the basic ingredient for successfully detecting interesting patterns. A key aspect of raw
data manipulation is feature engineering and it is related with the correct characterization or
selection of relevant features (or variables) that conceal relations with the target goal.
This study is particularly focused on feature engineering, aiming at the unfolding
features that best characterize the problem of selling long-term bank deposits through
telemarketing campaigns. For the experimental setup, a case-study from a Portuguese bank,
ranging the 2008-2013 year period and encompassing the recent global financial crisis, was
addressed. To assess the relevance of such problem, a novel literature analysis using text
mining and the latent Dirichlet allocation algorithm was conducted, confirming the existence of a
research gap for bank telemarketing.
Starting from a dataset containing typical telemarketing contacts and client information,
research followed three different and complementary strategies: first, by enriching the dataset
with social and economic context features; then, by including customer lifetime value related
features; finally, by applying a divide and conquer strategy for splitting the problem in smaller
fractions, leading to optimized sub-problems. Each of the three approaches improved previous
results in terms of model metrics related to prediction performance. The relevance of the
proposed features was evaluated, confirming the obtained models as credible and valuable for
telemarketing campaign managers.A utilização de técnicas de data mining para a descoberta de conhecimento tem sido
aplicada nos últimos anos a uma grande variedade de domínios, incluindo banca e marketing.
Os dados no seu estado primitivo constituem o ingrediente básico para a deteção de padrões
de informação. Um aspeto chave da manipulação de dados em bruto consiste na "engenharia
de atributos", que compreende uma correta definição e seleção de atributos relevantes (ou
variáveis) que se relacionem com o alvo da descoberta de conhecimento.
Este trabalho foca-se numa abordagem de "engenharia de atributos" para definir as
variáveis que melhor caraterizam o problema de vender depósitos bancários a prazo através de
campanhas de telemarketing. Sendo um estudo empírico, foi utilizado um caso de estudo de
um banco português, abrangendo o período 2008-2013, que inclui os efeitos da crise financeira
internacional. Para aferir da importância deste problema, foi realizada uma inovadora análise
da literatura recorrendo a text mining e ao algoritmo latent Dirichlet allocation, confirmando a
existência de uma lacuna nesta matéria.
Utilizando como base um conjunto de dados de contactos de telemarketing e
informação sobre os clientes, três estratégias diferentes e complementares foram propostas:
primeiro, os dados foram enriquecidos com atributos socioeconómicos; posteriormente, foram
adicionadas características associadas ao valor do cliente ao longo do seu tempo de vida;
finalmente, o problema foi dividido em problemas mais específicos, permitindo abordagens
otimizadas a cada subproblema. Cada abordagem melhorou as métricas associadas à
capacidade preditiva do modelo. Adicionalmente, a relevância dos atributos foi avaliada,
confirmando os modelos obtidos como credíveis e valiosos para gestores de campanhas de telemarketing
Unveiling the features of successful ebay sellers of smartphones: a data mining sales predictive model
JEL Classification guidelines (M310); (C380).EBay is one of the largest online retailing corporations worldwide, providing numerous
ways for customer feedback on registered sellers. In accordance, with the advent of Web
2.0 and online shopping, an immensity of data is collected from manifold devices. This
data is often unstructured, which inevitably asks for some form of further treatment that
allows classification, discovery of patterns and trends or prediction of outcomes. That
treatment implies the usage of increasingly complex and combined statistical tools as the
size of datasets builds up. Nowadays, datasets may extend to several exabytes, which can
be transformed into knowledge using adequate methods. The aim of the present study is
to evaluate and analyse which and in what way seller and product attributes such as
feedback ratings and price influence sales of smartphones on eBay using data mining
framework and techniques. The methods used include SVM algorithms for modelling the
sales of smartphones by eBay sellers combined with 10-fold cross-validation scheme
which ensured model robustness and employment of metrics MAE, RAE and NMAE for
the sake of gauging prediction accuracy followed by sensitivity analysis in order to assess
the influence of individual features on sales. The methods were considered effective for
both modelling evaluation and knowledge extraction reaching positive results although
with some discrepancies between different prediction accuracy metrics. Lastly, it was
discovered that the number of items in auction, average price and the variety of products
available from a given seller were the most significant attributes, i.e., the largest
contributors for sales.O EBay é uma das plataformas e retalho online de maior dimensão e abarca inúmeras
oportunidades de extração de dados de feedback dos consumidores sobre vários
vendedores. Em concordância, o advento da Web 2.0 e das compras online está
fortemente associado à geração de dados em abundância e à possibilidade da sua respetiva
recolha através de variados dispositivos e plataformas. Estes dados encontram-se,
frequentemente, desestruturados o que inevitavelmente revela a necessidade da sua
normalização e tratamento mais aprofundado de modo a possibilitar tarefas de
classificação, descoberta de padrões e tendências ou de previsão. A complexidade dos
métodos estatísticos aplicados para executar essas tarefas aumenta ao mesmo tempo que
a dimensão das bases de dados. Atualmente, existem bases de dados que atingem vários
exabytes e que se constituem como oportunidades para extração de conhecimento dado
que métodos apropriados e particularizados sejam utilizados. Pretende-se, então, com o
presente estudo quantificar e analisar quais e de que modo as características de
vendedores e produtos influenciam as vendas de smartphones no eBay, recorrendo ao
enquadramento conceptual e técnicas de mineração de dados. Os métodos utilizados
incluem máquinas de vetores de suporte (SVMs) visando a modelação das vendas de
smartphones por vendedores do eBay em combinação com validação cruzada 10-fold de
modo a assegurar a robustez do modelo e com recurso às métricas de avaliação de
desempenho erro absoluto médio (MAE), erro absoluto relativo (RAE) e erro absoluto
médio normalizado (NMAE) para garantir a precisão do modelo preditivo. Seguidamente,
é implementada a análise de sensibilidade para aferir a contribuição individual de cada
atributo para as vendas. Os métodos são considerados eficazes tanto na avaliação do
modelo como na extração de conhecimento visto que viabilizam resultados positivos
ainda que sejam verificadas discrepâncias entre as estimativas para diferentes métricas de
desempenho. Finalmente, foi possível descobrir que número de itens em leilão, o preço
médio e a variedade de produtos disponibilizada por cada vendedor foram os atributos
mais significantes, i.e., os que mais contribuíram para as vendas