    A divide-and-conquer strategy using feature relevance and expert knowledge for enhancing a data mining approach to bank telemarketing

    The discovery of knowledge through data mining provides a valuable asset for addressing decision making problems. Although a list of features may characterize a problem, it is often the case that a subset of those features may influence more a certain group of events constituting a sub-problem within the original problem. We propose a divide-and-conquer strategy for data mining using both the data-based sensitivity analysis for extracting feature relevance and expert evaluation for splitting the problem of characterizing telemarketing contacts to sell bank deposits. As a result, the call direction (inbound/outbound) was considered the most suitable candidate feature. The inbound telemarketing sub-problem re-evaluation led to a large increase in targeting performance, confirming the benefits of such approach and considering the importance of telemarketing for business, in particular in bank marketing

    Identification of common city characteristics influencing room occupancy

    Purpose National tourism offices worldwide implement marketing strategies to influence tourists’ choices. However, there is more than meets the eye when it comes to choosing a city as a tourism destination. The purpose of this paper is to answer which are the characteristics that play a key role in room occupancy. Design/methodology/approach Diverse characteristics such as the city offer, demographics, natural amenities (e.g. number of beaches) and also politics (e.g. type of government) are combined into a decision tree model to unveil the relevance of each in determining room occupancy. The empirical experiments used data known in 2015 from 43 cities from Europe and the rest of the World to model room occupancy rate in 2016. Findings While the seasonality effect plays the most significant role, other less studied features such as the type of political party prior to current government were found to have an impact in room occupancy. Originality/value This study unveiled that center–right and right governments are generally more sensitive to promote its city as a tourism destination.info:eu-repo/semantics/acceptedVersio

    On the statistical comparison of feature selection methods and the role of experts: the case of Las Vegas strip

    A statistical comparison of feature selection methods is performed. Feature selection is an important issue in Data Mining and Data Science, and a comparison of the results obtained from different methods is hard to be performed. Then, the evaluation of metrics and ways of comparisons is an important matter of study. Our study is performed on a real dataset previously analyzed in the literature containing a small number of records, drawing the attention on the conclusions to be applied where poor statistical confidence levels of significance can be obtained because of a relative low number of samples are present. The use of inter rater agreement coefficients is introduced as a novel approach extending a previous study. Boruta and tree-based methodologies perform rather well even in small data as it is shown. Our metrics can be used to guide the expert opinion in order to take the final decision. This work extends the results obtained in a previous analysis performed on the mentioned dataset.Sociedad Argentina de Informátic

    Predicción del éxito del telemarketing bancario mediante el uso de árboles de decisión

    Telemarketing is an interactive direct marketing technique in which a telemarketing agent solicits potential customers over the phone to make a sale of merchandise or a service. One of the great problems of telemarketing is to specify the list of clients that presents a greater probability of buying the product that is offered. In this article, we propose a personalized decision support system that can automatically predict the decision of the target audience after making a telemarketing call, in order to increase the effectiveness of direct advertising campaigns and consequently reduce the cost and cost. campaign time. The artificial intelligence method used in this work is the decision tree evaluated with the metrics of precision, accuracy and completeness. After applying the artificial intelligence method we obtain an accuracy, precision and completeness greater than 80%. The conclusions reached by the team are that in order to improve the decision tree model it is important to carry out a prior analysis of the data using statistical techniques or diagrams, to obtain a reference to the data and apply balancing techniques to obtain the best possible model.El telemercadeo es una técnica interactiva de mercadeo directo en la que un agente de telemercadeo solicita clientes potenciales a través del teléfono para realizar una venta de mercadería o servicio. Uno de los grandes problemas del telemarketing es especificar la lista de clientes que presentan una mayor probabilidad de comprar el producto que se ofrece. En este artículo proponemos un sistema de apoyo en la toma de decisiones personalizado que puede predecir automáticamente la decisión del público objetivo luego de realizar una llamada de telemarketing, con el fin de aumentar la efectividad de las campañas publicitarias directas y en consecuencia reducir el costo y tiempo de la campaña. El método de inteligencia artificial utilizado en este trabajo es el árbol de decisión evaluado con las métricas de precisión, exactitud y exhaustividad. Luego de aplicar el método de inteligencia artificial obtenemos una exactitud, precisión y exhaustividad mayor al 80%. Las conclusiones a los que el equipo llegó son que para mejorar el modelo de árbol de decisión es importante realizar un análisis previo de los datos mediante técnicas estadísticas o diagramas, para obtener referencia de los datos y aplicar técnicas de balanceo para obtener el mejor modelo posible

    Application of Artificial Neural Networks for Classification of Drilling Operations: The deepwater wells case of exploration and production

    A aplicação de métodos automáticos para classificação de texto não estruturadas são extremamente valiosas para a indústria de Oil&Gas. A perfuração é uma operação que acarreta custos elevados que são proporcionais à duração das atividades. A classificação das diversas operações durante a perfuração é muito importante para gerar premissas de duração para o projeto de novos poços. Para este artigo, dois procedimentos independentes foram realizados para identificar o melhor modelo de NPT (Non-Productive Time) e PT (Productive Time ). As conclusões apontam o modelo Multi-layer Perceptron (MLP) como o melhor modelo. O sistema de classificação pode ser utilizado para produzir um relatório preciso e detalhado sobre as atividades realizadas durante a perfuração de um poço. Através desse trabalho é possível concluir que os relatórios diários de perfuração atualmente disponíveis representam uma fonte rica de informação e podem ser utilizados para melhorar o processo de construção de poços de petróleo.info:eu-repo/semantics/publishedVersio

    Unfolding the characteristics of incentivized online reviews

    The rapid growth of social media in the last decades led e-commerce into a new era of value co-creation between the seller and the consumer. Since there is no contact with the product, people have to rely on the description of the seller, knowing that sometimes it may be biased and not entirely true. Therefore, review systems emerged to provide more trustworthy sources of information, since customer opinions may be less biased. However, the need to control the consumers’ opinion increased once sellers realized the importance of reviews and their direct impact on sales. One of the methods often used was to offer customers a specific product in exchange for an honest review. Yet, these incentivized reviews bias results and skew the overall rating of the products. The current study uses a data mining approach to predict whether or not a new review published was incentivized based on several review features such as the overall rating, the helpfulness rate, and the review length, among others. Additionally, the model was enriched with sentiment score features of the reviews computed through the VADER algorithm. The results provide an in-depth understanding of the phenomenon by identifying the most relevant features which enable to differentiate an incentivized from a non-incentivized review, thus providing users and companies with a simple set of rules to identify reviews that are biased without any disclaimer. Such rules include the length of a review, its helpfulness rate, and the overall sentiment polarity score.info:eu-repo/semantics/acceptedVersio

    The impact of in-game advertising on brand recall and recognition within non-linear video games

    Video games have changed throughout the years and new game releases have shown a shift to more non-linear video games, where players are free to choose what to do without the game forcing them to make a specific choice. This can change the effectiveness of advertisements. Recall and recognition are two variables that have been studied over the past years and are crucial to measure the successfulness in an advertisement, including advertisements integrated in a game. In-game advertising has been studied recently by researchers, with most analyzing the factors that impact recall and recognition levels. However, most studies tend to rely on extremely controlled scenarios where player action and freedom are not allowed. The purpose of this study is to test recall and recognition levels in a non-linear multiplayer video game where players can freely roam the map, almost depicting a real-life scenario. Results suggested that area population and consumer brand involvement are significant as predictors of brand recall and recognition, but advertisement size seemed insignificant. Using decision trees, individual player factors proved to have the same importance (sometimes more) as area population and involvement for predicting recall and recognition. Repetition was the most important predictor, which was measured by the number of times a player saw the advertisement fully on screen. The results are in line with previous research, but in a non-linear video game context. Businesses should be take area population into consideration when placing advertisements in games but should also think about player characteristics.Os videojogos têm sofrido alterações com o tempo e os novos lançamentos mostram uma mudança para videojogos não-lineares, ondes os jogadores têm mais liberdade de escolha no jogo. Isto pode mudar a eficácia dos anúncios no jogo. As variáveis recordação e reconhecimento têm sido estudadas durante anos e são cruciais para medir o sucesso de um anúncio, incluindo anúncios em jogos. Publicidade in-game é um tópico recentemente relevante para investigadores e muitos estudam o que impacta os níveis de recordação e reconhecimento. Contudo, muitos dos estudos tendem a contar com senários extremamente controlados em que a ação e liberdade do jogador são desautorizadas. O objetivo deste estudo é testar os níveis de recordação e reconhecimento num videojogo multiplayer não-linear onde os jogadores são livres de caminhar pelo mapa. Os nossos resultados sugerem que a população da área e o envolvimento do consumidor na marca são variáveis significantes na previsão dos níveis de recordação e reconhecimento, mas o tamanho do anúncio mostrou-se insignificante. Com árvores de decisão, os fatores individuais do jogador pareceram ser preditores tão ou mais importantes que a população da área e o envolvimento do consumidor na marca. Repetição foi o preditor mais importante, tendo sido medida pelo número de vezes um anúncio apareceu completamente no ecrã do jogador. Os nossos resultados estão de acordo com outros estudos, mas num contexto de jogo não-linear. A população da área deve ser considerada por empresas se usarem anúncios em jogos, tendo também em consideração as características dos jogadores

    Feature selection strategies for improving data-driven decision support in bank telemarketing

    The usage of data mining techniques to unveil previously undiscovered knowledge has been applied in past years to a wide number of domains, including banking and marketing. Raw data is the basic ingredient for successfully detecting interesting patterns. A key aspect of raw data manipulation is feature engineering and it is related with the correct characterization or selection of relevant features (or variables) that conceal relations with the target goal. This study is particularly focused on feature engineering, aiming at the unfolding features that best characterize the problem of selling long-term bank deposits through telemarketing campaigns. For the experimental setup, a case-study from a Portuguese bank, ranging the 2008-2013 year period and encompassing the recent global financial crisis, was addressed. To assess the relevance of such problem, a novel literature analysis using text mining and the latent Dirichlet allocation algorithm was conducted, confirming the existence of a research gap for bank telemarketing. Starting from a dataset containing typical telemarketing contacts and client information, research followed three different and complementary strategies: first, by enriching the dataset with social and economic context features; then, by including customer lifetime value related features; finally, by applying a divide and conquer strategy for splitting the problem in smaller fractions, leading to optimized sub-problems. Each of the three approaches improved previous results in terms of model metrics related to prediction performance. The relevance of the proposed features was evaluated, confirming the obtained models as credible and valuable for telemarketing campaign managers.A utilização de técnicas de data mining para a descoberta de conhecimento tem sido aplicada nos últimos anos a uma grande variedade de domínios, incluindo banca e marketing. Os dados no seu estado primitivo constituem o ingrediente básico para a deteção de padrões de informação. Um aspeto chave da manipulação de dados em bruto consiste na "engenharia de atributos", que compreende uma correta definição e seleção de atributos relevantes (ou variáveis) que se relacionem com o alvo da descoberta de conhecimento. Este trabalho foca-se numa abordagem de "engenharia de atributos" para definir as variáveis que melhor caraterizam o problema de vender depósitos bancários a prazo através de campanhas de telemarketing. Sendo um estudo empírico, foi utilizado um caso de estudo de um banco português, abrangendo o período 2008-2013, que inclui os efeitos da crise financeira internacional. Para aferir da importância deste problema, foi realizada uma inovadora análise da literatura recorrendo a text mining e ao algoritmo latent Dirichlet allocation, confirmando a existência de uma lacuna nesta matéria. Utilizando como base um conjunto de dados de contactos de telemarketing e informação sobre os clientes, três estratégias diferentes e complementares foram propostas: primeiro, os dados foram enriquecidos com atributos socioeconómicos; posteriormente, foram adicionadas características associadas ao valor do cliente ao longo do seu tempo de vida; finalmente, o problema foi dividido em problemas mais específicos, permitindo abordagens otimizadas a cada subproblema. Cada abordagem melhorou as métricas associadas à capacidade preditiva do modelo. Adicionalmente, a relevância dos atributos foi avaliada, confirmando os modelos obtidos como credíveis e valiosos para gestores de campanhas de telemarketing

    Unfolding the drivers for academic success: The case of ISCTE-IUL

    Predicting the success of academic students is a major topic in the higher education research community. This study presents a data mining approach to predict academic success in a Portuguese University called ISCTE-IUL, unveiling the features that better explain failures. A dataset of 10 curricular years for bachelor’s degrees has been analysed. Features’ selection resulted in a characterising set of 68 features, encompassing socio-demographic, social origin, previous education, special statutes and educational path information. Understanding features’ collection timings, distinct predicting was conducted. Based on entrance date, end of the first and the second curricular semesters, three distinct data models were proposed and tested. An additional model was designed for outlier degrees (i.e., a 4-year Bachelor). Six algorithms were tested for modelling. A support vector machines (SVM) model achieved the best overall performance and was selected to conduct a data-based sensitivity analysis. Relevance and impact review allowed extracting meaningful knowledge. This approach unfolded that previous evaluation performance, study gaps and age-related features play a major role in explaining failures at entrance stage. For subsequent stages, current evaluation performance features unveil their predicting power. Also, it should be noted that most of the features’ groups are represented on each model’s most relevant features, revealing that academic success is a combination of a wide range of distinct factors. These and many other findings, such as, age-related features increasing impact at the end first curricular semester, set a baseline for success improvement recommendations, and for easier data mining adoption by Higher Education institutions. Suggested guidelines include to provide study support groups to risk profiles and to create monitoring frameworks. From a practical standpoint, a data-driven decision-making framework based on these models can be used to promote academic success.O sucesso académico é um dos tópicos mais explorados nos estudos sobre o ensino superior. Este trabalho apresenta uma abordagem de data mining para a previsão do sucesso académico no ISCTE-IUL. Numa abordagem focada no insucesso, são estudados os fatores que explicam estes casos. Neste estudo foram utilizados dados de licenciatura de 10 anos curriculares. Foram analisadas 68 características sociodemográficas, origem social, percurso escolar anterior (ensino secundário), estatutos especiais e percurso académico. Foram adotados diferentes vetores de análise para o primeiro ano curricular (entrada e final dos primeiro e segundo semestres curriculares), dando origem a 3 modelos distintos. Um modelo suplementar foi projetado para cursos especiais. Entre os seis algoritmos de modelação testados, SVM obteve a melhor performance, sendo utilizado para a análise de sensibilidade. O processo de extração de conhecimento indicou que fatores como desempenho anterior, interrupções do percurso educacional e idade, demonstram grande impacto no (in)sucesso num estágio inicial. Nos estágios seguintes, fatores de performance atuais revelam um grande poder de previsão do (in)sucesso. A maior parte dos grupos de características faz-se representar, nas características mais relevantes de cada modelo. Estes e outros resultados, como o aumento do impacto dos fatores relacionadas com a idade no final do segundo semestre curricular, suportam a criação de recomendações institucionais. Por exemplo, criar grupos de apoio ao estudo para perfis de risco e criar ferramentas de monitorização são algumas das diretrizes sugeridas. Em suma, é possível criar uma ferramenta de apoio à decisão, baseada nos modelos apresentados, podendo ser utilizada pelo ISCTE-IUL para promover o sucesso académico