272 research outputs found

    Telemarketing outcome prediction using an Ensemblebased machine learning technique

    Get PDF
    Business organisations often use telemarketing, which is a form of direct marketing strategy to reach a wide range of customers within a short time. However, such marketing strategies need to target an appropriate subset of customers to offer them products/services instead of contacting everyone as people often get annoyed and disengaged when they receive pre-emptive communication. Machine learning techniques can aid in this scenario to select customers who are likely to positively respond to a telemarketing campaign. Business organisations can use their CRM-based customer information and embed machine learning techniques in the data analysis process to develop an automated decisionmaking system, which can recommend the set of customers to be communicated. A few works in the literature have used machine learning techniques to predict the outcome of telemarketing, however, the majority of them used a single classifier algorithm or used only a balanced dataset. To address this issue, this article proposes an ensemble-based machine learning technique to predict the outcome of telemarking, which works well even with an imbalanced dataset and achieves 90.29% accuracy

    Identifying Prospective Clients for Long-Term Bank Deposit

    Get PDF
    The numerous characteristics of customers are often kept in bank databases, which are utilized to understand who they are. But it has been found in recent years that utilizing different Data Mining and Feature Selection (PCA) methods, customer traits and other factors connected to bank services have a big influence on consumers\u27 decisions. Business analytics is an approach to conducting business that uses transactional data from an organization to acquire knowledge of how business operations can be enhanced by employing data mining methods to determine existing patterns that a firm can incorporate to generate significant data-driven choices to choose significant variables. In this project, we apply data mining techniques for the prediction of long- term bank deposits employing a well-known bank data collection. From PCA it is seen that customers’ income level, pout come, p days, and previous (first PC) in general, may seem to have a higher impact on prospective clients, but this is indeed not the real. Also, the Banks’ prior campaign and the social elements (Age, Marital Status, Education, Campaign, Duration) of the clients are primarily essential compared to other variables. Again k-means clustering is employed with reduced data by PCA to determine groups of potential customers which gives 87.76% accuracy scores

    Factors influencing charter flight departure delay

    Get PDF
    This study aims to identify the main factors leading to charter flight departure delay through data mining. The data sample analysed consists of 5,484 flights operated by a European airline between 2014 and 2017. The tuned dataset of 33 features was used for modelling departure delay (e.g., if the flight delayed more than 15 minutes). The results proved the value of the proposed approach by an area under the receiver operating characteristic curve of 0.831 and supported knowledge extraction through the data-based sensitivity analysis. The features related to previous flight delay information were considered as being the most influential toward current flight being delayed or not, which is consistent with the propagating effect of flight delays. However, it is not the reason for the previous delay nor the delay duration that accounted for the most relevance. Instead, a computed feature indicating if there were two or more registered reasons accounted for 33% of relevance. The contributions include also using a broader data mining approach supported by an extensive data understanding and preparation stage using both proprietary and open access data sources to build a comprehensive dataset.info:eu-repo/semantics/acceptedVersio

    Seleção de atributos usando árvores de decisão não-binárias

    Get PDF
    Mestrado em Engenharia Eletrónica e InformáticaExame público realizado em 22 de Maio de 2018A aprendizagem automática, área integrada na inteligência artificial, possui como principal objetivo a criação e o desenvolvimento de métodos e algoritmos que possuam capacidades comummente associadas aos humanos, como a aquisição e a descoberta de novos factos ou conhecimentos. Quando comparado com humanos, as principais vantagens da implementação destes métodos estão normalmente associadas a otimizações temporais e monetárias. Este trabalho apresenta um estudo de seleção de atributos/características e capacidade de previsão/classificação aplicado à monitorização de condições de ferramentas de corte (desgaste de ferramentas) e a classificação de potenciais novos clientes para serviços bancários (telemarketing bancário), usando as árvores de decisão ID3 com a capacidade de lidar com variáveis contínuas – algoritmo adaptado neste trabalho. Os resultados obtidos demonstram que este algoritmo, em comparação com as árvores de decisão convencionais, para conjuntos de dados reduzidos, apresenta o melhor desempenho. A seleção de atributos realizada pelo algoritmo adaptado provou ser uma mais-valia, quer seja para posterior classificação com a aplicação do algoritmo desenvolvido ou com a aplicação de outros algoritmos de referência na área de aprendizagem automática. Os resultados obtidos dos conjuntos de dados do desgaste de ferramentas e do telemarketing bancário apresentam uma redução de 15 para 5 e de 19 para 15 atributos, respetivamente. Em ambos os estudos ficou demonstrada a eficácia desta abordagem bem como a aplicabilidade na seleção de atributos de forma simples e transparente, mesmo na presença de dados com ruído.Machine learning, an area integrated in artificial intelligence, has as main objective the creation and development of methods and algorithms that have abilities commonly associated with humans, such as the acquisition and discovery of new facts or knowledge. When compared to humans, the main advantages of implementing these methods are usually associated with temporal and monetary optimizations. To this end, there are several models/algorithms, such as decision trees, neural networks and support vector machines, performing tasks that can also be different, such as classification and selection of attributes. In order to overcome inherent limitations to the ID3 decision trees, in relation to the manipulation of continuous variables and viability test, in this work an adaptation of the original algorithm was developed and implemented, using the same metrics, allowing, however, its application in data sets with continuous variables. This work presents a study of selection of attributes/characteristics and prediction/classification capacity applied to the monitoring of cutting tool conditions (tool wear) and the classification of potential new clients for banking services (banking telemarketing) using ID3 decision with the ability to handle continuous variables. The results show that this algorithm, in comparison to the conventional decision trees, namely the algorithms C4.5, CART and Random Forest, for reduced datasets, presents the best performance, with an improvement of 12.5% to 25%. For large data sets, despite having the lowest rating value, the difference is not at all relevant (-2%). The developed algorithm stands out because it allows a detailed analysis, contrary to C4.5 and CART that allow a general analysis. This is due to the way algorithms deal with and perform divisions when working with continuous variables. The selection of attributes performed by the adapted algorithm proved to be an asset, either for later classification with the application of the developed algorithm or with the application of other reference algorithms in the area of machine learning. The results obtained from tool wear data sets and bank telemarketing show a reduction from 15 to 5 and from 19 to 15 attributes, respectively. The applicability of decision trees has been proven both in the monitoring of multisensor processes and in the classification of new clients with continuous variables. This approach also revealed that decision trees can be applied for the purpose of selecting attributes in a simple and transparent way, even in the presence of noise data

    ReConTab: Regularized Contrastive Representation Learning for Tabular Data

    Full text link
    Representation learning stands as one of the critical machine learning techniques across various domains. Through the acquisition of high-quality features, pre-trained embeddings significantly reduce input space redundancy, benefiting downstream pattern recognition tasks such as classification, regression, or detection. Nonetheless, in the domain of tabular data, feature engineering and selection still heavily rely on manual intervention, leading to time-consuming processes and necessitating domain expertise. In response to this challenge, we introduce ReConTab, a deep automatic representation learning framework with regularized contrastive learning. Agnostic to any type of modeling task, ReConTab constructs an asymmetric autoencoder based on the same raw features from model inputs, producing low-dimensional representative embeddings. Specifically, regularization techniques are applied for raw feature selection. Meanwhile, ReConTab leverages contrastive learning to distill the most pertinent information for downstream tasks. Experiments conducted on extensive real-world datasets substantiate the framework's capacity to yield substantial and robust performance improvements. Furthermore, we empirically demonstrate that pre-trained embeddings can seamlessly integrate as easily adaptable features, enhancing the performance of various traditional methods such as XGBoost and Random Forest

    From Theory to Practice: A Data Quality Framework for Classification Tasks

    Get PDF
    The data preprocessing is an essential step in knowledge discovery projects. The experts affirm that preprocessing tasks take between 50% to 70% of the total time of the knowledge discovery process. In this sense, several authors consider the data cleaning as one of the most cumbersome and critical tasks. Failure to provide high data quality in the preprocessing stage will significantly reduce the accuracy of any data analytic project. In this paper, we propose a framework to address the data quality issues in classification tasks DQF4CT. Our approach is composed of: (i) a conceptual framework to provide the user guidance on how to deal with data problems in classification tasks; and (ii) an ontology that represents the knowledge in data cleaning and suggests the proper data cleaning approaches. We presented two case studies through real datasets: physical activity monitoring (PAM) and occupancy detection of an office room (OD). With the aim of evaluating our proposal, the cleaned datasets by DQF4CT were used to train the same algorithms used in classification tasks by the authors of PAM and OD. Additionally, we evaluated DQF4CT through datasets of the Repository of Machine Learning Databases of the University of California, Irvine (UCI). In addition, 84% of the results achieved by the models of the datasets cleaned by DQF4CT are better than the models of the datasets authors.This work has also been supported by: Project: “Red de formación de talento humano para la innovación social y productiva en el Departamento del Cauca InnovAcción Cauca”. Convocatoria 03-2018 Publicación de artículos en revistas de alto impacto. Project: “Alternativas Innovadoras de Agricultura Inteligente para sistemas productivos agrícolas del departamento del Cauca soportado en entornos de IoT - ID 4633” financed by Convocatoria 04C–2018 “Banco de Proyectos Conjuntos UEES-Sostenibilidad” of Project “Red de formación de talento humano para la innovación social y productiva en el Departamento del Cauca InnovAcción Cauca”. Spanish Ministry of Economy, Industry and Competitiveness (Projects TRA2015-63708-R and TRA2016-78886-C3-1-R)

    The Role of the Mangement Sciences in Research on Personalization

    Get PDF
    We present a review of research studies that deal with personalization. We synthesize current knowledge about these areas, and identify issues that we envision will be of interest to researchers working in the management sciences. We take an interdisciplinary approach that spans the areas of economics, marketing, information technology, and operations. We present an overarching framework for personalization that allows us to identify key players in the personalization process, as well as, the key stages of personalization. The framework enables us to examine the strategic role of personalization in the interactions between a firm and other key players in the firm's value system. We review extant literature in the strategic behavior of firms, and discuss opportunities for analytical and empirical research in this regard. Next, we examine how a firm can learn a customer's preferences, which is one of the key components of the personalization process. We use a utility-based approach to formalize such preference functions, and to understand how these preference functions could be learnt based on a customer's interactions with a firm. We identify well-established techniques in management sciences that can be gainfully employed in future research on personalization.CRM, Persoanlization, Marketing, e-commerce,

    Benchmarking insider threat intrusion detection systems

    Get PDF
    viii, 97 leaves : ill. ; 29 cm.Includes abstract.Includes bibliographical references (leaves 88-97).An intrusion detection system generally detects unwanted manipulations to computer systems. In recent years, this technology has been used to protect personal information after it has been collected by an organization. Selecting an appropriate IDS is an important decision for system security administrators, to keep authorized employees from abusing their access to the system to exploit sensitive information. To date, little work has been done to create a benchmark for small and mid-size organizations to measure and compare the capability of different insider threat IDSs which are based on user profiling. It motivates us to create a benchmark which enables organizations to compare these different IDSs. The benchmark is used to produce useful comparisons of the accuracy and overhead of two key research implementations of future insider threat intrusion algorithms, which are based on user behavior

    Financial revolution: a systemic analysis of artificial intelligence and machine learning in the banking sector

    Get PDF
    This paper reviews the advances, challenges, and approaches of artificial intelligence (AI) and machine learning (ML) in the banking sector. The use of these technologies is accelerating in various industries, including banking. However, the literature on banking is scattered, making a global understanding difficult. This study reviewed the main approaches in terms of applications and algorithmic models, as well as the benefits and challenges associated with their implementation in banking, in addition to a bibliometric analysis of variables related to the distribution of publications and the most productive countries, as well as an analysis of the co-occurrence and dynamics of keywords. Following the preferred reporting items for systematic reviews and meta-analyses (PRISMA) framework, forty articles were selected for review. The results indicate that these technologies are used in the banking sector for customer segmentation, credit risk analysis, recommendation, and fraud detection. It should be noted that credit analysis and fraud detection are the most implemented areas, using algorithms such as random forests (RF), decision trees (DT), support vector machines (SVM), and logistic regression (LR), among others. In addition, their use brings significant benefits for decision-making and optimizing banking operations. However, the handling of substantial amounts of data with these technologies poses ethical challenges
    corecore