795 research outputs found

    Modeling the Telemarketing Process using Genetic Algorithms and Extreme Boosting: Feature Selection and Cost-Sensitive Analytical Approach

    Full text link
    Currently, almost all direct marketing activities take place virtually rather than in person, weakening interpersonal skills at an alarming pace. Furthermore, businesses have been striving to sense and foster the tendency of their clients to accept a marketing offer. The digital transformation and the increased virtual presence forced firms to seek novel marketing research approaches. This research aims at leveraging the power of telemarketing data in modeling the willingness of clients to make a term deposit and finding the most significant characteristics of the clients. Real-world data from a Portuguese bank and national socio-economic metrics are used to model the telemarketing decision-making process. This research makes two key contributions. First, propose a novel genetic algorithm-based classifier to select the best discriminating features and tune classifier parameters simultaneously. Second, build an explainable prediction model. The best-generated classification models were intensively validated using 50 times repeated 10-fold stratified cross-validation and the selected features have been analyzed. The models significantly outperform the related works in terms of class of interest accuracy, they attained an average of 89.07\% and 0.059 in terms of geometric mean and type I error respectively. The model is expected to maximize the potential profit margin at the least possible cost and provide more insights to support marketing decision-making

    Feature selection strategies for improving data-driven decision support in bank telemarketing

    Get PDF
    The usage of data mining techniques to unveil previously undiscovered knowledge has been applied in past years to a wide number of domains, including banking and marketing. Raw data is the basic ingredient for successfully detecting interesting patterns. A key aspect of raw data manipulation is feature engineering and it is related with the correct characterization or selection of relevant features (or variables) that conceal relations with the target goal. This study is particularly focused on feature engineering, aiming at the unfolding features that best characterize the problem of selling long-term bank deposits through telemarketing campaigns. For the experimental setup, a case-study from a Portuguese bank, ranging the 2008-2013 year period and encompassing the recent global financial crisis, was addressed. To assess the relevance of such problem, a novel literature analysis using text mining and the latent Dirichlet allocation algorithm was conducted, confirming the existence of a research gap for bank telemarketing. Starting from a dataset containing typical telemarketing contacts and client information, research followed three different and complementary strategies: first, by enriching the dataset with social and economic context features; then, by including customer lifetime value related features; finally, by applying a divide and conquer strategy for splitting the problem in smaller fractions, leading to optimized sub-problems. Each of the three approaches improved previous results in terms of model metrics related to prediction performance. The relevance of the proposed features was evaluated, confirming the obtained models as credible and valuable for telemarketing campaign managers.A utilização de técnicas de data mining para a descoberta de conhecimento tem sido aplicada nos últimos anos a uma grande variedade de domínios, incluindo banca e marketing. Os dados no seu estado primitivo constituem o ingrediente básico para a deteção de padrões de informação. Um aspeto chave da manipulação de dados em bruto consiste na "engenharia de atributos", que compreende uma correta definição e seleção de atributos relevantes (ou variáveis) que se relacionem com o alvo da descoberta de conhecimento. Este trabalho foca-se numa abordagem de "engenharia de atributos" para definir as variáveis que melhor caraterizam o problema de vender depósitos bancários a prazo através de campanhas de telemarketing. Sendo um estudo empírico, foi utilizado um caso de estudo de um banco português, abrangendo o período 2008-2013, que inclui os efeitos da crise financeira internacional. Para aferir da importância deste problema, foi realizada uma inovadora análise da literatura recorrendo a text mining e ao algoritmo latent Dirichlet allocation, confirmando a existência de uma lacuna nesta matéria. Utilizando como base um conjunto de dados de contactos de telemarketing e informação sobre os clientes, três estratégias diferentes e complementares foram propostas: primeiro, os dados foram enriquecidos com atributos socioeconómicos; posteriormente, foram adicionadas características associadas ao valor do cliente ao longo do seu tempo de vida; finalmente, o problema foi dividido em problemas mais específicos, permitindo abordagens otimizadas a cada subproblema. Cada abordagem melhorou as métricas associadas à capacidade preditiva do modelo. Adicionalmente, a relevância dos atributos foi avaliada, confirmando os modelos obtidos como credíveis e valiosos para gestores de campanhas de telemarketing

    Framework for data quality in knowledge discovery tasks

    Get PDF
    Actualmente la explosión de datos es tendencia en el universo digital debido a los avances en las tecnologías de la información. En este sentido, el descubrimiento de conocimiento y la minería de datos han ganado mayor importancia debido a la gran cantidad de datos disponibles. Para un exitoso proceso de descubrimiento de conocimiento, es necesario preparar los datos. Expertos afirman que la fase de preprocesamiento de datos toma entre un 50% a 70% del tiempo de un proceso de descubrimiento de conocimiento. Herramientas software basadas en populares metodologías para el descubrimiento de conocimiento ofrecen algoritmos para el preprocesamiento de los datos. Según el cuadrante mágico de Gartner de 2018 para ciencia de datos y plataformas de aprendizaje automático, KNIME, RapidMiner, SAS, Alteryx, y H20.ai son las mejores herramientas para el desucrimiento del conocimiento. Estas herramientas proporcionan diversas técnicas que facilitan la evaluación del conjunto de datos, sin embargo carecen de un proceso orientado al usuario que permita abordar los problemas en la calidad de datos. Adem´as, la selección de las técnicas adecuadas para la limpieza de datos es un problema para usuarios inexpertos, ya que estos no tienen claro cuales son los métodos más confiables. De esta forma, la presente tesis doctoral se enfoca en abordar los problemas antes mencionados mediante: (i) Un marco conceptual que ofrezca un proceso guiado para abordar los problemas de calidad en los datos en tareas de descubrimiento de conocimiento, (ii) un sistema de razonamiento basado en casos que recomiende los algoritmos adecuados para la limpieza de datos y (iii) una ontología que representa el conocimiento de los problemas de calidad en los datos y los algoritmos de limpieza de datos. Adicionalmente, esta ontología contribuye en la representacion formal de los casos y en la fase de adaptación, del sistema de razonamiento basado en casos.The creation and consumption of data continue to grow by leaps and bounds. Due to advances in Information and Communication Technologies (ICT), today the data explosion in the digital universe is a new trend. The Knowledge Discovery in Databases (KDD) gain importance due the abundance of data. For a successful process of knowledge discovery is necessary to make a data treatment. The experts affirm that preprocessing phase take the 50% to 70% of the total time of knowledge discovery process. Software tools based on Knowledge Discovery Methodologies offers algorithms for data preprocessing. According to Gartner 2018 Magic Quadrant for Data Science and Machine Learning Platforms, KNIME, RapidMiner, SAS, Alteryx and H20.ai are the leader tools for knowledge discovery. These software tools provide different techniques and they facilitate the evaluation of data analysis, however, these software tools lack any kind of guidance as to which techniques can or should be used in which contexts. Consequently, the use of suitable data cleaning techniques is a headache for inexpert users. They have no idea which methods can be confidently used and often resort to trial and error. This thesis presents three contributions to address the mentioned problems: (i) A conceptual framework to provide the user a guidance to address data quality issues in knowledge discovery tasks, (ii) a Case-based reasoning system to recommend the suitable algorithms for data cleaning, and (iii) an Ontology that represent the knowledge in data quality issues and data cleaning methods. Also, this ontology supports the case-based reasoning system for case representation and reuse phase.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Fernando Fernández Rebollo.- Secretario: Gustavo Adolfo Ramírez.- Vocal: Juan Pedro Caraça-Valente Hernánde

    Employer training pilots : first year evaluation report

    Get PDF

    Ensemble learning with dynamic weighting for response modeling in direct marketing

    Get PDF
    Response modeling, a key to successful direct marketing, has become increasingly prevalent in recent years. However, it practically suffers from the difficulty of class imbalance, i.e., the number of responding (target) customers is often much smaller than that of the non-responding customers. This issue would result in a response model that is biased to the majority class, leading to the low prediction accuracy on the responding customers. In this study, we develop an Ensemble Learning with Dynamic Weighting (ELDW) approach to address the above problem. The proposed ELDW includes two stages. In the first stage, all the minority class instances are combined with different majority class instances to form a number of training subsets, and a base classifiers is trained in each subset. In the second stage, the results of the base classifiers are dynamically integrated, in which two factors are considered. The first factor is the cross entropy of neighbors in each subset, and the second factor is the feature similarity to the minority class instances. In order to evaluate the performance of ELDW, we conduct experimental studies on 10 imbalanced benchmark datasets. The results show that compared with other state-of-the-art imbalance classification algorithms, ELDW achieves higher accuracy on the minority class. Last, we apply the ELDW to a direct marketing activity of an insurance company to identify the target customers under a limited budget

    Preventing Discriminatory Decision-making in Evolving Data Streams

    Full text link
    Bias in machine learning has rightly received significant attention over the last decade. However, most fair machine learning (fair-ML) work to address bias in decision-making systems has focused solely on the offline setting. Despite the wide prevalence of online systems in the real world, work on identifying and correcting bias in the online setting is severely lacking. The unique challenges of the online environment make addressing bias more difficult than in the offline setting. First, Streaming Machine Learning (SML) algorithms must deal with the constantly evolving real-time data stream. Second, they need to adapt to changing data distributions (concept drift) to make accurate predictions on new incoming data. Adding fairness constraints to this already complicated task is not straightforward. In this work, we focus on the challenges of achieving fairness in biased data streams while accounting for the presence of concept drift, accessing one sample at a time. We present Fair Sampling over Stream (FS2FS^2), a novel fair rebalancing approach capable of being integrated with SML classification algorithms. Furthermore, we devise the first unified performance-fairness metric, Fairness Bonded Utility (FBU), to evaluate and compare the trade-off between performance and fairness of different bias mitigation methods efficiently. FBU simplifies the comparison of fairness-performance trade-offs of multiple techniques through one unified and intuitive evaluation, allowing model designers to easily choose a technique. Overall, extensive evaluations show our measures surpass those of other fair online techniques previously reported in the literature

    Unveiling the features of successful ebay sellers of smartphones: a data mining sales predictive model

    Get PDF
    JEL Classification guidelines (M310); (C380).EBay is one of the largest online retailing corporations worldwide, providing numerous ways for customer feedback on registered sellers. In accordance, with the advent of Web 2.0 and online shopping, an immensity of data is collected from manifold devices. This data is often unstructured, which inevitably asks for some form of further treatment that allows classification, discovery of patterns and trends or prediction of outcomes. That treatment implies the usage of increasingly complex and combined statistical tools as the size of datasets builds up. Nowadays, datasets may extend to several exabytes, which can be transformed into knowledge using adequate methods. The aim of the present study is to evaluate and analyse which and in what way seller and product attributes such as feedback ratings and price influence sales of smartphones on eBay using data mining framework and techniques. The methods used include SVM algorithms for modelling the sales of smartphones by eBay sellers combined with 10-fold cross-validation scheme which ensured model robustness and employment of metrics MAE, RAE and NMAE for the sake of gauging prediction accuracy followed by sensitivity analysis in order to assess the influence of individual features on sales. The methods were considered effective for both modelling evaluation and knowledge extraction reaching positive results although with some discrepancies between different prediction accuracy metrics. Lastly, it was discovered that the number of items in auction, average price and the variety of products available from a given seller were the most significant attributes, i.e., the largest contributors for sales.O EBay é uma das plataformas e retalho online de maior dimensão e abarca inúmeras oportunidades de extração de dados de feedback dos consumidores sobre vários vendedores. Em concordância, o advento da Web 2.0 e das compras online está fortemente associado à geração de dados em abundância e à possibilidade da sua respetiva recolha através de variados dispositivos e plataformas. Estes dados encontram-se, frequentemente, desestruturados o que inevitavelmente revela a necessidade da sua normalização e tratamento mais aprofundado de modo a possibilitar tarefas de classificação, descoberta de padrões e tendências ou de previsão. A complexidade dos métodos estatísticos aplicados para executar essas tarefas aumenta ao mesmo tempo que a dimensão das bases de dados. Atualmente, existem bases de dados que atingem vários exabytes e que se constituem como oportunidades para extração de conhecimento dado que métodos apropriados e particularizados sejam utilizados. Pretende-se, então, com o presente estudo quantificar e analisar quais e de que modo as características de vendedores e produtos influenciam as vendas de smartphones no eBay, recorrendo ao enquadramento conceptual e técnicas de mineração de dados. Os métodos utilizados incluem máquinas de vetores de suporte (SVMs) visando a modelação das vendas de smartphones por vendedores do eBay em combinação com validação cruzada 10-fold de modo a assegurar a robustez do modelo e com recurso às métricas de avaliação de desempenho erro absoluto médio (MAE), erro absoluto relativo (RAE) e erro absoluto médio normalizado (NMAE) para garantir a precisão do modelo preditivo. Seguidamente, é implementada a análise de sensibilidade para aferir a contribuição individual de cada atributo para as vendas. Os métodos são considerados eficazes tanto na avaliação do modelo como na extração de conhecimento visto que viabilizam resultados positivos ainda que sejam verificadas discrepâncias entre as estimativas para diferentes métricas de desempenho. Finalmente, foi possível descobrir que número de itens em leilão, o preço médio e a variedade de produtos disponibilizada por cada vendedor foram os atributos mais significantes, i.e., os que mais contribuíram para as vendas

    LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

    Full text link
    The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning -- which distinguish between its many forms -- correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables.Comment: 143 pages, 79 tables, 4 figure

    Factors influencing marketing strategy formulation for small and medium enterprises in Polokwane

    Get PDF
    The aim of the study was to investigate the factors influencing marketing strategy formulation for small and medium enterprises (SMEs) in Polokwane. SMEs, worldwide, are regarded as the cornerstone for economic development. However, SMEs are faced with business marketing constraints that lead to their downfall. The study objectives were set and led to the formation of the hypotheses. The study adopted a quantitative approach and collected data from a sample of 412 SMEs from Polokwane using non-probability sampling methods, which are convenience and snowball sampling. A survey questionnaire sought responses from the respondents. The pilot test was done, and responses were used to eliminate unnecessary and confusing statements. The respondents were required to indicate their agreement or disagreement with questions on a five-point Likert scale. Descriptive statistical analysis, factor analysis, ANOVA and regression analysis were done to determine if the objectives of the study were achieved and to test hypotheses. SMEs were found to be using sales promotions, digital marketing, business branding, personal selling and email communication as part of marketing communication strategies. Product strategies used are product quality and packaging, branding and collaborative product development. The findings of the study depict that SMEs marketing communications and product strategies have a positive influence on their performance. The main challenges experienced by SMEs are lack of understanding for marketing research, lack of finance, lack of business planning, inexperienced employees and, the least experienced challenge, lack of customer demands. It was further found that demographic factors (business operation/ maturity, business training and annual turnover) have a significant influence on the marketing communication strategies adopted and the challenges facing SMEs. SMEs that have been in existence for over 10 years perceive marketing challenges differently from SMEs with less than 10 years of operation. It is necessary for the government to put in place progressive policies that can assist SMEs to improve their marketing strategy. It was recommended that SMEs attend training related to marketing to be equipped with marketing and business operations knowledge to minimise the experienced marketing challenges. Training will enable SMEs to do better in marketing communication or/and product strategies.Business ManagementM. Com. (Business Management
    corecore