795 research outputs found
Modeling the Telemarketing Process using Genetic Algorithms and Extreme Boosting: Feature Selection and Cost-Sensitive Analytical Approach
Currently, almost all direct marketing activities take place virtually rather
than in person, weakening interpersonal skills at an alarming pace.
Furthermore, businesses have been striving to sense and foster the tendency of
their clients to accept a marketing offer. The digital transformation and the
increased virtual presence forced firms to seek novel marketing research
approaches. This research aims at leveraging the power of telemarketing data in
modeling the willingness of clients to make a term deposit and finding the most
significant characteristics of the clients. Real-world data from a Portuguese
bank and national socio-economic metrics are used to model the telemarketing
decision-making process. This research makes two key contributions. First,
propose a novel genetic algorithm-based classifier to select the best
discriminating features and tune classifier parameters simultaneously. Second,
build an explainable prediction model. The best-generated classification models
were intensively validated using 50 times repeated 10-fold stratified
cross-validation and the selected features have been analyzed. The models
significantly outperform the related works in terms of class of interest
accuracy, they attained an average of 89.07\% and 0.059 in terms of geometric
mean and type I error respectively. The model is expected to maximize the
potential profit margin at the least possible cost and provide more insights to
support marketing decision-making
Feature selection strategies for improving data-driven decision support in bank telemarketing
The usage of data mining techniques to unveil previously undiscovered knowledge has
been applied in past years to a wide number of domains, including banking and marketing. Raw
data is the basic ingredient for successfully detecting interesting patterns. A key aspect of raw
data manipulation is feature engineering and it is related with the correct characterization or
selection of relevant features (or variables) that conceal relations with the target goal.
This study is particularly focused on feature engineering, aiming at the unfolding
features that best characterize the problem of selling long-term bank deposits through
telemarketing campaigns. For the experimental setup, a case-study from a Portuguese bank,
ranging the 2008-2013 year period and encompassing the recent global financial crisis, was
addressed. To assess the relevance of such problem, a novel literature analysis using text
mining and the latent Dirichlet allocation algorithm was conducted, confirming the existence of a
research gap for bank telemarketing.
Starting from a dataset containing typical telemarketing contacts and client information,
research followed three different and complementary strategies: first, by enriching the dataset
with social and economic context features; then, by including customer lifetime value related
features; finally, by applying a divide and conquer strategy for splitting the problem in smaller
fractions, leading to optimized sub-problems. Each of the three approaches improved previous
results in terms of model metrics related to prediction performance. The relevance of the
proposed features was evaluated, confirming the obtained models as credible and valuable for
telemarketing campaign managers.A utilização de técnicas de data mining para a descoberta de conhecimento tem sido
aplicada nos últimos anos a uma grande variedade de domínios, incluindo banca e marketing.
Os dados no seu estado primitivo constituem o ingrediente básico para a deteção de padrões
de informação. Um aspeto chave da manipulação de dados em bruto consiste na "engenharia
de atributos", que compreende uma correta definição e seleção de atributos relevantes (ou
variáveis) que se relacionem com o alvo da descoberta de conhecimento.
Este trabalho foca-se numa abordagem de "engenharia de atributos" para definir as
variáveis que melhor caraterizam o problema de vender depósitos bancários a prazo através de
campanhas de telemarketing. Sendo um estudo empírico, foi utilizado um caso de estudo de
um banco português, abrangendo o período 2008-2013, que inclui os efeitos da crise financeira
internacional. Para aferir da importância deste problema, foi realizada uma inovadora análise
da literatura recorrendo a text mining e ao algoritmo latent Dirichlet allocation, confirmando a
existência de uma lacuna nesta matéria.
Utilizando como base um conjunto de dados de contactos de telemarketing e
informação sobre os clientes, três estratégias diferentes e complementares foram propostas:
primeiro, os dados foram enriquecidos com atributos socioeconómicos; posteriormente, foram
adicionadas características associadas ao valor do cliente ao longo do seu tempo de vida;
finalmente, o problema foi dividido em problemas mais específicos, permitindo abordagens
otimizadas a cada subproblema. Cada abordagem melhorou as métricas associadas à
capacidade preditiva do modelo. Adicionalmente, a relevância dos atributos foi avaliada,
confirmando os modelos obtidos como credíveis e valiosos para gestores de campanhas de telemarketing
Framework for data quality in knowledge discovery tasks
Actualmente la explosión de datos es tendencia en el universo digital debido a los
avances en las tecnologías de la información. En este sentido, el descubrimiento
de conocimiento y la minería de datos han ganado mayor importancia debido a
la gran cantidad de datos disponibles. Para un exitoso proceso de descubrimiento
de conocimiento, es necesario preparar los datos. Expertos afirman que la fase de
preprocesamiento de datos toma entre un 50% a 70% del tiempo de un proceso de
descubrimiento de conocimiento.
Herramientas software basadas en populares metodologías para el descubrimiento
de conocimiento ofrecen algoritmos para el preprocesamiento de los datos.
Según el cuadrante mágico de Gartner de 2018 para ciencia de datos y plataformas
de aprendizaje automático, KNIME, RapidMiner, SAS, Alteryx, y H20.ai son las
mejores herramientas para el desucrimiento del conocimiento. Estas herramientas
proporcionan diversas técnicas que facilitan la evaluación del conjunto de datos,
sin embargo carecen de un proceso orientado al usuario que permita abordar los
problemas en la calidad de datos. Adem´as, la selección de las técnicas adecuadas
para la limpieza de datos es un problema para usuarios inexpertos, ya que estos
no tienen claro cuales son los métodos más confiables.
De esta forma, la presente tesis doctoral se enfoca en abordar los problemas
antes mencionados mediante: (i) Un marco conceptual que ofrezca un proceso
guiado para abordar los problemas de calidad en los datos en tareas de descubrimiento
de conocimiento, (ii) un sistema de razonamiento basado en casos
que recomiende los algoritmos adecuados para la limpieza de datos y (iii) una ontología que representa el conocimiento de los problemas de calidad en los datos
y los algoritmos de limpieza de datos. Adicionalmente, esta ontología contribuye
en la representacion formal de los casos y en la fase de adaptación, del sistema de
razonamiento basado en casos.The creation and consumption of data continue to grow by leaps and bounds. Due
to advances in Information and Communication Technologies (ICT), today the
data explosion in the digital universe is a new trend. The Knowledge Discovery
in Databases (KDD) gain importance due the abundance of data. For a successful
process of knowledge discovery is necessary to make a data treatment. The
experts affirm that preprocessing phase take the 50% to 70% of the total time of
knowledge discovery process.
Software tools based on Knowledge Discovery Methodologies offers algorithms
for data preprocessing. According to Gartner 2018 Magic Quadrant for
Data Science and Machine Learning Platforms, KNIME, RapidMiner, SAS, Alteryx
and H20.ai are the leader tools for knowledge discovery. These software
tools provide different techniques and they facilitate the evaluation of data analysis,
however, these software tools lack any kind of guidance as to which techniques
can or should be used in which contexts. Consequently, the use of suitable data
cleaning techniques is a headache for inexpert users. They have no idea which
methods can be confidently used and often resort to trial and error.
This thesis presents three contributions to address the mentioned problems:
(i) A conceptual framework to provide the user a guidance to address data quality
issues in knowledge discovery tasks, (ii) a Case-based reasoning system to
recommend the suitable algorithms for data cleaning, and (iii) an Ontology that
represent the knowledge in data quality issues and data cleaning methods. Also,
this ontology supports the case-based reasoning system for case representation
and reuse phase.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: Fernando Fernández Rebollo.- Secretario: Gustavo Adolfo Ramírez.- Vocal: Juan Pedro Caraça-Valente Hernánde
Ensemble learning with dynamic weighting for response modeling in direct marketing
Response modeling, a key to successful direct marketing, has become increasingly prevalent in recent years. However, it practically suffers from the difficulty of class imbalance, i.e., the number of responding (target) customers is often much smaller than that of the non-responding customers. This issue would result in a response model that is biased to the majority class, leading to the low prediction accuracy on the responding customers. In this study, we develop an Ensemble Learning with Dynamic Weighting (ELDW) approach to address the above problem. The proposed ELDW includes two stages. In the first stage, all the minority class instances are combined with different majority class instances to form a number of training subsets, and a base classifiers is trained in each subset. In the second stage, the results of the base classifiers are dynamically integrated, in which two factors are considered. The first factor is the cross entropy of neighbors in each subset, and the second factor is the feature similarity to the minority class instances. In order to evaluate the performance of ELDW, we conduct experimental studies on 10 imbalanced benchmark datasets. The results show that compared with other state-of-the-art imbalance classification algorithms, ELDW achieves higher accuracy on the minority class. Last, we apply the ELDW to a direct marketing activity of an insurance company to identify the target customers under a limited budget
Preventing Discriminatory Decision-making in Evolving Data Streams
Bias in machine learning has rightly received significant attention over the
last decade. However, most fair machine learning (fair-ML) work to address bias
in decision-making systems has focused solely on the offline setting. Despite
the wide prevalence of online systems in the real world, work on identifying
and correcting bias in the online setting is severely lacking. The unique
challenges of the online environment make addressing bias more difficult than
in the offline setting. First, Streaming Machine Learning (SML) algorithms must
deal with the constantly evolving real-time data stream. Second, they need to
adapt to changing data distributions (concept drift) to make accurate
predictions on new incoming data. Adding fairness constraints to this already
complicated task is not straightforward. In this work, we focus on the
challenges of achieving fairness in biased data streams while accounting for
the presence of concept drift, accessing one sample at a time. We present Fair
Sampling over Stream (), a novel fair rebalancing approach capable of
being integrated with SML classification algorithms. Furthermore, we devise the
first unified performance-fairness metric, Fairness Bonded Utility (FBU), to
evaluate and compare the trade-off between performance and fairness of
different bias mitigation methods efficiently. FBU simplifies the comparison of
fairness-performance trade-offs of multiple techniques through one unified and
intuitive evaluation, allowing model designers to easily choose a technique.
Overall, extensive evaluations show our measures surpass those of other fair
online techniques previously reported in the literature
Unveiling the features of successful ebay sellers of smartphones: a data mining sales predictive model
JEL Classification guidelines (M310); (C380).EBay is one of the largest online retailing corporations worldwide, providing numerous
ways for customer feedback on registered sellers. In accordance, with the advent of Web
2.0 and online shopping, an immensity of data is collected from manifold devices. This
data is often unstructured, which inevitably asks for some form of further treatment that
allows classification, discovery of patterns and trends or prediction of outcomes. That
treatment implies the usage of increasingly complex and combined statistical tools as the
size of datasets builds up. Nowadays, datasets may extend to several exabytes, which can
be transformed into knowledge using adequate methods. The aim of the present study is
to evaluate and analyse which and in what way seller and product attributes such as
feedback ratings and price influence sales of smartphones on eBay using data mining
framework and techniques. The methods used include SVM algorithms for modelling the
sales of smartphones by eBay sellers combined with 10-fold cross-validation scheme
which ensured model robustness and employment of metrics MAE, RAE and NMAE for
the sake of gauging prediction accuracy followed by sensitivity analysis in order to assess
the influence of individual features on sales. The methods were considered effective for
both modelling evaluation and knowledge extraction reaching positive results although
with some discrepancies between different prediction accuracy metrics. Lastly, it was
discovered that the number of items in auction, average price and the variety of products
available from a given seller were the most significant attributes, i.e., the largest
contributors for sales.O EBay é uma das plataformas e retalho online de maior dimensão e abarca inúmeras
oportunidades de extração de dados de feedback dos consumidores sobre vários
vendedores. Em concordância, o advento da Web 2.0 e das compras online está
fortemente associado à geração de dados em abundância e à possibilidade da sua respetiva
recolha através de variados dispositivos e plataformas. Estes dados encontram-se,
frequentemente, desestruturados o que inevitavelmente revela a necessidade da sua
normalização e tratamento mais aprofundado de modo a possibilitar tarefas de
classificação, descoberta de padrões e tendências ou de previsão. A complexidade dos
métodos estatísticos aplicados para executar essas tarefas aumenta ao mesmo tempo que
a dimensão das bases de dados. Atualmente, existem bases de dados que atingem vários
exabytes e que se constituem como oportunidades para extração de conhecimento dado
que métodos apropriados e particularizados sejam utilizados. Pretende-se, então, com o
presente estudo quantificar e analisar quais e de que modo as características de
vendedores e produtos influenciam as vendas de smartphones no eBay, recorrendo ao
enquadramento conceptual e técnicas de mineração de dados. Os métodos utilizados
incluem máquinas de vetores de suporte (SVMs) visando a modelação das vendas de
smartphones por vendedores do eBay em combinação com validação cruzada 10-fold de
modo a assegurar a robustez do modelo e com recurso às métricas de avaliação de
desempenho erro absoluto médio (MAE), erro absoluto relativo (RAE) e erro absoluto
médio normalizado (NMAE) para garantir a precisão do modelo preditivo. Seguidamente,
é implementada a análise de sensibilidade para aferir a contribuição individual de cada
atributo para as vendas. Os métodos são considerados eficazes tanto na avaliação do
modelo como na extração de conhecimento visto que viabilizam resultados positivos
ainda que sejam verificadas discrepâncias entre as estimativas para diferentes métricas de
desempenho. Finalmente, foi possível descobrir que número de itens em leilão, o preço
médio e a variedade de produtos disponibilizada por cada vendedor foram os atributos
mais significantes, i.e., os que mais contribuíram para as vendas
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
The advent of large language models (LLMs) and their adoption by the legal
community has given rise to the question: what types of legal reasoning can
LLMs perform? To enable greater study of this question, we present LegalBench:
a collaboratively constructed legal reasoning benchmark consisting of 162 tasks
covering six different types of legal reasoning. LegalBench was built through
an interdisciplinary process, in which we collected tasks designed and
hand-crafted by legal professionals. Because these subject matter experts took
a leading role in construction, tasks either measure legal reasoning
capabilities that are practically useful, or measure reasoning skills that
lawyers find interesting. To enable cross-disciplinary conversations about LLMs
in the law, we additionally show how popular legal frameworks for describing
legal reasoning -- which distinguish between its many forms -- correspond to
LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary.
This paper describes LegalBench, presents an empirical evaluation of 20
open-source and commercial LLMs, and illustrates the types of research
explorations LegalBench enables.Comment: 143 pages, 79 tables, 4 figure
Factors influencing marketing strategy formulation for small and medium enterprises in Polokwane
The aim of the study was to investigate the factors influencing marketing strategy formulation for small and medium enterprises (SMEs) in Polokwane. SMEs, worldwide, are regarded as the cornerstone for economic development. However, SMEs are faced with business marketing constraints that lead to their downfall.
The study objectives were set and led to the formation of the hypotheses. The study adopted a quantitative approach and collected data from a sample of 412 SMEs from Polokwane using non-probability sampling methods, which are convenience and snowball sampling. A survey questionnaire sought responses from the respondents. The pilot test was done, and responses were used to eliminate unnecessary and confusing statements. The respondents were required to indicate their agreement or disagreement with questions on a five-point Likert scale. Descriptive statistical analysis, factor analysis, ANOVA and regression analysis were done to determine if the objectives of the study were achieved and to test hypotheses.
SMEs were found to be using sales promotions, digital marketing, business branding, personal selling and email communication as part of marketing communication strategies. Product strategies used are product quality and packaging, branding and collaborative product development. The findings of the study depict that SMEs marketing communications and product strategies have a positive influence on their performance. The main challenges experienced by SMEs are lack of understanding for marketing research, lack of finance, lack of business planning, inexperienced employees and, the least experienced challenge, lack of customer demands. It was further found that demographic factors (business operation/ maturity, business training and annual turnover) have a significant influence on the marketing communication strategies adopted and the challenges facing SMEs. SMEs that have been in existence for over 10 years perceive marketing challenges differently from SMEs with less than 10 years of operation.
It is necessary for the government to put in place progressive policies that can assist SMEs to improve their marketing strategy. It was recommended that SMEs attend training related to marketing to be equipped with marketing and business operations knowledge to minimise the experienced marketing challenges. Training will enable SMEs to do better in marketing communication or/and product strategies.Business ManagementM. Com. (Business Management
- …