3,288 research outputs found
On intelligible multimodal visual analysis
Analyzing data becomes an important skill in a more and more digital world. Yet, many users are facing knowledge barriers preventing them to independently conduct their data analysis. To tear down some of these barriers, multimodal interaction for visual analysis has been proposed. Multimodal interaction through speech and touch enables not only experts, but also novice users to effortlessly interact with such kind of technology. However, current approaches do not take the user differences into account. In fact, whether visual analysis is intelligible ultimately depends on the user.
In order to close this research gap, this dissertation explores how multimodal visual analysis can be personalized. To do so, it takes a holistic view. First, an intelligible task space of visual analysis tasks is defined by considering personalization potentials. This task space provides an initial basis for understanding how effective personalization in visual analysis can be approached. Second, empirical analyses on speech commands in visual analysis as well as used visualizations from scientific publications further reveal patterns and structures. These behavior-indicated findings help to better understand expectations towards multimodal visual analysis. Third, a technical prototype is designed considering the previous findings. Enriching the visual analysis by a persistent dialogue and a transparency of the underlying computations, conducted user studies show not only advantages, but address the relevance of considering the user’s characteristics. Finally, both communications channels – visualizations and dialogue – are personalized. Leveraging linguistic theory and reinforcement learning, the results highlight a positive effect of adjusting to the user. Especially when the user’s knowledge is exceeded, personalizations helps to improve the user experience.
Overall, this dissertations confirms not only the importance of considering the user’s characteristics in multimodal visual analysis, but also provides insights on how an intelligible analysis can be achieved. By understanding the use of input modalities, a system can focus only on the user’s needs. By understanding preferences on the output modalities, the system can better adapt to the user. Combining both directions imporves user experience and contributes towards an intelligible multimodal visual analysis
Machine Learning
Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience
Governance of environmental risk: New approaches to managing stakeholder involvement
Disputes concerning industrial legacies such as the disposal of toxic wastes illustrate changing pressures on corporations and governments. Business and governments are now confronted with managing the expectations of a society increasingly aware of the social and environmental impacts and risks associated with economic development and demanding more equitable distribution and democratic management of such risks. The closed managerialist decision-making of the powerful bureaucracies and corporations of the industrial era is informed by traditional management theory which cannot provide a framework for the adequate governance of these risks. Recent socio-political theories have conceptualised some key themes that must be addressed in a more fitting approach to governance. We identify more recent management and governance theory which addresses these themes and develop a process-based approach to governance of environmental disputes that allows for the evolving nature of stakeholder relations in a highly complex multiple stakeholder arena. © 2008 Elsevier Ltd. All rights reserved
Expressive and modular rule-based classifier for data streams
The advances in computing software, hardware, connected devices and wireless
communication infrastructure in recent years have led to the desire to
work with streaming data sources. Yet the number of techniques, approaches
and algorithms which can work with data from a streaming source is still very
limited, compared with batched data. Although data mining techniques have
been a well-studied topic of knowledge discovery for decades, many unique
properties as well as challenges in learning from a data stream have not been
considered properly due to the actual presence of and the real needs to mine
information from streaming data sources. This thesis aims to contribute to
the knowledge by developing a rule-based algorithm to specifically learn classification
rules from data streams, with the learned rules are expressive so
that a human user can easily interpret the concept and rationale behind the
predictions of the created model. There are two main structures to represent
a classification model; the ‘tree-based’ structure and the ‘rule-based’ structure.
Even though both forms of representation are popular and well-known
in traditional data mining, they are different when it comes to interpretability
and quality of models in certain circumstances.
The first part of this thesis analyses background work and relevant topics in learning classification rules from data streams. This study provides information
about the essential requirements to produce high quality classification
rules from data streams and how many systems, algorithms and techniques
related to learn the classification of a static dataset are not applicable in a
streaming environment.
The second part of the thesis investigates at a new technique to improve
the efficiency and accuracy in learning heuristics from numeric features from
a streaming data source. The computational cost is one of the important factors
to be considered for an effective and practical learning algorithm/system
because of the needs to learn from continuous arrivals of data examples sequentially
and discard the seen data examples. If the computing cost is too
expensive, then one may not be able to keep pace with the arrival of high
velocity and possibly unbound data streams. The proposed technique was
first discussed in the context of the use of Gaussian distribution as heuristics
for building rule terms on numeric features. Secondly, empirical evaluation
shows the successful integration of the proposed technique into an existing
rule-based algorithm for the data stream, eRules.
Continuing on the topic of a rule-based algorithm for classification data
streams, the use of Hoeffding’s Inequality addresses another problem in learning
from a data stream, namely how much data should be seen from a data
stream before starting learning and how to keep the model updated over time.
By incorporating the theory from Hoeffding’s Inequality, this study presents
the Hoeffding Rules algorithm, which can induce modular rules directly from
a streaming data source with dynamic window sizes throughout the learning
period to ensure the efficiency and robustness towards the concept drifts. Concept drift is another unique challenge in mining data streams which the
underlying concept of the data can change either gradually or abruptly over
time and the learner should adapt to these changes as quickly as possible.
This research focuses on the development of a rule-based algorithm, Hoeffding
Rules, for data stream which considers streaming environments as
primary data sources and addresses several unique challenges in learning
rules from data streams such as concept drifts and computational efficiency.
This knowledge facilitates the need and the importance of an interpretable
machine learning model; applying new studies to improve the ability to mine
useful insights from potentially high velocity, high volume and unbounded
data streams. More broadly, this research complements the study in learning
classification rules from data streams to address some of the unique challenges
in data streams compared with conventional batch data, with the
knowledge necessary to systematically and effectively learn expressive and
modular classification rules from data streams
Feature selection strategies for improving data-driven decision support in bank telemarketing
The usage of data mining techniques to unveil previously undiscovered knowledge has
been applied in past years to a wide number of domains, including banking and marketing. Raw
data is the basic ingredient for successfully detecting interesting patterns. A key aspect of raw
data manipulation is feature engineering and it is related with the correct characterization or
selection of relevant features (or variables) that conceal relations with the target goal.
This study is particularly focused on feature engineering, aiming at the unfolding
features that best characterize the problem of selling long-term bank deposits through
telemarketing campaigns. For the experimental setup, a case-study from a Portuguese bank,
ranging the 2008-2013 year period and encompassing the recent global financial crisis, was
addressed. To assess the relevance of such problem, a novel literature analysis using text
mining and the latent Dirichlet allocation algorithm was conducted, confirming the existence of a
research gap for bank telemarketing.
Starting from a dataset containing typical telemarketing contacts and client information,
research followed three different and complementary strategies: first, by enriching the dataset
with social and economic context features; then, by including customer lifetime value related
features; finally, by applying a divide and conquer strategy for splitting the problem in smaller
fractions, leading to optimized sub-problems. Each of the three approaches improved previous
results in terms of model metrics related to prediction performance. The relevance of the
proposed features was evaluated, confirming the obtained models as credible and valuable for
telemarketing campaign managers.A utilização de técnicas de data mining para a descoberta de conhecimento tem sido
aplicada nos últimos anos a uma grande variedade de domínios, incluindo banca e marketing.
Os dados no seu estado primitivo constituem o ingrediente básico para a deteção de padrões
de informação. Um aspeto chave da manipulação de dados em bruto consiste na "engenharia
de atributos", que compreende uma correta definição e seleção de atributos relevantes (ou
variáveis) que se relacionem com o alvo da descoberta de conhecimento.
Este trabalho foca-se numa abordagem de "engenharia de atributos" para definir as
variáveis que melhor caraterizam o problema de vender depósitos bancários a prazo através de
campanhas de telemarketing. Sendo um estudo empírico, foi utilizado um caso de estudo de
um banco português, abrangendo o período 2008-2013, que inclui os efeitos da crise financeira
internacional. Para aferir da importância deste problema, foi realizada uma inovadora análise
da literatura recorrendo a text mining e ao algoritmo latent Dirichlet allocation, confirmando a
existência de uma lacuna nesta matéria.
Utilizando como base um conjunto de dados de contactos de telemarketing e
informação sobre os clientes, três estratégias diferentes e complementares foram propostas:
primeiro, os dados foram enriquecidos com atributos socioeconómicos; posteriormente, foram
adicionadas características associadas ao valor do cliente ao longo do seu tempo de vida;
finalmente, o problema foi dividido em problemas mais específicos, permitindo abordagens
otimizadas a cada subproblema. Cada abordagem melhorou as métricas associadas à
capacidade preditiva do modelo. Adicionalmente, a relevância dos atributos foi avaliada,
confirmando os modelos obtidos como credíveis e valiosos para gestores de campanhas de telemarketing
- …