3,288 research outputs found

    On intelligible multimodal visual analysis

    Get PDF
    Analyzing data becomes an important skill in a more and more digital world. Yet, many users are facing knowledge barriers preventing them to independently conduct their data analysis. To tear down some of these barriers, multimodal interaction for visual analysis has been proposed. Multimodal interaction through speech and touch enables not only experts, but also novice users to effortlessly interact with such kind of technology. However, current approaches do not take the user differences into account. In fact, whether visual analysis is intelligible ultimately depends on the user. In order to close this research gap, this dissertation explores how multimodal visual analysis can be personalized. To do so, it takes a holistic view. First, an intelligible task space of visual analysis tasks is defined by considering personalization potentials. This task space provides an initial basis for understanding how effective personalization in visual analysis can be approached. Second, empirical analyses on speech commands in visual analysis as well as used visualizations from scientific publications further reveal patterns and structures. These behavior-indicated findings help to better understand expectations towards multimodal visual analysis. Third, a technical prototype is designed considering the previous findings. Enriching the visual analysis by a persistent dialogue and a transparency of the underlying computations, conducted user studies show not only advantages, but address the relevance of considering the user’s characteristics. Finally, both communications channels – visualizations and dialogue – are personalized. Leveraging linguistic theory and reinforcement learning, the results highlight a positive effect of adjusting to the user. Especially when the user’s knowledge is exceeded, personalizations helps to improve the user experience. Overall, this dissertations confirms not only the importance of considering the user’s characteristics in multimodal visual analysis, but also provides insights on how an intelligible analysis can be achieved. By understanding the use of input modalities, a system can focus only on the user’s needs. By understanding preferences on the output modalities, the system can better adapt to the user. Combining both directions imporves user experience and contributes towards an intelligible multimodal visual analysis

    Machine Learning

    Get PDF
    Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience

    Governance of environmental risk: New approaches to managing stakeholder involvement

    Full text link
    Disputes concerning industrial legacies such as the disposal of toxic wastes illustrate changing pressures on corporations and governments. Business and governments are now confronted with managing the expectations of a society increasingly aware of the social and environmental impacts and risks associated with economic development and demanding more equitable distribution and democratic management of such risks. The closed managerialist decision-making of the powerful bureaucracies and corporations of the industrial era is informed by traditional management theory which cannot provide a framework for the adequate governance of these risks. Recent socio-political theories have conceptualised some key themes that must be addressed in a more fitting approach to governance. We identify more recent management and governance theory which addresses these themes and develop a process-based approach to governance of environmental disputes that allows for the evolving nature of stakeholder relations in a highly complex multiple stakeholder arena. © 2008 Elsevier Ltd. All rights reserved

    Expressive and modular rule-based classifier for data streams

    Get PDF
    The advances in computing software, hardware, connected devices and wireless communication infrastructure in recent years have led to the desire to work with streaming data sources. Yet the number of techniques, approaches and algorithms which can work with data from a streaming source is still very limited, compared with batched data. Although data mining techniques have been a well-studied topic of knowledge discovery for decades, many unique properties as well as challenges in learning from a data stream have not been considered properly due to the actual presence of and the real needs to mine information from streaming data sources. This thesis aims to contribute to the knowledge by developing a rule-based algorithm to specifically learn classification rules from data streams, with the learned rules are expressive so that a human user can easily interpret the concept and rationale behind the predictions of the created model. There are two main structures to represent a classification model; the ‘tree-based’ structure and the ‘rule-based’ structure. Even though both forms of representation are popular and well-known in traditional data mining, they are different when it comes to interpretability and quality of models in certain circumstances. The first part of this thesis analyses background work and relevant topics in learning classification rules from data streams. This study provides information about the essential requirements to produce high quality classification rules from data streams and how many systems, algorithms and techniques related to learn the classification of a static dataset are not applicable in a streaming environment. The second part of the thesis investigates at a new technique to improve the efficiency and accuracy in learning heuristics from numeric features from a streaming data source. The computational cost is one of the important factors to be considered for an effective and practical learning algorithm/system because of the needs to learn from continuous arrivals of data examples sequentially and discard the seen data examples. If the computing cost is too expensive, then one may not be able to keep pace with the arrival of high velocity and possibly unbound data streams. The proposed technique was first discussed in the context of the use of Gaussian distribution as heuristics for building rule terms on numeric features. Secondly, empirical evaluation shows the successful integration of the proposed technique into an existing rule-based algorithm for the data stream, eRules. Continuing on the topic of a rule-based algorithm for classification data streams, the use of Hoeffding’s Inequality addresses another problem in learning from a data stream, namely how much data should be seen from a data stream before starting learning and how to keep the model updated over time. By incorporating the theory from Hoeffding’s Inequality, this study presents the Hoeffding Rules algorithm, which can induce modular rules directly from a streaming data source with dynamic window sizes throughout the learning period to ensure the efficiency and robustness towards the concept drifts. Concept drift is another unique challenge in mining data streams which the underlying concept of the data can change either gradually or abruptly over time and the learner should adapt to these changes as quickly as possible. This research focuses on the development of a rule-based algorithm, Hoeffding Rules, for data stream which considers streaming environments as primary data sources and addresses several unique challenges in learning rules from data streams such as concept drifts and computational efficiency. This knowledge facilitates the need and the importance of an interpretable machine learning model; applying new studies to improve the ability to mine useful insights from potentially high velocity, high volume and unbounded data streams. More broadly, this research complements the study in learning classification rules from data streams to address some of the unique challenges in data streams compared with conventional batch data, with the knowledge necessary to systematically and effectively learn expressive and modular classification rules from data streams

    Feature selection strategies for improving data-driven decision support in bank telemarketing

    Get PDF
    The usage of data mining techniques to unveil previously undiscovered knowledge has been applied in past years to a wide number of domains, including banking and marketing. Raw data is the basic ingredient for successfully detecting interesting patterns. A key aspect of raw data manipulation is feature engineering and it is related with the correct characterization or selection of relevant features (or variables) that conceal relations with the target goal. This study is particularly focused on feature engineering, aiming at the unfolding features that best characterize the problem of selling long-term bank deposits through telemarketing campaigns. For the experimental setup, a case-study from a Portuguese bank, ranging the 2008-2013 year period and encompassing the recent global financial crisis, was addressed. To assess the relevance of such problem, a novel literature analysis using text mining and the latent Dirichlet allocation algorithm was conducted, confirming the existence of a research gap for bank telemarketing. Starting from a dataset containing typical telemarketing contacts and client information, research followed three different and complementary strategies: first, by enriching the dataset with social and economic context features; then, by including customer lifetime value related features; finally, by applying a divide and conquer strategy for splitting the problem in smaller fractions, leading to optimized sub-problems. Each of the three approaches improved previous results in terms of model metrics related to prediction performance. The relevance of the proposed features was evaluated, confirming the obtained models as credible and valuable for telemarketing campaign managers.A utilização de técnicas de data mining para a descoberta de conhecimento tem sido aplicada nos últimos anos a uma grande variedade de domínios, incluindo banca e marketing. Os dados no seu estado primitivo constituem o ingrediente básico para a deteção de padrões de informação. Um aspeto chave da manipulação de dados em bruto consiste na "engenharia de atributos", que compreende uma correta definição e seleção de atributos relevantes (ou variáveis) que se relacionem com o alvo da descoberta de conhecimento. Este trabalho foca-se numa abordagem de "engenharia de atributos" para definir as variáveis que melhor caraterizam o problema de vender depósitos bancários a prazo através de campanhas de telemarketing. Sendo um estudo empírico, foi utilizado um caso de estudo de um banco português, abrangendo o período 2008-2013, que inclui os efeitos da crise financeira internacional. Para aferir da importância deste problema, foi realizada uma inovadora análise da literatura recorrendo a text mining e ao algoritmo latent Dirichlet allocation, confirmando a existência de uma lacuna nesta matéria. Utilizando como base um conjunto de dados de contactos de telemarketing e informação sobre os clientes, três estratégias diferentes e complementares foram propostas: primeiro, os dados foram enriquecidos com atributos socioeconómicos; posteriormente, foram adicionadas características associadas ao valor do cliente ao longo do seu tempo de vida; finalmente, o problema foi dividido em problemas mais específicos, permitindo abordagens otimizadas a cada subproblema. Cada abordagem melhorou as métricas associadas à capacidade preditiva do modelo. Adicionalmente, a relevância dos atributos foi avaliada, confirmando os modelos obtidos como credíveis e valiosos para gestores de campanhas de telemarketing
    corecore