14 research outputs found

    Visual diagnosis of tree boosting methods

    Get PDF
    Tree boosting, which combines weak learners (typically decision trees) to generate a strong learner, is a highly effective and widely used machine learning method. However, the development of a high performance tree boosting model is a time-consuming process that requires numerous trial-and-error experiments. To tackle this issue, we have developed a visual diagnosis tool, BOOSTVis, to help experts quickly analyze and diagnose the training process of tree boosting. In particular, we have designed a temporal confusion matrix visualization, and combined it with a t-SNE projection and a tree visualization. These visualization components work together to provide a comprehensive overview of a tree boosting model, and enable an effective diagnosis of an unsatisfactory training process. Two case studies that were conducted on the Otto Group Product Classification Challenge dataset demonstrate that BOOSTVis can provide informative feedback and guidance to improve understanding and diagnosis of tree boosting algorithms

    The state of the art in integrating machine learning into visual analytics

    Get PDF
    Visual analytics systems combine machine learning or other analytic techniques with interactive data visualization to promote sensemaking and analytical reasoning. It is through such techniques that people can make sense of large, complex data. While progress has been made, the tactful combination of machine learning and data visualization is still under-explored. This state-of-the-art report presents a summary of the progress that has been made by highlighting and synthesizing select research advances. Further, it presents opportunities and challenges to enhance the synergy between machine learning and visual analytics for impactful future research directions

    Embedding transparency in artificial intelligence machine learning models: managerial implications on predicting and explaining employee turnover

    Get PDF
    Employee turnover (ET) is a major issue faced by firms in all business sectors. Artificial intelligence (AI) machine learning (ML) prediction models can help to classify the likelihood of employees voluntarily departing from employment using historical employee datasets. However, output responses generated by these AI-based ML models lack transparency and interpretability, making it difficult for HR managers to understand the rationale behind the AI predictions. If managers do not understand how and why responses are generated by AI models based on the input datasets, it is unlikely to augment data-driven decision-making and bring value to the organisations. The main purpose of this article is to demonstrate the capability of Local Interpretable Model-Agnostic Explanations (LIME) technique to intuitively explain the ET predictions generated by AI-based ML models for a given employee dataset to HR managers. From a theoretical perspective, we contribute to the International Human Resource Management literature by presenting a conceptual review of AI algorithmic transparency and then discussing its significance to sustain competitive advantage by using the principles of resource-based view theory. We also offer a transparent AI implementation framework using LIME which will provide a useful guide for HR managers to increase the explainability of the AI-based ML models, and therefore mitigate trust issues in data-driven decision-making

    Methodologies in Predictive Visual Analytics

    Get PDF
    abstract: Predictive analytics embraces an extensive area of techniques from statistical modeling to machine learning to data mining and is applied in business intelligence, public health, disaster management and response, and many other fields. To date, visualization has been broadly used to support tasks in the predictive analytics pipeline under the underlying assumption that a human-in-the-loop can aid the analysis by integrating domain knowledge that might not be broadly captured by the system. Primary uses of visualization in the predictive analytics pipeline have focused on data cleaning, exploratory analysis, and diagnostics. More recently, numerous visual analytics systems for feature selection, incremental learning, and various prediction tasks have been proposed to support the growing use of complex models, agent-specific optimization, and comprehensive model comparison and result exploration. Such work is being driven by advances in interactive machine learning and the desire of end-users to understand and engage with the modeling process. However, despite the numerous and promising applications of visual analytics to predictive analytics tasks, work to assess the effectiveness of predictive visual analytics is lacking. This thesis studies the current methodologies in predictive visual analytics. It first defines the scope of predictive analytics and presents a predictive visual analytics (PVA) pipeline. Following the proposed pipeline, a predictive visual analytics framework is developed to be used to explore under what circumstances a human-in-the-loop prediction process is most effective. This framework combines sentiment analysis, feature selection mechanisms, similarity comparisons and model cross-validation through a variety of interactive visualizations to support analysts in model building and prediction. To test the proposed framework, an instantiation for movie box-office prediction is developed and evaluated. Results from small-scale user studies are presented and discussed, and a generalized user study is carried out to assess the role of predictive visual analytics under a movie box-office prediction scenario.Dissertation/ThesisDoctoral Dissertation Engineering 201

    Methodologies in Predictive Visual Analytics

    Get PDF
    abstract: Predictive analytics embraces an extensive area of techniques from statistical modeling to machine learning to data mining and is applied in business intelligence, public health, disaster management and response, and many other fields. To date, visualization has been broadly used to support tasks in the predictive analytics pipeline under the underlying assumption that a human-in-the-loop can aid the analysis by integrating domain knowledge that might not be broadly captured by the system. Primary uses of visualization in the predictive analytics pipeline have focused on data cleaning, exploratory analysis, and diagnostics. More recently, numerous visual analytics systems for feature selection, incremental learning, and various prediction tasks have been proposed to support the growing use of complex models, agent-specific optimization, and comprehensive model comparison and result exploration. Such work is being driven by advances in interactive machine learning and the desire of end-users to understand and engage with the modeling process. However, despite the numerous and promising applications of visual analytics to predictive analytics tasks, work to assess the effectiveness of predictive visual analytics is lacking. This thesis studies the current methodologies in predictive visual analytics. It first defines the scope of predictive analytics and presents a predictive visual analytics (PVA) pipeline. Following the proposed pipeline, a predictive visual analytics framework is developed to be used to explore under what circumstances a human-in-the-loop prediction process is most effective. This framework combines sentiment analysis, feature selection mechanisms, similarity comparisons and model cross-validation through a variety of interactive visualizations to support analysts in model building and prediction. To test the proposed framework, an instantiation for movie box-office prediction is developed and evaluated. Results from small-scale user studies are presented and discussed, and a generalized user study is carried out to assess the role of predictive visual analytics under a movie box-office prediction scenario.Dissertation/ThesisDoctoral Dissertation Engineering 201

    Real-time analytics and monitoring of ML-applications using visual analytics

    Get PDF
    In the quest for scientific developments and advancements in the society, machine learning applications are becoming part of almost every process in the industries. The world is heading towards the utilization of experience gained by machines. What if the experience is gained from faulty dataset or the predictions of the machine learning algorithm are wrong due to some other reason? This would corrupt the entire system and lead to an enormous loss of time and money. So, it is important to take a note of the performance of the machine learning application before deploying it. In the current scenario, understanding of machine learning model is a continued field of research. The visual analytics approach which involves interactive visualization and explorative analysis of dataset can be exploited in the model development process, as it integrates human-knowledge with the power of machines. In this thesis, a contribution is made in this area for monitoring the machine learning application in real-time and analysis of the same using Visual Analytics to address the problem of concept drift. An approach is designed and a software implementation is done for its demonstration. Interactive visualizations have been provided for the actual dataset and the predictions obtained from the machine learning model. A simulation for continuous arrival of data streams has been developed. The idea is to recognize the right point of time at which a new model needs to be trained. Tools have been integrated for further interactive analysis of this dataset. As the data from a News Agency has been used, analysis of textual data and its visualization have formed a significant part of the visual explorative analysis
    corecore