Search CORE

256 research outputs found

vivid: An R package for Variable Importance and Variable Interactions Displays for Machine Learning Models

Author: Hurley Catherine
Inglis Alan
Parnell Andrew
Publication venue
Publication date: 23/06/2023
Field of study

We present vivid, an R package for visualizing variable importance and variable interactions in machine learning models. The package provides a range of displays including heatmap and graph-based displays for viewing variable importance and interaction jointly and partial dependence plots in both a matrix layout and an alternative layout emphasizing important variable subsets. With the intention of increasing a machine learning models' interpretability and making the work applicable to a wider readership, we discuss the design choices behind our implementation by focusing on the package structure and providing an in-depth look at the package functions and key features. We also provide a practical illustration of the software in use on a data set.Comment: 15 pages, 7 figure

arXiv.org e-Print Archive

Choropleth Maps Can Convey Absolute Magnitude Through the Range of the Accompanying Colour Legend

Author: Bradley Duncan
Jay Caroline
Stewart Andrew
Zhang Boshuo
Publication venue
Publication date: 06/10/2023
Field of study

The University of Manchester - Institutional Repository

Dynamic Modulation of Local Population Activity by Rhythm Phase in Human Occipital Cortex During a Visual Search Task

Author: Adam O Hebb
Christopher J Honey
Dora eHermes
Eberhard E Fetz
Eberhard E Fetz
Eric C Leuthardt
Eric C Leuthardt
Jeffrey G Ojemann
Kai J Miller
Kai J Miller
Kai J Miller
Kai J Miller
Marcel eDen Nijs
Mohit eSharma
Rajesh P N Rao
Rajesh P N Rao
Scott eMakeig
Terrence J Sejnowski
Publication venue: Frontiers Research Foundation
Publication date: 01/01/2010
Field of study

Brain rhythms are more than just passive phenomena in visual cortex. For the first time, we show that the physiology underlying brain rhythms actively suppresses and releases cortical areas on a second-to-second basis during visual processing. Furthermore, their influence is specific at the scale of individual gyri. We quantified the interaction between broadband spectral change and brain rhythms on a second-to-second basis in electrocorticographic (ECoG) measurement of brain surface potentials in five human subjects during a visual search task. Comparison of visual search epochs with a blank screen baseline revealed changes in the raw potential, the amplitude of rhythmic activity, and in the decoupled broadband spectral amplitude. We present new methods to characterize the intensity and preferred phase of coupling between broadband power and band-limited rhythms, and to estimate the magnitude of rhythm-to-broadband modulation on a trial-by-trial basis. These tools revealed numerous coupling motifs between the phase of low-frequency (δ, θ, α, β, and γ band) rhythms and the amplitude of broadband spectral change. In the θ and β ranges, the coupling of phase to broadband change is dynamic during visual processing, decreasing in some occipital areas and increasing in others, in a gyrally specific pattern. Finally, we demonstrate that the rhythms interact with one another across frequency ranges, and across cortical sites

Crossref

Directory of Open Access Journals

PubMed Central

Frontiers - Publisher Connector

Digital Commons@Becker

eScholarship - University of California

Communicating Uncertainty and Risk in Air Quality Maps

Author: Ma Kwan-Liu
Preston Annie
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 03/05/2022
Field of study

Environmental sensors provide crucial data for understanding our surroundings. For example, air quality maps based on sensor readings help users make decisions to mitigate the effects of pollution on their health. Standard maps show readings from individual sensors or colored contours indicating estimated pollution levels. However, showing a single estimate may conceal uncertainty and lead to underestimation of risk, while showing sensor data yields varied interpretations. We present several visualizations of uncertainty in air quality maps, including a frequency-framing "dotmap" and small multiples, and we compare them with standard contour and sensor-based maps. In a user study, we find that including uncertainty in maps has a significant effect on how much users would choose to reduce physical activity, and that people make more cautious decisions when using uncertainty-aware maps. Additionally, we analyze think-aloud transcriptions from the experiment to understand more about how the representation of uncertainty influences people's decision-making. Our results suggest ways to design maps of sensor data that can encourage certain types of reasoning, yield more consistent responses, and convey risk better than standard maps

arXiv.org e-Print Archive

Statistical Analysis in Art Conservation Research

Author: Chandra L. Reedy
Terry J. Reedy
Publication venue: J. Paul Getty Trust
Publication date: 01/01/1988
Field of study

Evaluates all components of data analysis and shows that statistical methods in conservation are vastly underutilized. Also offers specific examples of possible improvements

IssueLab

Visualisation Techniques for Interpreting Machine Learning Models

Author: Inglis Alan
Publication venue
Publication date: 01/01/2022
Field of study

With the increase of complex Machine Learning (ML) models making decisions in everyday life in a wide range of fields from economics to healthcare, the demand for Interpretable Machine Learning (IML) techniques has grown. One method to broaden the understanding of the behaviour of a fitted ML model is through the use of informative visualisations. Visualisations can aid in interpretation and can provide a more thorough examination into the nature of the predictions generated from an ML model. This is of particular importance when using so-called blackbox models, such as random forests or Bayesian Additive Regression Trees (BART) models. In this thesis, various IML approaches are proposed through the use of novel visualisations for displaying different metrics and model summaries which can be used for examining the behaviour of a fitted ML model. First, we present flexible methods for investigating variable importance, interactions, and variable effects by presenting a suite of visualisations that can aid in the interpretation of statistical and ML models through the use of model-specific and agnostic methods. Following from this, motivated in part by the lack of existing visualisation methods and by the rise in popularity of this particular model, we develop novel visualisations for examining BART models that include examining the tree structures and, through the posterior distribution, the uncertainty surrounding predictions. Lastly, we demonstrate and discuss our implementation of the R package software vivid (Variable Importance and Variable Interaction Displays) which is used to explore the behaviour of fitted ML models. Here, we focus on key package features and general architectural principles used in vivid when designing informative IML visualisations and provide a practical illustration of the package in use

MURAL - Maynooth University Research Archive Library

대용량 데이터 탐색을 위한 점진적 시각화 시스템 설계

Author: 조재민
Publication venue: 서울대학교 대학원
Publication date: 01/02/2020
Field of study

학위논문(박사)--서울대학교 대학원 :공과대학 컴퓨터공학부,2020. 2. 서진욱.Understanding data through interactive visualization, also known as visual analytics, is a common and necessary practice in modern data science. However, as data sizes have increased at unprecedented rates, the computation latency of visualization systems becomes a significant hurdle to visual analytics. The goal of this dissertation is to design a series of systems for progressive visual analytics (PVA)—a visual analytics paradigm that can provide intermediate results during computation and allow visual exploration of these results—to address the scalability hurdle. To support the interactive exploration of data with billions of records, we first introduce SwiftTuna, an interactive visualization system with scalable visualization and computation components. Our performance benchmark demonstrates that it can handle data with four billion records, giving responsive feedback every few seconds without precomputation. Second, we present PANENE, a progressive algorithm for the Approximate k-Nearest Neighbor (AKNN) problem. PANENE brings useful machine learning methods into visual analytics, which has been challenging due to their long initial latency resulting from AKNN computation. In particular, we accelerate t-Distributed Stochastic Neighbor Embedding (t-SNE), a popular non-linear dimensionality reduction technique, which enables the responsive visualization of data with a few hundred columns. Each of these two contributions aims to address the scalability issues stemming from a large number of rows or columns in data, respectively. Third, from the users' perspective, we focus on improving the trustworthiness of intermediate knowledge gained from uncertain results in PVA. We propose a novel PVA concept, Progressive Visual Analytics with Safeguards, and introduce PVA-Guards, safeguards people can leave on uncertain intermediate knowledge that needs to be verified. We also present a proof-of-concept system, ProReveal, designed and developed to integrate seven safeguards into progressive data exploration. Our user study demonstrates that people not only successfully created PVA-Guards on ProReveal but also voluntarily used PVA-Guards to manage the uncertainty of their knowledge. Finally, summarizing the three studies, we discuss design challenges for progressive systems as well as future research agendas for PVA.현대 데이터 사이언스에서 인터랙티브한 시각화를 통해 데이터를 이해하는 것은 필수적인 분석 방법 중 하나이다. 그러나, 최근 데이터의 크기가 폭발적으로 증가하면서 데이터 크기로 인해 발생하는 지연 시간이 인터랙티브한 시각적 분석에 큰 걸림돌이 되었다. 본 연구에서는 이러한 확장성 문제를 해결하기 위해 점진적 시각적 분석(Progressive Visual Analytics)을 지원하는 일련의 시스템을 디자인하고 개발한다. 이러한 점진적 시각적 분석 시스템은 데이터 처리가 완전히 끝나지 않더라도 중간 분석 결과를 사용자에게 제공함으로써 데이터의 크기로 인해 발생하는 지연 시간 문제를 완화할 수 있다. 첫째로, 수십억 건의 행을 가지는 데이터를 시각적으로 탐색할 수 있는 SwiftTuna 시스템을 제안한다. 데이터 처리 및 시각적 표현의 확장성을 목표로 개발된 이 시스템은, 약 40억 건의 행을 가진 데이터에 대한 시각화를 전처리 없이 수 초마다 업데이트할 수 있는 것으로 나타났다. 둘째로, 근사적 k-최근접점(Approximate k-Nearest Neighbor) 문제를 점진적으로 계산하는 PANENE 알고리즘을 제안한다. 근사적 k-최근접점 문제는 여러 기계 학습 기법에서 쓰임에도 불구하고 초기 계산 시간이 길어서 인터랙티브한 시스템에 적용하기 힘든 한계가 있었다. PANENE 알고리즘은 이러한 긴 초기 계산 시간을 획기적으로 개선하여 다양한 기계 학습 기법을 시각적 분석에 활용할 수 있도록 한다. 특히, 유용한 비선형적 차원 감소 기법인 t-분포 확률적 임베딩(t-Distributed Stochastic Neighbor Embedding)을 가속하여 수백 개의 차원을 가지는 데이터를 빠른 시간 내에 사영할 수 있다. 위의 두 시스템과 알고리즘이 데이터의 행 또는 열의 개수로 인한 확장성 문제를 해결하고자 했다면, 세 번째 시스템에서는 점진적 시각적 분석의 신뢰도 문제를 개선하고자 한다. 점진적 시각적 분석에서 사용자에게 주어지는 중간 계산 결과는 최종 결과의 근사치이므로 불확실성이 존재한다. 본 연구에서는 세이프가드를 이용한 점진적 시각적 분석(Progressive Visual Analytics with Safeguards)이라는 새로운 개념을 제안한다. 이 개념은 사용자가 점진적 탐색에서 마주하는 불확실한 중간 지식에 세이프가드를 남길 수 있도록 하여 탐색에서 얻은 지식의 정확도를 추후 검증할 수 있도록 한다. 또한, 이러한 개념을 실제로 구현하여 탑재한 ProReveal 시스템을 소개한다. ProReveal를 이용한 사용자 실험에서 사용자들은 세이프가드를 성공적으로 만들 수 있었을 뿐만 아니라, 중간 지식의 불확실성을 다루기 위해 세이프가드를 자발적으로 이용한다는 것을 알 수 있었다. 마지막으로, 위 세 가지 연구의 결과를 종합하여 점진적 시각적 분석 시스템을 구현할 때의 디자인적 난제와 향후 연구 방향을 모색한다.CHAPTER1. Introduction 2 1.1 Background and Motivation 2 1.2 Thesis Statement and Research Questions 5 1.3 Thesis Contributions 5 1.3.1 Responsive and Incremental Visual Exploration of Large-scale Multidimensional Data 6 1.3.2 ProgressiveComputation of Approximate k-Nearest Neighbors and Responsive t-SNE 7 1.3.3 Progressive Visual Analytics with Safeguards 8 1.4 Structure of Dissertation 9 CHAPTER2. Related Work 11 2.1 Progressive Visual Analytics 11 2.1.1 Definitions 11 2.1.2 System Latency and Human Factors 13 2.1.3 Users, Tasks, and Models 15 2.1.4 Techniques, Algorithms, and Systems. 17 2.1.5 Uncertainty Visualization 19 2.2 Approaches for Scalable Visualization Systems 20 2.3 The k-Nearest Neighbor (KNN) Problem 22 2.4 t-Distributed Stochastic Neighbor Embedding 26 CHAPTER3. SwiTuna: Responsive and Incremental Visual Exploration of Large-scale Multidimensional Data 28 3.1 The SwiTuna Design 31 3.1.1 Design Considerations 32 3.1.2 System Overview 33 3.1.3 Scalable Visualization Components 36 3.1.4 Visualization Cards 40 3.1.5 User Interface and Interaction 42 3.2 Responsive Querying 44 3.2.1 Querying Pipeline 44 3.2.2 Prompt Responses 47 3.2.3 Incremental Processing 47 3.3 Evaluation: Performance Benchmark 49 3.3.1 Study Design 49 3.3.2 Results and Discussion 52 3.4 Implementation 56 3.5 Summary 56 CHAPTER4. PANENE:AProgressive Algorithm for IndexingandQuerying Approximate k-Nearest Neighbors 58 4.1 Approximate k-Nearest Neighbor 61 4.1.1 A Sequential Algorithm 62 4.1.2 An Online Algorithm 63 4.1.3 A Progressive Algorithm 66 4.1.4 Filtered AKNN Search 71 4.2 k-Nearest Neighbor Lookup Table 72 4.3 Benchmark. 78 4.3.1 Online and Progressive k-d Trees 78 4.3.2 k-Nearest Neighbor Lookup Tables 83 4.4 Applications 85 4.4.1 Progressive Regression and Density Estimation 85 4.4.2 Responsive t-SNE 87 4.5 Implementation 92 4.6 Discussion 92 4.7 Summary 93 CHAPTER5. ProReveal: Progressive Visual Analytics with Safeguards 95 5.1 Progressive Visual Analytics with Safeguards 98 5.1.1 Definition 98 5.1.2 Examples 101 5.1.3 Design Considerations 103 5.2 ProReveal 105 5.3 Evaluation 121 5.4 Discussion 127 5.5 Summary 130 CHAPTER6. Discussion 132 6.1 Lessons Learned 132 6.2 Limitations 135 CHAPTER7. Conclusion 137 7.1 Thesis Contributions Revisited 137 7.2 Future Research Agenda 139 7.3 Final Remarks 141 Abstract (Korean) 155 Acknowledgments (Korean) 157Docto

SNU Open Repository and Archive