256 research outputs found

    vivid: An R package for Variable Importance and Variable Interactions Displays for Machine Learning Models

    Full text link
    We present vivid, an R package for visualizing variable importance and variable interactions in machine learning models. The package provides a range of displays including heatmap and graph-based displays for viewing variable importance and interaction jointly and partial dependence plots in both a matrix layout and an alternative layout emphasizing important variable subsets. With the intention of increasing a machine learning models' interpretability and making the work applicable to a wider readership, we discuss the design choices behind our implementation by focusing on the package structure and providing an in-depth look at the package functions and key features. We also provide a practical illustration of the software in use on a data set.Comment: 15 pages, 7 figure

    Dynamic Modulation of Local Population Activity by Rhythm Phase in Human Occipital Cortex During a Visual Search Task

    Get PDF
    Brain rhythms are more than just passive phenomena in visual cortex. For the first time, we show that the physiology underlying brain rhythms actively suppresses and releases cortical areas on a second-to-second basis during visual processing. Furthermore, their influence is specific at the scale of individual gyri. We quantified the interaction between broadband spectral change and brain rhythms on a second-to-second basis in electrocorticographic (ECoG) measurement of brain surface potentials in five human subjects during a visual search task. Comparison of visual search epochs with a blank screen baseline revealed changes in the raw potential, the amplitude of rhythmic activity, and in the decoupled broadband spectral amplitude. We present new methods to characterize the intensity and preferred phase of coupling between broadband power and band-limited rhythms, and to estimate the magnitude of rhythm-to-broadband modulation on a trial-by-trial basis. These tools revealed numerous coupling motifs between the phase of low-frequency (Ξ΄, ΞΈ, Ξ±, Ξ², and Ξ³ band) rhythms and the amplitude of broadband spectral change. In the ΞΈ and Ξ² ranges, the coupling of phase to broadband change is dynamic during visual processing, decreasing in some occipital areas and increasing in others, in a gyrally specific pattern. Finally, we demonstrate that the rhythms interact with one another across frequency ranges, and across cortical sites

    Communicating Uncertainty and Risk in Air Quality Maps

    Full text link
    Environmental sensors provide crucial data for understanding our surroundings. For example, air quality maps based on sensor readings help users make decisions to mitigate the effects of pollution on their health. Standard maps show readings from individual sensors or colored contours indicating estimated pollution levels. However, showing a single estimate may conceal uncertainty and lead to underestimation of risk, while showing sensor data yields varied interpretations. We present several visualizations of uncertainty in air quality maps, including a frequency-framing "dotmap" and small multiples, and we compare them with standard contour and sensor-based maps. In a user study, we find that including uncertainty in maps has a significant effect on how much users would choose to reduce physical activity, and that people make more cautious decisions when using uncertainty-aware maps. Additionally, we analyze think-aloud transcriptions from the experiment to understand more about how the representation of uncertainty influences people's decision-making. Our results suggest ways to design maps of sensor data that can encourage certain types of reasoning, yield more consistent responses, and convey risk better than standard maps

    Statistical Analysis in Art Conservation Research

    Get PDF
    Evaluates all components of data analysis and shows that statistical methods in conservation are vastly underutilized. Also offers specific examples of possible improvements

    Visualisation Techniques for Interpreting Machine Learning Models

    Get PDF
    With the increase of complex Machine Learning (ML) models making decisions in everyday life in a wide range of fields from economics to healthcare, the demand for Interpretable Machine Learning (IML) techniques has grown. One method to broaden the understanding of the behaviour of a fitted ML model is through the use of informative visualisations. Visualisations can aid in interpretation and can provide a more thorough examination into the nature of the predictions generated from an ML model. This is of particular importance when using so-called blackbox models, such as random forests or Bayesian Additive Regression Trees (BART) models. In this thesis, various IML approaches are proposed through the use of novel visualisations for displaying different metrics and model summaries which can be used for examining the behaviour of a fitted ML model. First, we present flexible methods for investigating variable importance, interactions, and variable effects by presenting a suite of visualisations that can aid in the interpretation of statistical and ML models through the use of model-specific and agnostic methods. Following from this, motivated in part by the lack of existing visualisation methods and by the rise in popularity of this particular model, we develop novel visualisations for examining BART models that include examining the tree structures and, through the posterior distribution, the uncertainty surrounding predictions. Lastly, we demonstrate and discuss our implementation of the R package software vivid (Variable Importance and Variable Interaction Displays) which is used to explore the behaviour of fitted ML models. Here, we focus on key package features and general architectural principles used in vivid when designing informative IML visualisations and provide a practical illustration of the package in use

    λŒ€μš©λŸ‰ 데이터 탐색을 μœ„ν•œ 점진적 μ‹œκ°ν™” μ‹œμŠ€ν…œ 섀계

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사)--μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› :κ³΅κ³ΌλŒ€ν•™ 컴퓨터곡학뢀,2020. 2. μ„œμ§„μš±.Understanding data through interactive visualization, also known as visual analytics, is a common and necessary practice in modern data science. However, as data sizes have increased at unprecedented rates, the computation latency of visualization systems becomes a significant hurdle to visual analytics. The goal of this dissertation is to design a series of systems for progressive visual analytics (PVA)β€”a visual analytics paradigm that can provide intermediate results during computation and allow visual exploration of these resultsβ€”to address the scalability hurdle. To support the interactive exploration of data with billions of records, we first introduce SwiftTuna, an interactive visualization system with scalable visualization and computation components. Our performance benchmark demonstrates that it can handle data with four billion records, giving responsive feedback every few seconds without precomputation. Second, we present PANENE, a progressive algorithm for the Approximate k-Nearest Neighbor (AKNN) problem. PANENE brings useful machine learning methods into visual analytics, which has been challenging due to their long initial latency resulting from AKNN computation. In particular, we accelerate t-Distributed Stochastic Neighbor Embedding (t-SNE), a popular non-linear dimensionality reduction technique, which enables the responsive visualization of data with a few hundred columns. Each of these two contributions aims to address the scalability issues stemming from a large number of rows or columns in data, respectively. Third, from the users' perspective, we focus on improving the trustworthiness of intermediate knowledge gained from uncertain results in PVA. We propose a novel PVA concept, Progressive Visual Analytics with Safeguards, and introduce PVA-Guards, safeguards people can leave on uncertain intermediate knowledge that needs to be verified. We also present a proof-of-concept system, ProReveal, designed and developed to integrate seven safeguards into progressive data exploration. Our user study demonstrates that people not only successfully created PVA-Guards on ProReveal but also voluntarily used PVA-Guards to manage the uncertainty of their knowledge. Finally, summarizing the three studies, we discuss design challenges for progressive systems as well as future research agendas for PVA.ν˜„λŒ€ 데이터 μ‚¬μ΄μ–ΈμŠ€μ—μ„œ μΈν„°λž™ν‹°λΈŒν•œ μ‹œκ°ν™”λ₯Ό 톡해 데이터λ₯Ό μ΄ν•΄ν•˜λŠ” 것은 ν•„μˆ˜μ μΈ 뢄석 방법 쀑 ν•˜λ‚˜μ΄λ‹€. κ·ΈλŸ¬λ‚˜, 졜근 λ°μ΄ν„°μ˜ 크기가 폭발적으둜 μ¦κ°€ν•˜λ©΄μ„œ 데이터 크기둜 인해 λ°œμƒν•˜λŠ” 지연 μ‹œκ°„μ΄ μΈν„°λž™ν‹°λΈŒν•œ μ‹œκ°μ  뢄석에 큰 걸림돌이 λ˜μ—ˆλ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” μ΄λŸ¬ν•œ ν™•μž₯μ„± 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ 점진적 μ‹œκ°μ  뢄석(Progressive Visual Analytics)을 μ§€μ›ν•˜λŠ” 일련의 μ‹œμŠ€ν…œμ„ λ””μžμΈν•˜κ³  κ°œλ°œν•œλ‹€. μ΄λŸ¬ν•œ 점진적 μ‹œκ°μ  뢄석 μ‹œμŠ€ν…œμ€ 데이터 μ²˜λ¦¬κ°€ μ™„μ „νžˆ λλ‚˜μ§€ μ•Šλ”λΌλ„ 쀑간 뢄석 κ²°κ³Όλ₯Ό μ‚¬μš©μžμ—κ²Œ μ œκ³΅ν•¨μœΌλ‘œμ¨ λ°μ΄ν„°μ˜ 크기둜 인해 λ°œμƒν•˜λŠ” 지연 μ‹œκ°„ 문제λ₯Ό μ™„ν™”ν•  수 μžˆλ‹€. 첫째둜, μˆ˜μ‹­μ–΅ 건의 행을 κ°€μ§€λŠ” 데이터λ₯Ό μ‹œκ°μ μœΌλ‘œ 탐색할 수 μžˆλŠ” SwiftTuna μ‹œμŠ€ν…œμ„ μ œμ•ˆν•œλ‹€. 데이터 처리 및 μ‹œκ°μ  ν‘œν˜„μ˜ ν™•μž₯성을 λͺ©ν‘œλ‘œ 개발된 이 μ‹œμŠ€ν…œμ€, μ•½ 40μ–΅ 건의 행을 가진 데이터에 λŒ€ν•œ μ‹œκ°ν™”λ₯Ό μ „μ²˜λ¦¬ 없이 수 μ΄ˆλ§ˆλ‹€ μ—…λ°μ΄νŠΈν•  수 μžˆλŠ” κ²ƒμœΌλ‘œ λ‚˜νƒ€λ‚¬λ‹€. λ‘˜μ§Έλ‘œ, 근사적 k-μ΅œκ·Όμ ‘μ (Approximate k-Nearest Neighbor) 문제λ₯Ό μ μ§„μ μœΌλ‘œ κ³„μ‚°ν•˜λŠ” PANENE μ•Œκ³ λ¦¬μ¦˜μ„ μ œμ•ˆν•œλ‹€. 근사적 k-μ΅œκ·Όμ ‘μ  λ¬Έμ œλŠ” μ—¬λŸ¬ 기계 ν•™μŠ΅ κΈ°λ²•μ—μ„œ μ“°μž„μ—λ„ λΆˆκ΅¬ν•˜κ³  초기 계산 μ‹œκ°„μ΄ κΈΈμ–΄μ„œ μΈν„°λž™ν‹°λΈŒν•œ μ‹œμŠ€ν…œμ— μ μš©ν•˜κΈ° νž˜λ“  ν•œκ³„κ°€ μžˆμ—ˆλ‹€. PANENE μ•Œκ³ λ¦¬μ¦˜μ€ μ΄λŸ¬ν•œ κΈ΄ 초기 계산 μ‹œκ°„μ„ 획기적으둜 κ°œμ„ ν•˜μ—¬ λ‹€μ–‘ν•œ 기계 ν•™μŠ΅ 기법을 μ‹œκ°μ  뢄석에 ν™œμš©ν•  수 μžˆλ„λ‘ ν•œλ‹€. 특히, μœ μš©ν•œ λΉ„μ„ ν˜•μ  차원 κ°μ†Œ 기법인 t-뢄포 ν™•λ₯ μ  μž„λ² λ”©(t-Distributed Stochastic Neighbor Embedding)을 κ°€μ†ν•˜μ—¬ 수백 개의 차원을 κ°€μ§€λŠ” 데이터λ₯Ό λΉ λ₯Έ μ‹œκ°„ 내에 μ‚¬μ˜ν•  수 μžˆλ‹€. μœ„μ˜ 두 μ‹œμŠ€ν…œκ³Ό μ•Œκ³ λ¦¬μ¦˜μ΄ λ°μ΄ν„°μ˜ ν–‰ λ˜λŠ” μ—΄μ˜ 개수둜 μΈν•œ ν™•μž₯μ„± 문제λ₯Ό ν•΄κ²°ν•˜κ³ μž ν–ˆλ‹€λ©΄, μ„Έ 번째 μ‹œμŠ€ν…œμ—μ„œλŠ” 점진적 μ‹œκ°μ  λΆ„μ„μ˜ 신뒰도 문제λ₯Ό κ°œμ„ ν•˜κ³ μž ν•œλ‹€. 점진적 μ‹œκ°μ  λΆ„μ„μ—μ„œ μ‚¬μš©μžμ—κ²Œ μ£Όμ–΄μ§€λŠ” 쀑간 계산 κ²°κ³ΌλŠ” μ΅œμ’… 결과의 κ·Όμ‚¬μΉ˜μ΄λ―€λ‘œ λΆˆν™•μ‹€μ„±μ΄ μ‘΄μž¬ν•œλ‹€. λ³Έ μ—°κ΅¬μ—μ„œλŠ” μ„Έμ΄ν”„κ°€λ“œλ₯Ό μ΄μš©ν•œ 점진적 μ‹œκ°μ  뢄석(Progressive Visual Analytics with Safeguards)μ΄λΌλŠ” μƒˆλ‘œμš΄ κ°œλ…μ„ μ œμ•ˆν•œλ‹€. 이 κ°œλ…μ€ μ‚¬μš©μžκ°€ 점진적 νƒμƒ‰μ—μ„œ λ§ˆμ£Όν•˜λŠ” λΆˆν™•μ‹€ν•œ 쀑간 지식에 μ„Έμ΄ν”„κ°€λ“œλ₯Ό 남길 수 μžˆλ„λ‘ ν•˜μ—¬ νƒμƒ‰μ—μ„œ 얻은 μ§€μ‹μ˜ 정확도λ₯Ό μΆ”ν›„ 검증할 수 μžˆλ„λ‘ ν•œλ‹€. λ˜ν•œ, μ΄λŸ¬ν•œ κ°œλ…μ„ μ‹€μ œλ‘œ κ΅¬ν˜„ν•˜μ—¬ νƒ‘μž¬ν•œ ProReveal μ‹œμŠ€ν…œμ„ μ†Œκ°œν•œλ‹€. ProRevealλ₯Ό μ΄μš©ν•œ μ‚¬μš©μž μ‹€ν—˜μ—μ„œ μ‚¬μš©μžλ“€μ€ μ„Έμ΄ν”„κ°€λ“œλ₯Ό μ„±κ³΅μ μœΌλ‘œ λ§Œλ“€ 수 μžˆμ—ˆμ„ 뿐만 μ•„λ‹ˆλΌ, 쀑간 μ§€μ‹μ˜ λΆˆν™•μ‹€μ„±μ„ 닀루기 μœ„ν•΄ μ„Έμ΄ν”„κ°€λ“œλ₯Ό 자발적으둜 μ΄μš©ν•œλ‹€λŠ” 것을 μ•Œ 수 μžˆμ—ˆλ‹€. λ§ˆμ§€λ§‰μœΌλ‘œ, μœ„ μ„Έ 가지 μ—°κ΅¬μ˜ κ²°κ³Όλ₯Ό μ’…ν•©ν•˜μ—¬ 점진적 μ‹œκ°μ  뢄석 μ‹œμŠ€ν…œμ„ κ΅¬ν˜„ν•  λ•Œμ˜ λ””μžμΈμ  λ‚œμ œμ™€ ν–₯ν›„ 연ꡬ λ°©ν–₯을 λͺ¨μƒ‰ν•œλ‹€.CHAPTER1. Introduction 2 1.1 Background and Motivation 2 1.2 Thesis Statement and Research Questions 5 1.3 Thesis Contributions 5 1.3.1 Responsive and Incremental Visual Exploration of Large-scale Multidimensional Data 6 1.3.2 ProgressiveComputation of Approximate k-Nearest Neighbors and Responsive t-SNE 7 1.3.3 Progressive Visual Analytics with Safeguards 8 1.4 Structure of Dissertation 9 CHAPTER2. Related Work 11 2.1 Progressive Visual Analytics 11 2.1.1 Definitions 11 2.1.2 System Latency and Human Factors 13 2.1.3 Users, Tasks, and Models 15 2.1.4 Techniques, Algorithms, and Systems. 17 2.1.5 Uncertainty Visualization 19 2.2 Approaches for Scalable Visualization Systems 20 2.3 The k-Nearest Neighbor (KNN) Problem 22 2.4 t-Distributed Stochastic Neighbor Embedding 26 CHAPTER3. SwiTuna: Responsive and Incremental Visual Exploration of Large-scale Multidimensional Data 28 3.1 The SwiTuna Design 31 3.1.1 Design Considerations 32 3.1.2 System Overview 33 3.1.3 Scalable Visualization Components 36 3.1.4 Visualization Cards 40 3.1.5 User Interface and Interaction 42 3.2 Responsive Querying 44 3.2.1 Querying Pipeline 44 3.2.2 Prompt Responses 47 3.2.3 Incremental Processing 47 3.3 Evaluation: Performance Benchmark 49 3.3.1 Study Design 49 3.3.2 Results and Discussion 52 3.4 Implementation 56 3.5 Summary 56 CHAPTER4. PANENE:AProgressive Algorithm for IndexingandQuerying Approximate k-Nearest Neighbors 58 4.1 Approximate k-Nearest Neighbor 61 4.1.1 A Sequential Algorithm 62 4.1.2 An Online Algorithm 63 4.1.3 A Progressive Algorithm 66 4.1.4 Filtered AKNN Search 71 4.2 k-Nearest Neighbor Lookup Table 72 4.3 Benchmark. 78 4.3.1 Online and Progressive k-d Trees 78 4.3.2 k-Nearest Neighbor Lookup Tables 83 4.4 Applications 85 4.4.1 Progressive Regression and Density Estimation 85 4.4.2 Responsive t-SNE 87 4.5 Implementation 92 4.6 Discussion 92 4.7 Summary 93 CHAPTER5. ProReveal: Progressive Visual Analytics with Safeguards 95 5.1 Progressive Visual Analytics with Safeguards 98 5.1.1 Definition 98 5.1.2 Examples 101 5.1.3 Design Considerations 103 5.2 ProReveal 105 5.3 Evaluation 121 5.4 Discussion 127 5.5 Summary 130 CHAPTER6. Discussion 132 6.1 Lessons Learned 132 6.2 Limitations 135 CHAPTER7. Conclusion 137 7.1 Thesis Contributions Revisited 137 7.2 Future Research Agenda 139 7.3 Final Remarks 141 Abstract (Korean) 155 Acknowledgments (Korean) 157Docto
    • …
    corecore