1 research outputs found
λμ©λ λ°μ΄ν° νμμ μν μ μ§μ μκ°ν μμ€ν μ€κ³
νμλ
Όλ¬Έ(λ°μ¬)--μμΈλνκ΅ λνμ :곡과λν μ»΄ν¨ν°κ³΅νλΆ,2020. 2. μμ§μ±.Understanding data through interactive visualization, also known as visual analytics, is a common and necessary practice in modern data science. However, as data sizes have increased at unprecedented rates, the computation latency of visualization systems becomes a significant hurdle to visual analytics. The goal of this dissertation is to design a series of systems for progressive visual analytics (PVA)βa visual analytics paradigm that can provide intermediate results during computation and allow visual exploration of these resultsβto address the scalability hurdle. To support the interactive exploration of data with billions of records, we first introduce SwiftTuna, an interactive visualization system with scalable visualization and computation components. Our performance benchmark demonstrates that it can handle data with four billion records, giving responsive feedback every few seconds without precomputation. Second, we present PANENE, a progressive algorithm for the Approximate k-Nearest Neighbor (AKNN) problem. PANENE brings useful machine learning methods into visual analytics, which has been challenging due to their long initial latency resulting from AKNN computation. In particular, we accelerate t-Distributed Stochastic Neighbor Embedding (t-SNE), a popular non-linear dimensionality reduction technique, which enables the responsive visualization of data with a few hundred columns. Each of these two contributions aims to address the scalability issues stemming from a large number of rows or columns in data, respectively. Third, from the users' perspective, we focus on improving the trustworthiness of intermediate knowledge gained from uncertain results in PVA. We propose a novel PVA concept, Progressive Visual Analytics with Safeguards, and introduce PVA-Guards, safeguards people can leave on uncertain intermediate knowledge that needs to be verified. We also present a proof-of-concept system, ProReveal, designed and developed to integrate seven safeguards into progressive data exploration. Our user study demonstrates that people not only successfully created PVA-Guards on ProReveal but also voluntarily used PVA-Guards to manage the uncertainty of their knowledge. Finally, summarizing the three studies, we discuss design challenges for progressive systems as well as future research agendas for PVA.νλ λ°μ΄ν° μ¬μ΄μΈμ€μμ μΈν°λν°λΈν μκ°νλ₯Ό ν΅ν΄ λ°μ΄ν°λ₯Ό μ΄ν΄νλ κ²μ νμμ μΈ λΆμ λ°©λ² μ€ νλμ΄λ€. κ·Έλ¬λ, μ΅κ·Ό λ°μ΄ν°μ ν¬κΈ°κ° νλ°μ μΌλ‘ μ¦κ°νλ©΄μ λ°μ΄ν° ν¬κΈ°λ‘ μΈν΄ λ°μνλ μ§μ° μκ°μ΄ μΈν°λν°λΈν μκ°μ λΆμμ ν° κ±Έλ¦Όλμ΄ λμλ€. λ³Έ μ°κ΅¬μμλ μ΄λ¬ν νμ₯μ± λ¬Έμ λ₯Ό ν΄κ²°νκΈ° μν΄ μ μ§μ μκ°μ λΆμ(Progressive Visual Analytics)μ μ§μνλ μΌλ ¨μ μμ€ν
μ λμμΈνκ³ κ°λ°νλ€. μ΄λ¬ν μ μ§μ μκ°μ λΆμ μμ€ν
μ λ°μ΄ν° μ²λ¦¬κ° μμ ν λλμ§ μλλΌλ μ€κ° λΆμ κ²°κ³Όλ₯Ό μ¬μ©μμκ² μ 곡ν¨μΌλ‘μ¨ λ°μ΄ν°μ ν¬κΈ°λ‘ μΈν΄ λ°μνλ μ§μ° μκ° λ¬Έμ λ₯Ό μνν μ μλ€. 첫째λ‘, μμμ΅ κ±΄μ νμ κ°μ§λ λ°μ΄ν°λ₯Ό μκ°μ μΌλ‘ νμν μ μλ SwiftTuna μμ€ν
μ μ μνλ€. λ°μ΄ν° μ²λ¦¬ λ° μκ°μ ννμ νμ₯μ±μ λͺ©νλ‘ κ°λ°λ μ΄ μμ€ν
μ, μ½ 40μ΅ κ±΄μ νμ κ°μ§ λ°μ΄ν°μ λν μκ°νλ₯Ό μ μ²λ¦¬ μμ΄ μ μ΄λ§λ€ μ
λ°μ΄νΈν μ μλ κ²μΌλ‘ λνλ¬λ€. λμ§Έλ‘, κ·Όμ¬μ k-μ΅κ·Όμ μ (Approximate k-Nearest Neighbor) λ¬Έμ λ₯Ό μ μ§μ μΌλ‘ κ³μ°νλ PANENE μκ³ λ¦¬μ¦μ μ μνλ€. κ·Όμ¬μ k-μ΅κ·Όμ μ λ¬Έμ λ μ¬λ¬ κΈ°κ³ νμ΅ κΈ°λ²μμ μ°μμλ λΆκ΅¬νκ³ μ΄κΈ° κ³μ° μκ°μ΄ κΈΈμ΄μ μΈν°λν°λΈν μμ€ν
μ μ μ©νκΈ° νλ νκ³κ° μμλ€. PANENE μκ³ λ¦¬μ¦μ μ΄λ¬ν κΈ΄ μ΄κΈ° κ³μ° μκ°μ νκΈ°μ μΌλ‘ κ°μ νμ¬ λ€μν κΈ°κ³ νμ΅ κΈ°λ²μ μκ°μ λΆμμ νμ©ν μ μλλ‘ νλ€. νΉν, μ μ©ν λΉμ νμ μ°¨μ κ°μ κΈ°λ²μΈ t-λΆν¬ νλ₯ μ μλ² λ©(t-Distributed Stochastic Neighbor Embedding)μ κ°μνμ¬ μλ°± κ°μ μ°¨μμ κ°μ§λ λ°μ΄ν°λ₯Ό λΉ λ₯Έ μκ° λ΄μ μ¬μν μ μλ€. μμ λ μμ€ν
κ³Ό μκ³ λ¦¬μ¦μ΄ λ°μ΄ν°μ ν λλ μ΄μ κ°μλ‘ μΈν νμ₯μ± λ¬Έμ λ₯Ό ν΄κ²°νκ³ μ νλ€λ©΄, μΈ λ²μ§Έ μμ€ν
μμλ μ μ§μ μκ°μ λΆμμ μ λ’°λ λ¬Έμ λ₯Ό κ°μ νκ³ μ νλ€. μ μ§μ μκ°μ λΆμμμ μ¬μ©μμκ² μ£Όμ΄μ§λ μ€κ° κ³μ° κ²°κ³Όλ μ΅μ’
κ²°κ³Όμ κ·Όμ¬μΉμ΄λ―λ‘ λΆνμ€μ±μ΄ μ‘΄μ¬νλ€. λ³Έ μ°κ΅¬μμλ μΈμ΄νκ°λλ₯Ό μ΄μ©ν μ μ§μ μκ°μ λΆμ(Progressive Visual Analytics with Safeguards)μ΄λΌλ μλ‘μ΄ κ°λ
μ μ μνλ€. μ΄ κ°λ
μ μ¬μ©μκ° μ μ§μ νμμμ λ§μ£Όνλ λΆνμ€ν μ€κ° μ§μμ μΈμ΄νκ°λλ₯Ό λ¨κΈΈ μ μλλ‘ νμ¬ νμμμ μ»μ μ§μμ μ νλλ₯Ό μΆν κ²μ¦ν μ μλλ‘ νλ€. λν, μ΄λ¬ν κ°λ
μ μ€μ λ‘ κ΅¬ννμ¬ νμ¬ν ProReveal μμ€ν
μ μκ°νλ€. ProRevealλ₯Ό μ΄μ©ν μ¬μ©μ μ€νμμ μ¬μ©μλ€μ μΈμ΄νκ°λλ₯Ό μ±κ³΅μ μΌλ‘ λ§λ€ μ μμμ λΏλ§ μλλΌ, μ€κ° μ§μμ λΆνμ€μ±μ λ€λ£¨κΈ° μν΄ μΈμ΄νκ°λλ₯Ό μλ°μ μΌλ‘ μ΄μ©νλ€λ κ²μ μ μ μμλ€. λ§μ§λ§μΌλ‘, μ μΈ κ°μ§ μ°κ΅¬μ κ²°κ³Όλ₯Ό μ’
ν©νμ¬ μ μ§μ μκ°μ λΆμ μμ€ν
μ ꡬνν λμ λμμΈμ λμ μ ν₯ν μ°κ΅¬ λ°©ν₯μ λͺ¨μνλ€.CHAPTER1. Introduction 2
1.1 Background and Motivation 2
1.2 Thesis Statement and Research Questions 5
1.3 Thesis Contributions 5
1.3.1 Responsive and Incremental Visual Exploration of Large-scale Multidimensional Data 6
1.3.2 ProgressiveComputation of Approximate k-Nearest Neighbors and Responsive t-SNE 7
1.3.3 Progressive Visual Analytics with Safeguards 8
1.4 Structure of Dissertation 9
CHAPTER2. Related Work 11
2.1 Progressive Visual Analytics 11
2.1.1 Definitions 11
2.1.2 System Latency and Human Factors 13
2.1.3 Users, Tasks, and Models 15
2.1.4 Techniques, Algorithms, and Systems. 17
2.1.5 Uncertainty Visualization 19
2.2 Approaches for Scalable Visualization Systems 20
2.3 The k-Nearest Neighbor (KNN) Problem 22
2.4 t-Distributed Stochastic Neighbor Embedding 26
CHAPTER3. SwiTuna: Responsive and Incremental Visual Exploration of Large-scale Multidimensional Data 28
3.1 The SwiTuna Design 31
3.1.1 Design Considerations 32
3.1.2 System Overview 33
3.1.3 Scalable Visualization Components 36
3.1.4 Visualization Cards 40
3.1.5 User Interface and Interaction 42
3.2 Responsive Querying 44
3.2.1 Querying Pipeline 44
3.2.2 Prompt Responses 47
3.2.3 Incremental Processing 47
3.3 Evaluation: Performance Benchmark 49
3.3.1 Study Design 49
3.3.2 Results and Discussion 52
3.4 Implementation 56
3.5 Summary 56
CHAPTER4. PANENE:AProgressive Algorithm for IndexingandQuerying Approximate k-Nearest Neighbors 58
4.1 Approximate k-Nearest Neighbor 61
4.1.1 A Sequential Algorithm 62
4.1.2 An Online Algorithm 63
4.1.3 A Progressive Algorithm 66
4.1.4 Filtered AKNN Search 71
4.2 k-Nearest Neighbor Lookup Table 72
4.3 Benchmark. 78
4.3.1 Online and Progressive k-d Trees 78
4.3.2 k-Nearest Neighbor Lookup Tables 83
4.4 Applications 85
4.4.1 Progressive Regression and Density Estimation 85
4.4.2 Responsive t-SNE 87
4.5 Implementation 92
4.6 Discussion 92
4.7 Summary 93
CHAPTER5. ProReveal: Progressive Visual Analytics with Safeguards 95
5.1 Progressive Visual Analytics with Safeguards 98
5.1.1 Definition 98
5.1.2 Examples 101
5.1.3 Design Considerations 103
5.2 ProReveal 105
5.3 Evaluation 121
5.4 Discussion 127
5.5 Summary 130
CHAPTER6. Discussion 132
6.1 Lessons Learned 132
6.2 Limitations 135
CHAPTER7. Conclusion 137
7.1 Thesis Contributions Revisited 137
7.2 Future Research Agenda 139
7.3 Final Remarks 141
Abstract (Korean) 155
Acknowledgments (Korean) 157Docto