9 research outputs found

    Visual Analysis of Popping in Progressive Visualization

    Get PDF
    Progressive visualization allows users to examine intermediate results while they are further refined in the background. This makes them increasingly popular when dealing with large data and computationally expensive tasks. The characteristics of how preliminary visualizations evolve over time are crucial for efficient analysis; in particular unexpected disruptive changes betweeniterations can significantly hamper the user experience. This paper proposes a visualization framework to analyze the refinement behavior of progressive visualization. We particularly focus on sudden significant changes between the iterations, which we denote as popping artifacts, in reference to undesirable visual effects in the context of level of detail representations in computergraphics. Our visualization approach conveys where in image space and when during the refinement popping artifacts occur. It allows to compare across different runs of stochastic processes, and supports parameter studies for gaining further insights and tuning the algorithms under consideration. We demonstrate the application of our framework and its effectiveness via twodiverse use cases with underlying stochastic processes: adaptive image space sampling, and the generation of grid layouts

    SHARQL: Shape Analysis of Recursive SPARQL Queries

    Get PDF
    International audienceWe showcase SHARQL, a system that allows to navigate SPARQL query logs, can inspect complex queries by visualizing their shape, and can serve as a back-end to flexibly produce statistics about the logs. Even though SPARQL query logs are increasingly available and have become public recently, their navigation and analysis is hampered by the lack of appropriate tools. SPARQL queries are sometimes hard to understand and their inherent properties, such as their shape, their hypertree properties, and their property paths are even more difficult to be identified and properly rendered. In SHARQL, we show how the analysis and exploration of several hundred million queries is possible. We offer edge rendering which works with complex hyperedges, regular edges, and property paths of SPARQL queries. The underlying database stores more than one hundred attributes per query and is therefore extremely flexible for exploring the query logs and as a back-end to compute and display analytical properties of the entire logs or parts thereof

    ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•œ ์ฐจ์› ์ถ•์†Œ๋ฅผ ์œ„ํ•œ UMAP ๊ฐœ์„  ๋ฐฉ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021.8. ๊ณ ํ˜•๊ถŒ.One e ective way of understanding the characteristics of high-dimensional data is to embed it onto a low-dimensional space. Among many existing dimensionality reduction algorithms, Uniform Manifold Approximation and Projection (UMAP) has gained the most attention because of its fast and stable projection result. However, still it is too slow to be adopted for an interactive visual analytics system as it takes for a few minutes to embed even for a toy dataset (e.g., MNIST). Moreover, UMAP is vulnerable to di erent configurations of yperparameters, especially to the initialization methods and the number of epochs, which can bring about a serious bias mining insights from the embedding result. To achieve the responsiveness, we propose a progressive algorithm for UMAP, called Progressive UMAP, for the exploration of datasets by updating the embedding with a batch of points through a progressive computation. Next, to guarantee less biases and the robustness in the embedding, we present a novel dimensionality reduction algorithm called Uniform Manifold Approximation with Twophase Optimization (UMATO). We discover that the vulnerability comes from the approximation of cross-entropy loss function. UMATO, instead, takes a two-phase optimization approach: global optimization to obtain the overall skeleton of data, and local optimization to identify regional characteristics of a local area. In our experiment with one synthetic and three real-world datasets, UMATO outperformed widely-used baseline algorithms, such as PCA, t-SNE, UMAP, topological autoencoders and Anchor t-SNE, in terms of global quality metrics and 2D projection results. We further examine a case study of UMATO on real-world biological data and the extension to multi-phase optimization. Our work makes the original contributions to the field of dimensionality reduction, as well as the progressive visual analytics. Lastly, the thesis discusses the future research directions for improving the proposed algorithms.๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ์„ ํŒŒ์•…ํ•˜๋Š” ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜๋Š” ์ €์ฐจ์› ๊ณต๊ฐ„์— ์ž„๋ฒ ๋”ฉ์„ ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๋งŽ์€ ์ฐจ์› ์ถ•์†Œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์žˆ์ง€๋งŒ, ๊ท ์ผ ๋งค๋‹ˆํด๋“œ ๊ทผ์‚ฌ ๋ฐ ํˆฌ์˜๋ฒ• (UMAP)์€ ๋น ๋ฅธ ์†๋„์™€ ์•ˆ์ •์ ์ธ ํˆฌ์˜ ๊ฒฐ๊ณผ๋กœ ์ธํ•ด ๋งŽ์€ ์ฃผ๋ชฉ์„ ๋ฐ›์•˜๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ˜„์žฌ์˜ UMAP์€ ์‹คํ—˜์šฉ ๋ฐ์ดํ„ฐ ์…‹์ธ MNIST์—๋„ ์ˆ˜ ๋ถ„์ด ๊ฑธ๋ฆฌ๋Š” ๋“ฑ, ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ์‹œ๊ฐ์  ๋ถ„์„ ์‹œ์Šคํ…œ์— ๋„์ž…๋˜๊ธฐ์—๋Š” ๋„ˆ๋ฌด ๋Š๋ฆฌ๋‹ค. ๋˜ํ•œ UMAP์€ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •์ด (ํŠนํžˆ, ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•๊ณผ epoch ์ˆ˜) ๋‹ฌ๋ผ์ง€๋Š” ๊ฒƒ์— ์ทจ์•ฝํ•œ๋ฐ, ์ด๊ฒƒ์€ ์ž„๋ฒ ๋”ฉ ๊ฒฐ๊ณผ๋กœ ๋ถ€ํ„ฐ ํ†ต์ฐฐ์„ ์–ป๋Š” ๊ณผ์ •์—์„œ ํฐ ์˜ค๋ฅ˜๋ฅผ ๋ฒ”ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค. UMAP์˜ ์ฆ‰๊ฐ์ ์ธ ๋ฐ˜์‘์„ฑ์„ ์–ป๊ธฐ ์œ„ํ•ด์„œ, UMAP์˜ ์ ์ง„์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ Progressive UMAP์„ ์ œ์•ˆํ•œ๋‹ค. ์ด๋กœ์จ ํ•œ ๋ฐฐ์น˜์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•  ๋•Œ๋งˆ๋‹ค ์ž„๋ฒ ๋”ฉ ๊ฒฐ๊ณผ๋ฅผ ์—…๋ฐ์ดํŠธ ํ•˜๊ฒŒ๋˜๋Š” ์ ์ง„์ ์ธ ๊ณ„์‚ฐ์ด ๊ฐ€๋Šฅํ•ด์ง„๋‹ค. ๋‹ค์Œ์œผ๋กœ ์ ์€ ํŽธํ–ฅ๊ณผ ๊ฐ•๊ฑดํ•œ ์ž„๋ฒ ๋”ฉ์„ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด UMATO๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋จผ์ € ์šฐ๋ฆฌ๋Š” ์ด๋Ÿฌํ•œ ์ทจ์•ฝํ•จ์ด ์ตœ์ ํ™”๋ฅผ ๊ทผ์‚ฌํ•˜๋Š” ๊ณผ์ •์—์„œ ์ผ์–ด๋‚˜๋Š” ๊ฒƒ์„ ๋ฐํžŒ๋‹ค. UMATO๋Š”, UMAP๊ณผ ๋‹ค๋ฅด๊ฒŒ, ๋‘ ๋‹จ๊ณ„์— ๊ฑธ์นœ ์ตœ์ ํ™”๋ฅผ ํ†ตํ•ด์„œ ์ฒ˜์Œ์œผ๋กœ ์ „์ฒด์ ์ธ ๊ตฌ์กฐ๋ฅผ ์žก๊ณ , ๊ทธ ๋‹ค์Œ ์ง€์—ญ์  ํŠน์„ฑ์„ ํŒŒ์•…ํ•œ๋‹ค. ์‹คํ—˜์„ ํ†ตํ•ด UMATO๊ฐ€ PCA, t-SNE, UMAP, topological autoencoders ๊ทธ๋ฆฌ๊ณ  Anchort-SNE์™€ ๊ฐ™์€ ๊ธฐ์กด ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋น„ํ•ด ์ „์ฒด ๊ตฌ์กฐ ํ‰๊ฐ€ ์ง€ํ‘œ์™€ 2์ฐจ์› ์ž„๋ฒ ๋”ฉ ๊ฒฐ๊ณผ์—์„œ ๋” ๋‚˜์Œ์„ ๋ณด์ธ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ ์—ฌ๋Ÿฌ ๋‹จ๊ณ„๋กœ ์ตœ์ ํ™” ํ•˜๋Š” ๊ฒƒ๊ณผ ์ž„๋ฒ ๋”ฉ์˜ ์•ˆ์ •์„ฑ ์—ญ์‹œ ์‹คํ—˜์œผ๋กœ ํŒŒ์•…ํ•œ๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ์ฐจ์› ์ถ•์†Œ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ ์ง„์  ์‹œ๊ฐํ™” ๋ถ„์•ผ์—๋„ ๋…์ฐฝ์ ์ธ ๊ณตํ—Œ์„ ํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ์—ฐ๊ตฌ์˜ ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์„ ๋„๋ชจํ•œ๋‹ค.CHAPTER 1 Introduction 1 1.1 Motivation 1 1.2 Research Questions and Approaches 2 1.2.1 Progressive Algorithm for UMAP 3 1.2.2 Less Biased and Robust Dimensionality Reduction Algorithm 4 1.3 Contributions 4 1.4 Thesis Overview 5 CHAPTER 2 Background: UMAP 6 2.1 Graph Construction 6 2.2 Layout Optimization 7 CHAPTER 3 Progressive UMAP: A Progressive Algorithm for UMAP 10 3.1 Introduction 10 3.2 Related Work 11 3.2.1 Progressive Visual Analytics 11 3.3 Progressive UMAP 13 3.3.1 Computing Ni 14 3.3.2 Computing ฯi and ฯƒi 14 3.3.3 Layout Initialization 14 3.3.4 Layout Optimization 15 3.4 Evaluation and Discussion 15 3.5 Summary 18 CHAPTER 4 UMATO: A Less Biased and Robust Dimensionality Reduction Algorithm Based on UMAP 19 4.1 Introduction 19 4.2 Related Work 22 4.2.1 Dimensionality Reduction 22 4.2.2 Hubs, landmarks, and anchors 23 4.3 The Meaning of Using Di erent Loss Functions in Dimensionality Reduction 25 4.3.1 t-SNE 25 4.4 UMATO 27 4.4.1 Points Classification 28 4.4.2 Global Optimization 29 4.4.3 Local Optimization 30 4.4.4 Outliers Arrangement 32 4.5 Experiments 33 4.5.1 Quantitative and Qualitative Evaluation of UMATO Compared to Six Baseline Algorithms 33 4.5.2 Case Study: UMATO on Real-world Biological Data 39 4.6 Discussion 41 4.7 Summary 46 CHAPTER 5 Discussion 48 5.1 Lessons Learned 48 5.2 Limitations 49 CHAPTER 6 Conclusion 50 Abstract (Korean) 58์„

    Progressive Data Science: Potential and Challenges

    Get PDF
    Data science requires time-consuming iterative manual activities. In particular, activities such as data selection, preprocessing, transformation, and mining, highly depend on iterative trial-and-error processes that could be sped up significantly by providing quick feedback on the impact of changes. The idea of progressive data science is to compute the results of changes in a progressive manner, returning a first approximation of results quickly and allow iterative refinements until converging to a final result. Enabling the user to interact with the intermediate results allows an early detection of erroneous or suboptimal choices, the guided definition of modifications to the pipeline and their quick assessment. In this paper, we discuss the progressiveness challenges arising in different steps of the data science pipeline. We describe how changes in each step of the pipeline impact the subsequent steps and outline why progressive data science will help to make the process more effective. Computing progressive approximations of outcomes resulting from changes creates numerous research challenges, especially if the changes are made in the early steps of the pipeline. We discuss these challenges and outline first steps towards progressiveness, which, we argue, will ultimately help to significantly speed-up the overall data science process

    ๋Œ€์šฉ๋Ÿ‰ ๋ฐ์ดํ„ฐ ํƒ์ƒ‰์„ ์œ„ํ•œ ์ ์ง„์  ์‹œ๊ฐํ™” ์‹œ์Šคํ…œ ์„ค๊ณ„

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€,2020. 2. ์„œ์ง„์šฑ.Understanding data through interactive visualization, also known as visual analytics, is a common and necessary practice in modern data science. However, as data sizes have increased at unprecedented rates, the computation latency of visualization systems becomes a significant hurdle to visual analytics. The goal of this dissertation is to design a series of systems for progressive visual analytics (PVA)โ€”a visual analytics paradigm that can provide intermediate results during computation and allow visual exploration of these resultsโ€”to address the scalability hurdle. To support the interactive exploration of data with billions of records, we first introduce SwiftTuna, an interactive visualization system with scalable visualization and computation components. Our performance benchmark demonstrates that it can handle data with four billion records, giving responsive feedback every few seconds without precomputation. Second, we present PANENE, a progressive algorithm for the Approximate k-Nearest Neighbor (AKNN) problem. PANENE brings useful machine learning methods into visual analytics, which has been challenging due to their long initial latency resulting from AKNN computation. In particular, we accelerate t-Distributed Stochastic Neighbor Embedding (t-SNE), a popular non-linear dimensionality reduction technique, which enables the responsive visualization of data with a few hundred columns. Each of these two contributions aims to address the scalability issues stemming from a large number of rows or columns in data, respectively. Third, from the users' perspective, we focus on improving the trustworthiness of intermediate knowledge gained from uncertain results in PVA. We propose a novel PVA concept, Progressive Visual Analytics with Safeguards, and introduce PVA-Guards, safeguards people can leave on uncertain intermediate knowledge that needs to be verified. We also present a proof-of-concept system, ProReveal, designed and developed to integrate seven safeguards into progressive data exploration. Our user study demonstrates that people not only successfully created PVA-Guards on ProReveal but also voluntarily used PVA-Guards to manage the uncertainty of their knowledge. Finally, summarizing the three studies, we discuss design challenges for progressive systems as well as future research agendas for PVA.ํ˜„๋Œ€ ๋ฐ์ดํ„ฐ ์‚ฌ์ด์–ธ์Šค์—์„œ ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒํ•œ ์‹œ๊ฐํ™”๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์€ ํ•„์ˆ˜์ ์ธ ๋ถ„์„ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๊ทธ๋Ÿฌ๋‚˜, ์ตœ๊ทผ ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ๊ฐ€ ํญ๋ฐœ์ ์œผ๋กœ ์ฆ๊ฐ€ํ•˜๋ฉด์„œ ๋ฐ์ดํ„ฐ ํฌ๊ธฐ๋กœ ์ธํ•ด ๋ฐœ์ƒํ•˜๋Š” ์ง€์—ฐ ์‹œ๊ฐ„์ด ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒํ•œ ์‹œ๊ฐ์  ๋ถ„์„์— ํฐ ๊ฑธ๋ฆผ๋Œ์ด ๋˜์—ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ํ™•์žฅ์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ ์ง„์  ์‹œ๊ฐ์  ๋ถ„์„(Progressive Visual Analytics)์„ ์ง€์›ํ•˜๋Š” ์ผ๋ จ์˜ ์‹œ์Šคํ…œ์„ ๋””์ž์ธํ•˜๊ณ  ๊ฐœ๋ฐœํ•œ๋‹ค. ์ด๋Ÿฌํ•œ ์ ์ง„์  ์‹œ๊ฐ์  ๋ถ„์„ ์‹œ์Šคํ…œ์€ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ๊ฐ€ ์™„์ „ํžˆ ๋๋‚˜์ง€ ์•Š๋”๋ผ๋„ ์ค‘๊ฐ„ ๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ ์‚ฌ์šฉ์ž์—๊ฒŒ ์ œ๊ณตํ•จ์œผ๋กœ์จ ๋ฐ์ดํ„ฐ์˜ ํฌ๊ธฐ๋กœ ์ธํ•ด ๋ฐœ์ƒํ•˜๋Š” ์ง€์—ฐ ์‹œ๊ฐ„ ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค. ์ฒซ์งธ๋กœ, ์ˆ˜์‹ญ์–ต ๊ฑด์˜ ํ–‰์„ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์‹œ๊ฐ์ ์œผ๋กœ ํƒ์ƒ‰ํ•  ์ˆ˜ ์žˆ๋Š” SwiftTuna ์‹œ์Šคํ…œ์„ ์ œ์•ˆํ•œ๋‹ค. ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ ๋ฐ ์‹œ๊ฐ์  ํ‘œํ˜„์˜ ํ™•์žฅ์„ฑ์„ ๋ชฉํ‘œ๋กœ ๊ฐœ๋ฐœ๋œ ์ด ์‹œ์Šคํ…œ์€, ์•ฝ 40์–ต ๊ฑด์˜ ํ–‰์„ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์‹œ๊ฐํ™”๋ฅผ ์ „์ฒ˜๋ฆฌ ์—†์ด ์ˆ˜ ์ดˆ๋งˆ๋‹ค ์—…๋ฐ์ดํŠธํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ๋‹ค. ๋‘˜์งธ๋กœ, ๊ทผ์‚ฌ์  k-์ตœ๊ทผ์ ‘์ (Approximate k-Nearest Neighbor) ๋ฌธ์ œ๋ฅผ ์ ์ง„์ ์œผ๋กœ ๊ณ„์‚ฐํ•˜๋Š” PANENE ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•œ๋‹ค. ๊ทผ์‚ฌ์  k-์ตœ๊ทผ์ ‘์  ๋ฌธ์ œ๋Š” ์—ฌ๋Ÿฌ ๊ธฐ๊ณ„ ํ•™์Šต ๊ธฐ๋ฒ•์—์„œ ์“ฐ์ž„์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์ดˆ๊ธฐ ๊ณ„์‚ฐ ์‹œ๊ฐ„์ด ๊ธธ์–ด์„œ ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒํ•œ ์‹œ์Šคํ…œ์— ์ ์šฉํ•˜๊ธฐ ํž˜๋“  ํ•œ๊ณ„๊ฐ€ ์žˆ์—ˆ๋‹ค. PANENE ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ด๋Ÿฌํ•œ ๊ธด ์ดˆ๊ธฐ ๊ณ„์‚ฐ ์‹œ๊ฐ„์„ ํš๊ธฐ์ ์œผ๋กœ ๊ฐœ์„ ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๊ธฐ๊ณ„ ํ•™์Šต ๊ธฐ๋ฒ•์„ ์‹œ๊ฐ์  ๋ถ„์„์— ํ™œ์šฉํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ํŠนํžˆ, ์œ ์šฉํ•œ ๋น„์„ ํ˜•์  ์ฐจ์› ๊ฐ์†Œ ๊ธฐ๋ฒ•์ธ t-๋ถ„ํฌ ํ™•๋ฅ ์  ์ž„๋ฒ ๋”ฉ(t-Distributed Stochastic Neighbor Embedding)์„ ๊ฐ€์†ํ•˜์—ฌ ์ˆ˜๋ฐฑ ๊ฐœ์˜ ์ฐจ์›์„ ๊ฐ€์ง€๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ๋น ๋ฅธ ์‹œ๊ฐ„ ๋‚ด์— ์‚ฌ์˜ํ•  ์ˆ˜ ์žˆ๋‹ค. ์œ„์˜ ๋‘ ์‹œ์Šคํ…œ๊ณผ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ๋ฐ์ดํ„ฐ์˜ ํ–‰ ๋˜๋Š” ์—ด์˜ ๊ฐœ์ˆ˜๋กœ ์ธํ•œ ํ™•์žฅ์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ–ˆ๋‹ค๋ฉด, ์„ธ ๋ฒˆ์งธ ์‹œ์Šคํ…œ์—์„œ๋Š” ์ ์ง„์  ์‹œ๊ฐ์  ๋ถ„์„์˜ ์‹ ๋ขฐ๋„ ๋ฌธ์ œ๋ฅผ ๊ฐœ์„ ํ•˜๊ณ ์ž ํ•œ๋‹ค. ์ ์ง„์  ์‹œ๊ฐ์  ๋ถ„์„์—์„œ ์‚ฌ์šฉ์ž์—๊ฒŒ ์ฃผ์–ด์ง€๋Š” ์ค‘๊ฐ„ ๊ณ„์‚ฐ ๊ฒฐ๊ณผ๋Š” ์ตœ์ข… ๊ฒฐ๊ณผ์˜ ๊ทผ์‚ฌ์น˜์ด๋ฏ€๋กœ ๋ถˆํ™•์‹ค์„ฑ์ด ์กด์žฌํ•œ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์„ธ์ดํ”„๊ฐ€๋“œ๋ฅผ ์ด์šฉํ•œ ์ ์ง„์  ์‹œ๊ฐ์  ๋ถ„์„(Progressive Visual Analytics with Safeguards)์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ๊ฐœ๋…์„ ์ œ์•ˆํ•œ๋‹ค. ์ด ๊ฐœ๋…์€ ์‚ฌ์šฉ์ž๊ฐ€ ์ ์ง„์  ํƒ์ƒ‰์—์„œ ๋งˆ์ฃผํ•˜๋Š” ๋ถˆํ™•์‹คํ•œ ์ค‘๊ฐ„ ์ง€์‹์— ์„ธ์ดํ”„๊ฐ€๋“œ๋ฅผ ๋‚จ๊ธธ ์ˆ˜ ์žˆ๋„๋ก ํ•˜์—ฌ ํƒ์ƒ‰์—์„œ ์–ป์€ ์ง€์‹์˜ ์ •ํ™•๋„๋ฅผ ์ถ”ํ›„ ๊ฒ€์ฆํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ๋‹ค. ๋˜ํ•œ, ์ด๋Ÿฌํ•œ ๊ฐœ๋…์„ ์‹ค์ œ๋กœ ๊ตฌํ˜„ํ•˜์—ฌ ํƒ‘์žฌํ•œ ProReveal ์‹œ์Šคํ…œ์„ ์†Œ๊ฐœํ•œ๋‹ค. ProReveal๋ฅผ ์ด์šฉํ•œ ์‚ฌ์šฉ์ž ์‹คํ—˜์—์„œ ์‚ฌ์šฉ์ž๋“ค์€ ์„ธ์ดํ”„๊ฐ€๋“œ๋ฅผ ์„ฑ๊ณต์ ์œผ๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ์—ˆ์„ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ์ค‘๊ฐ„ ์ง€์‹์˜ ๋ถˆํ™•์‹ค์„ฑ์„ ๋‹ค๋ฃจ๊ธฐ ์œ„ํ•ด ์„ธ์ดํ”„๊ฐ€๋“œ๋ฅผ ์ž๋ฐœ์ ์œผ๋กœ ์ด์šฉํ•œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ์œ„ ์„ธ ๊ฐ€์ง€ ์—ฐ๊ตฌ์˜ ๊ฒฐ๊ณผ๋ฅผ ์ข…ํ•ฉํ•˜์—ฌ ์ ์ง„์  ์‹œ๊ฐ์  ๋ถ„์„ ์‹œ์Šคํ…œ์„ ๊ตฌํ˜„ํ•  ๋•Œ์˜ ๋””์ž์ธ์  ๋‚œ์ œ์™€ ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์„ ๋ชจ์ƒ‰ํ•œ๋‹ค.CHAPTER1. Introduction 2 1.1 Background and Motivation 2 1.2 Thesis Statement and Research Questions 5 1.3 Thesis Contributions 5 1.3.1 Responsive and Incremental Visual Exploration of Large-scale Multidimensional Data 6 1.3.2 ProgressiveComputation of Approximate k-Nearest Neighbors and Responsive t-SNE 7 1.3.3 Progressive Visual Analytics with Safeguards 8 1.4 Structure of Dissertation 9 CHAPTER2. Related Work 11 2.1 Progressive Visual Analytics 11 2.1.1 Definitions 11 2.1.2 System Latency and Human Factors 13 2.1.3 Users, Tasks, and Models 15 2.1.4 Techniques, Algorithms, and Systems. 17 2.1.5 Uncertainty Visualization 19 2.2 Approaches for Scalable Visualization Systems 20 2.3 The k-Nearest Neighbor (KNN) Problem 22 2.4 t-Distributed Stochastic Neighbor Embedding 26 CHAPTER3. SwiTuna: Responsive and Incremental Visual Exploration of Large-scale Multidimensional Data 28 3.1 The SwiTuna Design 31 3.1.1 Design Considerations 32 3.1.2 System Overview 33 3.1.3 Scalable Visualization Components 36 3.1.4 Visualization Cards 40 3.1.5 User Interface and Interaction 42 3.2 Responsive Querying 44 3.2.1 Querying Pipeline 44 3.2.2 Prompt Responses 47 3.2.3 Incremental Processing 47 3.3 Evaluation: Performance Benchmark 49 3.3.1 Study Design 49 3.3.2 Results and Discussion 52 3.4 Implementation 56 3.5 Summary 56 CHAPTER4. PANENE:AProgressive Algorithm for IndexingandQuerying Approximate k-Nearest Neighbors 58 4.1 Approximate k-Nearest Neighbor 61 4.1.1 A Sequential Algorithm 62 4.1.2 An Online Algorithm 63 4.1.3 A Progressive Algorithm 66 4.1.4 Filtered AKNN Search 71 4.2 k-Nearest Neighbor Lookup Table 72 4.3 Benchmark. 78 4.3.1 Online and Progressive k-d Trees 78 4.3.2 k-Nearest Neighbor Lookup Tables 83 4.4 Applications 85 4.4.1 Progressive Regression and Density Estimation 85 4.4.2 Responsive t-SNE 87 4.5 Implementation 92 4.6 Discussion 92 4.7 Summary 93 CHAPTER5. ProReveal: Progressive Visual Analytics with Safeguards 95 5.1 Progressive Visual Analytics with Safeguards 98 5.1.1 Definition 98 5.1.2 Examples 101 5.1.3 Design Considerations 103 5.2 ProReveal 105 5.3 Evaluation 121 5.4 Discussion 127 5.5 Summary 130 CHAPTER6. Discussion 132 6.1 Lessons Learned 132 6.2 Limitations 135 CHAPTER7. Conclusion 137 7.1 Thesis Contributions Revisited 137 7.2 Future Research Agenda 139 7.3 Final Remarks 141 Abstract (Korean) 155 Acknowledgments (Korean) 157Docto

    Pivotal Visualization:A Design Method to Enrich Visual Exploration

    Get PDF

    Progressive Data Analysis and Visualization (Dagstuhl Seminar 18411)

    Get PDF
    We live in an era where data is abundant and growing rapidly; databases storing big data sprawl past memory and computation limits, and across distributed systems. New hardware and software systems have been built to sustain this growth in terms of storage management and predictive computation. However, these infrastructures, while good for data at scale, do not well support exploratory data analysis (EDA) as, for instance, commonly used in Visual Analytics. EDA allows human users to make sense of data with little or no known model on this data and is essential in many application domains, from network security and fraud detection to epidemiology and preventive medicine. Data exploration is done through an iterative loop where analysts interact with data through computations that return results, usually shown with visualizations, which in turn are interacted with by the analyst again. Due to human cognitive constraints, exploration needs highly responsive system response times: at 500 ms, users change their querying behavior; past five or ten seconds, users abandon tasks or lose attention. As datasets grow and computations become more complex, response time suffers. To address this problem, a new computation paradigm has emerged in the last decade under several names: online aggregation in the database community; progressive, incremental, or iterative visualization in other communities. It consists of splitting long computations into a series of approximate results improving with time; in this process, partial or approximate results are then rapidly returned to the user and can be interacted with in a fluent and iterative fashion. With the increasing growth in data, such progressive data analysis approaches will become one of the leading paradigms for data exploration systems, but it also will require major changes in the algorithms, data structures, and visualization tools. This Dagstuhl Seminar was set out to discuss and address these challenges, by bringing together researchers from the different involved research communities: database, visualization, and machine learning. Thus far, these communities have often been divided by a gap hindering joint efforts in dealing with forthcoming challenges in progressive data analysis and visualization. The seminar gave a platform for these researchers and practitioners to exchange their ideas, experience, and visions, jointly develop strategies to deal with challenges, and create a deeper awareness of the implications of this paradigm shift. The implications are technical, but also human--both perceptual and cognitive--and the seminar provided a holistic view of the problem by gathering specialists from all the communities
    corecore