1 research outputs found

    ๋น ๋ฅด๊ณ  ์ •ํ™•ํ•œ ์ฐจ์› ์ถ•์†Œ๋ฅผ ์œ„ํ•œ UMAP ๊ฐœ์„  ๋ฐฉ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(์„์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2021.8. ๊ณ ํ˜•๊ถŒ.One e ective way of understanding the characteristics of high-dimensional data is to embed it onto a low-dimensional space. Among many existing dimensionality reduction algorithms, Uniform Manifold Approximation and Projection (UMAP) has gained the most attention because of its fast and stable projection result. However, still it is too slow to be adopted for an interactive visual analytics system as it takes for a few minutes to embed even for a toy dataset (e.g., MNIST). Moreover, UMAP is vulnerable to di erent configurations of yperparameters, especially to the initialization methods and the number of epochs, which can bring about a serious bias mining insights from the embedding result. To achieve the responsiveness, we propose a progressive algorithm for UMAP, called Progressive UMAP, for the exploration of datasets by updating the embedding with a batch of points through a progressive computation. Next, to guarantee less biases and the robustness in the embedding, we present a novel dimensionality reduction algorithm called Uniform Manifold Approximation with Twophase Optimization (UMATO). We discover that the vulnerability comes from the approximation of cross-entropy loss function. UMATO, instead, takes a two-phase optimization approach: global optimization to obtain the overall skeleton of data, and local optimization to identify regional characteristics of a local area. In our experiment with one synthetic and three real-world datasets, UMATO outperformed widely-used baseline algorithms, such as PCA, t-SNE, UMAP, topological autoencoders and Anchor t-SNE, in terms of global quality metrics and 2D projection results. We further examine a case study of UMATO on real-world biological data and the extension to multi-phase optimization. Our work makes the original contributions to the field of dimensionality reduction, as well as the progressive visual analytics. Lastly, the thesis discusses the future research directions for improving the proposed algorithms.๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ์˜ ํŠน์„ฑ์„ ํŒŒ์•…ํ•˜๋Š” ํšจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜๋Š” ์ €์ฐจ์› ๊ณต๊ฐ„์— ์ž„๋ฒ ๋”ฉ์„ ํ•˜๋Š” ๊ฒƒ์ด๋‹ค. ๋งŽ์€ ์ฐจ์› ์ถ•์†Œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ด ์žˆ์ง€๋งŒ, ๊ท ์ผ ๋งค๋‹ˆํด๋“œ ๊ทผ์‚ฌ ๋ฐ ํˆฌ์˜๋ฒ• (UMAP)์€ ๋น ๋ฅธ ์†๋„์™€ ์•ˆ์ •์ ์ธ ํˆฌ์˜ ๊ฒฐ๊ณผ๋กœ ์ธํ•ด ๋งŽ์€ ์ฃผ๋ชฉ์„ ๋ฐ›์•˜๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ํ˜„์žฌ์˜ UMAP์€ ์‹คํ—˜์šฉ ๋ฐ์ดํ„ฐ ์…‹์ธ MNIST์—๋„ ์ˆ˜ ๋ถ„์ด ๊ฑธ๋ฆฌ๋Š” ๋“ฑ, ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ์‹œ๊ฐ์  ๋ถ„์„ ์‹œ์Šคํ…œ์— ๋„์ž…๋˜๊ธฐ์—๋Š” ๋„ˆ๋ฌด ๋Š๋ฆฌ๋‹ค. ๋˜ํ•œ UMAP์€ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์„ค์ •์ด (ํŠนํžˆ, ์ดˆ๊ธฐํ™” ๋ฐฉ๋ฒ•๊ณผ epoch ์ˆ˜) ๋‹ฌ๋ผ์ง€๋Š” ๊ฒƒ์— ์ทจ์•ฝํ•œ๋ฐ, ์ด๊ฒƒ์€ ์ž„๋ฒ ๋”ฉ ๊ฒฐ๊ณผ๋กœ ๋ถ€ํ„ฐ ํ†ต์ฐฐ์„ ์–ป๋Š” ๊ณผ์ •์—์„œ ํฐ ์˜ค๋ฅ˜๋ฅผ ๋ฒ”ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•œ๋‹ค. UMAP์˜ ์ฆ‰๊ฐ์ ์ธ ๋ฐ˜์‘์„ฑ์„ ์–ป๊ธฐ ์œ„ํ•ด์„œ, UMAP์˜ ์ ์ง„์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ Progressive UMAP์„ ์ œ์•ˆํ•œ๋‹ค. ์ด๋กœ์จ ํ•œ ๋ฐฐ์น˜์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•  ๋•Œ๋งˆ๋‹ค ์ž„๋ฒ ๋”ฉ ๊ฒฐ๊ณผ๋ฅผ ์—…๋ฐ์ดํŠธ ํ•˜๊ฒŒ๋˜๋Š” ์ ์ง„์ ์ธ ๊ณ„์‚ฐ์ด ๊ฐ€๋Šฅํ•ด์ง„๋‹ค. ๋‹ค์Œ์œผ๋กœ ์ ์€ ํŽธํ–ฅ๊ณผ ๊ฐ•๊ฑดํ•œ ์ž„๋ฒ ๋”ฉ์„ ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด UMATO๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๋จผ์ € ์šฐ๋ฆฌ๋Š” ์ด๋Ÿฌํ•œ ์ทจ์•ฝํ•จ์ด ์ตœ์ ํ™”๋ฅผ ๊ทผ์‚ฌํ•˜๋Š” ๊ณผ์ •์—์„œ ์ผ์–ด๋‚˜๋Š” ๊ฒƒ์„ ๋ฐํžŒ๋‹ค. UMATO๋Š”, UMAP๊ณผ ๋‹ค๋ฅด๊ฒŒ, ๋‘ ๋‹จ๊ณ„์— ๊ฑธ์นœ ์ตœ์ ํ™”๋ฅผ ํ†ตํ•ด์„œ ์ฒ˜์Œ์œผ๋กœ ์ „์ฒด์ ์ธ ๊ตฌ์กฐ๋ฅผ ์žก๊ณ , ๊ทธ ๋‹ค์Œ ์ง€์—ญ์  ํŠน์„ฑ์„ ํŒŒ์•…ํ•œ๋‹ค. ์‹คํ—˜์„ ํ†ตํ•ด UMATO๊ฐ€ PCA, t-SNE, UMAP, topological autoencoders ๊ทธ๋ฆฌ๊ณ  Anchort-SNE์™€ ๊ฐ™์€ ๊ธฐ์กด ์•Œ๊ณ ๋ฆฌ์ฆ˜์— ๋น„ํ•ด ์ „์ฒด ๊ตฌ์กฐ ํ‰๊ฐ€ ์ง€ํ‘œ์™€ 2์ฐจ์› ์ž„๋ฒ ๋”ฉ ๊ฒฐ๊ณผ์—์„œ ๋” ๋‚˜์Œ์„ ๋ณด์ธ๋‹ค. ์ถ”๊ฐ€์ ์œผ๋กœ ์—ฌ๋Ÿฌ ๋‹จ๊ณ„๋กœ ์ตœ์ ํ™” ํ•˜๋Š” ๊ฒƒ๊ณผ ์ž„๋ฒ ๋”ฉ์˜ ์•ˆ์ •์„ฑ ์—ญ์‹œ ์‹คํ—˜์œผ๋กœ ํŒŒ์•…ํ•œ๋‹ค. ์ด ์—ฐ๊ตฌ๋Š” ์ฐจ์› ์ถ•์†Œ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์ ์ง„์  ์‹œ๊ฐํ™” ๋ถ„์•ผ์—๋„ ๋…์ฐฝ์ ์ธ ๊ณตํ—Œ์„ ํ•œ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ์—ฐ๊ตฌ์˜ ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์„ ๋„๋ชจํ•œ๋‹ค.CHAPTER 1 Introduction 1 1.1 Motivation 1 1.2 Research Questions and Approaches 2 1.2.1 Progressive Algorithm for UMAP 3 1.2.2 Less Biased and Robust Dimensionality Reduction Algorithm 4 1.3 Contributions 4 1.4 Thesis Overview 5 CHAPTER 2 Background: UMAP 6 2.1 Graph Construction 6 2.2 Layout Optimization 7 CHAPTER 3 Progressive UMAP: A Progressive Algorithm for UMAP 10 3.1 Introduction 10 3.2 Related Work 11 3.2.1 Progressive Visual Analytics 11 3.3 Progressive UMAP 13 3.3.1 Computing Ni 14 3.3.2 Computing ฯi and ฯƒi 14 3.3.3 Layout Initialization 14 3.3.4 Layout Optimization 15 3.4 Evaluation and Discussion 15 3.5 Summary 18 CHAPTER 4 UMATO: A Less Biased and Robust Dimensionality Reduction Algorithm Based on UMAP 19 4.1 Introduction 19 4.2 Related Work 22 4.2.1 Dimensionality Reduction 22 4.2.2 Hubs, landmarks, and anchors 23 4.3 The Meaning of Using Di erent Loss Functions in Dimensionality Reduction 25 4.3.1 t-SNE 25 4.4 UMATO 27 4.4.1 Points Classification 28 4.4.2 Global Optimization 29 4.4.3 Local Optimization 30 4.4.4 Outliers Arrangement 32 4.5 Experiments 33 4.5.1 Quantitative and Qualitative Evaluation of UMATO Compared to Six Baseline Algorithms 33 4.5.2 Case Study: UMATO on Real-world Biological Data 39 4.6 Discussion 41 4.7 Summary 46 CHAPTER 5 Discussion 48 5.1 Lessons Learned 48 5.2 Limitations 49 CHAPTER 6 Conclusion 50 Abstract (Korean) 58์„
    corecore