Search CORE

1 research outputs found

빠르고 정확한 차원 축소를 위한 UMAP 개선 방법

Author: 고형권
Publication venue: 서울대학교 대학원
Publication date: 01/08/2021
Field of study

학위논문(석사) -- 서울대학교대학원 : 공과대학 컴퓨터공학부, 2021.8. 고형권.One e ective way of understanding the characteristics of high-dimensional data is to embed it onto a low-dimensional space. Among many existing dimensionality reduction algorithms, Uniform Manifold Approximation and Projection (UMAP) has gained the most attention because of its fast and stable projection result. However, still it is too slow to be adopted for an interactive visual analytics system as it takes for a few minutes to embed even for a toy dataset (e.g., MNIST). Moreover, UMAP is vulnerable to di erent configurations of yperparameters, especially to the initialization methods and the number of epochs, which can bring about a serious bias mining insights from the embedding result. To achieve the responsiveness, we propose a progressive algorithm for UMAP, called Progressive UMAP, for the exploration of datasets by updating the embedding with a batch of points through a progressive computation. Next, to guarantee less biases and the robustness in the embedding, we present a novel dimensionality reduction algorithm called Uniform Manifold Approximation with Twophase Optimization (UMATO). We discover that the vulnerability comes from the approximation of cross-entropy loss function. UMATO, instead, takes a two-phase optimization approach: global optimization to obtain the overall skeleton of data, and local optimization to identify regional characteristics of a local area. In our experiment with one synthetic and three real-world datasets, UMATO outperformed widely-used baseline algorithms, such as PCA, t-SNE, UMAP, topological autoencoders and Anchor t-SNE, in terms of global quality metrics and 2D projection results. We further examine a case study of UMATO on real-world biological data and the extension to multi-phase optimization. Our work makes the original contributions to the field of dimensionality reduction, as well as the progressive visual analytics. Lastly, the thesis discusses the future research directions for improving the proposed algorithms.고차원 데이터의 특성을 파악하는 효과적인 방법 중 하나는 저차원 공간에 임베딩을 하는 것이다. 많은 차원 축소 알고리즘이 있지만, 균일 매니폴드 근사 및 투영법 (UMAP)은 빠른 속도와 안정적인 투영 결과로 인해 많은 주목을 받았다. 그러나 현재의 UMAP은 실험용 데이터 셋인 MNIST에도 수 분이 걸리는 등, 인터랙티브 시각적 분석 시스템에 도입되기에는 너무 느리다. 또한 UMAP은 하이퍼파라미터 설정이 (특히, 초기화 방법과 epoch 수) 달라지는 것에 취약한데, 이것은 임베딩 결과로 부터 통찰을 얻는 과정에서 큰 오류를 범할 수 있게 한다. UMAP의 즉각적인 반응성을 얻기 위해서, UMAP의 점진적인 알고리즘인 Progressive UMAP을 제안한다. 이로써 한 배치의 데이터를 추가할 때마다 임베딩 결과를 업데이트 하게되는 점진적인 계산이 가능해진다. 다음으로 적은 편향과 강건한 임베딩을 보장하기 위해 UMATO를 제안한다. 먼저 우리는 이러한 취약함이 최적화를 근사하는 과정에서 일어나는 것을 밝힌다. UMATO는, UMAP과 다르게, 두 단계에 걸친 최적화를 통해서 처음으로 전체적인 구조를 잡고, 그 다음 지역적 특성을 파악한다. 실험을 통해 UMATO가 PCA, t-SNE, UMAP, topological autoencoders 그리고 Anchort-SNE와 같은 기존 알고리즘에 비해 전체 구조 평가 지표와 2차원 임베딩 결과에서 더 나음을 보인다. 추가적으로 여러 단계로 최적화 하는 것과 임베딩의 안정성 역시 실험으로 파악한다. 이 연구는 차원 축소뿐만 아니라 점진적 시각화 분야에도 독창적인 공헌을 한다. 마지막으로 연구의 향후 연구 방향을 도모한다.CHAPTER 1 Introduction 1 1.1 Motivation 1 1.2 Research Questions and Approaches 2 1.2.1 Progressive Algorithm for UMAP 3 1.2.2 Less Biased and Robust Dimensionality Reduction Algorithm 4 1.3 Contributions 4 1.4 Thesis Overview 5 CHAPTER 2 Background: UMAP 6 2.1 Graph Construction 6 2.2 Layout Optimization 7 CHAPTER 3 Progressive UMAP: A Progressive Algorithm for UMAP 10 3.1 Introduction 10 3.2 Related Work 11 3.2.1 Progressive Visual Analytics 11 3.3 Progressive UMAP 13 3.3.1 Computing Ni 14 3.3.2 Computing ρi and σi 14 3.3.3 Layout Initialization 14 3.3.4 Layout Optimization 15 3.4 Evaluation and Discussion 15 3.5 Summary 18 CHAPTER 4 UMATO: A Less Biased and Robust Dimensionality Reduction Algorithm Based on UMAP 19 4.1 Introduction 19 4.2 Related Work 22 4.2.1 Dimensionality Reduction 22 4.2.2 Hubs, landmarks, and anchors 23 4.3 The Meaning of Using Di erent Loss Functions in Dimensionality Reduction 25 4.3.1 t-SNE 25 4.4 UMATO 27 4.4.1 Points Classification 28 4.4.2 Global Optimization 29 4.4.3 Local Optimization 30 4.4.4 Outliers Arrangement 32 4.5 Experiments 33 4.5.1 Quantitative and Qualitative Evaluation of UMATO Compared to Six Baseline Algorithms 33 4.5.2 Case Study: UMATO on Real-world Biological Data 39 4.6 Discussion 41 4.7 Summary 46 CHAPTER 5 Discussion 48 5.1 Lessons Learned 48 5.2 Limitations 49 CHAPTER 6 Conclusion 50 Abstract (Korean) 58석

SNU Open Repository and Archive