662 research outputs found
๋น ๋ฅด๊ณ ์ ํํ ์ฐจ์ ์ถ์๋ฅผ ์ํ UMAP ๊ฐ์ ๋ฐฉ๋ฒ
ํ์๋
ผ๋ฌธ(์์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ, 2021.8. ๊ณ ํ๊ถ.One e ective way of understanding the characteristics of high-dimensional data is to embed it onto a low-dimensional space. Among many existing dimensionality reduction algorithms, Uniform Manifold Approximation and Projection (UMAP) has gained the most attention because of its fast and stable projection result. However, still it is too slow to be adopted for an interactive visual analytics system as it takes for a few minutes to embed even for a toy dataset (e.g., MNIST). Moreover, UMAP is vulnerable to di erent configurations of yperparameters, especially to the initialization methods and the number of epochs, which can bring about a serious bias mining insights from the embedding result. To achieve the responsiveness, we propose a progressive algorithm for UMAP, called Progressive UMAP, for the exploration of datasets by updating the embedding with a batch of points through a progressive computation. Next, to guarantee less biases and the robustness in the embedding, we present a novel dimensionality reduction algorithm called Uniform Manifold Approximation with Twophase Optimization (UMATO). We discover that the vulnerability comes from the approximation of cross-entropy loss function. UMATO, instead, takes a two-phase optimization approach: global optimization to obtain the overall skeleton of data, and local optimization to identify regional characteristics of a local area. In our experiment with one synthetic and three real-world datasets, UMATO outperformed widely-used baseline algorithms, such as PCA, t-SNE, UMAP, topological autoencoders and Anchor t-SNE, in terms of global quality metrics and 2D projection results. We further examine a case study of UMATO on real-world biological data and the extension to multi-phase optimization. Our work makes the original contributions to the field of dimensionality reduction, as well as the progressive visual analytics. Lastly, the thesis discusses the future research directions for improving the proposed algorithms.๊ณ ์ฐจ์ ๋ฐ์ดํฐ์ ํน์ฑ์ ํ์
ํ๋ ํจ๊ณผ์ ์ธ ๋ฐฉ๋ฒ ์ค ํ๋๋ ์ ์ฐจ์ ๊ณต๊ฐ์ ์๋ฒ ๋ฉ์ ํ๋ ๊ฒ์ด๋ค. ๋ง์ ์ฐจ์ ์ถ์ ์๊ณ ๋ฆฌ์ฆ์ด ์์ง๋ง, ๊ท ์ผ ๋งค๋ํด๋ ๊ทผ์ฌ ๋ฐ ํฌ์๋ฒ (UMAP)์ ๋น ๋ฅธ ์๋์ ์์ ์ ์ธ ํฌ์ ๊ฒฐ๊ณผ๋ก ์ธํด ๋ง์ ์ฃผ๋ชฉ์ ๋ฐ์๋ค. ๊ทธ๋ฌ๋ ํ์ฌ์ UMAP์ ์คํ์ฉ ๋ฐ์ดํฐ ์
์ธ MNIST์๋ ์ ๋ถ์ด ๊ฑธ๋ฆฌ๋ ๋ฑ, ์ธํฐ๋ํฐ๋ธ ์๊ฐ์ ๋ถ์ ์์คํ
์ ๋์
๋๊ธฐ์๋ ๋๋ฌด ๋๋ฆฌ๋ค. ๋ํ UMAP์ ํ์ดํผํ๋ผ๋ฏธํฐ ์ค์ ์ด (ํนํ, ์ด๊ธฐํ ๋ฐฉ๋ฒ๊ณผ epoch ์) ๋ฌ๋ผ์ง๋ ๊ฒ์ ์ทจ์ฝํ๋ฐ, ์ด๊ฒ์ ์๋ฒ ๋ฉ ๊ฒฐ๊ณผ๋ก ๋ถํฐ ํต์ฐฐ์ ์ป๋ ๊ณผ์ ์์ ํฐ ์ค๋ฅ๋ฅผ ๋ฒํ ์ ์๊ฒ ํ๋ค. UMAP์ ์ฆ๊ฐ์ ์ธ ๋ฐ์์ฑ์ ์ป๊ธฐ ์ํด์, UMAP์ ์ ์ง์ ์ธ ์๊ณ ๋ฆฌ์ฆ์ธ Progressive UMAP์ ์ ์ํ๋ค. ์ด๋ก์จ ํ ๋ฐฐ์น์ ๋ฐ์ดํฐ๋ฅผ ์ถ๊ฐํ ๋๋ง๋ค ์๋ฒ ๋ฉ ๊ฒฐ๊ณผ๋ฅผ ์
๋ฐ์ดํธ ํ๊ฒ๋๋ ์ ์ง์ ์ธ ๊ณ์ฐ์ด ๊ฐ๋ฅํด์ง๋ค. ๋ค์์ผ๋ก ์ ์ ํธํฅ๊ณผ ๊ฐ๊ฑดํ ์๋ฒ ๋ฉ์ ๋ณด์ฅํ๊ธฐ ์ํด UMATO๋ฅผ ์ ์ํ๋ค. ๋จผ์ ์ฐ๋ฆฌ๋ ์ด๋ฌํ ์ทจ์ฝํจ์ด ์ต์ ํ๋ฅผ ๊ทผ์ฌํ๋ ๊ณผ์ ์์ ์ผ์ด๋๋ ๊ฒ์ ๋ฐํ๋ค. UMATO๋, UMAP๊ณผ ๋ค๋ฅด๊ฒ, ๋ ๋จ๊ณ์ ๊ฑธ์น ์ต์ ํ๋ฅผ ํตํด์ ์ฒ์์ผ๋ก ์ ์ฒด์ ์ธ ๊ตฌ์กฐ๋ฅผ ์ก๊ณ , ๊ทธ ๋ค์ ์ง์ญ์ ํน์ฑ์ ํ์
ํ๋ค. ์คํ์ ํตํด UMATO๊ฐ PCA, t-SNE, UMAP, topological autoencoders ๊ทธ๋ฆฌ๊ณ Anchort-SNE์ ๊ฐ์ ๊ธฐ์กด ์๊ณ ๋ฆฌ์ฆ์ ๋นํด ์ ์ฒด ๊ตฌ์กฐ ํ๊ฐ ์งํ์ 2์ฐจ์ ์๋ฒ ๋ฉ ๊ฒฐ๊ณผ์์ ๋ ๋์์ ๋ณด์ธ๋ค. ์ถ๊ฐ์ ์ผ๋ก ์ฌ๋ฌ ๋จ๊ณ๋ก ์ต์ ํ ํ๋ ๊ฒ๊ณผ ์๋ฒ ๋ฉ์ ์์ ์ฑ ์ญ์ ์คํ์ผ๋ก ํ์
ํ๋ค. ์ด ์ฐ๊ตฌ๋ ์ฐจ์ ์ถ์๋ฟ๋ง ์๋๋ผ ์ ์ง์ ์๊ฐํ ๋ถ์ผ์๋ ๋
์ฐฝ์ ์ธ ๊ณตํ์ ํ๋ค. ๋ง์ง๋ง์ผ๋ก ์ฐ๊ตฌ์ ํฅํ ์ฐ๊ตฌ ๋ฐฉํฅ์ ๋๋ชจํ๋ค.CHAPTER 1 Introduction 1
1.1 Motivation 1
1.2 Research Questions and Approaches 2
1.2.1 Progressive Algorithm for UMAP 3
1.2.2 Less Biased and Robust Dimensionality Reduction Algorithm 4
1.3 Contributions 4
1.4 Thesis Overview 5
CHAPTER 2 Background: UMAP 6
2.1 Graph Construction 6
2.2 Layout Optimization 7
CHAPTER 3 Progressive UMAP: A Progressive Algorithm for UMAP 10
3.1 Introduction 10
3.2 Related Work 11
3.2.1 Progressive Visual Analytics 11
3.3 Progressive UMAP 13
3.3.1 Computing Ni 14
3.3.2 Computing ฯi and ฯi 14
3.3.3 Layout Initialization 14
3.3.4 Layout Optimization 15
3.4 Evaluation and Discussion 15
3.5 Summary 18
CHAPTER 4 UMATO: A Less Biased and Robust Dimensionality Reduction Algorithm Based on UMAP 19
4.1 Introduction 19
4.2 Related Work 22
4.2.1 Dimensionality Reduction 22
4.2.2 Hubs, landmarks, and anchors 23
4.3 The Meaning of Using Di erent Loss Functions in Dimensionality Reduction 25
4.3.1 t-SNE 25
4.4 UMATO 27
4.4.1 Points Classification 28
4.4.2 Global Optimization 29
4.4.3 Local Optimization 30
4.4.4 Outliers Arrangement 32
4.5 Experiments 33
4.5.1 Quantitative and Qualitative Evaluation of UMATO Compared to Six Baseline Algorithms 33
4.5.2 Case Study: UMATO on Real-world Biological Data 39
4.6 Discussion 41
4.7 Summary 46
CHAPTER 5 Discussion 48
5.1 Lessons Learned 48
5.2 Limitations 49
CHAPTER 6 Conclusion 50
Abstract (Korean) 58์
Recommended from our members
Exploration and visualization of design spaces with applications to negative stiffness metamaterials
Engineering design problems are commonly hierarchical and multilevel which requires coordination between models at each scale. If the models are computationally expensive or highly nonlinear, such as many materials design applications, identification of an optimal design may be exceptionally difficult. Alternatives to optimization-based methods include set-based methods that classify and track sets or ensembles of high performance designs. By relaxing the requirement for an optimal design, it is often possible to identify promising, high performance regions of the design space efficiently. Bayesian network classifiers (BNCs) are such an approach that can identify these regions of promising designs in the presence of nonlinear relationships and mixed variables. When manufacturing the promising designs identified by the BNC approach, the intended design may not match the physical embodiment due to manufacturing variations. These variations may alter the performance of the design leading to unsatisfactory results and products. To facilitate selection of not only high performance but reliably manufacturable designs, a method for incorporating manufacturing variation, modeled as a joint probability distribution is presented for the BNC approach. The approach utilizes a dual classification strategy that identifies regions of design that are likely to perform well within statistical confidence. These design regions can be high dimensional in which it becomes very difficult to identify and visualize clusters of promising designs. This leads to a lack of understanding of the design space. To enhance the designerโs knowledge of the design space, this work presents a method, based on spectral clustering, that can identify high performance regions in a high dimensional space. Furthermore, a method for visualizing each individual design region is presented that is accomplished by incorporating t-Distributed Stochastic Neighbor Embedding. Through the accomplishment of these three tasksโincorporating manufacturing variation, clustering, and visualizingโa novel design methodology will be developed which will then be applied to identify satisfactory designs for a negative stiffness metamaterials design problem which will be manufactured and tested.Mechanical Engineerin
Projection-Based Clustering through Self-Organization and Swarm Intelligence
It covers aspects of unsupervised machine learning used for knowledge discovery in data science and introduces a data-driven approach to cluster analysis, the Databionic swarm (DBS). DBS consists of the 3D landscape visualization and clustering of data. The 3D landscape enables 3D printing of high-dimensional data structures. The clustering and number of clusters or an absence of cluster structure are verified by the 3D landscape at a glance. DBS is the first swarm-based technique that shows emergent properties while exploiting concepts of swarm intelligence, self-organization and the Nash equilibrium concept from game theory. It results in the elimination of a global objective function and the setting of parameters. By downloading the R package DBS can be applied to data drawn from diverse research fields and used even by non-professionals in the field of data mining
Projection-Based Clustering through Self-Organization and Swarm Intelligence: Combining Cluster Analysis with the Visualization of High-Dimensional Data
Cluster Analysis; Dimensionality Reduction; Swarm Intelligence; Visualization; Unsupervised Machine Learning; Data Science; Knowledge Discovery; 3D Printing; Self-Organization; Emergence; Game Theory; Advanced Analytics; High-Dimensional Data; Multivariate Data; Analysis of Structured Dat
The World We Want to Live In
Digitalisation, digital networks, and artificial intelligence are fundamentally changing our lives! We must understand the various developments and assess how they interact and how they affect our regular, analogue lives. What are the consequences of such changes for me personally and for our society? Digital networks and artificial intelligence are seminal innovations that are going to permeate all areas of society and trigger a comprehensive, disruptive structural change that will evoke numerous new advances in research and development in the coming years. Even though there are numerous books on this subject matter, most of them cover only specific aspects of the profound and multifaceted effects of the digital transformation. An overarching assessment is missing. In 2016, the Federation of German Scientists (VDW) has founded a study group to assess the technological impacts of digitalisation holistically. Now we present this compendium to you. We address the interrelations and feedbacks of digital innovation on policy, law, economics, science, and society from various scientific perspectives. Please consider this book as an invitation to contemplate with other people and with us, what kind of world we want to live in
Optimization and Energy Maximizing Control Systems for Wave Energy Converters
The book, โOptimization and Energy Maximizing Control Systems for Wave Energy Convertersโ, presents eleven contributions on the latest scientific advancements of 2020-2021 in wave energy technology optimization and control, including holistic techno-economic optimization, inclusion of nonlinear effects, and real-time implementations of estimation and control algorithms
- โฆ