Search CORE

2 research outputs found

이질적 공분산 모형에서의 이중 데이터 파일링 현상

Author: 김태현
Publication venue: 서울대학교 대학원
Publication date: 01/08/2023
Field of study

학위논문(석사) -- 서울대학교대학원 : 자연과학대학 통계학과, 2023. 8. 정성규.In this work, we characterize two data piling phenomenon for a high-dimensional binary classification problem with heterogeneous covariance models. The data piling refers to the phenomenon where projections of the training data onto a direction vector have exactly two distinct values, one for each class. This first data piling phenomenon occurs for any data when the dimension p is larger than the sample size n. We show that the second data piling phenomenon, which refers to a data piling of independent test data, can occur in an asymptotic context where p grows while n is fixed. We further show that a second maximal data piling direction, which gives an asymptotic maximal distance between the two piles of independent test data, can be obtained by projecting the first maximal data piling direction onto the nullspace of the common leading eigenspace. Based on the second data piling phenomenon, we propose novel linear classification rules which ensure perfect classification of high-dimension low-sample-size data under generalized heterogeneous spiked covariance models.본 연구에서는 이질적인 공분산 모형을 가정하는 고차원 이항 분류 문제에 대한 두 가지 데이터 파일링 현상을 구체화한다. 데이터 파일링 현상은 훈련 데이터를 방향 벡터에 사영하였을 때 각 범주마다 정확히 두 개의 다른 값을 갖는 현상을 말한다. 첫 번째 데이터 파일링 현상은 데이터의 차원 p가 표본 크기 n보다 큰 경우 항상 발생한다. 이 연구에서는 새로운 테스트 데이터의 파일링을 의미하는 두 번째 데이터 파일링 현상이 표본 크기 n은 고정되어 있을 때 데이터의 차원 p가 증가하는 점근적 상황에서 발생할 수 있음을 보인다. 또한 테스트 데이터의 두 더미 사이의 최대 점근 거리를 만드는 두 번째 최대 데이터 파일링 방향은 첫 번째 최대 데이터 파일링 방향을 공통의 선행 고유벡터로 구성되는 공간의 직교여공간에 투영하여 얻을 수 있음을 보인다. 두 번째 데이터 파일링 현상을 바탕으로, 일반화된 이질적 스파이크 공분산 모형 하에서 고차원 저표본 데이터를 완벽하게 분류할 수 있는 새로운 선형 분류 방법을 제안한다.Chapter 1 Introduction 1 Chapter 2 Heterogeneous Covariance Models 6 Chapter 3 Data Piling of Independent Test Data 10 3.1 One-component Covariance Model 11 3.2 Main Theorem 20 Chapter 4 Estimation of Second Maximal Data Piling Direction 26 Chapter 5 Simulation 33 Chapter 6 Discussion 37 Appendix A Asymptotic Properties of High-dimensional Sample Within-scatter Matrix 42 A.1 Proof of Lemma 3 45 A.2 Proof of Lemma 4 47 Appendix B Technical Details of Main Results 52 B.1 Proof of Theorem 5 52 B.2 Proof of Theorem 6 55 B.3 Proof of Theorem 7 58 B.4 Proof of Theorem 8 59 B.5 Proof of Theorem 9 60 국문초록 63석

SNU Open Repository and Archive