2 research outputs found

    이질적 곡뢄산 λͺ¨ν˜•μ—μ„œμ˜ 이쀑 데이터 파일링 ν˜„μƒ

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(석사) -- μ„œμšΈλŒ€ν•™κ΅λŒ€ν•™μ› : μžμ—°κ³Όν•™λŒ€ν•™ 톡계학과, 2023. 8. μ •μ„±κ·œ.In this work, we characterize two data piling phenomenon for a high-dimensional binary classification problem with heterogeneous covariance models. The data piling refers to the phenomenon where projections of the training data onto a direction vector have exactly two distinct values, one for each class. This first data piling phenomenon occurs for any data when the dimension p is larger than the sample size n. We show that the second data piling phenomenon, which refers to a data piling of independent test data, can occur in an asymptotic context where p grows while n is fixed. We further show that a second maximal data piling direction, which gives an asymptotic maximal distance between the two piles of independent test data, can be obtained by projecting the first maximal data piling direction onto the nullspace of the common leading eigenspace. Based on the second data piling phenomenon, we propose novel linear classification rules which ensure perfect classification of high-dimension low-sample-size data under generalized heterogeneous spiked covariance models.λ³Έ μ—°κ΅¬μ—μ„œλŠ” 이질적인 곡뢄산 λͺ¨ν˜•μ„ κ°€μ •ν•˜λŠ” 고차원 이항 λΆ„λ₯˜ λ¬Έμ œμ— λŒ€ν•œ 두 가지 데이터 파일링 ν˜„μƒμ„ κ΅¬μ²΄ν™”ν•œλ‹€. 데이터 파일링 ν˜„μƒμ€ ν›ˆλ ¨ 데이터λ₯Ό λ°©ν–₯ 벑터에 μ‚¬μ˜ν•˜μ˜€μ„ λ•Œ 각 λ²”μ£Όλ§ˆλ‹€ μ •ν™•νžˆ 두 개의 λ‹€λ₯Έ 값을 κ°–λŠ” ν˜„μƒμ„ λ§ν•œλ‹€. 첫 번째 데이터 파일링 ν˜„μƒμ€ λ°μ΄ν„°μ˜ 차원 pκ°€ ν‘œλ³Έ 크기 n보닀 큰 경우 항상 λ°œμƒν•œλ‹€. 이 μ—°κ΅¬μ—μ„œλŠ” μƒˆλ‘œμš΄ ν…ŒμŠ€νŠΈ λ°μ΄ν„°μ˜ νŒŒμΌλ§μ„ μ˜λ―Έν•˜λŠ” 두 번째 데이터 파일링 ν˜„μƒμ΄ ν‘œλ³Έ 크기 n은 κ³ μ •λ˜μ–΄ μžˆμ„ λ•Œ λ°μ΄ν„°μ˜ 차원 pκ°€ μ¦κ°€ν•˜λŠ” 점근적 μƒν™©μ—μ„œ λ°œμƒν•  수 μžˆμŒμ„ 보인닀. λ˜ν•œ ν…ŒμŠ€νŠΈ λ°μ΄ν„°μ˜ 두 더미 μ‚¬μ΄μ˜ μ΅œλŒ€ 점근 거리λ₯Ό λ§Œλ“œλŠ” 두 번째 μ΅œλŒ€ 데이터 파일링 λ°©ν–₯은 첫 번째 μ΅œλŒ€ 데이터 파일링 λ°©ν–₯을 κ³΅ν†΅μ˜ μ„ ν–‰ κ³ μœ λ²‘ν„°λ‘œ κ΅¬μ„±λ˜λŠ” κ³΅κ°„μ˜ 직ꡐ여곡간에 νˆ¬μ˜ν•˜μ—¬ 얻을 수 μžˆμŒμ„ 보인닀. 두 번째 데이터 파일링 ν˜„μƒμ„ λ°”νƒ•μœΌλ‘œ, μΌλ°˜ν™”λœ 이질적 슀파이크 곡뢄산 λͺ¨ν˜• ν•˜μ—μ„œ 고차원 μ €ν‘œλ³Έ 데이터λ₯Ό μ™„λ²½ν•˜κ²Œ λΆ„λ₯˜ν•  수 μžˆλŠ” μƒˆλ‘œμš΄ μ„ ν˜• λΆ„λ₯˜ 방법을 μ œμ•ˆν•œλ‹€.Chapter 1 Introduction 1 Chapter 2 Heterogeneous Covariance Models 6 Chapter 3 Data Piling of Independent Test Data 10 3.1 One-component Covariance Model 11 3.2 Main Theorem 20 Chapter 4 Estimation of Second Maximal Data Piling Direction 26 Chapter 5 Simulation 33 Chapter 6 Discussion 37 Appendix A Asymptotic Properties of High-dimensional Sample Within-scatter Matrix 42 A.1 Proof of Lemma 3 45 A.2 Proof of Lemma 4 47 Appendix B Technical Details of Main Results 52 B.1 Proof of Theorem 5 52 B.2 Proof of Theorem 6 55 B.3 Proof of Theorem 7 58 B.4 Proof of Theorem 8 59 B.5 Proof of Theorem 9 60 ꡭ문초둝 63석
    corecore