12,291 research outputs found
κΈ°κ³ νμ΅ μκ³ λ¦¬μ¦μ μ κ·Ό μ±μ§ μ°κ΅¬
νμλ
Όλ¬Έ(λ°μ¬)--μμΈλνκ΅ λνμ :μμ°κ³Όνλν ν΅κ³νκ³Ό,2020. 2. κΉμ©λ.In this thesis, we study the asymptotic properties of three machine learning algorithms including two supervised learning algorithms with deep neural networks and a Bayesian learning method for high-dimensional factor models.
The first research problem involves learning deep neural network (DNN) classifiers. We derive the fast convergence rates of a DNN classifier learned using the hinge loss. We consider various cases for a true probability model and show that the DNN classifier achieves fast convergence rates for all cases, provided its architecture is carefully selected.
The second research topic is learning sparse DNNs. We propose a sparse learning algorithm, which minimizes penalized empirical risk using a novel sparsity-inducing penalty. We establish an oracle inequality for the excess risk of the proposed sparse DNN estimator and derive convergence rates for several learning tasks. In particular, the proposed sparse DNN estimator can adaptively attain minimax optimal convergence rates for nonparametric regression problems.
The third part of the thesis is devoted to Bayesian non-parametric learning for high-dimensional factor models. We propose a prior distribution based on the two-parameter Indian buffet process, which is computationally tractable. We proved that the resulting posterior distribution concentrates on the true factor dimensionality as well as contracts to the true covariance matrix at a near-optimal rate.λ³Έ λ
Όλ¬Έμ μΈ κ°μ§ κΈ°κ³ νμ΅ μκ³ λ¦¬μ¦μ μ κ·Ό μ±μ§μ μ°κ΅¬νλ€. μ²μ λ μ₯μ μ§λνμ΅μ μ¬μ©λλ κΉμ μ κ²½λ§ νμ΅μ λ€λ£¨λ©° λ§μ§λ§ μ₯μ κ³ μ°¨μ μμΈ λͺ¨νμ λ² μ΄μ§μ μΆμ λ°©λ²μ λνμ¬ μ°κ΅¬νλ€.
첫 λ²μ§Έ μ₯μμλ κΉμ μ κ²½λ§ λΆλ₯κΈ°μ λνμ¬ μ°κ΅¬νλ€. μ°λ¦¬λ νμ§ μμ€ ν¨μλ‘ νμ΅ν κΉμ μ κ²½λ§ λΆλ₯κΈ°κ° λͺ κ°μ§ νλ₯ λͺ¨νμ λν΄ λΉ λ₯Έ μλ ΄ μλλ₯Ό λ¬μ±ν¨μ 보μλ€.
λ λ²μ§Έ μ₯μμλ κΉμ μ κ²½λ§μ ν¬μ νμ΅μ λνμ¬ μ°κ΅¬νλ€. μ°λ¦¬λ κ²½ν μνκ³Ό ν¬μμ±μ λΆμ¬νλ λ²μ ν¨μλ₯Ό λν λͺ©μ ν¨μλ₯Ό μ΅μννλ νμ΅ λ°©λ²μ μ μνμλ€. μ°λ¦¬λ μ μνλ κΉμ ν¬μ μ κ²½λ§ μΆμ λμ λν μ μ λΆλ±μμ μ»μμΌλ©°, μ΄λ₯Ό ν΅ν΄ λͺ κ°μ§ ν΅κ³ λ¬Έμ μμμ μλ ΄ μλλ₯Ό ꡬνμλ€. νΉν μ μνλ κΉμ ν¬μ μ κ²½λ§ μΆμ λμ λΉλͺ¨μ νκ· λ¬Έμ μμ μ μμ μΌλ‘ μ΅μμ΅λ μ΅μ μ±μ λ¬μ±ν¨μ 보μλ€.
λ§μ§λ§ μ₯μ κ³ μ°¨μ μμΈ λͺ¨νμμ λ² μ΄μ§μ νμ΅μ μ κ·Ό μ±μ§μ μ°κ΅¬νλ€. μ°λ¦¬λ λͺ¨μκ° λκ°μΈ μΈλλΆνκ³Όμ μ κΈ°λ°μΌλ‘ ν μ¬μ λΆν¬λ₯Ό μ μνμλ€. μ μν μ¬μ λΆν¬λ‘λΆν° μ λλ μ¬νλΆν¬κ° 곡λΆμ° νλ ¬μ κ±°μ μ΅μ μ μλ ΄ μλλ‘ μΆμ ν¨κ³Ό λμμ μμΈ μ°¨μμ μΌκ΄λκ² μΆμ ν μ μμμ μ¦λͺ
νμλ€.Introduction 1
0.1 Motivation 1
0.2 Outline and contributions2
1 Fast convergence rates of deep neural networks for classification 5
1.1 Introduction 5
1.1.1 Notation 7
1.2 Estimation of the classifier with DNNs 8
1.2.1 About the hinge loss 8
1.2.2 Learning DNN with the hinge loss 10
1.3 Fast convergence rates of DNN classifiers with the hinge loss 12
1.3.1 Case 1: Smooth conditional class probability 12
1.3.2 Case 2: Smooth boundary 13
1.3.3 Case 3: Margin condition 17
1.4 Adaptive estimation 18
1.5 Use of the logistic loss 21
1.6 Concluding remarks 24
1.7 Proofs 25
1.7.1 Complexity of a class of DNNs 25
1.7.2 Convergence rate of the excess surrogate risk for general surrogate losses 26
1.7.3 Generic convergence rate for the hinge loss 31
1.7.4 Proof of Theorem 1.3.1 33
1.7.5 Proof of Theorem 1.3.2 35
1.7.6 Proof of Theorem 1.3.3 37
1.7.7 Proof of Theorem 1.3.4 40
1.7.8 Proof of Theorem 1.4.1 42
1.7.9 Proof of Theorem 1.5.1 47
1.7.10 Proof of Proposition 1.7.9 50
2 Rate-optimal sparse learning for deep neural networks 53
2.1 Introduction 53
2.1.1 Notation 54
2.1.2 Deep neural networks 55
2.1.3 Empirical risk minimization algorithm with a sparsity constraint and its nonadaptiveness 56
2.1.4 Outline 57
2.2 Learning sparse deep neural networks with the clipped L1 penalty 57
2.3 Main results 59
2.3.1 Nonparametric regression 59
2.3.2 Classification with strictly convex losses 65
2.4 Implementation 67
2.5 Numerical studies 69
2.5.1 Regression with simulated data 69
2.5.2 Classification with real data 71
2.6 Conclusion 73
2.7 Proofs 74
2.7.1 Covering numbers of classes of DNNs 74
2.7.2 Proofs of Theorem 2.3.1 and Theorem 2.3.3 77
2.7.3 Proofs of Theorem 2.3.2 and Theorem 2.3.4 84
3 Posterior consistency of the factor dimensionality in high-dimensional sparse factor models 87
3.1 Introduction 87
3.1.1 Notation 89
3.2 Assumptions and prior distribution 90
3.2.1 Assumptions 90
3.2.2 Prior distribution and its properties 92
3.2.2.1 Induced distribution of the factor dimensionality 93
3.2.2.2 Induced distribution of the sparsity 94
3.2.2.3 Prior concentration near the true loading matrix 94
3.3 Asymptotic properties of the posterior distribution 96
3.3.1 Posterior contraction rate for covariance matrix 96
3.3.2 Posterior consistency of the factor dimensionality 97
3.4 Numerical results 98
3.4.1 MCMC algorithm 99
3.4.2 Simulation study 101
3.5 Discussions about adaptive priors 103
3.6 Concluding remarks 105
3.7 Proofs 106
3.7.1 Proofs of lemmas and corollary in Section 3.2 106
3.7.2 Proofs of theorems in Section 3.3 112
3.7.3 Proof of Theorem 3.5.1. 118
3.7.4 Auxiliary lemmas 121
Appendix A Smooth function approximation by deep neural networks with general activation functions 129
A.1 Introduction 129
A.1.1 Notation 130
A.2 Deep neural networks 131
A.3 Classes of activation functions 132
A.3.1 Piecewise linear activation functions 132
A.3.2 Locally quadratic activation functions 133
A.4 Approximation of Hlder smooth functions by deep neural networks 135
A.5 Application to statistical learning theory 139
A.5.1 Application to regression 141
A.5.2 Application to binary classification 142
A.6 Proofs 144
A.6.1 Proof of Theorem A.4.1 for piecewise linear activation functions 144
A.6.2 Proof of Theorem A.4.1 for locally quadratic activation functions 146
A.6.3 Proof of Proposition A.5.1 154
A.6.4 Proof of Theorem A.5.2. 155
A.6.5 Proof of Theorem A.5.3. 157
Appendix B Poisson mixture of finite feature models 159
B.1 Overview 159
B.1.1 Equivalence classes 160
B.1.2 Notation 161
B.2 Equivalent representations 161
B.2.1 Urn schemes 161
B.2.2 Hierarchical representation 163
B.3 Application to sparse Bayesian factor models 165
B.3.1 Model and prior 165
B.3.2 Assumptions on the true distribution 166
B.3.3 Preliminary results 167
B.3.4 Asymptotic properties 169
B.4 Proofs 170
B.4.1 Proofs of results in Section B.2 170
B.4.2 Proofs of results in Section B.3.3 174
B.4.3 Proof of Theorem B.3.5 177
Bibliography 181Docto
- β¦