157 research outputs found

    B-μŠ€ν”ŒλΌμΈ κ³Όμ™„λΉ„ 체계λ₯Ό μ΄μš©ν•œ λΉ„λͺ¨μˆ˜ 베이즈 νšŒκ·€ λͺ¨ν˜• 연ꡬ

    Get PDF
    ν•™μœ„λ…Όλ¬Έ(박사) -- μ„œμšΈλŒ€ν•™κ΅λŒ€ν•™μ› : μžμ—°κ³Όν•™λŒ€ν•™ 톡계학과, 2021.8. 이재용.λ³Έ ν•™μœ„ λ…Όλ¬Έμ—μ„œλŠ” ν•¨μˆ˜μ˜ λ³€ν™”ν•˜λŠ” λΆ€λ“œλŸ¬μ›€μ„ μΆ”μ •ν•˜κΈ° μœ„ν•΄ LARK λͺ¨ν˜•μ„ ν™•μž₯ν•œ β€œλ ˆλΉ„ 적응 B-μŠ€ν”ŒλΌμΈ νšŒκ·€ λͺ¨ν˜•β€ (LABS) 을 μ œμ•ˆν•œλ‹€. 즉, μ œμ•ˆν•œ λͺ¨ν˜•μ€ B-μŠ€ν”ŒλΌμΈ 기저듀이 생성 μ»€λ„λ‘œ κ°–λŠ” LARK λͺ¨ν˜•μ΄λ‹€. μ œμ•ˆν•œ λͺ¨ν˜•μ€ B-μŠ€ν”ŒλΌμΈ κΈ°μ €μ˜ 차수λ₯Ό μ‘°μ •ν•˜λ©΄μ„œ λΆˆμ—°μ†ν•˜κ±°λ‚˜ 졜고점 등을 μ§€λ‹Œ ν•¨μˆ˜μ˜ λΆ€λ“œλŸ¬μ›€μ— μ²΄κ³„μ μœΌλ‘œ μ μ‘ν•œλ‹€. λͺ¨μ˜ μ‹€ν—˜λ“€κ³Ό μ‹€μ œ 자료 뢄석을 ν†΅ν•΄μ„œ μ œμ•ˆν•œ λͺ¨ν˜•μ΄ λΆˆμ—°μ†μ , 졜고점, 곑선 뢀뢄을 λͺ¨λ‘ 잘 μΆ”μ •ν•˜κ³  μžˆμŒμ„ μž…μ¦ν•˜κ³ , 거의 λͺ¨λ“  μ‹€ν—˜μ—μ„œ 졜고의 μ„±λŠ₯을 λ°œνœ˜ν•œλ‹€. λ˜ν•œ, B-μŠ€ν”ŒλΌμΈ μ°¨μˆ˜μ— 따라 LABS λͺ¨ν˜•μ˜ 평균 ν•¨μˆ˜κ°€ νŠΉμ • λ² μ†Œν”„ 곡간에 μ‘΄μž¬ν•˜κ³ , LABS λͺ¨ν˜•μ˜ 사전뢄포가 ν•΄λ‹Ή λ² μ†Œν”„ 곡간에 μƒλ‹Ήνžˆ 넓은 받침을 κ°–λŠ”λ‹€λŠ” 것을 λ°νžŒλ‹€. μΆ”κ°€μ μœΌλ‘œ, ν…μ„œκ³± B-μŠ€ν”ŒλΌμΈ κΈ°μ €λ₯Ό λ„μž…ν•˜μ—¬ 닀차원 자료λ₯Ό 뢄석할 수 μžˆλŠ” LABS λͺ¨ν˜•μ„ κ°œλ°œν•œλ‹€. μ œμ•ˆν•œ λͺ¨ν˜•μ„ β€œλ‹€μ°¨μ› λ ˆλΉ„ 적응 B-μŠ€ν”ŒλΌμΈ νšŒκ·€ λͺ¨ν˜•β€ (MLABS) 이라고 λͺ…λͺ…ν•œλ‹€. MLABS λͺ¨ν˜•μ€ νšŒκ·€ 및 λΆ„λ₯˜ λ¬Έμ œλ“€μ—μ„œ μ΅œμ‹  λͺ¨ν˜•λ“€κ³Ό ν•„μ ν• λ§Œν•œ μ„±λŠ₯을 κ°–μΆ”κ³  μžˆλ‹€. 특히, MLABS λͺ¨ν˜•μ΄ 저차원 νšŒκ·€ λ¬Έμ œλ“€μ—μ„œ μ΅œμ‹  λΉ„λͺ¨μˆ˜ νšŒκ·€ λͺ¨ν˜•λ“€λ³΄λ‹€ μ•ˆμ •μ μ΄κ³  μ •ν™•ν•œ 예츑 λŠ₯λ ₯을 μ§€λ‹ˆκ³  μžˆμŒμ„ μ‹€ν—˜λ“€μ„ 톡해 보인닀.In this dissertation, we propose the LΓ©vy Adaptive B-Spline regression (LABS) model, an extension of the LARK models, to estimate functions with varying degrees of smoothness. LABS model is a LARK with B-spline bases as generating kernels. By changing the degrees of the B-spline basis, LABS can systematically adapt the smoothness of functions, i.e., jump discontinuities, sharp peaks, etc. Results of simulation studies and real data examples support that this model catches not only smooth areas but also jumps and sharp peaks of functions. The LABS model has the best performance in almost all examples. We also provide theoretical results that the mean function for the LABS model belongs to the specific Besov spaces based on the degrees of the B-spline basis and that the prior of the model has the full support on the Besov spaces. Furthermore, we develop a multivariate version of the LABS model by introducing tensor product of B-spline bases named Multivariate LΓ©vy Adaptive B-Spline regression (MLABS). MLABS model has comparable performance on both regression and classification problems. Especially, empirical results demonstrate that MLABS has more stable and accurate predictive abilities than state-of-the-art nonparametric regression models in relatively low-dimensional data.1 Introduction 1 1.1 Nonparametric regression model 1 1.2 Literature Review 2 1.2.1 Literature review of nonparametric function estimation 2 1.2.2 Literature review of multivariate nonparametric regression 5 1.3 Outline 7 2 Bayesian nonparametric function estimation using overcomplete systems with B-spline bases 9 2.1 Introduction 9 2.2 LΓ©vy adaptive regression kernels 11 2.3 LΓ©vy adaptive B-spline regression 14 2.3.1 B-spline basis 15 2.3.2 Model specification 17 2.3.3 Support of LABS model 19 2.4 Algorithm 22 2.5 Simulation studies 25 2.5.1 Simulation 1 : DJ test functions 27 2.5.2 Simulation 2 : Smooth functions with jumps and peaks 30 2.6 Real data applications 35 2.6.1 Example 1: Minimum legal drinking age 35 2.6.2 Example 2: Bitcoin prices on Bitstamp 37 2.6.3 Example 3: Fine particulate matter in Seoul 39 2.7 Discussion 42 3 Bayesian multivariate nonparametric regression using overcomplete systems with tensor products of B-spline bases 43 3.1 Introduction 43 3.2 Multivariate LΓ©vy adaptive B-spline regression 44 3.2.1 Model specifications 45 3.2.2 Comparisons between basis fucntions of MLABS and MARS 47 3.2.3 Posterior inference 50 3.2.4 Binomial regressions for MLABS 53 3.3 Simulation studies 55 3.3.1 Surface examples 58 3.3.2 Friedman's examples 60 3.4 Real data applications 63 3.4.1 Regression examples 64 3.4.2 Classification examples 66 3.5 Discussion 67 4 Concluding Remarks 70 A Appendix 72 A.1 Appendix for Chapter 2 72 A.1.1 Proof of Theorem 2.3.1 72 A.1.2 Proof of Theorem 2.3.2 75 A.1.3 Proof of Theorem 2.3.3 75 A.1.4 Full simulation results for Simulation 1 79 A.1.5 Derivation of the full conditionals for LABS 83 Bibliography 87 Abstract in Korean 95λ°•

    Bayesian nonparametric multivariate convex regression

    Full text link
    In many applications, such as economics, operations research and reinforcement learning, one often needs to estimate a multivariate regression function f subject to a convexity constraint. For example, in sequential decision processes the value of a state under optimal subsequent decisions may be known to be convex or concave. We propose a new Bayesian nonparametric multivariate approach based on characterizing the unknown regression function as the max of a random collection of unknown hyperplanes. This specification induces a prior with large support in a Kullback-Leibler sense on the space of convex functions, while also leading to strong posterior consistency. Although we assume that f is defined over R^p, we show that this model has a convergence rate of log(n)^{-1} n^{-1/(d+2)} under the empirical L2 norm when f actually maps a d dimensional linear subspace to R. We design an efficient reversible jump MCMC algorithm for posterior computation and demonstrate the methods through application to value function approximation

    Biometrika

    Get PDF
    We consider shape restricted nonparametric regression on a closed set [Formula: see text], where it is reasonable to assume the function has no more than | local extrema interior to [Formula: see text]. Following a Bayesian approach we develop a nonparametric prior over a novel class of local extremum splines. This approach is shown to be consistent when modeling any continuously differentiable function within the class considered, and is used to develop methods for testing hypotheses on the shape of the curve. Sampling algorithms are developed, and the method is applied in simulation studies and data examples where the shape of the curve is of interest.CC999999/Intramural CDC HHS/United States2018-12-01T00:00:00Z29422695PMC5798493vault:2622

    Bayesian methods in bioinformatics

    Get PDF
    This work is directed towards developing flexible Bayesian statistical methods in the semi- and nonparamteric regression modeling framework with special focus on analyzing data from biological and genetic experiments. This dissertation attempts to solve two such problems in this area. In the first part, we study penalized regression splines (P-splines), which are low-order basis splines with a penalty to avoid under- smoothing. Such P-splines are typically not spatially adaptive, and hence can have trouble when functions are varying rapidly. We model the penalty parameter inherent in the P-spline method as a heteroscedastic regression function. We develop a full Bayesian hierarchical structure to do this and use Markov Chain Monte Carlo tech- niques for drawing random samples from the posterior for inference. We show that the approach achieves very competitive performance as compared to other methods. The second part focuses on modeling DNA microarray data. Microarray technology enables us to monitor the expression levels of thousands of genes simultaneously and hence to obtain a better picture of the interactions between the genes. In order to understand the biological structure underlying these gene interactions, we present a hierarchical nonparametric Bayesian model based on Multivariate Adaptive Regres-sion Splines (MARS) to capture the functional relationship between genes and also between genes and disease status. The novelty of the approach lies in the attempt to capture the complex nonlinear dependencies between the genes which could otherwise be missed by linear approaches. The Bayesian model is flexible enough to identify significant genes of interest as well as model the functional relationships between the genes. The effectiveness of the proposed methodology is illustrated on leukemia and breast cancer datasets

    Penalized spline models and applications

    Get PDF
    Penalized spline regression models are a popular statistical tool for curve fitting problems due to their flexibility and computational efficiency. In particular, penalized cubic spline functions have received a great deal of attention. Cubic splines have good numerical properties and have proven extremely useful in a variety of applications. Typically, splines are represented as linear combinations of basis functions. However, such representations can lack numerical stability or be difficult to manipulate analytically. The current thesis proposes a different parametrization for cubic spline functions that is intuitive and simple to implement. Moreover, integral based penalty functionals have simple interpretable expressions in terms of the components of the parametrization. Also, the curvature of the function is not constrained to be continuous everywhere on its domain, which adds flexibility to the fitting process. We consider not only models where smoothness is imposed by means of a single penalty functional, but also a generalization where a combination of different measures of roughness is built in order to specify the adequate limit of shrinkage for the problem at hand. The proposed methodology is illustrated in two distinct regression settings

    Locally adaptive smoothing with Markov random fields and shrinkage priors

    Full text link
    We present a locally adaptive nonparametric curve fitting method that operates within a fully Bayesian framework. This method uses shrinkage priors to induce sparsity in order-k differences in the latent trend function, providing a combination of local adaptation and global control. Using a scale mixture of normals representation of shrinkage priors, we make explicit connections between our method and kth order Gaussian Markov random field smoothing. We call the resulting processes shrinkage prior Markov random fields (SPMRFs). We use Hamiltonian Monte Carlo to approximate the posterior distribution of model parameters because this method provides superior performance in the presence of the high dimensionality and strong parameter correlations exhibited by our models. We compare the performance of three prior formulations using simulated data and find the horseshoe prior provides the best compromise between bias and precision. We apply SPMRF models to two benchmark data examples frequently used to test nonparametric methods. We find that this method is flexible enough to accommodate a variety of data generating models and offers the adaptive properties and computational tractability to make it a useful addition to the Bayesian nonparametric toolbox.Comment: 38 pages, to appear in Bayesian Analysi
    • …
    corecore