157 research outputs found
B-μ€νλΌμΈ κ³ΌμλΉ μ²΄κ³λ₯Ό μ΄μ©ν λΉλͺ¨μ λ² μ΄μ¦ νκ· λͺ¨ν μ°κ΅¬
νμλ
Όλ¬Έ(λ°μ¬) -- μμΈλνκ΅λνμ : μμ°κ³Όνλν ν΅κ³νκ³Ό, 2021.8. μ΄μ¬μ©.λ³Έ νμ λ
Όλ¬Έμμλ ν¨μμ λ³ννλ λΆλλ¬μμ μΆμ νκΈ° μν΄ LARK λͺ¨νμ νμ₯ν βλ λΉ μ μ B-μ€νλΌμΈ νκ· λͺ¨νβ (LABS) μ μ μνλ€. μ¦, μ μν λͺ¨νμ B-μ€νλΌμΈ κΈ°μ λ€μ΄ μμ± μ»€λλ‘ κ°λ LARK λͺ¨νμ΄λ€. μ μν λͺ¨νμ B-μ€νλΌμΈ κΈ°μ μ μ°¨μλ₯Ό μ‘°μ νλ©΄μ λΆμ°μνκ±°λ μ΅κ³ μ λ±μ μ§λ ν¨μμ λΆλλ¬μμ 체κ³μ μΌλ‘ μ μνλ€. λͺ¨μ μ€νλ€κ³Ό μ€μ μλ£ λΆμμ ν΅ν΄μ μ μν λͺ¨νμ΄ λΆμ°μμ , μ΅κ³ μ , 곑μ λΆλΆμ λͺ¨λ μ μΆμ νκ³ μμμ μ
μ¦νκ³ , κ±°μ λͺ¨λ μ€νμμ μ΅κ³ μ μ±λ₯μ λ°ννλ€. λν, B-μ€νλΌμΈ μ°¨μμ λ°λΌ LABS λͺ¨νμ νκ· ν¨μκ° νΉμ λ² μν 곡κ°μ μ‘΄μ¬νκ³ , LABS λͺ¨νμ μ¬μ λΆν¬κ° ν΄λΉ λ² μν 곡κ°μ μλΉν λμ λ°μΉ¨μ κ°λλ€λ κ²μ λ°νλ€.
μΆκ°μ μΌλ‘, ν
μκ³± B-μ€νλΌμΈ κΈ°μ λ₯Ό λμ
νμ¬ λ€μ°¨μ μλ£λ₯Ό λΆμν μ μλ LABS λͺ¨νμ κ°λ°νλ€. μ μν λͺ¨νμ βλ€μ°¨μ λ λΉ μ μ B-μ€νλΌμΈ νκ· λͺ¨νβ (MLABS) μ΄λΌκ³ λͺ
λͺ
νλ€. MLABS λͺ¨νμ νκ· λ° λΆλ₯ λ¬Έμ λ€μμ μ΅μ λͺ¨νλ€κ³Ό νμ ν λ§ν μ±λ₯μ κ°μΆκ³ μλ€. νΉν, MLABS λͺ¨νμ΄ μ μ°¨μ νκ· λ¬Έμ λ€μμ μ΅μ λΉλͺ¨μ νκ· λͺ¨νλ€λ³΄λ€ μμ μ μ΄κ³ μ νν μμΈ‘ λ₯λ ₯μ μ§λκ³ μμμ μ€νλ€μ ν΅ν΄ 보μΈλ€.In this dissertation, we propose the LΓ©vy Adaptive B-Spline regression (LABS) model, an extension of the LARK models, to estimate functions with varying degrees of smoothness. LABS model is a LARK with B-spline bases as generating kernels. By changing the degrees of the B-spline basis, LABS can systematically adapt the smoothness of functions, i.e., jump discontinuities, sharp peaks, etc. Results of simulation studies and real data examples support that this model catches not only smooth areas but also jumps and sharp peaks of functions. The LABS model has the best performance in almost all examples. We also provide theoretical results that the mean function for the LABS model belongs to the specific Besov spaces based on the degrees of the B-spline basis and that the prior of the model has the full support on the Besov spaces.
Furthermore, we develop a multivariate version of the LABS model by introducing tensor product of B-spline bases named Multivariate LΓ©vy Adaptive B-Spline regression (MLABS). MLABS model has comparable performance on both regression and classification problems. Especially, empirical results demonstrate that MLABS has more stable and accurate predictive abilities than state-of-the-art nonparametric regression models in relatively low-dimensional data.1 Introduction 1
1.1 Nonparametric regression model 1
1.2 Literature Review 2
1.2.1 Literature review of nonparametric function estimation 2
1.2.2 Literature review of multivariate nonparametric regression 5
1.3 Outline 7
2 Bayesian nonparametric function estimation using overcomplete systems with B-spline bases 9
2.1 Introduction 9
2.2 LΓ©vy adaptive regression kernels 11
2.3 LΓ©vy adaptive B-spline regression 14
2.3.1 B-spline basis 15
2.3.2 Model specification 17
2.3.3 Support of LABS model 19
2.4 Algorithm 22
2.5 Simulation studies 25
2.5.1 Simulation 1 : DJ test functions 27
2.5.2 Simulation 2 : Smooth functions with jumps and peaks 30
2.6 Real data applications 35
2.6.1 Example 1: Minimum legal drinking age 35
2.6.2 Example 2: Bitcoin prices on Bitstamp 37
2.6.3 Example 3: Fine particulate matter in Seoul 39
2.7 Discussion 42
3 Bayesian multivariate nonparametric regression using overcomplete systems with tensor products of B-spline bases 43
3.1 Introduction 43
3.2 Multivariate LΓ©vy adaptive B-spline regression 44
3.2.1 Model specifications 45
3.2.2 Comparisons between basis fucntions of MLABS and MARS 47
3.2.3 Posterior inference 50
3.2.4 Binomial regressions for MLABS 53
3.3 Simulation studies 55
3.3.1 Surface examples 58
3.3.2 Friedman's examples 60
3.4 Real data applications 63
3.4.1 Regression examples 64
3.4.2 Classification examples 66
3.5 Discussion 67
4 Concluding Remarks 70
A Appendix 72
A.1 Appendix for Chapter 2 72
A.1.1 Proof of Theorem 2.3.1 72
A.1.2 Proof of Theorem 2.3.2 75
A.1.3 Proof of Theorem 2.3.3 75
A.1.4 Full simulation results for Simulation 1 79
A.1.5 Derivation of the full conditionals for LABS 83
Bibliography 87
Abstract in Korean 95λ°
Bayesian nonparametric multivariate convex regression
In many applications, such as economics, operations research and
reinforcement learning, one often needs to estimate a multivariate regression
function f subject to a convexity constraint. For example, in sequential
decision processes the value of a state under optimal subsequent decisions may
be known to be convex or concave. We propose a new Bayesian nonparametric
multivariate approach based on characterizing the unknown regression function
as the max of a random collection of unknown hyperplanes. This specification
induces a prior with large support in a Kullback-Leibler sense on the space of
convex functions, while also leading to strong posterior consistency. Although
we assume that f is defined over R^p, we show that this model has a convergence
rate of log(n)^{-1} n^{-1/(d+2)} under the empirical L2 norm when f actually
maps a d dimensional linear subspace to R. We design an efficient reversible
jump MCMC algorithm for posterior computation and demonstrate the methods
through application to value function approximation
Biometrika
We consider shape restricted nonparametric regression on a closed set [Formula: see text], where it is reasonable to assume the function has no more than | local extrema interior to [Formula: see text]. Following a Bayesian approach we develop a nonparametric prior over a novel class of local extremum splines. This approach is shown to be consistent when modeling any continuously differentiable function within the class considered, and is used to develop methods for testing hypotheses on the shape of the curve. Sampling algorithms are developed, and the method is applied in simulation studies and data examples where the shape of the curve is of interest.CC999999/Intramural CDC HHS/United States2018-12-01T00:00:00Z29422695PMC5798493vault:2622
Bayesian methods in bioinformatics
This work is directed towards developing flexible Bayesian statistical methods
in the semi- and nonparamteric regression modeling framework with special focus on
analyzing data from biological and genetic experiments. This dissertation attempts to
solve two such problems in this area. In the first part, we study penalized regression
splines (P-splines), which are low-order basis splines with a penalty to avoid under-
smoothing. Such P-splines are typically not spatially adaptive, and hence can have
trouble when functions are varying rapidly. We model the penalty parameter inherent
in the P-spline method as a heteroscedastic regression function. We develop a full
Bayesian hierarchical structure to do this and use Markov Chain Monte Carlo tech-
niques for drawing random samples from the posterior for inference. We show that
the approach achieves very competitive performance as compared to other methods.
The second part focuses on modeling DNA microarray data. Microarray technology
enables us to monitor the expression levels of thousands of genes simultaneously and
hence to obtain a better picture of the interactions between the genes. In order to
understand the biological structure underlying these gene interactions, we present a
hierarchical nonparametric Bayesian model based on Multivariate Adaptive Regres-sion Splines (MARS) to capture the functional relationship between genes and also
between genes and disease status. The novelty of the approach lies in the attempt to
capture the complex nonlinear dependencies between the genes which could otherwise
be missed by linear approaches. The Bayesian model is flexible enough to identify
significant genes of interest as well as model the functional relationships between the
genes. The effectiveness of the proposed methodology is illustrated on leukemia and
breast cancer datasets
Penalized spline models and applications
Penalized spline regression models are a popular statistical tool for curve fitting
problems due to their flexibility and computational efficiency. In particular, penalized
cubic spline functions have received a great deal of attention. Cubic splines
have good numerical properties and have proven extremely useful in a variety of
applications. Typically, splines are represented as linear combinations of basis functions.
However, such representations can lack numerical stability or be difficult to
manipulate analytically.
The current thesis proposes a different parametrization for cubic spline functions
that is intuitive and simple to implement. Moreover, integral based penalty
functionals have simple interpretable expressions in terms of the components of the
parametrization. Also, the curvature of the function is not constrained to be continuous
everywhere on its domain, which adds flexibility to the fitting process.
We consider not only models where smoothness is imposed by means of a single
penalty functional, but also a generalization where a combination of different measures
of roughness is built in order to specify the adequate limit of shrinkage for the
problem at hand.
The proposed methodology is illustrated in two distinct regression settings
Locally adaptive smoothing with Markov random fields and shrinkage priors
We present a locally adaptive nonparametric curve fitting method that
operates within a fully Bayesian framework. This method uses shrinkage priors
to induce sparsity in order-k differences in the latent trend function,
providing a combination of local adaptation and global control. Using a scale
mixture of normals representation of shrinkage priors, we make explicit
connections between our method and kth order Gaussian Markov random field
smoothing. We call the resulting processes shrinkage prior Markov random fields
(SPMRFs). We use Hamiltonian Monte Carlo to approximate the posterior
distribution of model parameters because this method provides superior
performance in the presence of the high dimensionality and strong parameter
correlations exhibited by our models. We compare the performance of three prior
formulations using simulated data and find the horseshoe prior provides the
best compromise between bias and precision. We apply SPMRF models to two
benchmark data examples frequently used to test nonparametric methods. We find
that this method is flexible enough to accommodate a variety of data generating
models and offers the adaptive properties and computational tractability to
make it a useful addition to the Bayesian nonparametric toolbox.Comment: 38 pages, to appear in Bayesian Analysi
- β¦