383 research outputs found
Adaptive greedy algorithm for moderately large dimensions in kernel conditional density estimation
This paper studies the estimation of the conditional density f (x, ยท) of Y i given X i = x, from the observation of an i.i.d. sample (X i , Y i) โ R d , i = 1,. .. , n. We assume that f depends only on r unknown components with typically r d. We provide an adaptive fully-nonparametric strategy based on kernel rules to estimate f. To select the bandwidth of our kernel rule, we propose a new fast iterative algorithm inspired by the Rodeo algorithm (Wasserman and Lafferty (2006)) to detect the sparsity structure of f. More precisely, in the minimax setting, our pointwise estimator, which is adaptive to both the regularity and the sparsity, achieves the quasi-optimal rate of convergence. Its computational complexity is only O(dn log n)
Valuing travel time variability: Characteristics of the travel time distribution on an urban road
Fosgerau and Karlstrom [The value of reliability. Transportation Research Part B, Vol. 43 (8โ9), pp. 813โ820, 2010] presented a derivation of the value of travel time variability (VTTV) with a number of desirable properties. This definition of the VTTV depends on certain properties of
the distribution of random travel times that require empirical verification. This paper therefore provides a detailed empirical investigation of the distribution of travel times on an urban road. Applying a range of nonparametric statistical techniques to data giving minute-by-minute travel times for a congested urban road over a period of five months, we show that the standardized
travel time is roughly independent of the time of day as required by the theory. Except for the extreme right tail, a stable distribution seems to fit the data well. The travel time distributions on consecutive links seem to share a common stability parameter such that the travel time distribution for a sequence of links is also a stable distribution. The parameters of the travel time distribution
for a sequence of links can then be derived analytically from the link level distributions
Inverse Density as an Inverse Problem: The Fredholm Equation Approach
In this paper we address the problem of estimating the ratio
where is a density function and is another density, or, more generally
an arbitrary function. Knowing or approximating this ratio is needed in various
problems of inference and integration, in particular, when one needs to average
a function with respect to one probability distribution, given a sample from
another. It is often referred as {\it importance sampling} in statistical
inference and is also closely related to the problem of {\it covariate shift}
in transfer learning as well as to various MCMC methods. It may also be useful
for separating the underlying geometry of a space, say a manifold, from the
density function defined on it.
Our approach is based on reformulating the problem of estimating
as an inverse problem in terms of an integral operator
corresponding to a kernel, and thus reducing it to an integral equation, known
as the Fredholm problem of the first kind. This formulation, combined with the
techniques of regularization and kernel methods, leads to a principled
kernel-based framework for constructing algorithms and for analyzing them
theoretically.
The resulting family of algorithms (FIRE, for Fredholm Inverse Regularized
Estimator) is flexible, simple and easy to implement.
We provide detailed theoretical analysis including concentration bounds and
convergence rates for the Gaussian kernel in the case of densities defined on
, compact domains in and smooth -dimensional sub-manifolds of
the Euclidean space.
We also show experimental results including applications to classification
and semi-supervised learning within the covariate shift framework and
demonstrate some encouraging experimental comparisons. We also show how the
parameters of our algorithms can be chosen in a completely unsupervised manner.Comment: Fixing a few typos in last versio
Renewable Composite Quantile Method and Algorithm for Nonparametric Models with Streaming Data
We are interested in renewable estimations and algorithms for nonparametric
models with streaming data. In our method, the nonparametric function of
interest is expressed through a functional depending on a weight function and a
conditional distribution function (CDF). The CDF is estimated by renewable
kernel estimations combined with function interpolations, based on which we
propose the method of renewable weighted composite quantile regression (WCQR).
Then we fully use the model structure and obtain new selectors for the weight
function, such that the WCQR can achieve asymptotic unbiasness when estimating
specific functions in the model. We also propose practical bandwidth selectors
for streaming data and find the optimal weight function minimizing the
asymptotic variance. The asymptotical results show that our estimator is almost
equivalent to the oracle estimator obtained from the entire data together.
Besides, our method also enjoys adaptiveness to error distributions, robustness
to outliers, and efficiency in both estimation and computation. Simulation
studies and real data analyses further confirm our theoretical findings.Comment: 24 pages, 0 figure
Regularizaciรณn Laplaciana en el espacio dual para SVMs
Mรกster Universitario en en Investigaciรณn e Innovaciรณn en Inteligencia Computacional y Sistemas InteractivosNowadays, Machine Learning (ML) is a field with a great impact because of its usefulness in solving
many types of problems. However, today large amounts of data are handled and therefore traditional
learning methods can be severely limited in performance. To address this problem, Regularized Learning
(RL) is used, where the objective is to make the model as flexible as possible but preserving the
generalization properties, so that overfitting is avoided.
There are many models that use regularization in their formulations, such as Lasso, or models that
use intrinsic regularization, such as the Support Vector Machine (SVM). In this model, the margin of
a separating hyperplane is maximized, resulting in a solution that depends only on a subset of the
samples called support vectors.
This Master Thesis aims to develop an SVM model with Laplacian regularization in the dual space,
under the intuitive idea that close patterns should have similar coefficients. To construct the Laplacian
term we will use as basis the Fused Lasso model which penalizes the differences of the consecutive
coefficients, but in our case we seek to penalize the differences between every pair of samples, using
the elements of the kernel matrix as weights.
This thesis presents the different phases carried out in the implementation of the new proposal,
starting from the standard SVM, followed by the comparative experiments between the new model and
the original method. As a result, we see that Laplacian regularization is very useful, since the new
proposal outperforms the standard SVM in most of the datasets used, both in classification and regression.
Furthermore, we observe that if we only consider the Laplacian term and we set the parameter
C (upper bound for the coefficients) as if it were infinite, we also obtain better performance than the
standard SVM metho
Nonparametric circular methods for density and regression
The goal of this dissertation is to introduce nonparametric methods for density and regression estimation for circular data, analyzing their performance through simulation studies and illustrating their use by real data applications. In addition, the proposed methods are implemented in the R library NPCirc
B-์คํ๋ผ์ธ ๊ณผ์๋น ์ฒด๊ณ๋ฅผ ์ด์ฉํ ๋น๋ชจ์ ๋ฒ ์ด์ฆ ํ๊ท ๋ชจํ ์ฐ๊ตฌ
ํ์๋
ผ๋ฌธ(๋ฐ์ฌ) -- ์์ธ๋ํ๊ต๋ํ์ : ์์ฐ๊ณผํ๋ํ ํต๊ณํ๊ณผ, 2021.8. ์ด์ฌ์ฉ.๋ณธ ํ์ ๋
ผ๋ฌธ์์๋ ํจ์์ ๋ณํํ๋ ๋ถ๋๋ฌ์์ ์ถ์ ํ๊ธฐ ์ํด LARK ๋ชจํ์ ํ์ฅํ โ๋ ๋น ์ ์ B-์คํ๋ผ์ธ ํ๊ท ๋ชจํโ (LABS) ์ ์ ์ํ๋ค. ์ฆ, ์ ์ํ ๋ชจํ์ B-์คํ๋ผ์ธ ๊ธฐ์ ๋ค์ด ์์ฑ ์ปค๋๋ก ๊ฐ๋ LARK ๋ชจํ์ด๋ค. ์ ์ํ ๋ชจํ์ B-์คํ๋ผ์ธ ๊ธฐ์ ์ ์ฐจ์๋ฅผ ์กฐ์ ํ๋ฉด์ ๋ถ์ฐ์ํ๊ฑฐ๋ ์ต๊ณ ์ ๋ฑ์ ์ง๋ ํจ์์ ๋ถ๋๋ฌ์์ ์ฒด๊ณ์ ์ผ๋ก ์ ์ํ๋ค. ๋ชจ์ ์คํ๋ค๊ณผ ์ค์ ์๋ฃ ๋ถ์์ ํตํด์ ์ ์ํ ๋ชจํ์ด ๋ถ์ฐ์์ , ์ต๊ณ ์ , ๊ณก์ ๋ถ๋ถ์ ๋ชจ๋ ์ ์ถ์ ํ๊ณ ์์์ ์
์ฆํ๊ณ , ๊ฑฐ์ ๋ชจ๋ ์คํ์์ ์ต๊ณ ์ ์ฑ๋ฅ์ ๋ฐํํ๋ค. ๋ํ, B-์คํ๋ผ์ธ ์ฐจ์์ ๋ฐ๋ผ LABS ๋ชจํ์ ํ๊ท ํจ์๊ฐ ํน์ ๋ฒ ์ํ ๊ณต๊ฐ์ ์กด์ฌํ๊ณ , LABS ๋ชจํ์ ์ฌ์ ๋ถํฌ๊ฐ ํด๋น ๋ฒ ์ํ ๊ณต๊ฐ์ ์๋นํ ๋์ ๋ฐ์นจ์ ๊ฐ๋๋ค๋ ๊ฒ์ ๋ฐํ๋ค.
์ถ๊ฐ์ ์ผ๋ก, ํ
์๊ณฑ B-์คํ๋ผ์ธ ๊ธฐ์ ๋ฅผ ๋์
ํ์ฌ ๋ค์ฐจ์ ์๋ฃ๋ฅผ ๋ถ์ํ ์ ์๋ LABS ๋ชจํ์ ๊ฐ๋ฐํ๋ค. ์ ์ํ ๋ชจํ์ โ๋ค์ฐจ์ ๋ ๋น ์ ์ B-์คํ๋ผ์ธ ํ๊ท ๋ชจํโ (MLABS) ์ด๋ผ๊ณ ๋ช
๋ช
ํ๋ค. MLABS ๋ชจํ์ ํ๊ท ๋ฐ ๋ถ๋ฅ ๋ฌธ์ ๋ค์์ ์ต์ ๋ชจํ๋ค๊ณผ ํ์ ํ ๋งํ ์ฑ๋ฅ์ ๊ฐ์ถ๊ณ ์๋ค. ํนํ, MLABS ๋ชจํ์ด ์ ์ฐจ์ ํ๊ท ๋ฌธ์ ๋ค์์ ์ต์ ๋น๋ชจ์ ํ๊ท ๋ชจํ๋ค๋ณด๋ค ์์ ์ ์ด๊ณ ์ ํํ ์์ธก ๋ฅ๋ ฅ์ ์ง๋๊ณ ์์์ ์คํ๋ค์ ํตํด ๋ณด์ธ๋ค.In this dissertation, we propose the Lรฉvy Adaptive B-Spline regression (LABS) model, an extension of the LARK models, to estimate functions with varying degrees of smoothness. LABS model is a LARK with B-spline bases as generating kernels. By changing the degrees of the B-spline basis, LABS can systematically adapt the smoothness of functions, i.e., jump discontinuities, sharp peaks, etc. Results of simulation studies and real data examples support that this model catches not only smooth areas but also jumps and sharp peaks of functions. The LABS model has the best performance in almost all examples. We also provide theoretical results that the mean function for the LABS model belongs to the specific Besov spaces based on the degrees of the B-spline basis and that the prior of the model has the full support on the Besov spaces.
Furthermore, we develop a multivariate version of the LABS model by introducing tensor product of B-spline bases named Multivariate Lรฉvy Adaptive B-Spline regression (MLABS). MLABS model has comparable performance on both regression and classification problems. Especially, empirical results demonstrate that MLABS has more stable and accurate predictive abilities than state-of-the-art nonparametric regression models in relatively low-dimensional data.1 Introduction 1
1.1 Nonparametric regression model 1
1.2 Literature Review 2
1.2.1 Literature review of nonparametric function estimation 2
1.2.2 Literature review of multivariate nonparametric regression 5
1.3 Outline 7
2 Bayesian nonparametric function estimation using overcomplete systems with B-spline bases 9
2.1 Introduction 9
2.2 Lรฉvy adaptive regression kernels 11
2.3 Lรฉvy adaptive B-spline regression 14
2.3.1 B-spline basis 15
2.3.2 Model specification 17
2.3.3 Support of LABS model 19
2.4 Algorithm 22
2.5 Simulation studies 25
2.5.1 Simulation 1 : DJ test functions 27
2.5.2 Simulation 2 : Smooth functions with jumps and peaks 30
2.6 Real data applications 35
2.6.1 Example 1: Minimum legal drinking age 35
2.6.2 Example 2: Bitcoin prices on Bitstamp 37
2.6.3 Example 3: Fine particulate matter in Seoul 39
2.7 Discussion 42
3 Bayesian multivariate nonparametric regression using overcomplete systems with tensor products of B-spline bases 43
3.1 Introduction 43
3.2 Multivariate Lรฉvy adaptive B-spline regression 44
3.2.1 Model specifications 45
3.2.2 Comparisons between basis fucntions of MLABS and MARS 47
3.2.3 Posterior inference 50
3.2.4 Binomial regressions for MLABS 53
3.3 Simulation studies 55
3.3.1 Surface examples 58
3.3.2 Friedman's examples 60
3.4 Real data applications 63
3.4.1 Regression examples 64
3.4.2 Classification examples 66
3.5 Discussion 67
4 Concluding Remarks 70
A Appendix 72
A.1 Appendix for Chapter 2 72
A.1.1 Proof of Theorem 2.3.1 72
A.1.2 Proof of Theorem 2.3.2 75
A.1.3 Proof of Theorem 2.3.3 75
A.1.4 Full simulation results for Simulation 1 79
A.1.5 Derivation of the full conditionals for LABS 83
Bibliography 87
Abstract in Korean 95๋ฐ
- โฆ