Search CORE

383 research outputs found

Adaptive greedy algorithm for moderately large dimensions in kernel conditional density estimation

Author: Lacour Claire
Nguyen Minh-Lien Jeanne
Rivoirard Vincent
Publication venue: HAL CCSD
Publication date: 31/03/2019
Field of study

This paper studies the estimation of the conditional density f (x, ·) of Y i given X i = x, from the observation of an i.i.d. sample (X i , Y i) ∈ R d , i = 1,. .. , n. We assume that f depends only on r unknown components with typically r d. We provide an adaptive fully-nonparametric strategy based on kernel rules to estimate f. To select the bandwidth of our kernel rule, we propose a new fast iterative algorithm inspired by the Rodeo algorithm (Wasserman and Lafferty (2006)) to detect the sparsity structure of f. More precisely, in the minimax setting, our pointwise estimator, which is adaptive to both the regularity and the sparsity, achieves the quasi-optimal rate of convergence. Its computational complexity is only O(dn log n)

Valuing travel time variability: Characteristics of the travel time distribution on an urban road

Author: Fosgerau Mogens
Fukuda Daisuke
Publication venue
Publication date: 01/01/2010
Field of study

Fosgerau and Karlstrom [The value of reliability. Transportation Research Part B, Vol. 43 (8–9), pp. 813–820, 2010] presented a derivation of the value of travel time variability (VTTV) with a number of desirable properties. This definition of the VTTV depends on certain properties of the distribution of random travel times that require empirical verification. This paper therefore provides a detailed empirical investigation of the distribution of travel times on an urban road. Applying a range of nonparametric statistical techniques to data giving minute-by-minute travel times for a congested urban road over a period of five months, we show that the standardized travel time is roughly independent of the time of day as required by the theory. Except for the extreme right tail, a stable distribution seems to fit the data well. The travel time distributions on consecutive links seem to share a common stability parameter such that the travel time distribution for a sequence of links is also a stable distribution. The parameters of the travel time distribution for a sequence of links can then be derived analytically from the link level distributions

Munich RePEc Personal Archive

Inverse Density as an Inverse Problem: The Fredholm Equation Approach

Author: Belkin Mikhail
Que Qichao
Publication venue
Publication date: 25/04/2013
Field of study

In this paper we address the problem of estimating the ratio

\frac{q}{p}

where

p

is a density function and

q

is another density, or, more generally an arbitrary function. Knowing or approximating this ratio is needed in various problems of inference and integration, in particular, when one needs to average a function with respect to one probability distribution, given a sample from another. It is often referred as {\it importance sampling} in statistical inference and is also closely related to the problem of {\it covariate shift} in transfer learning as well as to various MCMC methods. It may also be useful for separating the underlying geometry of a space, say a manifold, from the density function defined on it. Our approach is based on reformulating the problem of estimating

\frac{q}{p}

as an inverse problem in terms of an integral operator corresponding to a kernel, and thus reducing it to an integral equation, known as the Fredholm problem of the first kind. This formulation, combined with the techniques of regularization and kernel methods, leads to a principled kernel-based framework for constructing algorithms and for analyzing them theoretically. The resulting family of algorithms (FIRE, for Fredholm Inverse Regularized Estimator) is flexible, simple and easy to implement. We provide detailed theoretical analysis including concentration bounds and convergence rates for the Gaussian kernel in the case of densities defined on

\R^d

, compact domains in

\R^d

and smooth

d

-dimensional sub-manifolds of the Euclidean space. We also show experimental results including applications to classification and semi-supervised learning within the covariate shift framework and demonstrate some encouraging experimental comparisons. We also show how the parameters of our algorithms can be chosen in a completely unsupervised manner.Comment: Fixing a few typos in last versio

arXiv.org e-Print Archive

CiteSeerX

Renewable Composite Quantile Method and Algorithm for Nonparametric Models with Streaming Data

Author: Chen Yan
Fang Shuixin
Lin Lu
Publication venue
Publication date: 08/10/2022
Field of study

We are interested in renewable estimations and algorithms for nonparametric models with streaming data. In our method, the nonparametric function of interest is expressed through a functional depending on a weight function and a conditional distribution function (CDF). The CDF is estimated by renewable kernel estimations combined with function interpolations, based on which we propose the method of renewable weighted composite quantile regression (WCQR). Then we fully use the model structure and obtain new selectors for the weight function, such that the WCQR can achieve asymptotic unbiasness when estimating specific functions in the model. We also propose practical bandwidth selectors for streaming data and find the optimal weight function minimizing the asymptotic variance. The asymptotical results show that our estimator is almost equivalent to the oracle estimator obtained from the entire data together. Besides, our method also enjoys adaptiveness to error distributions, robustness to outliers, and efficiency in both estimation and computation. Simulation studies and real data analyses further confirm our theoretical findings.Comment: 24 pages, 0 figure

arXiv.org e-Print Archive

Predicting expected progeny difference for marbling score in Angus cattle using artificial neural networks and Bayesian regression models

Author: A Shaneh
BDB Pereira
Brent W Woodward
C de Souza
CM Bishop
D Boichard
D Gianola
D Gianola
D Gianola
D Gianola
D Gianola
DA Winkler
Daniel Gianola
DJ Garrick
DJ Miller
DJC MacKay
DM Titterington
FD Foresee
FM Møller
G de los Campos
Guilherme JM Rosa
H Demuth
H Habier
H Okut
Hayrettin Okut
HJ Moore
I Alados
J Lampinen
JCD MacKay
JE Decker
Jeremy F Taylor
K Gopalakrishnan
M Hajmeer
M Xu
MH Beal
N Long
Robert D Schnabel
S Haykin
Stewart Bauck
T Luan
T Ostersen
THE Meuwissen
Xiao-Liao Wu
XL Wu
XL Wu
Y Liu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Regularización Laplaciana en el espacio dual para SVMs

Author: López Ramos David
Publication venue
Publication date: 01/09/2020
Field of study

Máster Universitario en en Investigación e Innovación en Inteligencia Computacional y Sistemas InteractivosNowadays, Machine Learning (ML) is a field with a great impact because of its usefulness in solving many types of problems. However, today large amounts of data are handled and therefore traditional learning methods can be severely limited in performance. To address this problem, Regularized Learning (RL) is used, where the objective is to make the model as flexible as possible but preserving the generalization properties, so that overfitting is avoided. There are many models that use regularization in their formulations, such as Lasso, or models that use intrinsic regularization, such as the Support Vector Machine (SVM). In this model, the margin of a separating hyperplane is maximized, resulting in a solution that depends only on a subset of the samples called support vectors. This Master Thesis aims to develop an SVM model with Laplacian regularization in the dual space, under the intuitive idea that close patterns should have similar coefficients. To construct the Laplacian term we will use as basis the Fused Lasso model which penalizes the differences of the consecutive coefficients, but in our case we seek to penalize the differences between every pair of samples, using the elements of the kernel matrix as weights. This thesis presents the different phases carried out in the implementation of the new proposal, starting from the standard SVM, followed by the comparative experiments between the new model and the original method. As a result, we see that Laplacian regularization is very useful, since the new proposal outperforms the standard SVM in most of the datasets used, both in classification and regression. Furthermore, we observe that if we only consider the Laplacian term and we set the parameter C (upper bound for the coefficients) as if it were infinite, we also obtain better performance than the standard SVM metho

Biblos-e Archivo

Nonparametric circular methods for density and regression

Author: Oliveira Pérez María
Publication venue
Publication date: 01/01/2014
Field of study

The goal of this dissertation is to introduce nonparametric methods for density and regression estimation for circular data, analyzing their performance through simulation studies and illustrating their use by real data applications. In addition, the proposed methods are implemented in the R library NPCirc

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositorio Institucional da Universidade de Santiago de Compostela

B-스플라인 과완비 체계를 이용한 비모수 베이즈 회귀 모형 연구

Author: 박세원
Publication venue: 서울대학교 대학원
Publication date: 01/08/2021
Field of study

학위논문(박사) -- 서울대학교대학원 : 자연과학대학 통계학과, 2021.8. 이재용.본 학위 논문에서는 함수의 변화하는 부드러움을 추정하기 위해 LARK 모형을 확장한 “레비 적응 B-스플라인 회귀 모형” (LABS) 을 제안한다. 즉, 제안한 모형은 B-스플라인 기저들이 생성 커널로 갖는 LARK 모형이다. 제안한 모형은 B-스플라인 기저의 차수를 조정하면서 불연속하거나 최고점 등을 지닌 함수의 부드러움에 체계적으로 적응한다. 모의 실험들과 실제 자료 분석을 통해서 제안한 모형이 불연속점, 최고점, 곡선 부분을 모두 잘 추정하고 있음을 입증하고, 거의 모든 실험에서 최고의 성능을 발휘한다. 또한, B-스플라인 차수에 따라 LABS 모형의 평균 함수가 특정 베소프 공간에 존재하고, LABS 모형의 사전분포가 해당 베소프 공간에 상당히 넓은 받침을 갖는다는 것을 밝힌다. 추가적으로, 텐서곱 B-스플라인 기저를 도입하여 다차원 자료를 분석할 수 있는 LABS 모형을 개발한다. 제안한 모형을 “다차원 레비 적응 B-스플라인 회귀 모형” (MLABS) 이라고 명명한다. MLABS 모형은 회귀 및 분류 문제들에서 최신 모형들과 필적할만한 성능을 갖추고 있다. 특히, MLABS 모형이 저차원 회귀 문제들에서 최신 비모수 회귀 모형들보다 안정적이고 정확한 예측 능력을 지니고 있음을 실험들을 통해 보인다.In this dissertation, we propose the Lévy Adaptive B-Spline regression (LABS) model, an extension of the LARK models, to estimate functions with varying degrees of smoothness. LABS model is a LARK with B-spline bases as generating kernels. By changing the degrees of the B-spline basis, LABS can systematically adapt the smoothness of functions, i.e., jump discontinuities, sharp peaks, etc. Results of simulation studies and real data examples support that this model catches not only smooth areas but also jumps and sharp peaks of functions. The LABS model has the best performance in almost all examples. We also provide theoretical results that the mean function for the LABS model belongs to the specific Besov spaces based on the degrees of the B-spline basis and that the prior of the model has the full support on the Besov spaces. Furthermore, we develop a multivariate version of the LABS model by introducing tensor product of B-spline bases named Multivariate Lévy Adaptive B-Spline regression (MLABS). MLABS model has comparable performance on both regression and classification problems. Especially, empirical results demonstrate that MLABS has more stable and accurate predictive abilities than state-of-the-art nonparametric regression models in relatively low-dimensional data.1 Introduction 1 1.1 Nonparametric regression model 1 1.2 Literature Review 2 1.2.1 Literature review of nonparametric function estimation 2 1.2.2 Literature review of multivariate nonparametric regression 5 1.3 Outline 7 2 Bayesian nonparametric function estimation using overcomplete systems with B-spline bases 9 2.1 Introduction 9 2.2 Lévy adaptive regression kernels 11 2.3 Lévy adaptive B-spline regression 14 2.3.1 B-spline basis 15 2.3.2 Model specification 17 2.3.3 Support of LABS model 19 2.4 Algorithm 22 2.5 Simulation studies 25 2.5.1 Simulation 1 : DJ test functions 27 2.5.2 Simulation 2 : Smooth functions with jumps and peaks 30 2.6 Real data applications 35 2.6.1 Example 1: Minimum legal drinking age 35 2.6.2 Example 2: Bitcoin prices on Bitstamp 37 2.6.3 Example 3: Fine particulate matter in Seoul 39 2.7 Discussion 42 3 Bayesian multivariate nonparametric regression using overcomplete systems with tensor products of B-spline bases 43 3.1 Introduction 43 3.2 Multivariate Lévy adaptive B-spline regression 44 3.2.1 Model specifications 45 3.2.2 Comparisons between basis fucntions of MLABS and MARS 47 3.2.3 Posterior inference 50 3.2.4 Binomial regressions for MLABS 53 3.3 Simulation studies 55 3.3.1 Surface examples 58 3.3.2 Friedman's examples 60 3.4 Real data applications 63 3.4.1 Regression examples 64 3.4.2 Classification examples 66 3.5 Discussion 67 4 Concluding Remarks 70 A Appendix 72 A.1 Appendix for Chapter 2 72 A.1.1 Proof of Theorem 2.3.1 72 A.1.2 Proof of Theorem 2.3.2 75 A.1.3 Proof of Theorem 2.3.3 75 A.1.4 Full simulation results for Simulation 1 79 A.1.5 Derivation of the full conditionals for LABS 83 Bibliography 87 Abstract in Korean 95박

SNU Open Repository and Archive