Search CORE

23 research outputs found

Reinforcement Learning for Profit Maximization of Recommender Systems

Author: Choi Il Young
Ju Jo Yong
Kim Jae Kyeong
Moon Hyun Sil
Publication venue: AIS Electronic Library (AISeL)
Publication date: 10/12/2017
Field of study

The Discrepancy Principle for Choosing Bandwidths in Kernel Density Estimation

Author: Mildenberger Thoralf
Publication venue
Publication date: 01/01/2012
Field of study

We investigate the discrepancy principle for choosing smoothing parameters for kernel density estimation. The method is based on the distance between the empirical and estimated distribution functions. We prove some new positive and negative results on L_1-consistency of kernel estimators with bandwidths chosen using the discrepancy principle. Consistency crucially depends on a rather weak H\"older condition on the distribution function. We also unify and extend previous results on the behaviour of the chosen bandwidth under more strict smoothness assumptions. Furthermore, we compare the discrepancy principle to standard methods in a simulation study. Surprisingly, some of the proposals work reasonably well over a large set of different densities and sample sizes, and the performance of the methods at least up to n=2500 can be quite different from their asymptotic behavior.Comment: 17 pages, 3 figures. Section on histograms removed, new (positive and negative) consistency results for kernel density estimators adde

arXiv.org e-Print Archive

CiteSeerX

K-rank : an evolution of y-rank for multiple solutions problem

Author: Farenzena Marcelo
Ranzan Lucas
Santos Pedro Victor José de Lima
Trierweiler Jorge Otávio
Publication venue
Publication date: 01/01/2019
Field of study

Y-rank can present faults when dealing with non-linear problems. A methodology is proposed to improve the selection of data in situations where y-rank is fragile. The proposed alternative, called k-rank, consists of splitting the data set into clusters using the k-means algorithm, and then apply y-rank to the generated clusters. Models were calibrated and tested with subsets split by y-rank and k-rank. For the Heating Tank case study, in 59% of the simulations, models calibrated with k-rank subsets achieved better results. For the Propylene / Propane Separation Unit case, when dealing with a small number of sample points, the y-rank models had errors almost three times higher than the k-rank models for the test subset, meaning that the fitted model could not deal properly with new unseen data. The proposed methodology was successful in splitting the data, especially in cases with a limited amount of samples

Lume 5.8

Optimal cross-validation in density estimation with the $L^2$ -loss

Author: Celisse Alain
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/10/2014
Field of study

We analyze the performance of cross-validation (CV) in the density estimation framework with two purposes: (i) risk estimation and (ii) model selection. The main focus is given to the so-called leave-

p

-out CV procedure (Lpo), where

p

denotes the cardinality of the test set. Closed-form expressions are settled for the Lpo estimator of the risk of projection estimators. These expressions provide a great improvement upon

V

-fold cross-validation in terms of variability and computational complexity. From a theoretical point of view, closed-form expressions also enable to study the Lpo performance in terms of risk estimation. The optimality of leave-one-out (Loo), that is Lpo with

p=1

, is proved among CV procedures used for risk estimation. Two model selection frameworks are also considered: estimation, as opposed to identification. For estimation with finite sample size

n

, optimality is achieved for

p

large enough [with

p/n=o(1)

] to balance the overfitting resulting from the structure of the model collection. For identification, model selection consistency is settled for Lpo as long as

p/n

is conveniently related to the rate of convergence of the best estimator in the collection: (i)

p/n\to1

n\to+\infty

with a parametric rate, and (ii)

p/n=o(1)

with some nonparametric estimators. These theoretical results are validated by simulation experiments.Comment: Published in at http://dx.doi.org/10.1214/14-AOS1240 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Constructing irregular histograms by penalized likelihood

Author: Gather Ursula
Mildenberger Thoralf
Rozenholc Yves
Publication venue
Publication date
Field of study

We propose a fully automatic procedure for the construction of irregular histograms. For a given number of bins, the maximum likelihood histogram is known to be the result of a dynamic programming algorithm. To choose the number of bins, we propose two different penalties motivated by recent work in model selection by Castellan [6] and Massart [26]. We give a complete description of the algorithm and a proper tuning of the penalties. Finally, we compare our procedure to other existing proposals for a wide range of different densities and sample sizes. --irregular histogram,density estimation,penalized likelihood,dynamic programming

Research Papers in Economics

On efficient estimators of the proportion of true null hypotheses in a multiple testing setup

Author: Matias Catherine
Nguyen Van Hanh
Publication venue
Publication date: 08/01/2013
Field of study

We consider the problem of estimating the proportion

\theta

of true null hypotheses in a multiple testing context. The setup is classically modeled through a semiparametric mixture with two components: a uniform distribution on interval

[0,1]

with prior probability

\theta

and a nonparametric density

f

. We discuss asymptotic efficiency results and establish that two different cases occur whether

f

vanishes on a set with non null Lebesgue measure or not. In the first case, we exhibit estimators converging at parametric rate, compute the optimal asymptotic variance and conjecture that no estimator is asymptotically efficient (i.e. attains the optimal asymptotic variance). In the second case, we prove that the quadratic risk of any estimator does not converge at parametric rate. We illustrate those results on simulated data

arXiv.org e-Print Archive

HAL Evry

Crossref

Hal-Diderot

Theoretical analysis of cross-validation for estimating the risk of the k-Nearest Neighbor classifier

Author: Celisse Alain
Mary-Huard Tristan
Publication venue
Publication date: 12/10/2017
Field of study

The present work aims at deriving theoretical guaranties on the behavior of some cross-validation procedures applied to the

k

-nearest neighbors (

k

NN) rule in the context of binary classification. Here we focus on the leave-

p

-out cross-validation (L

p

O) used to assess the performance of the

k

NN classifier. Remarkably this L

p

O estimator can be efficiently computed in this context using closed-form formulas derived by \cite{CelisseMaryHuard11}. We describe a general strategy to derive moment and exponential concentration inequalities for the L

p

O estimator applied to the

k

NN classifier. Such results are obtained first by exploiting the connection between the L

p

O estimator and U-statistics, and second by making an intensive use of the generalized Efron-Stein inequality applied to the L

1

O estimator. One other important contribution is made by deriving new quantifications of the discrepancy between the L

p

O estimator and the classification error/risk of the

k

NN classifier. The optimality of these bounds is discussed by means of several lower bounds as well as simulation experiments

arXiv.org e-Print Archive

HAL Descartes

ProdInra

Nonparametric density estimation by exact leave-p-out cross-validation

Author: Célisse Alain
Robin Stephane
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

The problem of density estimation is addressed by minimization of the L-2-risk for both histogram and kernel estimators. This quadratic risk is estimated by leave-p-out cross-validation (LPO), which is made possible thanks to closed formulas, contrary to common belief. The potential gain in the use of LPO with respect to V-fold cross-validation (V-fold) in terms of the bias-variance trade-off is highlighted. An exact quantification of this extra variability, induced by the preliminary random partition of the data in the V-fold, is proposed. Furthermore, exact expressions are derived for both the bias and the variance of the risk estimator with histograms. Plug-in estimates of these quantities are provided, while their accuracy is assessed thanks to concentration inequalities. An adaptive selection procedure for p in the case of histograms is subsequently presented. This relies on minimization of the mean square error of the LPO risk estimator. Finally a simulation study is carried out which first illustrates the higher reliability of the LPO with respect to the V-fold, and then assesses the behavior of the selection procedure. For instance optimality of leave-one-out (LOO) is shown, at least empirically, in the context of regular histograms

HAL Descartes

Nonparametric density estimation by exact leave-p-out cross-validation

Author: Celisse Alain
Robin Stephane
Publication venue
Publication date
Field of study

Research Papers in Economics