503 research outputs found
Testing the order of a model
This paper deals with order identification for nested models in the i.i.d.
framework. We study the asymptotic efficiency of two generalized likelihood
ratio tests of the order. They are based on two estimators which are proved to
be strongly consistent. A version of Stein's lemma yields an optimal
underestimation error exponent. The lemma also implies that the overestimation
error exponent is necessarily trivial. Our tests admit nontrivial
underestimation error exponents. The optimal underestimation error exponent is
achieved in some situations. The overestimation error can decay exponentially
with respect to a positive power of the number of observations. These results
are proved under mild assumptions by relating the underestimation (resp.
overestimation) error to large (resp. moderate) deviations of the
log-likelihood process. In particular, it is not necessary that the classical
Cram\'{e}r condition be satisfied; namely, the -densities are not
required to admit every exponential moment. Three benchmark examples with
specific difficulties (location mixture of normal distributions, abrupt changes
and various regressions) are detailed so as to illustrate the generality of our
results.Comment: Published at http://dx.doi.org/10.1214/009053606000000344 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Faster Rates for Policy Learning
This article improves the existing proven rates of regret decay in optimal
policy estimation. We give a margin-free result showing that the regret decay
for estimating a within-class optimal policy is second-order for empirical risk
minimizers over Donsker classes, with regret decaying at a faster rate than the
standard error of an efficient estimator of the value of an optimal policy. We
also give a result from the classification literature that shows that faster
regret decay is possible via plug-in estimation provided a margin condition
holds. Four examples are considered. In these examples, the regret is expressed
in terms of either the mean value or the median value; the number of possible
actions is either two or finitely many; and the sampling scheme is either
independent and identically distributed or sequential, where the latter
represents a contextual bandit sampling scheme
Classification in postural style
This article contributes to the search for a notion of postural style,
focusing on the issue of classifying subjects in terms of how they maintain
posture. Longer term, the hope is to make it possible to determine on a case by
case basis which sensorial information is prevalent in postural control, and to
improve/adapt protocols for functional rehabilitation among those who show
deficits in maintaining posture, typically seniors. Here, we specifically
tackle the statistical problem of classifying subjects sampled from a two-class
population. Each subject (enrolled in a cohort of 54 participants) undergoes
four experimental protocols which are designed to evaluate potential deficits
in maintaining posture. These protocols result in four complex trajectories,
from which we can extract four small-dimensional summary measures. Because
undergoing several protocols can be unpleasant, and sometimes painful, we try
to limit the number of protocols needed for the classification. Therefore, we
first rank the protocols by decreasing order of relevance, then we derive four
plug-in classifiers which involve the best (i.e., more informative), the two
best, the three best and all four protocols. This two-step procedure relies on
the cutting-edge methodologies of targeted maximum likelihood learning (a
methodology for robust and efficient inference) and super-learning (a machine
learning procedure for aggregating various estimation procedures into a single
better estimation procedure). A simulation study is carried out. The
performances of the procedure applied to the real data set (and evaluated by
the leave-one-out rule) go as high as an 87% rate of correct classification (47
out of 54 subjects correctly classified), using only the best protocol.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS542 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Asymptotically Optimal Algorithms for Budgeted Multiple Play Bandits
We study a generalization of the multi-armed bandit problem with multiple
plays where there is a cost associated with pulling each arm and the agent has
a budget at each time that dictates how much she can expect to spend. We derive
an asymptotic regret lower bound for any uniformly efficient algorithm in our
setting. We then study a variant of Thompson sampling for Bernoulli rewards and
a variant of KL-UCB for both single-parameter exponential families and bounded,
finitely supported rewards. We show these algorithms are asymptotically
optimal, both in rateand leading problem-dependent constants, including in the
thick margin setting where multiple arms fall on the decision boundary
Practical targeted learning from large data sets by survey sampling
We address the practical construction of asymptotic confidence intervals for
smooth (i.e., path-wise differentiable), real-valued statistical parameters by
targeted learning from independent and identically distributed data in contexts
where sample size is so large that it poses computational challenges. We
observe some summary measure of all data and select a sub-sample from the
complete data set by Poisson rejective sampling with unequal inclusion
probabilities based on the summary measures. Targeted learning is carried out
from the easier to handle sub-sample. We derive a central limit theorem for the
targeted minimum loss estimator (TMLE) which enables the construction of the
confidence intervals. The inclusion probabilities can be optimized to reduce
the asymptotic variance of the TMLE. We illustrate the procedure with two
examples where the parameters of interest are variable importance measures of
an exposure (binary or continuous) on an outcome. We also conduct a simulation
study and comment on its results. keywords: semiparametric inference; survey
sampling; targeted minimum loss estimation (TMLE
La microcirculation chez les patients en choc septique à deux objectifs de pression artérielle moyenne de 65 et 85 mmHg
Le choc septique est caractĂ©risĂ© par des anomalies macro et microcirculatoires. La technique de spectroscopie en proche infra-rouge (NIRS) permet dâapprocher la microcirculation au niveau de lâĂ©minence thĂ©nar. Cette technique mesure la saturation musculaire en oxygĂšne (StO2) et la pente de resaturation lors dâun test dâocclusion vasculaire qui possĂšde un intĂ©rĂȘt physiologique (indicateur de la rĂ©serve microcirculatoire) et pronostique. Cette Ă©tude avait pour objectif principal de comparer lâĂ©tat microcirculatoire et en particulier la pente de resaturation des patients en choc septique Ă deux niveaux de PAm cible : 65 mmHg et 85 mmHg. Vingt-deux patients ont Ă©tĂ© inclus Ă la phase prĂ©coce dâun choc septique aprĂšs rĂ©tablissement dâune Pam Ă 65 mmHg. Nous avons trouvĂ© globalement une amĂ©lioration de la pente de resaturation de la StO2 Ă 85 mmHg de PAm vs. 65 mmHg (2,6 [1,5] vs. 3 [1,4] ; p=0,021). Il nây avait pas de diffĂ©rence significative pour les autres variables microcirculatoires (StO2, pente dâocclusion, aire dâhyperhĂ©mie) aux deux niveaux de PAm. Toutefois chez certains patients, la pente de resaturation Ă©tait clairement meilleure (plus Ă©levĂ©e) Ă 65 mmHG de PAm vs. 85 mmHg. Nous nâavons pas pu identifier les caractĂ©ristiques de cette sous-population.Au total, cibler une PAm de 85 mmHg est globalement associĂ© Ă un meilleur Ă©tat microcirculatoire quâune PAm de 65 mmHg, surtout en terme de pente de resaturation de StO2. Il existe une forte variabilitĂ© interindividuelle plaidant en faveur dâune Ă©valuation personnalisĂ©e de la microcirculation afin de mieux dĂ©finir le niveau de PAm que doit cibler la rĂ©animation du choc septique par remplissage vasculaire et vasopresseurs
La prévision des crues du bassin versant de l'Oued Dis (Sebaou) par la métode DPFT
La modĂ©lisation pluie-dĂ©bit dans le cas de la prĂ©vision des crues peut ĂȘtre Ă©tudiĂ©e par la mĂ©thode DPFT (diffĂ©rence premiĂšre de la fonction de transfert) qui est une extension de la mĂ©thode de l'hydrogramme unitaire. Contrairement aux autres mĂ©thodes, la mĂ©thode DPFT permet d'obtenir Ă la fois la fonction de transfert Ă travers sa diffĂ©rence premiĂšre (DPFT) et une sĂ©rie des pluies efficaces. L'avantage principal de la formulation en diffĂ©rences est la diminution de l'auto corrĂ©lation des dĂ©bits successifs et des coefficients de la fonction de transfert.L'algorithme de calcul procĂšde par itĂ©rations en rĂ©solvant alternativement un systĂšme multi-Ă©vĂ©nements qui identifie la fonction de transfert et un systĂšme de dĂ©convolution qui estime une sĂ©rie de pluies efficaces, cette fois-ci crue par crue. L'initialisation se fait Ă l'aide des pluies brutes comme premiĂšre approximation des pluies efficaces puisque les rĂ©sultats ne dĂ©pendent que des variations des dĂ©bits. La convergence de l'algorithme est Ă©tablie aisĂ©ment lorsqu'on applique les diffĂ©rentes contraintes (positivitĂ© des ordonnĂ©es de la fonction de transfert et des pluies efficaces; normalisation de la fonction de transfert).La mĂ©thode DPFT ne nĂ©cessite donc que les mesures des pluies brutes et les dĂ©bits pour effectuer les identifications de la fonction de transfert et des pluies efficaces. Elle n'impose pas de prĂ©ciser la fonction de production. Une fois la fonction de transfert calĂ©e et les pluies efficaces estimĂ©es par la DPFT, l'ajustement de la fonction de production se fait par la suite en rĂ©solvant un problĂšme du type entrĂ©e-sortie.Une application de la mĂ©thode DPFT est faite sur le bassin versant de Oued Dis (Sebaou) dans le but de tester les performances de cette mĂ©thode sur des donnĂ©es rĂ©elles, sachant qu'on a obtenu une confirmation assez rigoureuse de ces propriĂ©tĂ©s sur des donnĂ©es synthĂ©tiques gĂ©nĂ©rĂ©es. Les rĂ©sultats de l'identification de la fonction de transfert sont satisfaisants tandis que ceux de l'ajustement de la fonction de production sont moins satisfaisants, ce qui a influencĂ© directement la qualitĂ© des rĂ©sultats de validation.Rainfall-runoff modelling in the case of flood forecasting may be studied by the first difference of the transfer function (FDTF) method, which is an extension of the unit hydrograph approach. In contrast to other methods, the FDFT method simultaneously provides both the transfer model by its first difference and an excess precipitation series. The principal advantage of the difference formulation is the diminution of the autocorrelation between successive flow data and the transfer function coefficients.The compilation algorithm proceeds iteratively by resolving alternately a multievent system which identifies the transfer function, and a deconvolution system which assesses the excess precipitation series event by event The initiatization is done with the total precipitation as a first approximation of the excess precipitations, since the results are only dependent on flood variations. The algorithm convergence is easily established if the various constraints are applied (positive values for the transfer function coefficients and the excess precipitations; normalization of the transfer function).Thus, the FDFT method only requires total precipitation and flood data in order to generate the transfer function and quantify the excess precipitation. It doesn't require that the production function be specified. once the Transfer fonction is calibrated and the excess precipitation estimated, the production function adjustment is carried out by resolving an input-output model type.The FDTF method has previously been applied successfully to simulated data. In the present study, the method has been applied to the Oued Dis watershed (Sebaou, Algeria) in order to test its performance using real data. The transfer function identification results proved satisfactory, but those related to the producton function adjustment were less satisfactory and degraded the overall quality ofthe validation results
Quantile Super Learning for independent and online settings with application to solar power forecasting
Estimating quantiles of an outcome conditional on covariates is of
fundamental interest in statistics with broad application in probabilistic
prediction and forecasting. We propose an ensemble method for conditional
quantile estimation, Quantile Super Learning, that combines predictions from
multiple candidate algorithms based on their empirical performance measured
with respect to a cross-validated empirical risk of the quantile loss function.
We present theoretical guarantees for both iid and online data scenarios. The
performance of our approach for quantile estimation and in forming prediction
intervals is tested in simulation studies. Two case studies related to solar
energy are used to illustrate Quantile Super Learning: in an iid setting, we
predict the physical properties of perovskite materials for photovoltaic cells,
and in an online setting we forecast ground solar irradiance based on output
from dynamic weather ensemble models
AdaptiveConformal: An R Package for Adaptive Conformal Inference
Conformal Inference (CI) is a popular approach for generating finite sample
prediction intervals based on the output of any point prediction method when
data are exchangeable. Adaptive Conformal Inference (ACI) algorithms extend CI
to the case of sequentially observed data, such as time series, and exhibit
strong theoretical guarantees without having to assume exchangeability of the
observed data. The common thread that unites algorithms in the ACI family is
that they adaptively adjust the width of the generated prediction intervals in
response to the observed data. We provide a detailed description of five ACI
algorithms and their theoretical guarantees, and test their performance in
simulation studies. We then present a case study of producing prediction
intervals for influenza incidence in the United States based on black-box point
forecasts. Implementations of all the algorithms are released as an open-source
R package, AdaptiveConformal, which also includes tools for visualizing and
summarizing conformal prediction intervals
- âŠ