23 research outputs found

    The Discrepancy Principle for Choosing Bandwidths in Kernel Density Estimation

    Full text link
    We investigate the discrepancy principle for choosing smoothing parameters for kernel density estimation. The method is based on the distance between the empirical and estimated distribution functions. We prove some new positive and negative results on L_1-consistency of kernel estimators with bandwidths chosen using the discrepancy principle. Consistency crucially depends on a rather weak H\"older condition on the distribution function. We also unify and extend previous results on the behaviour of the chosen bandwidth under more strict smoothness assumptions. Furthermore, we compare the discrepancy principle to standard methods in a simulation study. Surprisingly, some of the proposals work reasonably well over a large set of different densities and sample sizes, and the performance of the methods at least up to n=2500 can be quite different from their asymptotic behavior.Comment: 17 pages, 3 figures. Section on histograms removed, new (positive and negative) consistency results for kernel density estimators adde

    A note on the geometry of the multiresolution criterion

    Get PDF
    Several recent developments in nonparametric regression are based on the concept of data approximation: They aim at finding the simplest model that is an adequate approximation to the data. Approximations are regarded as adequate iff the residuals ’look like noise’. This is usually checked with the so-called multiresolution criterion. We show that this criterion is related to a special norm (the ’multiresolution norm’), and point out some important differences between this norm and the p-norms often used to measure the size of residuals. We also treat an important approximation problem with regard to this norm that can be solved using linear programming. Finally, we give sharp upper and lower bounds for the multiresolution norm in terms of p-norms

    Constructing irregular histograms by penalized likelihood

    Get PDF
    We propose a fully automatic procedure for the construction of irregular histograms. For a given number of bins, the maximum likelihood histogram is known to be the result of a dynamic programming algorithm. To choose the number of bins, we propose two different penalties motivated by recent work in model selection by Castellan [6] and Massart [26]. We give a complete description of the algorithm and a proper tuning of the penalties. Finally, we compare our procedure to other existing proposals for a wide range of different densities and sample sizes. --irregular histogram,density estimation,penalized likelihood,dynamic programming

    The benchden package

    Get PDF
    This article describes the benchden package which implements a set of 28 example densities for nonparametric density estimation in R. In addition to the usual functions that evaluate the density, distribution and quantile functions or generate random variates, a function designed to be specifically useful for larger simulation studies has been added. After describing the set of densities and the usage of the package, a small toy example of a simulation study conducted using the benchden package is given

    Simon Rogers and Mark Girolami: A first course in machine learning

    Get PDF
    Erworben im Rahmen der Schweizer Nationallizenzen (http://www.nationallizenzen.ch

    Das Diskrepanzprinzip in der nichtparametrischen Kurvenschätzung

    Get PDF
    In dieser Dissertation wird das Diskrepanzprinzip, ein Verfahren zur Glättungsparameterwahl in der nichtparametrischen Kurvenschätzung, untersucht. Diese Methode stammt ursprünglich aus der Theorie der inversen Probleme, wo sie eine der bekanntesten Methoden zur Regularisierungsparameterwahl darstellt. Die Grundidee ist, maximal zu glätten unter einer Nebenbedingung an die Anpassung an die Daten. Obwohl sich dieses Prinzip auch auf die nichtparametrische Kurvenschätzung anwenden lässt, ist es in der Statistik relativ unbekannt. Für Kerndichteschätzer werden die bisher bekannten Resultate zum Verhalten der gewählten Bandbreite in einen gemeinsamen Rahmen eingeordnet und um neue ergänzt. Es wird auch gezeigt, dass die Bandbreitenwahl mit Hilfe des Diskrepanzprinzips für bestimmte Dichten zu inkonsistenten Schätzungen führen kann. Analoge Resultate werden für die Wahl der Binanzahl in regulären Histogrammen hergeleitet. In Simulationsstudien werden die einzelnen Varianten des Diskrepanzprinzips sowohl untereinander als auch mit Standardmethoden verglichen. Die bislang für die nichtparametrische Regression vorgeschlagenen Versionen des Diskrepanzprinzips basieren meist auf der Residuenquadratsumme oder dem sogenannten Multiresolutionskriterium. Letzteres wird in dieser Arbeit ausführlich untersucht, wobei gezeigt wird, dass sich das Kriterium über eine spezielle Norm des Residualvektors formulieren lässt. Es werden einige geometrische Eigenschaften dieser Norm hergeleitet. In einer weiteren Simulationsstudie wird die Verwendung des Diskrepanzprinzips für Nadaraya-Watson-Kernschätzer und kubische Glättungssplines untersucht

    Assessing keyness using permutation tests

    Get PDF
    We propose a resampling-based approach for assessing keyness in corpus linguistics based on suggestions by Gries (2006, 2022). Traditional approaches based on hypothesis tests (e.g. Likelihood Ratio) model the copora as independent identically distributed samples of tokens. This model does not account for the often observed uneven distribution of occurences of a word across a corpus. When occurences of a word are concentrated in few documents, large values of LLR and similar scores are in fact much more likely than accounted for by the token-by-token sampling model, leading to false positives. We replace the token-by-token sampling model by a model where corpora are samples of documents rather than tokens, which is much closer to the way corpora are actually assembled. We then use a permutation approach to approximate the distribution of a given keyness score under the null hypothesis of equal frequencies and obtain p-values for assessing significance. We do not need any assumption on how the tokens are organized within or across documents, and the approach works with basically *any* keyness score. Hence, appart from obtaining more accurate p-values for scores like LLR, we can also assess significance for e.g. the logratio which has been proposed as a measure of effect size. An efficient implementation of the proposed approach is provided in the `R` package `keyperm` available from github

    Constructing irregular histograms by penalized likelihood

    Get PDF
    We propose a fully automatic procedure for the construction of irregular histograms. For a given number of bins, the maximum likelihood histogram is known to be the result of a dynamic programming algorithm. To choose the number of bins, we propose two different penalties motivated by recent work in model selection by Castellan [1] and Massart [2]. We give a complete description of the algorithm and a proper tuning of the penalties. Finally, we compare our procedure to other existing proposals for a wide range of different densities and sample sizes. [1] Castellan, G., 1999. Modified Akaike's criterion for histogram density estimation. Technical Report 99.61, Université de Paris-Sud. [2] Massart, P., 2007. Concentration inequalities and model selection. Lecture Notes in Mathematics Vol. 1896, Springer, New York

    Facilitating flexible learning by replacing classroom time with an online learning environment : a systematic review of blended learning in higher education

    Get PDF
    Higher education institutions are trying to provide more flexibility and individualization, which is mainly realized through the use of new technologies and implemented in online or blended learning designs. This systematic review aims to investigate the impact of replacing classroom time with an online learning environment. The meta-analysis (k = 21 effect sizes) applied strict inclusion criteria concerning research design, measurement of learning outcomes and implementation of blended learning. The estimated effect size (Hedge's g) was positive, although not significantly different from zero and the confidence interval [-0.13, 0.25], suggesting that overall differences between blended and conventional classroom learning are small, and, at best, very small negative or moderate positive effects are plausible. This means that despite a reduction in classroom time between 30 and 79 per cent, equivalent learning outcomes were found. Consequently, blended learning with reduced classroom time is not systematically more or less effective than conventional classroom learning

    Die Energienachfrage steuern : Handlungsempfehlungen zur Nutzung von Smart-Meter-Daten

    Get PDF
    Mit digitalen Stromzählern – smart meter – können Stromkund(inn)en zu energieeffizientem Verhalten motiviert werden. Mit Expert(inn)en aus Forschung und Praxis hat die saguf-Arbeitsgruppe Energiezukunft Handlungsempfehlungen zur Nutzung von Smart-Meter-Daten sowie zum optimierten Stromverbrauch erarbeitet
    corecore