5 research outputs found

    On visual distances for spectrum-type functional data

    Full text link
    A functional distance (H), based on the Hausdorff metric between the function hypographs, is proposed for the space Ɛ of non-negative real upper semicontinuous functions on a compact interval. The main goal of the paper is to show that the space (Ɛ,H) is particularly suitable in some statistical problems with functional data which involve functions with very wiggly graphs and narrow, sharp peaks. A typical example is given by spectrograms, either obtained by magnetic resonance or by mass spectrometry. On the theoretical side, we show that (Ɛ,H) is a complete, separable locally compact space and that the H-convergence of a sequence of functions implies the convergence of the respective maximum values of these functions. The probabilistic and statistical implications of these results are discussed, in particular regarding the consistency of k-NN classifiers for supervised classification problems with functional data in H. On the practical side, we provide the results of a small simulation study and check also the performance of our method in two real data problems of supervised classification involving mass spectraA. Cuevas and R. Fraiman have been partially supported by Spanish Grant MTM2013-44045-

    Detection of low dimensionality and data denoising via set estimation techniques

    Get PDF
    This work is closely related to the theories of set estimation and manifold estimation. Our object of interest is a, possibly lower-dimensional, compact set S ⊂ ℝd. The general aim is to identify (via stochastic procedures) some qualitative or quantitative features of S, of geometric or topological character. The available information is just a random sample of points drawn on S. The term “to identify” means here to achieve a correct answer almost surely (a.s.) when the sample size tends to infinity. More specifically the paper aims at giving some partial answers to the following questions: is S full dimensional? Is S “close to a lower dimensional set” M? If so, can we estimate M or some functionals of M (in particular, the Minkowski content of M)? As an important auxiliary tool in the answers of these questions, a denoising procedure is proposed in order to partially remove the noise in the original data. The theoretical results are complemented with some simulations and graphical illustrations. © 2017, Institute of Mathematical Statistics

    On semi-supervised learning

    Get PDF
    Major efforts have been made, mostly in the machine learning literature, to construct good predictors combining unlabelled and labelled data. These methods are known as semi-supervised. They deal with the problem of how to take advantage, if possible, of a huge amount of unlabelled data to perform classification in situations where there are few labelled data. This is not always feasible: it depends on the possibility to infer the labels from the unlabelled data distribution. Nevertheless, several algorithms have been proposed recently. In this work, we present a new method that, under almost necessary conditions, attains asymptotically the performance of the best theoretical rule when the size of the unlabelled sample goes to infinity, even if the size of the labelled sample remains fixed. Its performance and computational time are assessed through simulations and in the well- known “Isolet” real data of phonemes, where a strong dependence on the choice of the initial training sample is shown. The main focus of this work is to elucidate when and why semi-supervised learning works in the asymptotic regime described above. The set of necessary assumptions, although reasonable, show that semi-parametric methods only attain consistency for very well-conditioned problems.Fil: Cholaquidis, A.. Universidad de la República; UruguayFil: Fraiman, R.. Universidad de la República; UruguayFil: Sued, Raquel Mariela. Universidad de Buenos Aires. Facultad de Ciencias Exactas y Naturales. Instituto de Cálculo; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentin
    corecore