165 research outputs found

    Découpage de courbes de densité : Application au dépistage du cancer

    Get PDF
    International audienceLe dépistage actuel du cancer broncho-pulmonaire est effectué à l'aide d'une radiographie pulmonaire, d'un scanner thoracique et d'un examen cytologique des expectorations. La cytologie automatisée des expectorations est une méthode permettant l'analyse informatique des cellules d'un crachat sur la lame d'un microscope. Comme une personne est représentée par l'ensemble des cellules de sa lame, il nous a paru intéressant d'utiliser la densité de probabilité comme unité statistique. La modélisation fonctionnelle des données, méthode pour laquelle l'unité statistique est à valeurs dans un espace infini, répond bien à cette problématique statistique puisque, par définition, une densité de probabilité est une fonction. Lors de cet exposé nous présenterons la méthode de classification supervisée de courbes de densité que nous avons développée, pour discriminer des personnes ayant un cancer et des personnes saines, et nous vous donnerons quelques résultats issus de données réelles

    Utilisation de tests de structure en régression sur variable fonctionnelle.

    Get PDF
    International audienceCe travail s'intéresse à la construction et à l'utilisation de tests de structure en régression sur variable fonctionnelle. Nous proposons, de manière générale, de construire notre statistique de test à partir d'un estimateur spécifique au modèle particulier dont nous voulons tester la validité et de méthodes d'estimation à noyau fonctionnel. Un résultat théorique montre, sous des hypothèses générales, la normalité asymptotique de notre statistique de test sous l'hypothèse nulle (c'est à dire lorsque l'hypothèse sur la structure du modèle est valide) et sa divergence sous des alternatives locales. Ce résultat permet d'envisager la construction de tests de structure de nature très variée permettant par exemple de tester si la variable explicative n'a pas d'effet, si cet effet est linéaire, ou bien si l'effet de la variable explicative fonctionnelle se résume par l'effet de quelques caractéristiques réelles associées à celle-ci. Différentes méthodes de rééchantillonnage sont proposées pour calculer la valeur seuil du test. La méthode la plus adaptée (au vu de simulations) est ensuite utilisée dans le cadre de l'étude de données spectrométriques. L'utilisation de différents tests construits à partir de l'approche que nous proposons permet d'apporter des éléments de réponses à des questions concrètes liées à ces données. Nous discutons finalement les points qui peuvent être améliorés et présentons brièvement des perspectives intéressantes qu'offre l'utilisation de tests de structure dans le cadre de procédures s'intéressant à l'extraction de caractéristiques importantes pour la prédiction au sein de la courbe explicative mais aussi au choix de la semi-métrique

    Weak pointwise consistency of the cross validatory window estimate in non parametric regression estimation

    Get PDF

    Advances on nonparametric regression for functional variables

    Full text link
    We consider the problem of predicting a real random variable from a functional explanatory variable. The problem is attacked by mean of nonparametric kernel approach which has been recently adapted to this functional context. We derive theoretical results by giving a deep asymptotic study of the behaviour of the estimate, including mean squared convergence (with rates and precise evaluation of the constant terms) as well as asymptotic distribution. Practical use of these results are relying on the ability to estimate these constants. Some perspectives in this direction are discussed including the presentation of a functional version of bootstrapping ideas

    Choosing the most relevant level sets for depicting a sample of densities

    Get PDF
    The final publication is available at link.springer.comWhen exploring a sample composed with a set of bivariate density functions, the question of the visualisation of the data has to front with the choice of the relevant level set(s). The approach proposed in this paper consists in defining the optimal level set(s) as being the one(s) allowing for the best reconstitution of the whole density. A fully data-driven procedure is developed in order to estimate the link between the level set(s) and their corresponding density, to construct optimal level set(s) and to choose automatically the number of relevant level set(s). The method is based on recent advances in functional data analysis when both response and predictors are functional. After a wide description of the methodology, finite sample studies are presented (including both real and simulated data) while theoretical studies are reported to a final appendix.Peer ReviewedPostprint (author's final draft

    Optimal level sets for representing a bivariate density function

    Get PDF
    We deal with the problem of representing a bivariate density function by level sets. The choice of which levels are used in this representation are commonly arbitrary (most usual choices being those with probability contents .25, .5 and .75). Choosing which level is (or which levels are) of most interest is an important practical question which depends on the kind of problem one has to deal with as well as the kind of feature one wishes to highlight in the density. The approach we develop is based on minimum distance ideas.Peer ReviewedPostprint (author's final draft

    Optimal level sets for bivariate density representation

    Get PDF
    In bivariate density representation there is an extensive literature on level set estimation when the level is fixed, but this is not so much the case when choosing which level is (or which levels are) of most interest. This is an important practical question which depends on the kind of problem one has to deal with as well as the kind of feature one wishes to highlight in the density, the answer to which requires both the definition of what the optimal level is and the construction of a method for finding it. We consider two scenarios for this problem. The first one corresponds to situations in which one has just a single density function to be represented. However, as a result of the technical progress in data collecting, problems are emerging in which one has to deal with a sample of densities. In these situations, the need arises to develop joint representation for all these densities, and this is the second scenario considered in this paper. For each case, we provide consistency results for the estimated levels and present wide Monte Carlo simulated experiments illustrating the interest and feasibility of the proposed method. (C) 2015 Elsevier Inc. All rights reserved.Peer ReviewedPostprint (author's final draft

    Evaluating the complexity of some families of functional data

    Get PDF
    In this paper we study the complexity of functional data set by means of a two steps approach. The first step considers a new graphical tool for assessing to which family the data belong: the main aim is to detect whether a sample comes from a monomial or an exponential family. This first tool is based on a nonparametric kNN estimation of Small Ball Probability. Once the family is specified, the second step consists in evaluating the extent of complexity by estimating some specific indexes related to the assigned family. It turns out that the developed methodology is fully free from assumptions on model, distribution as well as dominating measure. This large flexibility ensures the wide applicability of the methodology. Computational issues are carried out by means of simulations and finally the method is applied to analyze some financial real curves dataset

    Modeling functional data: a test procedure

    Get PDF
    The paper deals with a test procedure able to state the compatibility of observed data with a reference model, by using an estimate of the volumetric part in the small-ball probability factorization which plays the role of a real complexity index. As a preliminary by-product we state some asymptotics for a new estimator of the complexity index. A suitable test statistic is derived and, referring to the U-statistics theory, its asymptotic null distribution is obtained. A study of level and power of the test for finite sample sizes and a comparison with a competitor are carried out by Monte Carlo simulations. The test procedure is performed over a financial time series

    Corpus annotation within the French FrameNet: a domain-by-domain methodology

    Get PDF
    International audienceThis paper reports on the development of a French FrameNet, within the ASFALDA project. While the first phase of the project focused on the development of a French set of frames and corresponding lexicon (Candito et al., 2014), this paper concentrates on the subsequent corpus annotation phase, which focused on four notional domains (commercial transactions, cognitive stances, causality and verbal communication). Given full coverage is not reachable for a relatively " new " FrameNet project, we advocate that focusing on specific notional domains allowed us to obtain full lexical coverage for the frames of these domains, while partially reflecting word sense ambiguities. Furthermore, as frames and roles were annotated on two French Treebanks (the French Treebank (Abeillé and Barrier, 2004) and the Sequoia Treebank (Candito and Seddah, 2012), we were able to extract a syntactico-semantic lexicon from the annotated frames. In the resource's current status, there are 98 frames, 662 frame-evoking words, 872 senses, and about 13000 annotated frames, with their semantic roles assigned to portions of text. The French FrameNet is freely available at alpage.inria.fr/asfalda
    • …
    corecore