171 research outputs found
Découpage de courbes de densité : Application au dépistage du cancer
International audienceLe dépistage actuel du cancer broncho-pulmonaire est effectué à l'aide d'une radiographie pulmonaire, d'un scanner thoracique et d'un examen cytologique des expectorations. La cytologie automatisée des expectorations est une méthode permettant l'analyse informatique des cellules d'un crachat sur la lame d'un microscope. Comme une personne est représentée par l'ensemble des cellules de sa lame, il nous a paru intéressant d'utiliser la densité de probabilité comme unité statistique. La modélisation fonctionnelle des données, méthode pour laquelle l'unité statistique est à valeurs dans un espace infini, répond bien à cette problématique statistique puisque, par définition, une densité de probabilité est une fonction. Lors de cet exposé nous présenterons la méthode de classification supervisée de courbes de densité que nous avons développée, pour discriminer des personnes ayant un cancer et des personnes saines, et nous vous donnerons quelques résultats issus de données réelles
Advances on nonparametric regression for functional variables
We consider the problem of predicting a real random variable from a
functional explanatory variable. The problem is attacked by mean of
nonparametric kernel approach which has been recently adapted to this
functional context. We derive theoretical results by giving a deep asymptotic
study of the behaviour of the estimate, including mean squared convergence
(with rates and precise evaluation of the constant terms) as well as asymptotic
distribution. Practical use of these results are relying on the ability to
estimate these constants. Some perspectives in this direction are discussed
including the presentation of a functional version of bootstrapping ideas
A NN procedure in semiparametric functional data analysis
A fast and flexible NN procedure is developed for dealing with a
semiparametric functional regression model involving both partial-linear and
single-index components. Rates of uniform consistency are presented. Simulated
experiments highlight the advantages of the NN procedure. A real data
analysis is also shown.Comment: 14 pages, 1 figure, 6 table
Utilisation de tests de structure en régression sur variable fonctionnelle.
International audienceCe travail s'intéresse à la construction et à l'utilisation de tests de structure en régression sur variable fonctionnelle. Nous proposons, de manière générale, de construire notre statistique de test à partir d'un estimateur spécifique au modèle particulier dont nous voulons tester la validité et de méthodes d'estimation à noyau fonctionnel. Un résultat théorique montre, sous des hypothèses générales, la normalité asymptotique de notre statistique de test sous l'hypothèse nulle (c'est à dire lorsque l'hypothèse sur la structure du modèle est valide) et sa divergence sous des alternatives locales. Ce résultat permet d'envisager la construction de tests de structure de nature très variée permettant par exemple de tester si la variable explicative n'a pas d'effet, si cet effet est linéaire, ou bien si l'effet de la variable explicative fonctionnelle se résume par l'effet de quelques caractéristiques réelles associées à celle-ci. Différentes méthodes de rééchantillonnage sont proposées pour calculer la valeur seuil du test. La méthode la plus adaptée (au vu de simulations) est ensuite utilisée dans le cadre de l'étude de données spectrométriques. L'utilisation de différents tests construits à partir de l'approche que nous proposons permet d'apporter des éléments de réponses à des questions concrètes liées à ces données. Nous discutons finalement les points qui peuvent être améliorés et présentons brièvement des perspectives intéressantes qu'offre l'utilisation de tests de structure dans le cadre de procédures s'intéressant à l'extraction de caractéristiques importantes pour la prédiction au sein de la courbe explicative mais aussi au choix de la semi-métrique
Choosing the most relevant level sets for depicting a sample of densities
The final publication is available at link.springer.comWhen exploring a sample composed with a set of bivariate density functions, the question of the visualisation of the data has to front with the choice of the relevant level set(s). The approach proposed in this paper consists in defining the optimal level set(s) as being the one(s) allowing for the best reconstitution of the whole density. A fully data-driven procedure is developed in order to estimate the link between the level set(s) and their corresponding density, to construct optimal level set(s) and to choose automatically the number of relevant level set(s). The method is based on recent advances in functional data analysis when both response and predictors are functional. After a wide description of the methodology, finite sample studies are presented (including both real and simulated data) while theoretical studies are reported to a final appendix.Peer ReviewedPostprint (author's final draft
Sparse semiparametric regression when predictors are mixture of functional and high-dimensional variables
This paper aims to front with dimensionality reduction in regression setting
when the predictors are a mixture of functional variable and high-dimensional
vector. A flexible model, combining both sparse linear ideas together with
semiparametrics, is proposed. A wide scope of asymptotic results is provided:
this covers as well rates of convergence of the estimators as asymptotic
behaviour of the variable selection procedure. Practical issues are analysed
through finite sample simulated experiments while an application to Tecator's
data illustrates the usefulness of our methodology.Comment: 40 pages, 7 figures, 5 table
Fast and efficient algorithms for sparse semiparametric bi-functional regression
A new sparse semiparametric model is proposed, which incorporates the
influence of two functional random variables in a scalar response in a flexible
and interpretable manner. One of the functional covariates is included through
a single-index structure, while the other is included linearly through the
high-dimensional vector formed by its discretised observations. For this model,
two new algorithms are presented for selecting relevant variables in the linear
part and estimating the model. Both procedures utilise the functional origin of
linear covariates. Finite sample experiments demonstrated the scope of
application of both algorithms: the first method is a fast algorithm that
provides a solution (without loss in predictive ability) for the significant
computational time required by standard variable selection methods for
estimating this model, and the second algorithm completes the set of relevant
linear covariates provided by the first, thus improving its predictive
efficiency. Some asymptotic results theoretically support both procedures. A
real data application demonstrated the applicability of the presented
methodology from a predictive perspective in terms of the interpretability of
outputs and low computational cost.Comment: 33 pages, 6 figures, 10 table
Variable selection in functional regression models: a review
Despite of various similar features, Functional Data Analysis and
High-Dimensional Data Analysis are two major fields in Statistics that grew up
recently almost independently one from each other. The aim of this paper is to
propose a survey on methodological advances for variable selection in
functional regression, which is typically a question for which both functional
and multivariate ideas are crossing. More than a simple survey, this paper aims
to promote even more new links between both areas.Comment: 22 page
Optimal level sets for bivariate density representation
In bivariate density representation there is an extensive literature on level set estimation when the level is fixed, but this is not so much the case when choosing which level is (or which levels are) of most interest. This is an important practical question which depends on the kind of problem one has to deal with as well as the kind of feature one wishes to highlight in the density, the answer to which requires both the definition of what the optimal level is and the construction of a method for finding it. We consider two scenarios for this problem. The first one corresponds to situations in which one has just a single density function to be represented. However, as a result of the technical progress in data collecting, problems are emerging in which one has to deal with a sample of densities. In these situations, the need arises to develop joint representation for all these densities, and this is the second scenario considered in this paper. For each case, we provide consistency results for the estimated levels and present wide Monte Carlo simulated experiments illustrating the interest and feasibility of the proposed method. (C) 2015 Elsevier Inc. All rights reserved.Peer ReviewedPostprint (author's final draft
- …