34 research outputs found

    Amélioration de la performance des analyses de survie dans le cadre des essais de prévention et application à la maladie d'Alzheimer

    Get PDF
    AmĂ©lioration de la performance des analyses de survie dans le cadre des essais de prĂ©vention et application Ă  la maladie d'Alzheimer. En l'absence de traitement curatif de la maladie d'Alzheimer, les efforts se portent actuellement sur la prĂ©vention. A ce jour, tous les essais publiĂ©s, qui avaient comme objectif de prĂ©venir la dĂ©mence de type Alzheimer, ont Ă©chouĂ©. Le plan d'analyse statistique de ces essais proposait de traiter ces donnĂ©es de survie par le classique test du logrank. Les traitements prĂ©ventifs supposent une imprĂ©gnation au long cours avant d'en percevoir l'effet, ce qui est contradictoire avec l'hypothĂšse des risques proportionnels, sous laquelle le test du logrank est reconnu ĂȘtre le plus puissant. Il est donc envisageable de trouver des tests plus puissants permettant de capter un effet tardif (tests du logrank et Kaplan-Meier pondĂ©rĂ©s). Des outils thĂ©oriques pour comparer ces tests sont introduits tels que la consistance et l'efficacitĂ© asymptotique. Si l'existence de l'effet tardif est connue a priori, une mĂ©thodologie est proposĂ©e afin de choisir la bonne pondĂ©ration. Si la forme de l'effet n'est pas connue a priori, une nouvelle statistique de type "Maximum" est introduite. Enfin, cette mĂ©thodologie est appliquĂ©e aux donnĂ©es rĂ©elles GuidAge.No effective curative treatment currently exists for Alzheimer disease, making its prevention a priority. To date, the rare published articles in the field of prevention trials for dementia, which measured dementia incidence as their primary outcome, have been negative. The statistical analysis of these trials relies on the logrank test. This test is known to be optimal under the proportional hazards model, thus it may be inadequate for prevention clinical trials, which may require a certain period of exposure to an intervention before an effect can be detected. The proportional hazards condition of optimality is unrealistic in this setting. In order to solve this problem, we suggest using more efficient tests to detect a late effect (weighted logrank and Kaplan-Meier tests). Theoretical tools are introduced to compare these tests such as consistency and asymptotic efficiency. If the existence of this late effect is known a priori, a methodology is proposed for choosing the best weight. Finally, if the form of the effect isn't known a priori, a new statistic of type "Maximum" tests is introduced. Finally, we apply this methodology to real data from the GuidAge trial

    Data maturity and follow-up in time-to-event analyses

    Get PDF
    NHMR

    S-estimation in linear models with structured covariance matrices

    Get PDF
    We provide a unified approach to S-estimation in balanced linear models with structured covariance matrices. Of main interest are S-estimators for linear mixed effects models, but our approach also includes S-estimators in several other standard multivariate models, such as multiple regression, multivariate regression, and multivariate location and scatter. We provide sufficient conditions for the existence of S-functionals and S-estimators, establish asymptotic properties such as consistency and asymptotic normality, and derive their robustness prop-erties in terms of breakdown point and influence function. All the results are obtained for general identifiable covariance structures and are established under mild conditions on the distribution of the observations, which goes far beyond models with elliptically contoured densities. Some of our results are new and others are more general than existing ones in the literature. In this way this manuscript completes and improves results on S-estimation in a wide variety of multivariate models. We illustrate our results by means of a simulation study and an application to data from a trial on the treatment of lead-exposed children

    Efficient approximations of the fisher matrix in neural networks using kronecker product singular value decomposition

    Get PDF
    We design four novel approximations of the Fisher Information Matrix (FIM) that plays a central role in natural gradient descent methods for neural networks. The newly proposed approximations are aimed at improving Martens and Grosse’s Kronecker-factored block diagonal (KFAC) one. They rely on a direct minimization problem, the solution of which can be computed via the Kronecker product singular value decomposition technique. Experimental results on the three standard deep auto-encoder benchmarks showed that they provide more accurate approximations to the FIM. Furthermore, they outperform KFAC and state-of-the-art first-order methods in terms of optimization speed

    Optimal transport for data integration

    No full text
    International audienceStatistical matching methods consist in integrating two or more data sources, related to the same target population, which share a subset of covariates while each data source has its own distinct subset of variables. The aim is to derive a unique synthetic data set in which all the variables, coming from the different sources, are jointly available. A method based on an application of optimal transport theory has been proposed by GarĂšs and Omer (2020), in the case where the distinct variables in the different data sources are categorical. Joint distribution of shared and distinct variables is transported within the data sources. Although the method demonstrated good performance, the proposed approach also transports the distribution of shared and distinct variables and estimate a function to predict the missing variables. The performances are assessed through a Monte Carlo simulation study.Les mĂ©thodes d'appariement statistique consistent Ă  intĂ©grer deux ou plusieurs sources de donnĂ©es, relatives Ă  une mĂȘme population cible. Ces sources partagent un sousensemble de covariables tout en disposant d'autres sous-ensembles de variables distincts. Le but est de construire un ensemble unique de donnĂ©es synthĂ©tique dans lequel toutes les variables des diffĂ©rentes sources sont disponibles conjointement. Une mĂ©thode basĂ©e sur une application du transport optimal a Ă©tĂ© proposĂ©e dans GarĂšs and Omer (2020), dans le cas oĂč les variables distinctes des diffĂ©rentes sources de donnĂ©es sont catĂ©gorielles. La distribution jointe des variables partagĂ©es et distinctes est transportĂ©e dans un jeu de donnĂ©es. L'approche proposĂ©e ici utilise Ă©galement le transport optimal pour la distribution des variables partagĂ©es et distinctes, mais intĂšgre de plus l'estimation d'une fonction pour prĂ©dire les variables distinctes dans l'autre source. Les performances de la mĂ©thode sont Ă©valuĂ©es via une Ă©tude par simulation de Monte Carlo

    Regularized optimal transport of covariates and outcomes in data recoding

    No full text
    When databases are constructed from heterogeneous sources, it is not unusual that different encodings are used for the same outcome. In such case, it is necessary to recode the outcome variable before merging two databases. The method proposed for the recoding is an application of optimal transportation where we search for a bijective mapping between the distributions of such variable in two databases. In this article, we build upon the work by GarĂšs et al. [9], where they transport the distributions of categorical outcomes assuming that they are distributed equally in the two databases. Here, we extend the scope of the model to treat all the situations where the covariates explain the outcomes similarly in the two databases. In particular, we do not require that the outcomes be distributed equally. For this, we propose a model where joint distributions of outcomes and covariates are transported. We also propose to enrich the model by relaxing the constraints on marginal distributions and adding an L1 regularization term. The performances of the models are evaluated in a simulation study, and they are applied to a real dataset

    Optimal transport for data integration

    No full text
    International audienceStatistical matching methods consist in integrating two or more data sources, related to the same target population, which share a subset of covariates while each data source has its own distinct subset of variables. The aim is to derive a unique synthetic data set in which all the variables, coming from the different sources, are jointly available. A method based on an application of optimal transport theory has been proposed by GarĂšs and Omer (2020), in the case where the distinct variables in the different data sources are categorical. Joint distribution of shared and distinct variables is transported within the data sources. Although the method demonstrated good performance, the proposed approach also transports the distribution of shared and distinct variables and estimate a function to predict the missing variables. The performances are assessed through a Monte Carlo simulation study.Les mĂ©thodes d'appariement statistique consistent Ă  intĂ©grer deux ou plusieurs sources de donnĂ©es, relatives Ă  une mĂȘme population cible. Ces sources partagent un sousensemble de covariables tout en disposant d'autres sous-ensembles de variables distincts. Le but est de construire un ensemble unique de donnĂ©es synthĂ©tique dans lequel toutes les variables des diffĂ©rentes sources sont disponibles conjointement. Une mĂ©thode basĂ©e sur une application du transport optimal a Ă©tĂ© proposĂ©e dans GarĂšs and Omer (2020), dans le cas oĂč les variables distinctes des diffĂ©rentes sources de donnĂ©es sont catĂ©gorielles. La distribution jointe des variables partagĂ©es et distinctes est transportĂ©e dans un jeu de donnĂ©es. L'approche proposĂ©e ici utilise Ă©galement le transport optimal pour la distribution des variables partagĂ©es et distinctes, mais intĂšgre de plus l'estimation d'une fonction pour prĂ©dire les variables distinctes dans l'autre source. Les performances de la mĂ©thode sont Ă©valuĂ©es via une Ă©tude par simulation de Monte Carlo

    Closed-form variance estimators for weighted and stratified dose-response function estimators using generalized propensity score

    No full text
    Propensity score methods are widely used in observational studies for evaluating marginal treatment effects. The generalized propensity score (GPS) is an extension of the propensity score framework, historically developed in the case of binary exposures, for use with quantitative or continuous exposures. In this paper, we proposed variance esti-mators for treatment effect estimators on continuous outcomes. Dose-response functions (DRF) were estimated through weighting on the inverse of the GPS, or using stratification. Variance estimators were evaluated using Monte Carlo simulations. Despite the use of stabilized weights, the variability of the weighted estimator of the DRF was particularly high, and none of the variance estimators (a bootstrap-based estimator, a closed-form estimator especially developped to take into account the estimation step of the GPS, and a sandwich estimator) were able to adequately capture this variability, resulting in coverages below to the nominal value, particularly when the proportion of the variation in the quantitative exposure explained by the covariates was 1 large. The stratified estimator was more stable, and variance estima-tors (a bootstrap-based estimator, a pooled linearized estimator, and a pooled model-based estimator) more efficient at capturing the empirical variability of the parameters of the DRF. The pooled variance estimators tended to overestimate the variance, whereas the bootstrap estimator, which intrinsically takes into account the estimation step of the GPS, resulted in correct variance estimations and coverage rates. These methods were applied to a real data set with the aim of assessing the effect of maternal body mass index on newborn birth weight

    Optimal transport for data integration

    No full text
    International audienceStatistical matching methods consist in integrating two or more data sources, related to the same target population, which share a subset of covariates while each data source has its own distinct subset of variables. The aim is to derive a unique synthetic data set in which all the variables, coming from the different sources, are jointly available. A method based on an application of optimal transport theory has been proposed by GarĂšs and Omer (2020), in the case where the distinct variables in the different data sources are categorical. Joint distribution of shared and distinct variables is transported within the data sources. Although the method demonstrated good performance, the proposed approach also transports the distribution of shared and distinct variables and estimate a function to predict the missing variables. The performances are assessed through a Monte Carlo simulation study.Les mĂ©thodes d'appariement statistique consistent Ă  intĂ©grer deux ou plusieurs sources de donnĂ©es, relatives Ă  une mĂȘme population cible. Ces sources partagent un sousensemble de covariables tout en disposant d'autres sous-ensembles de variables distincts. Le but est de construire un ensemble unique de donnĂ©es synthĂ©tique dans lequel toutes les variables des diffĂ©rentes sources sont disponibles conjointement. Une mĂ©thode basĂ©e sur une application du transport optimal a Ă©tĂ© proposĂ©e dans GarĂšs and Omer (2020), dans le cas oĂč les variables distinctes des diffĂ©rentes sources de donnĂ©es sont catĂ©gorielles. La distribution jointe des variables partagĂ©es et distinctes est transportĂ©e dans un jeu de donnĂ©es. L'approche proposĂ©e ici utilise Ă©galement le transport optimal pour la distribution des variables partagĂ©es et distinctes, mais intĂšgre de plus l'estimation d'une fonction pour prĂ©dire les variables distinctes dans l'autre source. Les performances de la mĂ©thode sont Ă©valuĂ©es via une Ă©tude par simulation de Monte Carlo
    corecore