Search CORE

19 research outputs found

Modélisation des données d'enquêtes cas-cohorte par imputation multiple (Application en épidémiologie cardio-vasculaire.)

Author: CHAVANCE Michel
MARTI SOLER Helena
Publication venue
Publication date: 01/01/2012
Field of study

Les estimateurs pondérés généralement utilisés pour analyser les enquêtes cas-cohorte ne sont pas pleinement efficaces. Or, les enquêtes cas-cohorte sont un cas particulier de données incomplètes où le processus d'observation est contrôlé par les organisateurs de l'étude. Ainsi, des méthodes d'analyse pour données manquant au hasard (MA) peuvent être pertinentes, en particulier, l'imputation multiple, qui utilise toute l'information disponible et permet d'approcher l'estimateur du maximum de vraisemblance partielle.Cette méthode est fondée sur la génération de plusieurs jeux plausibles de données complétées prenant en compte les différents niveaux d'incertitude sur les données manquantes. Elle permet d'adapter facilement n'importe quel outil statistique disponible pour les données de cohorte, par exemple, l'estimation de la capacité prédictive d'un modèle ou d'une variable additionnelle qui pose des problèmes spécifiques dans les enquêtes cas-cohorte. Nous avons montré que le modèle d'imputation doit être estimé à partir de tous les sujets complètement observés (cas et non-cas) en incluant l'indicatrice de statut parmi les variables explicatives. Nous avons validé cette approche à l'aide de plusieurs séries de simulations: 1) données complètement simulées, où nous connaissions les vraies valeurs des paramètres, 2) enquêtes cas-cohorte simulées à partir de la cohorte PRIME, où nous ne disposions pas d'une variable de phase-1 (observée sur tous les sujets) fortement prédictive de la variable de phase-2 (incomplètement observée), 3) enquêtes cas-cohorte simulées à partir de la cohorte NWTS, où une variable de phase-1 fortement prédictive de la variable de phase-2 était disponible. Ces simulations ont montré que l'imputation multiple fournissait généralement des estimateurs sans biais des risques relatifs. Pour les variables de phase-1, ils approchaient la précision obtenue par l'analyse de la cohorte complète, ils étaient légèrement plus précis que l'estimateur calibré de Breslow et coll. et surtout que les estimateurs pondérés classiques. Pour les variables de phase-2, l'estimateur de l'imputation multiple était généralement sans biais et d'une précision supérieure à celle des estimateurs pondérés classiques et analogue à celle de l'estimateur calibré. Les résultats des simulations réalisées à partir des données de la cohorte NWTS étaient cependant moins bons pour les effets impliquant la variable de phase-2 : les estimateurs de l'imputation multiple étaient légèrement biaisés et moins précis que les estimateurs pondérés. Cela s'explique par la présence de termes d'interaction impliquant la variable de phase-2 dans le modèle d'analyse, d'où la nécessité d'estimer des modèles d'imputation spécifiques à différentes strates de la cohorte incluant parfois trop peu de cas pour que les conditions asymptotiques soient réunies.Nous recommandons d'utiliser l'imputation multiple pour obtenir des estimations plus précises des risques relatifs, tout en s'assurant qu'elles sont analogues à celles fournies par les analyses pondérées. Nos simulations ont également montré que l'imputation multiple fournissait des estimations de la valeur prédictive d'un modèle (C de Harrell) ou d'une variable additionnelle (différence des indices C, NRI ou IDI) analogues à celles fournies par la cohorte complèteThe weighted estimators generally used for analyzing case-cohort studies are not fully efficient. However, case-cohort surveys are a special type of incomplete data in which the observation process is controlled by the study organizers. So, methods for analyzing Missing At Random (MAR) data could be appropriate, in particular, multiple imputation, which uses all the available information and allows to approximate the partial maximum likelihood estimator.This approach is based on the generation of several plausible complete data sets, taking into account all the uncertainty about the missing values. It allows adapting any statistical tool available for cohort data, for instance, estimators of the predictive ability of a model or of an additional variable, which meet specific problems with case-cohort data. We have shown that the imputation model must be estimated on all the completely observed subjects (cases and non-cases) including the case indicator among the explanatory variables. We validated this approach with several sets of simulations: 1) completely simulated data where the true parameter values were known, 2) case-cohort data simulated from the PRIME cohort, without any phase-1 variable (completely observed) strongly predictive of the phase-2 variable (incompletely observed), 3) case-cohort data simulated from de NWTS cohort, where a phase-1 variable strongly predictive of the phase-2 variable was available. These simulations showed that multiple imputation generally provided unbiased estimates of the risk ratios. For the phase-1 variables, they were almost as precise as the estimates provided by the full cohort, slightly more precise than Breslow et al. calibrated estimator and still more precise than classical weighted estimators. For the phase-2 variables, the multiple imputation estimator was generally unbiased, with a precision better than classical weighted estimators and similar to Breslow et al. calibrated estimator. The simulations performed with the NWTS cohort data provided less satisfactory results for the effects where the phase-2 variable was involved: the multiple imputation estimators were slightly biased and less precise than the weighted estimators. This can be explained by the interactions terms involving the phase-2 variable in the analysis model and the necessity of estimating specific imputation models in different strata not including sometimes enough cases to satisfy the asymptotic conditions. We advocate the use of multiple imputation for improving the precision of the risk ratios estimates while making sure they are similar to the weighted estimates.Our simulations also showed that multiple imputation provided estimates of a model predictive value (Harrell's C) or of an additional variable (difference of C indices, NRI or IDI) similar to those obtained from the full cohort.PARIS11-SCD-Bib. électronique (914719901) / SudocSudocFranceF

OpenGrey Repository

Smart Sensor Control and Monitoring of an Automated Cell Expansion Process

Author: Costa Miquel
Egan Joseph R
Goldrick Stephen
Horna David
Hort Simon
König Niels
Marti-Soler Helena
Marí-Buyé Núria
Nettleton David F
R. Reyes Aldo
Rafiq Qasim A
Schmitt Robert H
Vallejo Benítez-Cano Elia
Publication venue: 'MDPI AG'
Publication date: 07/12/2023
Field of study

Immune therapy for cancer patients is a new and promising area that in the future may complement traditional chemotherapy. The cell expansion phase is a critical part of the process chain to produce a large number of high-quality, genetically modified immune cells from an initial sample from the patient. Smart sensors augment the ability of the control and monitoring system of the process to react in real-time to key control parameter variations, adapt to different patient profiles, and optimize the process. The aim of the current work is to develop and calibrate smart sensors for their deployment in a real bioreactor platform, with adaptive control and monitoring for diverse patient/donor cell profiles. A set of contrasting smart sensors has been implemented and tested on automated cell expansion batch runs, which incorporate advanced data-driven machine learning and statistical techniques to detect variations and disturbances of the key system features. Furthermore, a ‘consensus’ approach is applied to the six smart sensor alerts as a confidence factor which helps the human operator identify significant events that require attention. Initial results show that the smart sensors can effectively model and track the data generated by the Aglaris FACER bioreactor, anticipate events within a 30 min time window, and mitigate perturbations in order to optimize the key performance indicators of cell quantity and quality. In quantitative terms for event detection, the consensus for sensors across batch runs demonstrated good stability: the AI-based smart sensors (Fuzzy and Weighted Aggregation) gave 88% and 86% consensus, respectively, whereas the statistically based (Stability Detector and Bollinger) gave 25% and 42% consensus, respectively, the average consensus for all six being 65%. The different results reflect the different theoretical approaches. Finally, the consensus of batch runs across sensors gave even higher stability, ranging from 57% to 98% with an average consensus of 80%

UCL Discovery

Neighbors' use of water and sanitation facilities can affect children's health:a cohort study in Mozambique using a spatial approach

Author: Cano Jorge
Casellas Aina
Giné Ricard
Giorgi Emanuele
Grau-Pujol Berta
Marti-Soler Helena
Muñoz Jose
Nhacolo Ariel
Quintó Llorenç
Sacoor Charfudin
Saute Francisco
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/05/2022
Field of study

Background Impact evaluation of most water, sanitation and hygiene (WASH) interventions in health are user-centered. However, recent research discussed WASH herd protection - community WASH coverage could protect neighboring households. We evaluated the effect of water and sanitation used in the household and by household neighbors in children's morbidity and mortality using recorded health data. Methods We conducted a retrospective cohort including 61,333 children from a district in Mozambique during 2012-2015. We obtained water and sanitation household data and morbidity data from Manhiça Health Research Centre surveillance system. To evaluate herd protection, we estimated the density of household neighbors with improved facilities using a Kernel Density Estimator. We fitted negative binomial adjusted regression models to assess the minimum children-based incidence rates for every morbidity indicator, and Cox regression models for mortality. Results Household use of unimproved water and sanitation displayed a higher rate of outpatient visit, diarrhea, malaria, and anemia. Households with unimproved water and sanitation surrounded by neighbors with improved water and sanitation high coverage were associated with a lower rate of outpatient visit, malaria, anemia, and malnutrition. Conclusion Household and neighbors' access to improve water and sanitation can affect children's health. Accounting for household WASH and herd protection in interventions' evaluation could foster stakeholders' investment and improve WASH related diseases control

PubMed Central

Lancaster E-Prints

Modélisation des données d'enquêtes cas-cohorte par imputation multiple : application en épidémiologie cardio-vasculaire

Author: Marti Soler Helena,
Publication venue: HAL CCSD
Publication date: 04/05/2012
Field of study

The weighted estimators generally used for analyzing case-cohort studies are not fully efficient. However, case-cohort surveys are a special type of incomplete data in which the observation process is controlled by the study organizers. So, methods for analyzing Missing At Random (MAR) data could be appropriate, in particular, multiple imputation, which uses all the available information and allows to approximate the partial maximum likelihood estimator.This approach is based on the generation of several plausible complete data sets, taking into account all the uncertainty about the missing values. It allows adapting any statistical tool available for cohort data, for instance, estimators of the predictive ability of a model or of an additional variable, which meet specific problems with case-cohort data. We have shown that the imputation model must be estimated on all the completely observed subjects (cases and non-cases) including the case indicator among the explanatory variables. We validated this approach with several sets of simulations: 1) completely simulated data where the true parameter values were known, 2) case-cohort data simulated from the PRIME cohort, without any phase-1 variable (completely observed) strongly predictive of the phase-2 variable (incompletely observed), 3) case-cohort data simulated from de NWTS cohort, where a phase-1 variable strongly predictive of the phase-2 variable was available. These simulations showed that multiple imputation generally provided unbiased estimates of the risk ratios. For the phase-1 variables, they were almost as precise as the estimates provided by the full cohort, slightly more precise than Breslow et al. calibrated estimator and still more precise than classical weighted estimators. For the phase-2 variables, the multiple imputation estimator was generally unbiased, with a precision better than classical weighted estimators and similar to Breslow et al. calibrated estimator. The simulations performed with the NWTS cohort data provided less satisfactory results for the effects where the phase-2 variable was involved: the multiple imputation estimators were slightly biased and less precise than the weighted estimators. This can be explained by the interactions terms involving the phase-2 variable in the analysis model and the necessity of estimating specific imputation models in different strata not including sometimes enough cases to satisfy the asymptotic conditions. We advocate the use of multiple imputation for improving the precision of the risk ratios estimates while making sure they are similar to the weighted estimates.Our simulations also showed that multiple imputation provided estimates of a model predictive value (Harrell's C) or of an additional variable (difference of C indices, NRI or IDI) similar to those obtained from the full cohort.Les estimateurs pondérés généralement utilisés pour analyser les enquêtes cas-cohorte ne sont pas pleinement efficaces. Or, les enquêtes cas-cohorte sont un cas particulier de données incomplètes où le processus d'observation est contrôlé par les organisateurs de l'étude. Ainsi, des méthodes d'analyse pour données manquant au hasard (MA) peuvent être pertinentes, en particulier, l'imputation multiple, qui utilise toute l'information disponible et permet d'approcher l'estimateur du maximum de vraisemblance partielle.Cette méthode est fondée sur la génération de plusieurs jeux plausibles de données complétées prenant en compte les différents niveaux d'incertitude sur les données manquantes. Elle permet d'adapter facilement n'importe quel outil statistique disponible pour les données de cohorte, par exemple, l'estimation de la capacité prédictive d'un modèle ou d'une variable additionnelle qui pose des problèmes spécifiques dans les enquêtes cas-cohorte. Nous avons montré que le modèle d'imputation doit être estimé à partir de tous les sujets complètement observés (cas et non-cas) en incluant l'indicatrice de statut parmi les variables explicatives. Nous avons validé cette approche à l'aide de plusieurs séries de simulations: 1) données complètement simulées, où nous connaissions les vraies valeurs des paramètres, 2) enquêtes cas-cohorte simulées à partir de la cohorte PRIME, où nous ne disposions pas d'une variable de phase-1 (observée sur tous les sujets) fortement prédictive de la variable de phase-2 (incomplètement observée), 3) enquêtes cas-cohorte simulées à partir de la cohorte NWTS, où une variable de phase-1 fortement prédictive de la variable de phase-2 était disponible. Ces simulations ont montré que l'imputation multiple fournissait généralement des estimateurs sans biais des risques relatifs. Pour les variables de phase-1, ils approchaient la précision obtenue par l'analyse de la cohorte complète, ils étaient légèrement plus précis que l'estimateur calibré de Breslow et coll. et surtout que les estimateurs pondérés classiques. Pour les variables de phase-2, l'estimateur de l'imputation multiple était généralement sans biais et d'une précision supérieure à celle des estimateurs pondérés classiques et analogue à celle de l'estimateur calibré. Les résultats des simulations réalisées à partir des données de la cohorte NWTS étaient cependant moins bons pour les effets impliquant la variable de phase-2 : les estimateurs de l'imputation multiple étaient légèrement biaisés et moins précis que les estimateurs pondérés. Cela s'explique par la présence de termes d'interaction impliquant la variable de phase-2 dans le modèle d'analyse, d'où la nécessité d'estimer des modèles d'imputation spécifiques à différentes strates de la cohorte incluant parfois trop peu de cas pour que les conditions asymptotiques soient réunies.Nous recommandons d'utiliser l'imputation multiple pour obtenir des estimations plus précises des risques relatifs, tout en s'assurant qu'elles sont analogues à celles fournies par les analyses pondérées. Nos simulations ont également montré que l'imputation multiple fournissait des estimations de la valeur prédictive d'un modèle (C de Harrell) ou d'une variable additionnelle (différence des indices C, NRI ou IDI) analogues à celles fournies par la cohorte complèt

Modélisation des données d'enquêtes cas-cohorte par imputation multiple : application en épidémiologie cardio-vasculaire

Author: Marti Soler Helena,
Publication venue: HAL CCSD
Publication date: 04/05/2012
Field of study

HAL UVSQ

Modeling of case-cohort data by multiple imputation : application to cardio-vascular epidemiology

Author: Marti soler Helena
Publication venue
Publication date: 04/05/2012
Field of study

Theses.fr

Biodegradation Prediction and Modelling for Decision Support

Author: Aliotta Laura
Coltelli Maria
Fernandez-Avila Cristina
Gigante Vito
Marti-Soler Helena
Nettleton David
Sánchez-Esteva Sara
Verstichel Steven
Publication venue
Publication date: 01/01/2022
Field of study

Archivio della Ricerca - Università di Pisa

Population characteristics by adherence to dietary patterns.

Author: Ana-Lucia Mayén (813433)
Bharathi Viswanathan (603745)
Fred Paccaud (111527)
Helena Marti-Soler (665193)
Jude Gedeon (603746)
Pascal Bovet (58562)
Pedro Marques-Vidal (58558)
Silvia Stringhini (120881)
Publication venue
Publication date
Field of study

<p>Population characteristics by adherence to dietary patterns.</p

The Francis Crick Institute

Adherence to the different patterns according to socioeconomic indicators assessed by Poisson regression (n = 2476).

Author: Ana-Lucia Mayén (813433)
Bharathi Viswanathan (603745)
Fred Paccaud (111527)
Helena Marti-Soler (665193)
Jude Gedeon (603746)
Pascal Bovet (58562)
Pedro Marques-Vidal (58558)
Silvia Stringhini (120881)
Publication venue
Publication date
Field of study

<p>Model 1: adjusted for age, sex, year and education; Model 2: adjusted for age, sex, year and income; Model 3: adjusted for age, sex, year, education and income; Model 4: same as Model 3 and including an interaction term for education and income. Low education: secondary (obligatory), post secondary vocational or lower. High education: polytechnic and university. High income defined as income ≥3,001 Rupees in 2004 and ≥8,001 Rupees in 2013. Significant associations are indicated in bold.</p

The Francis Crick Institute

Factor loadings of the principal dietary patterns identified in 2004 and 2013 (n = 2476).

Author: Ana-Lucia Mayén (813433)
Bharathi Viswanathan (603745)
Fred Paccaud (111527)
Helena Marti-Soler (665193)
Jude Gedeon (603746)
Pascal Bovet (58562)
Pedro Marques-Vidal (58558)
Silvia Stringhini (120881)
Publication venue
Publication date
Field of study

<p>Factor loadings of the principal dietary patterns identified in 2004 and 2013 (n = 2476).</p

The Francis Crick Institute