19 research outputs found
Modeling the Influence of Local Environmental Factors on Malaria Transmission in Benin and Its Implications for Cohort Study
Malaria remains endemic in tropical areas, especially in Africa. For the evaluation of new tools and to further our understanding of host-parasite interactions, knowing the environmental risk of transmission—even at a very local scale—is essential. The aim of this study was to assess how malaria transmission is influenced and can be predicted by local climatic and environmental factors
Anopheles number prediction on environmental and climate variables using Lasso and stratified two levels cross validation
This paper deals with prediction of anopheles number using environmental and climate variables. The variables selection is performed by an automatic machine learning method %don't get what you mean % %ok% based on Lasso and stratified two levels cross validation. Selected variables are debiased while the predictionis generated by simple GLM (Generalized linear model). Finally, the results reveal to be qualitatively better, at selection, the prediction,and the CPU time point of view than those obtained by B-GLM method
Contributions de l'apprentissage statistique aux méthodes GLMM et LASSO: Application à la modélisation statistique de la morbidité liée au paludisme à Tori-Bossito (Bénin)
The subject of this Thesis is the identification of environmental factors that may explain the variability of anopheline density at village and home scale and the determination malaria risk exposure in the study area. We consider these problems as variables selection and prediction problems in epidemiology context.Then, the main objective is the selection of an optimal subset of variables for the prediction of malaria risk exposure in the study area and also in an other area where the entomological data are not available. In the first part of the Thesis, we propose one method based on GLMM algorithm combined with a backward process for variables selection. Random effects are used at each hierarchy level of data for taking account the possible correlation because of the hierarchical structure of the data. This method provides an optimal subset of variables for prediction of malaria risk. But algorithm do not converge when some explanatory variables are too correlated or if data have a particular structure. For overcoming this, we propose in the second part an automatic machine learning method. We have generated automatically interactions between variables. The variables selection is performed by this automatic machine learning method based on Lasso and stratified two levels cross validation. Selected variables are debiased while the predictionis generated by simple GLM (Generalized linear model). The results of this method reveal to be qualitatively better, at selection, the prediction,and the CPU time point of view than those obtained in the first part. %In the third part of this work, we propose a second automatic machine learning method.%This method combines regression trees, random forest and stratified cross validation with two levels.%The minimum threshold of variables importance is accessed using the quadratic distance of variables importance while %the optimal subset of selected variables is used to perform predictions. %The results reveal to be qualitatively better, at the %selection, the prediction,%and the CPU time point of view than those obtained in the second part.Finally, the best subset of prediction contains : Season; interaction between Mean rainfall and openings; interaction between Rainy days before mission and Number of inhabitants; interaction between Rainy days during the mission and Vegetation.L'objectif principal de cette thèse est la détermination des facteurs environnementaux pouvant expliquer la variabilité de la densitéanophélienne et la prédiction du risque d'exposition au vecteur palustre au niveau village et maison de la zone de Tori-Bossito.Dans ce travail, nous avons considéré ces deux problèmes comme des problèmes de séection de variables et de prédiction dans le contexte épidémiologique. L'objectif principal est alors de sélectionnerun sous ensemble optimal de variables pertinentes pour la prédiction du risque d'exposition au vecteur palustre dans le milieu d'étude ainsi que dans un autre milieu où les données entomologiques ne sont pas disponibles. Dans la première partie de cette Thèse, nous avons proposé une méthode basée sur un algorithme de type GLMM combiné avec une sélection de variables de type backward. Des effetsaléatoires ont été mis au niveau de chaque hiérarchie des données pour prendre en compte les possibles corrélations à cause de la structure hiérarchiquedes données. Les résultats ont permis de déterminer un sous ensemble optimal pour la prédiction du risque palustre. Ces algorithmes deviennent non convergents lorsque les données possèdentune structure particulière ou sont très correlées. Dans laseconde partie de cette Thèse, nous avons donc proposé une méthode d'apprentissage machine automatique. Cette méthode combine le GLM, le Lasso et une validation croisée stratifiée à deux niveaux. Nous avons généré automatiquement les interactionsentre les variables. La sélection de variables a été faite par la combinaison GLM, Lasso et validation croisée. Les variablessélectionnées sont débiaisées par le GLM pour faire de la prédiction. Les résultats obtenus montrent queles pré-traitements effectués par les experts sur les données peuvent êtresurmontés. Aussi, ces résultats montrent une amélioration au niveau de la sélection, de la sparsité du sous ensemble optimal pour la prédiction,la qualité des prédictions et le temps CPU d'exécution des calculs. %Dans la troisième partie de cette Thèse, nous avons proposé une autre méthode%d'apprentissage machine automatique%basée sur les arbres de régression et les forêts %aléatoires combinés avec la %validation croisée stratifiée à deux niveaux. %Cet algorithme utilise un seuil minimum d'importance de variables déterminé par la distance quadratique %entre les importances de variables, %et la fréquence d'importance de %ces variables.% Les résultats obtenus montrent une nouvelle%amélioration au niveau de la %sparsité du sous ensemble optimal pour la prédiction, %la qualité des prédicteurs et des prédictions ainsi %que la vitesse d'exécution des calculs. Finalement, le meilleur sous ensemble de prédiction comporte Saison, interaction entre Quantité moyenne de pluie et Ouvertures, interaction entre Jours de pluie avant la mission et Nombre d'habitants, interaction entre Jours de pluie pendant la mission et Végétation
Variables selection by the LASSO method. Application to malaria data of Tori-Bossito (Benin)
COPROMATH 2013 Cotonou BĂ©ninThis work deals with prediction of anopheles number using environmental and climate variables. The variables selection is performed by GLMM (Generalized linear mixed model) combined with the Lasso method and simple cross validation. Selected variables are debiased while the predictionis generated by simple GLMM. Finally, the results reveal to be qualitatively better, at selection, the prediction point of view than those obtained by the reference method
Frequent variables.
<p>The x-axis shows the variables including the interactions, and the y-axis shows the percentage of presence of the variables. The left figure corresponds to the LDLM strategy and the right figure corresponds to LDLS strategy. Each vertical band represents one variable.</p
Number of original stable covariables for the strategies LDLM and LDLS.
<p>Number of original stable covariables for the strategies LDLM and LDLS.</p
Comparison between observed and predicted number of anopheles in eight houses.
<p>The line with “⋆” is for observed values, the line with “o” is for B-GLM and the line with “+” is for LOLO-DCV.</p
Modeling the influence of local environmental factors on malaria transmission in Benin and its implications for cohort study
Malaria remains endemic in tropical areas, especially in Africa. For the evaluation of new tools and to further our understanding of host-parasite interactions, knowing the environmental risk of transmission-even at a very local scale-is essential. The aim of this study was to assess how malaria transmission is influenced and can be predicted by local climatic and environmental factors. As the entomological part of a cohort study of 650 newborn babies in nine villages in the Tori Bossito district of Southern Benin between June 2007 and February 2010, human landing catches were performed to assess the density of malaria vectors and transmission intensity. Climatic factors as well as household characteristics were recorded throughout the study. Statistical correlations between Anopheles density and environmental and climatic factors were tested using a three-level Poisson mixed regression model. The results showed both temporal variations in vector density (related to season and rainfall), and spatial variations at the level of both village and house. These spatial variations could be largely explained by factors associated with the house's immediate surroundings, namely soil type, vegetation index and the proximity of a watercourse. Based on these results, a predictive regression model was developed using a leave-one-out method, to predict the spatiotemporal variability of malaria transmission in the nine villages. This study points up the importance of local environmental factors in malaria transmission and describes a model to predict the transmission risk of individual children, based on environmental and behavioral characteristics
Summary of predictions for B-GLM, LDLM, and LDLS on original variables.
<p>Summary of predictions for B-GLM, LDLM, and LDLS on original variables.</p