11 research outputs found

    Statistical models for decision support systems based on the reuse of Health Big Data : application to syndromic surveillance in public health

    No full text
    Depuis plusieurs années, la notion de Big Data s'est largement développée. Afin d'analyser et explorer toutes ces données, il a été nécessaire de concevoir de nouvelles méthodes et de nouvelles technologies. Aujourd'hui, le Big Data existe également dans le domaine de la santé. Les hôpitaux en particulier, participent à la production de données grâce à l'adoption du dossier patient électronique. L'objectif de cette thèse a été de développer des méthodes statistiques réutilisant ces données afin de participer à la surveillance syndromique et d'apporter une aide à la décision. Cette étude comporte 4 axes majeurs. Tout d'abord, nous avons montré que les données massives hospitalières étaient très corrélées aux signaux des réseaux de surveillance traditionnels. Dans un second temps, nous avons établi que les données hospitalières permettaient d'obtenir des estimations en temps réel plus précises que les données du web, et que les modèles SVM et Elastic Net avaient des performances comparables. Puis, nous avons appliqué des méthodes développées aux Etats-Unis réutilisant les données hospitalières, les données du web (Google et Twitter) et les données climatiques afin de prévoir à 2 semaines les taux d'incidence grippaux de toutes les régions françaises. Enfin, les méthodes développées ont été appliquées à la prévision à 3 semaines des cas de gastro-entérite au niveau national, régional, et hospitalier.Over the past few years, the Big Data concept has been widely developed. In order to analyse and explore all this data, it was necessary to develop new methods and technologies. Today, Big Data also exists in the health sector. Hospitals in particular are involved in data production through the adoption of electronic health records. The objective of this thesis was to develop statistical methods reusing these data in order to participate in syndromic surveillance and to provide decision-making support. This study has 4 major axes. First, we showed that hospital Big Data were highly correlated with signals from traditional surveillance networks. Secondly, we showed that hospital data allowed to obtain more accurate estimates in real time than web data, and SVM and Elastic Net models had similar performances. Then, we applied methods developed in United States reusing hospital data, web data (Google and Twitter) and climatic data to predict influenza incidence rates for all French regions up to 2 weeks. Finally, methods developed were applied to the 3-week forecast for cases of gastroenteritis at the national, regional and hospital levels

    Modèles statistiques pour les systèmes d'aide à la décision basés sur la réutilisation des données massives en santé : application à la surveillance syndromique en santé publique

    No full text
    Over the past few years, the Big Data concept has been widely developed. In order to analyse and explore all this data, it was necessary to develop new methods and technologies. Today, Big Data also exists in the health sector. Hospitals in particular are involved in data production through the adoption of electronic health records. The objective of this thesis was to develop statistical methods reusing these data in order to participate in syndromic surveillance and to provide decision-making support. This study has 4 major axes. First, we showed that hospital Big Data were highly correlated with signals from traditional surveillance networks. Secondly, we showed that hospital data allowed to obtain more accurate estimates in real time than web data, and SVM and Elastic Net models had similar performances. Then, we applied methods developed in United States reusing hospital data, web data (Google and Twitter) and climatic data to predict influenza incidence rates for all French regions up to 2 weeks. Finally, methods developed were applied to the 3-week forecast for cases of gastroenteritis at the national, regional and hospital levels.Depuis plusieurs années, la notion de Big Data s'est largement développée. Afin d'analyser et explorer toutes ces données, il a été nécessaire de concevoir de nouvelles méthodes et de nouvelles technologies. Aujourd'hui, le Big Data existe également dans le domaine de la santé. Les hôpitaux en particulier, participent à la production de données grâce à l'adoption du dossier patient électronique. L'objectif de cette thèse a été de développer des méthodes statistiques réutilisant ces données afin de participer à la surveillance syndromique et d'apporter une aide à la décision. Cette étude comporte 4 axes majeurs. Tout d'abord, nous avons montré que les données massives hospitalières étaient très corrélées aux signaux des réseaux de surveillance traditionnels. Dans un second temps, nous avons établi que les données hospitalières permettaient d'obtenir des estimations en temps réel plus précises que les données du web, et que les modèles SVM et Elastic Net avaient des performances comparables. Puis, nous avons appliqué des méthodes développées aux Etats-Unis réutilisant les données hospitalières, les données du web (Google et Twitter) et les données climatiques afin de prévoir à 2 semaines les taux d'incidence grippaux de toutes les régions françaises. Enfin, les méthodes développées ont été appliquées à la prévision à 3 semaines des cas de gastro-entérite au niveau national, régional, et hospitalier

    Gastroenteritis Forecasting Assessing the Use of Web and Electronic Health Record Data With a Linear and a Nonlinear Approach: Comparison Study

    No full text
    International audienceBACKGROUND: Disease surveillance systems capable of producing accurate real-time and short-term forecasts can help public health officials design timely public health interventions to mitigate the effects of disease outbreaks in affected populations. In France, existing clinic-based disease surveillance systems produce gastroenteritis activity information that lags real time by 1 to 3 weeks. This temporal data gap prevents public health officials from having a timely epidemiological characterization of this disease at any point in time and thus leads to the design of interventions that do not take into consideration the most recent changes in dynamics. OBJECTIVE: The goal of this study was to evaluate the feasibility of using internet search query trends and electronic health records to predict acute gastroenteritis (AG) incidence rates in near real time, at the national and regional scales, and for long-term forecasts (up to 10 weeks). METHODS: We present 2 different approaches (linear and nonlinear) that produce real-time estimates, short-term forecasts, and long-term forecasts of AG activity at 2 different spatial scales in France (national and regional). Both approaches leverage disparate data sources that include disease-related internet search activity, electronic health record data, and historical disease activity. RESULTS: Our results suggest that all data sources contribute to improving gastroenteritis surveillance for long-term forecasts with the prominent predictive power of historical data owing to the strong seasonal dynamics of this disease. CONCLUSIONS: The methods we developed could help reduce the impact of the AG peak by making it possible to anticipate increased activity by up to 10 weeks

    Real Time Influenza Monitoring Using Hospital Big Data in Combination with Machine Learning Methods Comparison Study

    No full text
    International audienceBackground - Traditional surveillance systems produce estimates of influenza-like illness (ILI) incidence rates, but with 1- to 3-week delay. Accurate real-time monitoring systems for influenza outbreaks could be useful for making public health decisions. Several studies have investigated the possibility of using internet users' activity data and different statistical models to predict influenza epidemics in near real time. However, very few studies have investigated hospital big data. Objective - Here, we compared internet and electronic health records (EHRs) data and different statistical models to identify the best approach (data type and statistical model) for ILI estimates in real time. Methods - We used Google data for internet data and the clinical data warehouse eHOP, which included all EHRs from Rennes University Hospital (France), for hospital data. We compared 3 statistical models-random forest, elastic net, and support vector machine (SVM). Results - For national ILI incidence rate, the best correlation was 0.98 and the mean squared error (MSE) was 866 obtained with hospital data and the SVM model. For the Brittany region, the best correlation was 0.923 and MSE was 2364 obtained with hospital data and the SVM model. Conclusions - We found that EHR data together with historical epidemiological information (French Sentinelles network) allowed for accurately predicting ILI incidence rates for the entire France as well as for the Brittany region and outperformed the internet data whatever was the statistical model used. Moreover, the performance of the two statistical models, elastic net and SVM, was comparable

    Influenza forecasting for French regions combining EHR, web and climatic data sources with a machine learning ensemble approach

    No full text
    International audienceEffective and timely disease surveillance systems have the potential to help public health officials design interventions to mitigate the effects of disease outbreaks. Currently, healthcare-based disease monitoring systems in France offer influenza activity information that lags real-time by one to three weeks. This temporal data gap introduces uncertainty that prevents public health officials from having a timely perspective on the population-level disease activity. Here, we present a machine-learning modeling approach that produces real-time estimates and short-term forecasts of influenza activity for the twelve continental regions of France by leveraging multiple disparate data sources that include, Google search activity, real-time and local weather information, flu-related Twitter micro-blogs, electronic health records data, and historical disease activity synchronicities across regions. Our results show that all data sources contribute to improving influenza surveillance and that machine-learning ensembles that combine all data sources lead to accurate and timely predictions

    Screening and vaccination against COVID-19 to minimise school closure: a modelling study

    No full text
    International audienceBackground: Schools were closed extensively in 2020–21 to counter SARS-CoV-2 spread, impacting students' education and wellbeing. With highly contagious variants expanding in Europe, safe options to maintain schools open are urgently needed. By estimating school-specific transmissibility, our study evaluates costs and benefits of different protocols for SARS-CoV-2 control at school.Methods: We developed an agent-based model of SARS-CoV-2 transmission in schools. We used empirical contact data in a primary and a secondary school and data from pilot screenings in 683 schools during the alpha variant (B.1.1.7) wave in March–June, 2021, in France. We fitted the model to observed school prevalence to estimate the school-specific effective reproductive number for the alpha (Ralpha) and delta (B.1.617.2; Rdelta) variants and performed a cost–benefit analysis examining different intervention protocols.Findings: We estimated Ralpha to be 1·40 (95% CI 1·35–1·45) in the primary school and 1·46 (1·41–1·51) in the secondary school during the spring wave, higher than the time-varying reproductive number estimated from community surveillance. Considering the delta variant and vaccination coverage in Europe as of mid-September, 2021, we estimated Rdelta to be 1·66 (1·60–1·71) in primary schools and 1·10 (1·06–1·14) in secondary schools. Under these conditions, weekly testing of 75% of unvaccinated students (PCR tests on saliva samples in primary schools and lateral flow tests in secondary schools), in addition to symptom-based testing, would reduce cases by 34% (95% CI 32–36) in primary schools and 36% (35–39) in secondary schools compared with symptom-based testing alone. Insufficient adherence was recorded in pilot screening (median ≤53%). Regular testing would also reduce student-days lost up to 80% compared with reactive class closures. Moderate vaccination coverage in students would still benefit from regular testing for additional control—ie, weekly testing 75% of unvaccinated students would reduce cases compared with symptom-based testing only, by 23% in primary schools when 50% of children are vaccinated.InterpretationThe COVID-19 pandemic will probably continue to pose a risk to the safe and normal functioning of schools. Extending vaccination coverage in students, complemented by regular testing with good adherence, are essential steps to keep schools open when highly transmissible variants are circulating.Funding: EU Framework Programme for Research and Innovation Horizon 2020, Horizon Europe Framework Programme, Agence Nationale de la Recherche, ANRS–Maladies Infectieuses Émergente
    corecore