48 research outputs found
After the epidemic: Zika virus projections for Latin America and the Caribbean
Background: Zika is one of the most challenging emergent vector-borne diseases, yet its future public health impact remains unclear. Zika was of little public health concern until recent reports of its association with congenital syndromes. By 3 August 2017 ~217,000 Zika cases and ~3,400 cases of associated congenital syndrome were reported in Latin America and the Caribbean. Some modelling exercises suggest that Zika virus infection could become endemic in agreement with recent declarations from the The World Health Organisation. Methodology/Principal findings: We produced high-resolution spatially-explicit projections of Zika cases, associated congenital syndromes and monetary costs for Latin America and the Caribbean now that the epidemic phase of the disease appears to be over. In contrast to previous studies which have adopted a modelling approach to map Zika potential, we project case numbers using a statistical approach based upon reported dengue case data as a Zika surrogate. Our results indicate that ~12.3 (0.7–162.3) million Zika cases could be expected across Latin America and the Caribbean every year, leading to ~64.4 (0.2–5159.3) thousand cases of Guillain-Barré syndrome and ~4.7 (0.0–116.3) thousand cases of microcephaly. The economic burden of these neurological sequelae are estimated to be USD ~2.3 (USD 0–159.3) billion per annum. Conclusions/Significance: Zika is likely to have significant public health consequences across Latin America and the Caribbean in years to come. Our projections inform regional and federal health authorities, offering an opportunity to adapt to this public health challenge
Forecasting: theory and practice
Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases
The importance of climatic factors and outliers in predicting regional monthly campylobacteriosis risk in Georgia, USA
Incidence of Campylobacter infection exhibits a strong seasonal component and regional variations in temperate climate zones. Forecasting the risk of infection regionally may provide clues to identify sources of transmission affected by temperature and precipitation. The objectives of this study were to (1) assess temporal patterns and differences in campylobacteriosis risk among nine climatic divisions of Georgia, USA, (2) compare univariate forecasting models that analyze campylobacteriosis risk over time with those that incorporate temperature and/or precipitation, and (3) investigate alternatives to supposedly random walk series and non-random occurrences that could be outliers. Temporal patterns of campylobacteriosis risk in Georgia were visually and statistically assessed. Univariate and multivariable forecasting models were used to predict the risk of campylobacteriosis and the coefficient of determination (R(2)) was used for evaluating training (1999–2007) and holdout (2008) samples. Statistical control charting and rolling holdout periods were investigated to better understand the effect of outliers and improve forecasts. State and division level campylobacteriosis risk exhibited seasonal patterns with peaks occurring between June and August, and there were significant associations between campylobacteriosis risk, precipitation, and temperature. State and combined division forecasts were better than divisions alone, and models that included climate variables were comparable to univariate models. While rolling holdout techniques did not improve predictive ability, control charting identified high-risk time periods that require further investigation. These findings are important in (1) determining how climatic factors affect environmental sources and reservoirs of Campylobacter spp. and (2) identifying regional spikes in the risk of human Campylobacter infection and their underlying causes
How to evaluate sentiment classifiers for Twitter time-ordered data?
Social media are becoming an increasingly important source of information
about the public mood regarding issues such as elections, Brexit, stock market,
etc. In this paper we focus on sentiment classification of Twitter data.
Construction of sentiment classifiers is a standard text mining task, but here
we address the question of how to properly evaluate them as there is no settled
way to do so. Sentiment classes are ordered and unbalanced, and Twitter
produces a stream of time-ordered data. The problem we address concerns the
procedures used to obtain reliable estimates of performance measures, and
whether the temporal ordering of the training and test data matters. We
collected a large set of 1.5 million tweets in 13 European languages. We
created 138 sentiment models and out-of-sample datasets, which are used as a
gold standard for evaluations. The corresponding 138 in-sample datasets are
used to empirically compare six different estimation procedures: three variants
of cross-validation, and three variants of sequential validation (where test
set always follows the training set). We find no significant difference between
the best cross-validation and sequential validation. However, we observe that
all cross-validation variants tend to overestimate the performance, while the
sequential methods tend to underestimate it. Standard cross-validation with
random selection of examples is significantly worse than the blocked
cross-validation, and should not be used to evaluate classifiers in
time-ordered data scenarios
Explainable Deep Learning for Fault Prognostics in Complex Systems: A Particle Accelerator Use-Case
Sophisticated infrastructures often exhibit misbehaviour and failures resulting from complex interactions of their constituent subsystems. Such infrastructures use alarms, event and fault information, which is recorded to help diagnose and repair failure conditions by operations experts. This data can be analysed using explainable artificial intelligence to attempt to reveal precursors and eventual root causes. The proposed method is first applied to synthetic data in order to prove functionality. With synthetic data the framework makes extremely precise predictions and root causes can be identified correctly. Subsequently, the method is applied to real data from a complex particle accelerator system. In the real data setting, deep learning models produce accurate predictive models from less than ten error examples when precursors are captured. The approach described herein is a potentially valuable tool for operations experts to identify precursors in complex infrastructures