76 research outputs found
Global short-term forecasting of COVID-19 cases
The continuously growing number of COVID-19 cases pressures healthcare services worldwide.
Accurate short-term forecasting is thus vital to support country-level policy making. The strategies
adopted by countries to combat the pandemic vary, generating diferent uncertainty levels about
the actual number of cases. Accounting for the hierarchical structure of the data and accommodating
extra-variability is therefore fundamental. We introduce a new modelling framework to describe the
pandemic’s course with great accuracy and provide short-term daily forecasts for every country in the
world. We show that our model generates highly accurate forecasts up to seven days ahead and use
estimated model components to cluster countries based on recent events. We introduce statistical
novelty in terms of modelling the autoregressive parameter as a function of time, increasing predictive
power and fexibility to adapt to each country. Our model can also be used to forecast the number of
deaths, study the efects of covariates (such as lockdown policies), and generate forecasts for smaller
regions within countries. Consequently, it has substantial implications for global planning and decision
making. We present forecasts and make all results freely available to any country in the world through
an online Shiny dashboard
An Improved Method for the Estimation and Comparison of Mortality Rates in Fish from Catch‐Curve Data
Catch-curve analyses are routinely used to estimate instantaneous mortality (Z) in fish, and as the age-frequency data are often over dispersed, the application of a variance bias-correction factor has been recommended. The extensions of the Poisson generalized linear model (GLMPoisson) may, however, constitute a better alternative, as they model the variance (SE) in counts more adequately with their specific dispersion parameter for more accurate estimations and statistical comparisons. To test this idea, simulated age-frequency data generated under four dispersion scenarios were analyzed according to six currently available methods and compared with the results of a GLMPoisson and
five of its extensions to evaluate each method-specific bias in Z SE estimates. Empirical age-frequency data from
sampled Walleye Sander vitreus and Arctic Char Salvelinus alpinus populations in Quebec, Canada, were then used ´
to illustrate the applicability of our GLM-based method, which relies on the behavior of Pearson residuals to assess
model adequacy and an information-theoretic approach for model selection. All analyses revealed that Z-estimates
were generally accurate among the methods considered, except under the most likely situation of quadratic over dispersion met in ecological studies, for which only the negative binomial type 2 and the mean-parametrized Conway–
Maxwell–Poisson (CMP) extensions were adequate to estimate both Z and its SE. Linearly over dispersed data were
best modeled by the negative binomial type 1 and generalized Poisson (GLMGP) extensions; the GLMCMP and
GLMGP were the most appropriate to model under dispersed data, whereas the GLM Poisson adequately modeled equi- dispersed data, similar to the Chapman and Robson (1960) method. Statistical comparisons of Z SE for grouping
factors, such as year or site, were correctly achieved when the most adequate and statistically supported GLM Poisson
extension was applied. Altogether, the proposed GLM-based method should help to circumvent the identified issues
related to SE estimation for statistical inferences about mortality rates for fisheries management decision making
The use of sheepdogs in sheep production in southeastern Brazil
This study assessed the economic value of using sheepdogs as livestock guardians in southeastern Brazil by implementing a semi-structured interview format divided into four main categories: maintenance costs of sheep production, selling prices of carcasses, annual rate of depredation, and sheepdog acquisition and maintenance costs. According to our results, producers perceive the “unproductive” costs of sheepdogs similarly to the way they view taxes. However, management using sheepdogs as herd guardians tends to be most profitable for herds above 483 head from the fourth year on, being possibly more stable and predictable over time. In contrast, management without sheepdogs shows stochastic dynamics with occasional, though unpredictable, episodes of sheep depredation. This means that sheep farmers follow a cyclical decision strategy, which basically depends on the purchase price of the sheepdog
Understanding learning from EEG data: Combining machine learning and feature engineering based on hidden Markov models and mixed models
Theta oscillations, ranging from 4-8 Hz, play a significant role in spatial
learning and memory functions during navigation tasks. Frontal theta
oscillations are thought to play an important role in spatial navigation and
memory. Electroencephalography (EEG) datasets are very complex, making any
changes in the neural signal related to behaviour difficult to interpret.
However, multiple analytical methods are available to examine complex data
structure, especially machine learning based techniques. These methods have
shown high classification performance and the combination with feature
engineering enhances the capability of these methods. This paper proposes using
hidden Markov and linear mixed effects models to extract features from EEG
data. Based on the engineered features obtained from frontal theta EEG data
during a spatial navigation task in two key trials (first, last) and between
two conditions (learner and non-learner), we analysed the performance of six
machine learning methods (Polynomial Support Vector Machines, Non-linear
Support Vector Machines, Random Forests, K-Nearest Neighbours, Ridge, and Deep
Neural Networks) on classifying learner and non-learner participants. We also
analysed how different standardisation methods used to pre-process the EEG data
contribute to classification performance. We compared the classification
performance of each trial with data gathered from the same subjects, including
solely coordinate-based features, such as idle time and average speed. We found
that more machine learning methods perform better classification using
coordinate-based data. However, only deep neural networks achieved an area
under the ROC curve higher than 80% using the theta EEG data alone. Our
findings suggest that standardising the theta EEG data and using deep neural
networks enhances the classification of learner and non-learner subjects in a
spatial learning task.Comment: 25 page
Bayesian additive regression trees with model trees
Bayesian additive regression trees (BART) is a tree-based machine learning method that has been successfully applied to
regression and classification problems. BART assumes regularisation priors on a set of trees that work as weak learners
and is very flexible for predicting in the presence of nonlinearity and high-order interactions. In this paper, we introduce
an extension of BART, called model trees BART (MOTR-BART), that considers piecewise linear functions at node levels
instead of piecewise constants. In MOTR-BART, rather than having a unique value at node level for the prediction, a linear
predictor is estimated considering the covariates that have been used as the split variables in the corresponding tree. In our
approach, local linearities are captured more efficiently and fewer trees are required to achieve equal or better performance
than BART. Via simulation studies and real data applications, we compare MOTR-BART to its main competitors. R code for
MOTR-BART implementation is available at https://github.com/ebprado/MOTR-BART
A Mixed Model for Assessing the Effect of Numerous Plant Species Interactions on Grassland Biodiversity and Ecosystem Function Relationships
In grassland ecosystems, it is well known that increasing plant species diversity can improve ecosystem functions (i.e., ecosystem responses), for example, by increasing productivity and reducing weed invasion. Diversity-Interactions models use species proportions and their interactions as predictors in a regression framework to assess biodiversity and ecosystem function relationships. However, it can be difficult to model numerous interactions if there are many species, and interactions may be temporally variable or dependent on spatial planting patterns. We developed a new Diversity-Interactions mixed model for jointly assessing many species interactions and within-plot species planting pattern over multiple years. We model pairwise interactions using a small number of fixed parameters that incorporate spatial effects and supplement this by including all pairwise interaction variables as random effects, each constrained to have the same variance within each year. The random effects are indexed by pairs of species within plots rather than a plot-level factor as is typical in mixed models, and capture remaining variation due to pairwise species interactions parsimoniously. We apply our novel methodology to three years of weed invasion data from a 16-species grassland experiment that manipulated plant species diversity and spatial planting pattern and test its statistical properties in a simulation study. Supplementary materials accompanying this paper appear online
Multi-state models for double transitions associated with parasitism in biological control
Competition between parasitoids can reduce the success of pest control in
biological programs using two species as bio-control agents or when multiple
species exploit the same host crop. Parasitoid foraging behavior and the
ability to identify already parasitized hosts affect the efficacy of parasitoid
species as bio-agents to regulate pest insects. We evaluated the behavioural
changes of parasitoids according to the quality of hosts ({\it i.e.},
previously parasitised or not), and the characterisation of these transitions
over time via multi-state models. We evaluated the effects of previous
parasitism of the brown stinkbug {\it Euschistus heros} eggs on the parasitism
rate of the species {\it Trissolcus basalis} and {\it Telenomus podisi}. We
successively modelled the choice of eggs (with three possibilities: non
parasitised eggs, eggs previously parasitised by {\it T. podisi}, and eggs
previously parasitised by {\it T. basalis}) and the conditional behaviour given
the choice (walking, drumming, ovipositing or marking the chosen egg). We
consider multi-state models in two successive stages to calculate double
transition probabilities, and the statistical methodology is based on the
maximum likelihood procedure. Using the Cox model and assuming a stationary
process, we verified that the treatment effect was significant for the choice,
indicating that the two parasitoid species have different choice patterns. For
the second stage, i.e. behaviour given the choice, the results also showed the
influence of the species on the conditional behaviour, especially for
previously parasitised eggs. Specifically, {\it T.podisi} avoids intraspecific
competition and makes decisions faster than {\it T. basalis}. In this work, we
emphasise the methodological contribution with multi-state models, especially
in the context of double transitions.Comment: 16 page
Diagnostics for categorical response models based on quantile residuals and distance measures
Polytomous categorical data are frequent in studies, that can be obtained
with an individual or grouped structure. In both structures, the generalized
logit model is commonly used to relate the covariates on the response variable.
After fitting a model, one of the challenges is the definition of an
appropriate residual and choosing diagnostic techniques. Since the polytomous
variable is multivariate, raw, Pearson, or deviance residuals are vectors and
their asymptotic distribution is generally unknown, which leads to difficulties
in graphical visualization and interpretation. Therefore, the definition of
appropriate residuals and the choice of the correct analysis in diagnostic
tools is important, especially for nominal data, where a restriction of methods
is observed. This paper proposes the use of randomized quantile residuals
associated with individual and grouped nominal data, as well as Euclidean and
Mahalanobis distance measures, as an alternative to reduce the dimension of the
residuals. We developed simulation studies with both data structures
associated. The half-normal plots with simulation envelopes were used to assess
model performance. These studies demonstrated a good performance of the
quantile residuals, and the distance measurements allowed a better
interpretation of the graphical techniques. We illustrate the proposed
procedures with two applications to real data.Comment: 20 page
Development of a Decision Support System for the Management of Mummy Berry Disease in Northwestern Washington
Mummy berry, caused by Monilinia vaccinii-corymbosi, is the most important disease of the northern highbush blueberry (Vaccinium corymbosum L.) in North America and can cause up to 70% yield losses in affected fields. A key event in the mummy berry disease cycle is the primary infection phase where ascospores are released by apothecia that infect emerging floral and vegetative tissues. Current management of mummy berry disease in northwestern Washington is predominantly reliant on the prevention of primary infections through prophylactic, calendar-based fungicide spray applications early in the growing season. To improve the understanding of risk during these periods and to help tailor management strategies, we developed a decision support system (DSS) based on field records spanning over five seasons and four locations in northwestern Washington. Environmental conditions across the region were highly uniform but different dynamics of apothecial development were observed under high- and low-management regimes. Based on our analysis, we suggest basing the initial iteration of the DSS on two sub-models. The first sub-model predicts the onset of apothecia based on chill-unit accumulation under high- and low-management regimes, and the second predicts primary infection risk, which provides opportunities to improve the timing of fungicide applications. The synoptic DSS proposed here is based on the current biological knowledge of the pathosystem and available data for the northwestern Washington region. We provide the analysis and the DSS implementation and evaluation as an open-source repository, providing opportunities for further improvements. Finally, we provide suggestions for future research and the operational efforts needed for improving the utility and accuracy of the mummy berry DSS.publishedVersio
Discussion of: "Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects"
Contributed discussion included in P. Richard Hahn. Jared S. Murray. Carlos M. Carvalho. "Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (with Discussion)." Bayesian Anal. 15 (3) 965 - 1056, September 2020. https://doi.org/10.1214/19-BA119
- …