76 research outputs found

    Global short-term forecasting of COVID-19 cases

    Get PDF
    The continuously growing number of COVID-19 cases pressures healthcare services worldwide. Accurate short-term forecasting is thus vital to support country-level policy making. The strategies adopted by countries to combat the pandemic vary, generating diferent uncertainty levels about the actual number of cases. Accounting for the hierarchical structure of the data and accommodating extra-variability is therefore fundamental. We introduce a new modelling framework to describe the pandemic’s course with great accuracy and provide short-term daily forecasts for every country in the world. We show that our model generates highly accurate forecasts up to seven days ahead and use estimated model components to cluster countries based on recent events. We introduce statistical novelty in terms of modelling the autoregressive parameter as a function of time, increasing predictive power and fexibility to adapt to each country. Our model can also be used to forecast the number of deaths, study the efects of covariates (such as lockdown policies), and generate forecasts for smaller regions within countries. Consequently, it has substantial implications for global planning and decision making. We present forecasts and make all results freely available to any country in the world through an online Shiny dashboard

    An Improved Method for the Estimation and Comparison of Mortality Rates in Fish from Catch‐Curve Data

    Get PDF
    Catch-curve analyses are routinely used to estimate instantaneous mortality (Z) in fish, and as the age-frequency data are often over dispersed, the application of a variance bias-correction factor has been recommended. The extensions of the Poisson generalized linear model (GLMPoisson) may, however, constitute a better alternative, as they model the variance (SE) in counts more adequately with their specific dispersion parameter for more accurate estimations and statistical comparisons. To test this idea, simulated age-frequency data generated under four dispersion scenarios were analyzed according to six currently available methods and compared with the results of a GLMPoisson and five of its extensions to evaluate each method-specific bias in Z SE estimates. Empirical age-frequency data from sampled Walleye Sander vitreus and Arctic Char Salvelinus alpinus populations in Quebec, Canada, were then used ´ to illustrate the applicability of our GLM-based method, which relies on the behavior of Pearson residuals to assess model adequacy and an information-theoretic approach for model selection. All analyses revealed that Z-estimates were generally accurate among the methods considered, except under the most likely situation of quadratic over dispersion met in ecological studies, for which only the negative binomial type 2 and the mean-parametrized Conway– Maxwell–Poisson (CMP) extensions were adequate to estimate both Z and its SE. Linearly over dispersed data were best modeled by the negative binomial type 1 and generalized Poisson (GLMGP) extensions; the GLMCMP and GLMGP were the most appropriate to model under dispersed data, whereas the GLM Poisson adequately modeled equi- dispersed data, similar to the Chapman and Robson (1960) method. Statistical comparisons of Z SE for grouping factors, such as year or site, were correctly achieved when the most adequate and statistically supported GLM Poisson extension was applied. Altogether, the proposed GLM-based method should help to circumvent the identified issues related to SE estimation for statistical inferences about mortality rates for fisheries management decision making

    The use of sheepdogs in sheep production in southeastern Brazil

    Get PDF
    This study assessed the economic value of using sheepdogs as livestock guardians in southeastern Brazil by implementing a semi-structured interview format divided into four main categories: maintenance costs of sheep production, selling prices of carcasses, annual rate of depredation, and sheepdog acquisition and maintenance costs. According to our results, producers perceive the “unproductive” costs of sheepdogs similarly to the way they view taxes. However, management using sheepdogs as herd guardians tends to be most profitable for herds above 483 head from the fourth year on, being possibly more stable and predictable over time. In contrast, management without sheepdogs shows stochastic dynamics with occasional, though unpredictable, episodes of sheep depredation. This means that sheep farmers follow a cyclical decision strategy, which basically depends on the purchase price of the sheepdog

    Understanding learning from EEG data: Combining machine learning and feature engineering based on hidden Markov models and mixed models

    Full text link
    Theta oscillations, ranging from 4-8 Hz, play a significant role in spatial learning and memory functions during navigation tasks. Frontal theta oscillations are thought to play an important role in spatial navigation and memory. Electroencephalography (EEG) datasets are very complex, making any changes in the neural signal related to behaviour difficult to interpret. However, multiple analytical methods are available to examine complex data structure, especially machine learning based techniques. These methods have shown high classification performance and the combination with feature engineering enhances the capability of these methods. This paper proposes using hidden Markov and linear mixed effects models to extract features from EEG data. Based on the engineered features obtained from frontal theta EEG data during a spatial navigation task in two key trials (first, last) and between two conditions (learner and non-learner), we analysed the performance of six machine learning methods (Polynomial Support Vector Machines, Non-linear Support Vector Machines, Random Forests, K-Nearest Neighbours, Ridge, and Deep Neural Networks) on classifying learner and non-learner participants. We also analysed how different standardisation methods used to pre-process the EEG data contribute to classification performance. We compared the classification performance of each trial with data gathered from the same subjects, including solely coordinate-based features, such as idle time and average speed. We found that more machine learning methods perform better classification using coordinate-based data. However, only deep neural networks achieved an area under the ROC curve higher than 80% using the theta EEG data alone. Our findings suggest that standardising the theta EEG data and using deep neural networks enhances the classification of learner and non-learner subjects in a spatial learning task.Comment: 25 page

    Bayesian additive regression trees with model trees

    Get PDF
    Bayesian additive regression trees (BART) is a tree-based machine learning method that has been successfully applied to regression and classification problems. BART assumes regularisation priors on a set of trees that work as weak learners and is very flexible for predicting in the presence of nonlinearity and high-order interactions. In this paper, we introduce an extension of BART, called model trees BART (MOTR-BART), that considers piecewise linear functions at node levels instead of piecewise constants. In MOTR-BART, rather than having a unique value at node level for the prediction, a linear predictor is estimated considering the covariates that have been used as the split variables in the corresponding tree. In our approach, local linearities are captured more efficiently and fewer trees are required to achieve equal or better performance than BART. Via simulation studies and real data applications, we compare MOTR-BART to its main competitors. R code for MOTR-BART implementation is available at https://github.com/ebprado/MOTR-BART

    A Mixed Model for Assessing the Effect of Numerous Plant Species Interactions on Grassland Biodiversity and Ecosystem Function Relationships

    Get PDF
    In grassland ecosystems, it is well known that increasing plant species diversity can improve ecosystem functions (i.e., ecosystem responses), for example, by increasing productivity and reducing weed invasion. Diversity-Interactions models use species proportions and their interactions as predictors in a regression framework to assess biodiversity and ecosystem function relationships. However, it can be difficult to model numerous interactions if there are many species, and interactions may be temporally variable or dependent on spatial planting patterns. We developed a new Diversity-Interactions mixed model for jointly assessing many species interactions and within-plot species planting pattern over multiple years. We model pairwise interactions using a small number of fixed parameters that incorporate spatial effects and supplement this by including all pairwise interaction variables as random effects, each constrained to have the same variance within each year. The random effects are indexed by pairs of species within plots rather than a plot-level factor as is typical in mixed models, and capture remaining variation due to pairwise species interactions parsimoniously. We apply our novel methodology to three years of weed invasion data from a 16-species grassland experiment that manipulated plant species diversity and spatial planting pattern and test its statistical properties in a simulation study. Supplementary materials accompanying this paper appear online

    Multi-state models for double transitions associated with parasitism in biological control

    Full text link
    Competition between parasitoids can reduce the success of pest control in biological programs using two species as bio-control agents or when multiple species exploit the same host crop. Parasitoid foraging behavior and the ability to identify already parasitized hosts affect the efficacy of parasitoid species as bio-agents to regulate pest insects. We evaluated the behavioural changes of parasitoids according to the quality of hosts ({\it i.e.}, previously parasitised or not), and the characterisation of these transitions over time via multi-state models. We evaluated the effects of previous parasitism of the brown stinkbug {\it Euschistus heros} eggs on the parasitism rate of the species {\it Trissolcus basalis} and {\it Telenomus podisi}. We successively modelled the choice of eggs (with three possibilities: non parasitised eggs, eggs previously parasitised by {\it T. podisi}, and eggs previously parasitised by {\it T. basalis}) and the conditional behaviour given the choice (walking, drumming, ovipositing or marking the chosen egg). We consider multi-state models in two successive stages to calculate double transition probabilities, and the statistical methodology is based on the maximum likelihood procedure. Using the Cox model and assuming a stationary process, we verified that the treatment effect was significant for the choice, indicating that the two parasitoid species have different choice patterns. For the second stage, i.e. behaviour given the choice, the results also showed the influence of the species on the conditional behaviour, especially for previously parasitised eggs. Specifically, {\it T.podisi} avoids intraspecific competition and makes decisions faster than {\it T. basalis}. In this work, we emphasise the methodological contribution with multi-state models, especially in the context of double transitions.Comment: 16 page

    Diagnostics for categorical response models based on quantile residuals and distance measures

    Full text link
    Polytomous categorical data are frequent in studies, that can be obtained with an individual or grouped structure. In both structures, the generalized logit model is commonly used to relate the covariates on the response variable. After fitting a model, one of the challenges is the definition of an appropriate residual and choosing diagnostic techniques. Since the polytomous variable is multivariate, raw, Pearson, or deviance residuals are vectors and their asymptotic distribution is generally unknown, which leads to difficulties in graphical visualization and interpretation. Therefore, the definition of appropriate residuals and the choice of the correct analysis in diagnostic tools is important, especially for nominal data, where a restriction of methods is observed. This paper proposes the use of randomized quantile residuals associated with individual and grouped nominal data, as well as Euclidean and Mahalanobis distance measures, as an alternative to reduce the dimension of the residuals. We developed simulation studies with both data structures associated. The half-normal plots with simulation envelopes were used to assess model performance. These studies demonstrated a good performance of the quantile residuals, and the distance measurements allowed a better interpretation of the graphical techniques. We illustrate the proposed procedures with two applications to real data.Comment: 20 page

    Development of a Decision Support System for the Management of Mummy Berry Disease in Northwestern Washington

    Get PDF
    Mummy berry, caused by Monilinia vaccinii-corymbosi, is the most important disease of the northern highbush blueberry (Vaccinium corymbosum L.) in North America and can cause up to 70% yield losses in affected fields. A key event in the mummy berry disease cycle is the primary infection phase where ascospores are released by apothecia that infect emerging floral and vegetative tissues. Current management of mummy berry disease in northwestern Washington is predominantly reliant on the prevention of primary infections through prophylactic, calendar-based fungicide spray applications early in the growing season. To improve the understanding of risk during these periods and to help tailor management strategies, we developed a decision support system (DSS) based on field records spanning over five seasons and four locations in northwestern Washington. Environmental conditions across the region were highly uniform but different dynamics of apothecial development were observed under high- and low-management regimes. Based on our analysis, we suggest basing the initial iteration of the DSS on two sub-models. The first sub-model predicts the onset of apothecia based on chill-unit accumulation under high- and low-management regimes, and the second predicts primary infection risk, which provides opportunities to improve the timing of fungicide applications. The synoptic DSS proposed here is based on the current biological knowledge of the pathosystem and available data for the northwestern Washington region. We provide the analysis and the DSS implementation and evaluation as an open-source repository, providing opportunities for further improvements. Finally, we provide suggestions for future research and the operational efforts needed for improving the utility and accuracy of the mummy berry DSS.publishedVersio

    Discussion of: "Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects"

    Get PDF
    Contributed discussion included in P. Richard Hahn. Jared S. Murray. Carlos M. Carvalho. "Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects (with Discussion)." Bayesian Anal. 15 (3) 965 - 1056, September 2020. https://doi.org/10.1214/19-BA119
    corecore