3 research outputs found
Mottle: Accurate pairwise substitution distance at high divergence through the exploitation of short-read mappers and gradient descent
Current tools for estimating the substitution distance between two related sequences struggle to remain accurate at a high divergence. Difficulties at distant homologies, such as false seeding and over-alignment, create a high barrier for the development of a stable estimator. This is especially true for viral genomes, which carry a high rate of mutation, small size, and sparse taxonomy. Developing an accurate substitution distance measure would help to elucidate the relationship between highly divergent sequences, interrogate their evolutionary history, and better facilitate the discovery of new viral genomes. To tackle these problems, we propose an approach that uses short-read mappers to create whole-genome maps, and gradient descent to isolate the homologous fraction and calculate the final distance value. We implement this approach as Mottle. With the use of simulated and biological sequences, Mottle was able to remain stable to 0.66–0.96 substitutions per base pair and identify viral outgroup genomes with 95% accuracy at the family-order level. Our results indicate that Mottle performs as well as existing programs in identifying taxonomic relationships, with more accurate numerical estimation of genomic distance over greater divergences. By contrast, one limitation is a reduced numerical accuracy at low divergences, and on genomes where insertions and deletions are uncommon, when compared to alternative approaches. We propose that Mottle may therefore be of particular interest in the study of viruses, viral relationships, and notably for viral discovery platforms, helping in benchmarking of homology search tools and defining the limits of taxonomic classification methods. The code for Mottle is available at https://github.com/tphoward/Mottle_Repo
Evaluation of the 2022 West Nile virus forecasting challenge, USA
\ua9 2025. The Author(s). BACKGROUND: West Nile virus (WNV) is the most common cause of mosquito-borne disease in the continental USA, with an average of ~1200 severe, neuroinvasive cases reported annually from 2005 to 2021 (range 386-2873). Despite this burden, efforts to forecast WNV disease to inform public health measures to reduce disease incidence have had limited success. Here, we analyze forecasts submitted to the 2022 WNV Forecasting Challenge, a follow-up to the 2020 WNV Forecasting Challenge. METHODS: Forecasting teams submitted probabilistic forecasts of annual West Nile virus neuroinvasive disease (WNND) cases for each county in the continental USA for the 2022 WNV season. We assessed the skill of team-specific forecasts, baseline forecasts, and an ensemble created from team-specific forecasts. We then characterized the impact of model characteristics and county-specific contextual factors (e.g., population) on forecast skill. RESULTS: Ensemble forecasts for 2022 anticipated a season at or below median long-term WNND incidence for nearly all (> 99%) counties. More counties reported higher case numbers than anticipated by the ensemble forecast median, but national caseload (826) was well below the 10-year median (1386). Forecast skill was highest for the ensemble forecast, though the historical negative binomial baseline model and several team-submitted forecasts had similar forecast skill. Forecasts utilizing regression-based frameworks tended to have more skill than those that did not and models using climate, mosquito surveillance, demographic, or avian data had less skill than those that did not, potentially due to overfitting. County-contextual analysis showed strong relationships with the number of years that WNND had been reported and permutation entropy (historical variability). Evaluations based on weighted interval score and logarithmic scoring metrics produced similar results. CONCLUSIONS: The relative success of the ensemble forecast, the best forecast for 2022, suggests potential gains in community ability to forecast WNV, an improvement from the 2020 Challenge. Similar to the previous challenge, however, our results indicate that skill was still limited with general underprediction despite a relative low incidence year. Potential opportunities for improvement include refining mechanistic approaches, integrating additional data sources, and considering different approaches for areas with and without previous cases
Evaluation of the 2022 West Nile virus forecasting challenge, USA
Background: West Nile virus (WNV) is the most common cause of mosquito-borne disease in the continental USA, with an average of ~1200 severe, neuroinvasive cases reported annually from 2005 to 2021 (range 386-2873). Despite this burden, efforts to forecast WNV disease to inform public health measures to reduce disease incidence have had limited success. Here, we analyze forecasts submitted to the 2022 WNV Forecasting Challenge, a follow-up to the 2020 WNV Forecasting Challenge. Methods: Forecasting teams submitted probabilistic forecasts of annual West Nile virus neuroinvasive disease (WNND) cases for each county in the continental USA for the 2022 WNV season. We assessed the skill of team-specific forecasts, baseline forecasts, and an ensemble created from team-specific forecasts. We then characterized the impact of model characteristics and county-specific contextual factors (e.g., population) on forecast skill. Results: Ensemble forecasts for 2022 anticipated a season at or below median long-term WNND incidence for nearly all (> 99%) counties. More counties reported higher case numbers than anticipated by the ensemble forecast median, but national caseload (826) was well below the 10-year median (1386). Forecast skill was highest for the ensemble forecast, though the historical negative binomial baseline model and several team-submitted forecasts had similar forecast skill. Forecasts utilizing regression-based frameworks tended to have more skill than those that did not and models using climate, mosquito surveillance, demographic, or avian data had less skill than those that did not, potentially due to overfitting. County-contextual analysis showed strong relationships with the number of years that WNND had been reported and permutation entropy (historical variability). Evaluations based on weighted interval score and logarithmic scoring metrics produced similar results. Conclusions: The relative success of the ensemble forecast, the best forecast for 2022, suggests potential gains in community ability to forecast WNV, an improvement from the 2020 Challenge. Similar to the previous challenge, however, our results indicate that skill was still limited with general underprediction despite a relative low incidence year. Potential opportunities for improvement include refining mechanistic approaches, integrating additional data sources, and considering different approaches for areas with and without previous cases
