20 research outputs found
Assessment of accuracy: systematic reduction of training points for maximum likelihood classification and mixture discriminant analysis (Gaussian and t-distribution)
Remote sensing provides a valuable tool for monitoring land cover across large areas of land. A simple yet popular method for land cover classification is Maximum Likelihood Classification (MLC), which assumes a single normal distribution of the samples per class in the feature space. Mixture Discriminant Analysis (MDA) is a natural extension of MLC which can be used with varying distributions and multiple distributions per class, which simplifies the classification process tremendously. We compare the accuracies of MLC and MDA (using a Gaussian and t-distribution) as the number of training points are systematically reduced in order to simulate varying reference data availability conditions. The results show that the more robust t-distribution MDA performs comparatively with the Gaussian MDA and that both outperform MLC when sufficient training points are available. As the number of training points increases the MDA accuracies increase while the MLC accuracy stagnates. At very low numbers of training samples (ranging from 22 to 169 dependent on the class), there is more variability in terms of which method performs best
Distinguishing tree species from in situ hyperspectral and temporal measurements through ensemble statistical learning
The data presented in this study may be obtained from the corresponding author upon request. Due to intellectual property and confidentiality concerns, the data is not publicly available.Hyperspectral sensors capture and compute spectral reflectance of objects over many
wavelength bands, resulting in a high-dimensional space with enough information to differentiate
between spectrally similar objects. Due to the curse of dimensionality, high spectral dimensionality
can also be difficult to handle and analyse, demanding complex processing and the use of advanced
analytical techniques. Moreover, when hyperspectral measurements are taken at different temporal
frequencies, separation is likely to improve; however, additional complexities in modelling time
variability concurrently with this high spectral dimensionality may be created. As a result, the applicability of ensemble-based techniques suitable for high-dimensional data is examined in this research,
together with the statistical evaluation of time-induced variability, since spectral measurements of tree
species were taken at different time periods. Classification errors for the stochastic gradient boosting
(SGB) and random forest (RF) methods ranged between 5.6% and 13.5%, respectively. Differences in
classification accuracy or errors were also accounted for in the assessment of the models, with up
to 46% of variation in classification error due to the effect of time in the RF model, indicating that
measurement time is important in improving discrimination between tree species. This is because
optical leaf characteristics can vary during the course of the year due to seasonal effects, health status,
or the developmental stage of a tree. Different spectral properties (assumed from relevant wavelength
bands) were found to be key factors impacting the models’ discrimination performance at various
measurement times.The Council for Scientific and Industrial Research (CSIR).https://www.mdpi.com/journal/remotesensingPlant Production and Soil SciencePlant ScienceSDG-15:Life on lan
A Markov chain model for geographical accessibility
Accessibility analyses are conducted for a variety of applications,
including urban planning and public health studies. These
applications may aggregate data at the level of administrative
units, such as provinces or municipalities. Accessibility between
administrative units can be quantified by travel distance. However,
modelling the distances between all administrative units
in a region is computationally expensive if a large number of
administrative units is considered. We propose a methodology
to model accessibility between administrative units as a homogeneous
Markov chain, where the administrative units are
states and standardised inverse travel distances act as transition
probabilities. Single transitions are allowed only between
adjacent administrative units, resulting in a sparse one-step
transition probability matrix (TPM). Powers of the TPM are taken
to obtain transition probabilities between non-adjacent units.
The methodology assumes that the Markov property holds for
travel between units. We apply the methodology to administrative
units within Tshwane, South Africa, considering only major
roads for the sake of computation. The results are compared to
those obtained using Euclidean distance, and we show that using
network distance yields more reasonable results. The proposed
methodology is computationally efficient and can be used to
estimate accessibility between any set of administrative units
connected by a road network.In part by the National Research Foundation of South Africa and the NRF-SASA Academic Statistics Grant.http://www.elsevier.com/locate/spastaam2024StatisticsNon
Short-term real-time prediction of total number of reported COVID-19 cases and deaths in South Africa : a data driven approach
BACKGROUND: The rising burden of the ongoing COVID-19 epidemic in South Africa has motivated the application
of modeling strategies to predict the COVID-19 cases and deaths. Reliable and accurate short and long-term
forecasts of COVID-19 cases and deaths, both at the national and provincial level, are a key aspect of the strategy to
handle the COVID-19 epidemic in the country.
METHODS: In this paper we apply the previously validated approach of phenomenological models, fitting several nonlinear growth curves (Richards, 3 and 4 parameter logistic, Weibull and Gompertz), to produce short term forecasts of
COVID-19 cases and deaths at the national level as well as the provincial level. Using publicly available daily reported
cumulative case and death data up until 22 June 2020, we report 5, 10, 15, 20, 25 and 30-day ahead forecasts of
cumulative cases and deaths. All predictions are compared to the actual observed values in the forecasting period.
RESULTS: We observed that all models for cases provided accurate and similar short-term forecasts for a period of 5
days ahead at the national level, and that the three and four parameter logistic growth models provided more
accurate forecasts than that obtained from the Richards model 10 days ahead. However, beyond 10 days all models
underestimated the cumulative cases. Our forecasts across the models predict an additional 23,551–26,702 cases in 5
days and an additional 47,449–57,358 cases in 10 days. While the three parameter logistic growth model provided the
most accurate forecasts of cumulative deaths within the 10 day period, the Gompertz model was able to better
capture the changes in cumulative deaths beyond this period. Our forecasts across the models predict an additional
145–437 COVID-19 deaths in 5 days and an additional 243–947 deaths in 10 days.
CONCLUSIONS: By comparing both the predictions of deaths and cases to the observed data in the forecasting period,
we found that this modeling approach provides reliable and accurate forecasts for a maximum period of 10 days
ahead.http://www.biomedcentral.com/bmcmedresmethodolpm2021Statistic
A spatial model with vaccinations for COVID-19 in South Africa
Since the emergence of the novel COVID-19 virus pandemic in December 2019, numerous
mathematical models were published to assess the transmission dynamics of the disease, predict
its future course, and evaluate the impact of different control measures. The simplest models
make the basic assumptions that individuals are perfectly and evenly mixed and have the
same social structures. Such assumptions become problematic for large developing countries
that aggregate heterogeneous COVID-19 outbreaks in local areas. Thus, this paper proposes
a spatial SEIRDV model that includes spatial vaccination coverage, spatial vulnerability, and
level of mobility, to take into account the spatial–temporal clustering pattern of COVID-19
cases. The conclusion of this study is that immunity, government interventions, infectiousness
and virulence are the main drivers of the spread of COVID-19. These factors should be taken
into consideration when scientists, public policy makers and other stakeholders in the health
community analyse, create and project future disease prevention scenarios. Such a model has a
place for disease outbreaks that may occur in future, allowing for the inclusion of vaccination
rates in a spatial manner.In part by the National Research Foundation of South Africa and also funded by Canada’s International Development Research Centre (IDRC).http://www.elsevier.com/locate/spastaam2024StatisticsSDG-03:Good heatlh and well-bein
Are earth sciences lagging behind in data integration methodologies?
This article reflects discussions German and South
African Earth scientists, statisticians and risk analysts
had on occasion of two bilateral workshops on Data
Integration Technologies for Earth System Modelling
and Resource Management. The workshops were
held in October 2012 at Leipzig, Germany, and April
2013 at Pretoria, South Africa, and were attended by
about 70 researchers, practitioners and data managers
of both countries. Both events were arranged as
part of the South African-German Year of Science
2012/2013. The South African National Research
Foundation (NRF, UID 81579) has supported the two
workshops as part of the South African--German Year
of Science activities 2012/2013 established by the German Federal Ministry of Education and Research
and the South African Department of Science and
Technology.http://link.springer.com/journal/12665hb201