1,631 research outputs found
Predicting grade progression within the Limpopo Education System
One way to improve education in South Africa is to ensure that additional support and resourcing are provided to schools and learners that are most in need of help. To this end, education officials need to understand the factors affecting learning and the schools most in need of appropriate interventions. Several theories, models and methods have been developed to attempt to address the challenges faced in the education sector. Educational Data Mining (EDM) is one which has gained prominence in addressing these challenges. EDM is a field of data mining using mathematical and machine learning models to improve learners’ performance, education administration, and policy formulation. This study explored the literature and related methodologies used within the EDM context and constructed a solution to improve learner support and planning in the Limpopo primary and secondary schools education system. The data utilized included socio-economic environment, demographic information as well as learner’s performance sourced from the Education Management Information Systems database of the Limpopo Department of Education (LDoE). Feature selection methods; Information Gain, Correlation and Asymmetrical Uncertainty were combined to determine factors that affect learning. Three machine learning classifiers, AdaboostM1 (Decision Stump), HoeffdingTree and NaïveBayes, were used to predict learners’ grade progression. These were compared using several evaluation metrics and HoeffdingTree outperformed AdaboostM1 (Decision Stump) and NaïveBayes. When the final HoeffdingTree model was applied to the test datasets, the performance was exceptionally good. It is hoped that the implementation of this model will assist the LDoE in its role of supporting learning and planning of resource allocation
Recommended from our members
Artificial neural network techniques to investigate potential interactions between biomarkers
High-throughput technologies in biomedical sciences, including gene microarrays, supposed to revolutionise the post-genomic era, have barely met the great expectations they inspired to the biomedical community at first. Current efforts are still focused toward improving the technology, its reproducibility and accuracy. In the meantime, computational techniques for the analysis of the data from these technologies have achieved great progresses and show encouraging results. New approaches have been developed to extract relevant information out from these results. However, important work needs to be further conducted in order to extract even more meaningful and relevant information. These techniques offer great possibilities to explore the overall dynamic held within a living organism. The potential information contained in their output can reveal important leads at deciphering the interconnection, interaction or regulation influences that can exist between several molecules. In front of an increasing interest of the scientific community toward the exploration of these dynamics, some groups have started to develop solutions based on different technologies to extract these information related to interactions. Here we present an Artificial Neural Network-based methodology for the study of interactions in gene transcriptomic data. This will be applied and validated in a breast cancer context
Virtual metrology for plasma etch processes.
Plasma processes can present dicult control challenges due to time-varying dynamics
and a lack of relevant and/or regular measurements. Virtual metrology (VM) is the
use of mathematical models with accessible measurements from an operating process to
estimate variables of interest. This thesis addresses the challenge of virtual metrology
for plasma processes, with a particular focus on semiconductor plasma etch.
Introductory material covering the essentials of plasma physics, plasma etching, plasma
measurement techniques, and black-box modelling techniques is rst presented for readers
not familiar with these subjects. A comprehensive literature review is then completed
to detail the state of the art in modelling and VM research for plasma etch processes.
To demonstrate the versatility of VM, a temperature monitoring system utilising a
state-space model and Luenberger observer is designed for the variable specic impulse
magnetoplasma rocket (VASIMR) engine, a plasma-based space propulsion system. The
temperature monitoring system uses optical emission spectroscopy (OES) measurements
from the VASIMR engine plasma to correct temperature estimates in the presence of
modelling error and inaccurate initial conditions. Temperature estimates within 2% of
the real values are achieved using this scheme.
An extensive examination of the implementation of a wafer-to-wafer VM scheme to estimate
plasma etch rate for an industrial plasma etch process is presented. The VM
models estimate etch rate using measurements from the processing tool and a plasma
impedance monitor (PIM). A selection of modelling techniques are considered for VM
modelling, and Gaussian process regression (GPR) is applied for the rst time for VM
of plasma etch rate. Models with global and local scope are compared, and modelling
schemes that attempt to cater for the etch process dynamics are proposed. GPR-based
windowed models produce the most accurate estimates, achieving mean absolute percentage
errors (MAPEs) of approximately 1:15%. The consistency of the results presented
suggests that this level of accuracy represents the best accuracy achievable for
the plasma etch system at the current frequency of metrology.
Finally, a real-time VM and model predictive control (MPC) scheme for control of
plasma electron density in an industrial etch chamber is designed and tested. The VM
scheme uses PIM measurements to estimate electron density in real time. A predictive
functional control (PFC) scheme is implemented to cater for a time delay in the VM
system. The controller achieves time constants of less than one second, no overshoot,
and excellent disturbance rejection properties. The PFC scheme is further expanded by
adapting the internal model in the controller in real time in response to changes in the
process operating point
Virtual metrology for plasma etch processes.
Plasma processes can present dicult control challenges due to time-varying dynamics
and a lack of relevant and/or regular measurements. Virtual metrology (VM) is the
use of mathematical models with accessible measurements from an operating process to
estimate variables of interest. This thesis addresses the challenge of virtual metrology
for plasma processes, with a particular focus on semiconductor plasma etch.
Introductory material covering the essentials of plasma physics, plasma etching, plasma
measurement techniques, and black-box modelling techniques is rst presented for readers
not familiar with these subjects. A comprehensive literature review is then completed
to detail the state of the art in modelling and VM research for plasma etch processes.
To demonstrate the versatility of VM, a temperature monitoring system utilising a
state-space model and Luenberger observer is designed for the variable specic impulse
magnetoplasma rocket (VASIMR) engine, a plasma-based space propulsion system. The
temperature monitoring system uses optical emission spectroscopy (OES) measurements
from the VASIMR engine plasma to correct temperature estimates in the presence of
modelling error and inaccurate initial conditions. Temperature estimates within 2% of
the real values are achieved using this scheme.
An extensive examination of the implementation of a wafer-to-wafer VM scheme to estimate
plasma etch rate for an industrial plasma etch process is presented. The VM
models estimate etch rate using measurements from the processing tool and a plasma
impedance monitor (PIM). A selection of modelling techniques are considered for VM
modelling, and Gaussian process regression (GPR) is applied for the rst time for VM
of plasma etch rate. Models with global and local scope are compared, and modelling
schemes that attempt to cater for the etch process dynamics are proposed. GPR-based
windowed models produce the most accurate estimates, achieving mean absolute percentage
errors (MAPEs) of approximately 1:15%. The consistency of the results presented
suggests that this level of accuracy represents the best accuracy achievable for
the plasma etch system at the current frequency of metrology.
Finally, a real-time VM and model predictive control (MPC) scheme for control of
plasma electron density in an industrial etch chamber is designed and tested. The VM
scheme uses PIM measurements to estimate electron density in real time. A predictive
functional control (PFC) scheme is implemented to cater for a time delay in the VM
system. The controller achieves time constants of less than one second, no overshoot,
and excellent disturbance rejection properties. The PFC scheme is further expanded by
adapting the internal model in the controller in real time in response to changes in the
process operating point
Audition, learning and experience: expertise through development
Our experience with the auditory world can shape and modify perceptual,
cognitive and neural processes with respect to audition. Such experience can
occur over multiple timescales, and can vary in its specificity and intensity. In
order to understand how auditory perceptual, cognitive and neural processes
develop, it is important to explore the different means through which experience
can influence audition. This thesis aims to address these issues. Using an
expertise framework, we explore how the auditory environment and ontogenetic
factors can shape and guide perceptual, cognitive and neural processes
through long- and short-term profiles of experience. In early chapters, we use
expertly-trained musicians as a model for long-term experience accrued under
specific auditory constraints. We find that expertise on a particular instrument
(violin versus piano) yields training-specific auditory perceptual advantages in a
musical context, as well as improvements to ‘low-level’ auditory acuity (versus
non-musicians); yet we find limited generalisation of expertise to cognitive tasks
that require some of the skills that musicians hone. In a subsequent chapter, we
find that expert violinists (versus non-musicians) show subtle increases in
quantitative MR proxies for cortical myelin at left auditory core. In latter
chapters, we explore short-term sound learning. We ask whether listeners can
learn combinations of auditory cues within an active visuo-spatial task, and
whether development can mediate learning of auditory cue combinations or
costs due to cue contingency violations. We show that auditory cue
combinations can be learned within periods of minutes. However, we find wide
variation in cue learning success across all experiments, with no differences in
overall cue combination learning between children and adults. These
experiments help to further understanding of auditory expertise, learning,
development and plasticity, within an experience-based framework
GPU-accelerated 3D visualisation and analysis of migratory behaviour of long lived birds
With the amount of data we collect increasing, due to the efficacy of tagging technology improving, the methods we previously applied have begun to take longer and longer to process. As we move forward, it is important that the methods we develop also evolve with the data we collect. Maritime visualisation has already begun to leverage the power of parallel processing to accelerate visualisation. However, some of these techniques require the use of distributed computing, that while useful for datasets that contain billions of points, is harder to implement due to hardware requirements. Here we show that movement ecology can also significantly benefit from the use of parallel processing, while using GPGPU acceleration to enable the use of a single workstation. With only minor adjustments, algorithms can be implemented in parallel, enabling for computation to be completed in real time.
We show this by first implementing a GPGPU accelerated visualisation of global environmental datasets. Through the use of OpenGL and CUDA, it is possible to visualise a dataset containing over 25 million datapoints per timestamp and swap between timestamps in 5ms, allowing for environmental context to be considered when visualising trajectories in real time. These can then be used alongside different GPU accelerated visualisation methods, such as aggregate flow diagrams, to explore large datasets in real time. We also continue to apply GPGPU acceleration to the analysis of migratory data through the use of parallel primitives. With these parallel primitives we show that GPGPU acceleration can allow researchers to accelerate their workflow without the need to completely understand the complexities of GPU programming, allowing for orders of magnitude faster computation times when compared to sequential CPU methods
Multivariate Analysis of Tumour Gene Expression Profiles Applying Regularisation and Bayesian Variable Selection Techniques
High-throughput microarray technology is here to stay, e.g. in oncology for tumour classification
and gene expression profiling to predict cancer pathology and clinical outcome. The global
objective of this thesis is to investigate multivariate methods that are suitable for this task.
After introducing the problem and the biological background, an overview of multivariate
regularisation methods is given in Chapter 3 and the binary classification problem is outlined
(Chapter 4). The focus of applications presented in Chapters 5 to 7 is on sparse binary classifiers
that are both parsimonious and interpretable. Particular emphasis is on sparse penalised
likelihood and Bayesian variable selection models, all in the context of logistic regression. The
thesis concludes with a final discussion chapter.
The variable selection problem is particularly challenging here, since the number of variables
is much larger than the sample size, which results in an ill-conditioned problem with
many equally good solutions. Thus, one open problem is the stability of gene expression profiles.
In a resampling study, various characteristics including stability are compared between a
variety of classifiers applied to five gene expression data sets and validated on two independent
data sets.
Bayesian variable selection provides an alternative to resampling for estimating the uncertainty
in the selection of genes. MCMC methods are used for model space exploration, but
because of the high dimensionality standard algorithms are computationally expensive and/or
result in poor Markov chain mixing. A novel MCMC algorithm is presented that uses the
dependence structure between input variables for finding blocks of variables to be updated together.
This drastically improves mixing while keeping the computational burden acceptable.
Several algorithms are compared in a simulation study. In an ovarian cancer application in
Chapter 7, the best-performing MCMC algorithms are combined with parallel tempering and
compared with an alternative method
Multi-scale data storage schemes for spatial information systems
This thesis documents a research project that has led to the design and prototype
implementation of several data storage schemes suited to the efficient multi-scale
representation of integrated spatial data. Spatial information systems will benefit from
having data models which allow for data to be viewed and analysed at various levels
of detail, while the integration of data from different sources will lead to a more
accurate representation of reality.
The work has addressed two specific problems. The first concerns the design of an
integrated multi-scale data model suited for use within Geographical Information
Systems. This has led to the development of two data models, each of which allow for
the integration of terrain data and topographic data at multiple levels of detail. The
models are based on a combination of adapted versions of three previous data
structures, namely, the constrained Delaunay pyramid, the line generalisation tree and
the fixed grid.
The second specific problem addressed in this thesis has been the development of an
integrated multi-scale 3-D geological data model, for use within a Geoscientific
Information System. This has resulted in a data storage scheme which enables the
integration of terrain data, geological outcrop data and borehole data at various levels
of detail.
The thesis also presents details of prototype database implementations of each of the
new data storage schemes. These implementations have served to demonstrate the
feasibility and benefits of an integrated multi-scale approach.
The research has also brought to light some areas that will need further research before
fully functional systems are produced. The final chapter contains, in addition to
conclusions made as a result of the research to date, a summary of some of these areas
that require future work
- …