10,151 research outputs found
Recommended from our members
Ensuring Access to Safe and Nutritious Food for All Through the Transformation of Food Systems
Assessing performance of artificial neural networks and re-sampling techniques for healthcare datasets.
Re-sampling methods to solve class imbalance problems have shown to improve classification accuracy by mitigating the bias introduced by differences in class size. However, it is possible that a model which uses a specific re-sampling technique prior to Artificial neural networks (ANN) training may not be suitable for aid in classifying varied datasets from the healthcare industry. Five healthcare-related datasets were used across three re-sampling conditions: under-sampling, over-sampling and combi-sampling. Within each condition, different algorithmic approaches were applied to the dataset and the results were statistically analysed for a significant difference in ANN performance. The combi-sampling condition showed that four out of the five datasets did not show significant consistency for the optimal re-sampling technique between the f1-score and Area Under the Receiver Operating Characteristic Curve performance evaluation methods. Contrarily, the over-sampling and under-sampling condition showed all five datasets put forward the same optimal algorithmic approach across performance evaluation methods. Furthermore, the optimal combi-sampling technique (under-, over-sampling and convergence point), were found to be consistent across evaluation measures in only two of the five datasets. This study exemplifies how discrete ANN performances on datasets from the same industry can occur in two ways: how the same re-sampling technique can generate varying ANN performance on different datasets, and how different re-sampling techniques can generate varying ANN performance on the same dataset
Identifying and responding to people with mild learning disabilities in the probation service
It has long been recognised that, like many other individuals, people with learningdisabilities find their way into the criminal justice system. This fact is not disputed. Whathas been disputed, however, is the extent to which those with learning disabilities arerepresented within the various agencies of the criminal justice system and the ways inwhich the criminal justice system (and society) should address this. Recently, social andlegislative confusion over the best way to deal with offenders with learning disabilities andmental health problems has meant that the waters have become even more muddied.Despite current government uncertainty concerning the best way to support offenders withlearning disabilities, the probation service is likely to continue to play a key role in thesupervision of such offenders. The three studies contained herein aim to clarify the extentto which those with learning disabilities are represented in the probation service, toexamine the effectiveness of probation for them and to explore some of the ways in whichprobation could be adapted to fit their needs.Study 1 and study 2 showed that around 10% of offenders on probation in Kent appearedto have an IQ below 75, putting them in the bottom 5% of the general population. Study 3was designed to assess some of the support needs of those with learning disabilities in theprobation service, finding that many of the materials used by the probation service arelikely to be too complex for those with learning disabilities to use effectively. To addressthis, a model for service provision is tentatively suggested. This is based on the findings ofthe three studies and a pragmatic assessment of what the probation service is likely to becapable of achieving in the near future
Recommended from our members
The Epidemiology and Genetic Architecture of Vitamin D Deficiency in African Children
Vitamin D deficiency is a common public health problem worldwide. However, little is known about the epidemiology of vitamin D deficiency in Africa. In this thesis, I aimed to determine: 1) the prevalence of and risk factors associated with vitamin D deficiency in studies conducted in Africa; 2) the prevalence and predictors of vitamin D deficiency in African children; 3) the association between vitamin D and iron deficiency in African children; and 4) genetic variants that influence vitamin D status in Africans.
In a systematic review and meta-analyses of previous vitamin D studies in Africa, the average prevalence of low vitamin D status was 18.5%, 34.2% and 59.5% using cut-offs of 25-hydroxyvitamin D (25(OH)D) levels of <30 nmol/L, <50 nmol/L and <75 nmol/L, respectively. Populations at risk of vitamin D deficiency included newborns, women, and people living in high latitudes or urban areas.
In an epidemiological study of young children living in Africa, the prevalence of low vitamin D status was 0.6%, 7.8% and 44.5% using cut-offs of 25(OH)D levels of GC2 variant of the group-specific component (GC) gene, which encodes vitamin D binding protein.
Vitamin D deficiency was also associated with 80% higher odds of iron deficiency in these children. Adjusted regression models revealed that vitamin D deficiency was associated with higher ferritin and hepcidin levels suggesting lower iron status, and reduced sTfR and transferrin levels and increased TSAT and serum iron levels suggesting improved iron status.
Genome-wide association study (GWAS) in Africans revealed genetic variants that influence vitamin D status in vitamin D metabolism genes: DHCR7/NADSYN1, CYP2R1 and GC. However, the majority of SNPs from previous European GWASs did not replicate in the current GWAS.
Findings from this thesis indicate that vitamin D deficiency is prevalent in many African populations and should be considered in public health strategies in Africa
Recommended from our members
Privacy-aware Smart Home Interface Framework
Smart home user interfaces are pervasive and shared by multiple users who occupy the space. Therefore, they pose a risk to interpersonal privacy of occupants because an individual’s sensitive information can be leaked to other co-occupants (information privacy), or they can be disturbed by intrusions into their personal space (physical privacy) when the co-occupant interacts with the smart home user interfaces. This thesis hypothesises that interpersonal privacy violations can be mitigated by adapting the user interface layer and presents insights into how to achieve usable user interface adaptation to mitigate or minimise interpersonal privacy violations in smart homes.
The thesis reports two case studies and two user studies. The first case study identifies the key characteristics needed to model the rich context of interpersonal privacy violations scenarios. Then it presents knowledge representation models that are required to represent the identified characteristics and evaluates them for adequacy in modelling the context information of interpersonal privacy violation scenarios. The second case study presents a software architecture and a set of algorithms that can detect interpersonal privacy violations and generate usable user interface adaptations. Then it evaluates the architecture and the algorithms for adequacy in generating usable privacy-aware user interface adaptations. The first user study (N=15) evaluates the usability of the adaptive user interfaces generated from the framework where storyboards were used as the stimulant. Extending the findings from the usability study and expanding the coverage of example scenarios, the second user study (N=23) evaluates the overall user experience of the adaptive user interfaces, using video prototypes as the stimulant.
The research demonstrates that the characteristics identified, and the respective knowledge representation models adequately captured the context of interpersonal privacy violation scenarios. Furthermore, the software architecture and the algorithms could detect possible interpersonal privacy violations and generate usable user interface adaptations to mitigate them. The two user studies demonstrate that the adaptive user interfaces, when used in appropriate situations, were a suitable solution for addressing interpersonal privacy violations while providing high usability and a positive user experience. The thesis concludes by providing recommendations for developing privacy-aware user interface adaptations and suggesting future work that can extend this research
Machine learning and large scale cancer omic data: decoding the biological mechanisms underpinning cancer
Many of the mechanisms underpinning cancer risk and tumorigenesis are still not
fully understood. However, the next-generation sequencing revolution and the
rapid advances in big data analytics allow us to study cells
and complex phenotypes at unprecedented depth and breadth. While experimental
and clinical data are still fundamental to validate findings and confirm
hypotheses, computational biology is key for the analysis of system- and
population-level data for detection of hidden patterns and the generation of
testable hypotheses.
In this work, I tackle two main questions regarding cancer risk and tumorigenesis
that require novel computational methods for the analysis of system-level omic
data. First, I focused on how frequent, low-penetrance inherited variants modulate
cancer risk in the broader population. Genome-Wide Association Studies (GWAS)
have shown that Single Nucleotide Polymorphisms (SNP) contribute to cancer risk
with multiple subtle effects, but they are still failing to give further insight
into their synergistic effects. I developed a novel hierarchical Bayesian
regression model, BAGHERA, to estimate heritability at the gene-level from GWAS
summary statistics. I then used BAGHERA to analyse data from 38 malignancies in
the UK Biobank. I showed that genes with high heritable risk are involved in key
processes associated with cancer and are often localised in genes that are
somatically mutated drivers.
Heritability, like many other omics analysis methods, study the effects of DNA
variants on single genes in isolation. However, we know that most biological
processes require the interplay of multiple genes and we often lack a broad
perspective on them. For the second part of this thesis, I then worked on the
integration of Protein-Protein Interaction (PPI) graphs and omics data, which
bridges this gap and recapitulates these interactions at a system level. First,
I developed a modular and scalable Python package, PyGNA, that enables
robust statistical testing of genesets' topological properties. PyGNA complements
the literature with a tool that can be routinely introduced in bioinformatics
automated pipelines. With PyGNA I processed multiple genesets obtained from
genomics and transcriptomics data. However, topological properties alone have
proven to be insufficient to fully characterise complex phenotypes.
Therefore, I focused on a model that allows to combine topological and functional
data to detect multiple communities associated with a phenotype. Detecting
cancer-specific submodules is still an open problem, but it has the potential to
elucidate mechanisms detectable only by integrating multi-omics data. Building
on the recent advances in Graph Neural Networks (GNN), I present a supervised
geometric deep learning model that combines GNNs and Stochastic Block Models
(SBM). The model is able to learn multiple graph-aware representations, as
multiple joint SBMs, of the attributed network, accounting for nodes
participating in multiple processes. The simultaneous estimation of structure
and function provides an interpretable picture of how genes interact in specific
conditions and it allows to detect novel putative pathways associated with
cancer
Scalable software and models for large-scale extracellular recordings
The brain represents information about the world through the electrical activity of
populations of neurons. By placing an electrode near a neuron that is firing (spiking), it
is possible to detect the resulting extracellular action potential (EAP) that is transmitted
down an axon to other neurons. In this way, it is possible to monitor the communication
of a group of neurons to uncover how they encode and transmit information. As the
number of recorded neurons continues to increase, however, so do the data processing
and analysis challenges. It is crucial that scalable software and analysis tools are developed
and made available to the neuroscience community to keep up with the large
amounts of data that are already being gathered.
This thesis is composed of three pieces of work which I develop in order to better
process and analyze large-scale extracellular recordings. My work spans all stages of extracellular
analysis from the processing of raw electrical recordings to the development
of statistical models to reveal underlying structure in neural population activity.
In the first work, I focus on developing software to improve the comparison and adoption
of different computational approaches for spike sorting. When analyzing neural
recordings, most researchers are interested in the spiking activity of individual neurons,
which must be extracted from the raw electrical traces through a process called
spike sorting. Much development has been directed towards improving the performance
and automation of spike sorting. This continuous development, while essential,
has contributed to an over-saturation of new, incompatible tools that hinders rigorous
benchmarking and complicates reproducible analysis. To address these limitations, I
develop SpikeInterface, an open-source, Python framework designed to unify preexisting
spike sorting technologies into a single toolkit and to facilitate straightforward
benchmarking of different approaches. With this framework, I demonstrate that modern,
automated spike sorters have low agreement when analyzing the same dataset, i.e.
they find different numbers of neurons with different activity profiles; This result holds
true for a variety of simulated and real datasets. Also, I demonstrate that utilizing a
consensus-based approach to spike sorting, where the outputs of multiple spike sorters
are combined, can dramatically reduce the number of falsely detected neurons.
In the second work, I focus on developing an unsupervised machine learning approach
for determining the source location of individually detected spikes that are
recorded by high-density, microelectrode arrays. By localizing the source of individual
spikes, my method is able to determine the approximate position of the recorded neuriii
ons in relation to the microelectrode array. To allow my model to work with large-scale
datasets, I utilize deep neural networks, a family of machine learning algorithms that
can be trained to approximate complicated functions in a scalable fashion. I evaluate
my method on both simulated and real extracellular datasets, demonstrating that it is
more accurate than other commonly used methods. Also, I show that location estimates
for individual spikes can be utilized to improve the efficiency and accuracy of spike
sorting. After training, my method allows for localization of one million spikes in approximately
37 seconds on a TITAN X GPU, enabling real-time analysis of massive
extracellular datasets.
In my third and final presented work, I focus on developing an unsupervised machine
learning model that can uncover patterns of activity from neural populations
associated with a behaviour being performed. Specifically, I introduce Targeted Neural
Dynamical Modelling (TNDM), a statistical model that jointly models the neural activity
and any external behavioural variables. TNDM decomposes neural dynamics (i.e.
temporal activity patterns) into behaviourally relevant and behaviourally irrelevant dynamics;
the behaviourally relevant dynamics constitute all activity patterns required
to generate the behaviour of interest while behaviourally irrelevant dynamics may be
completely unrelated (e.g. other behavioural or brain states), or even related to behaviour
execution (e.g. dynamics that are associated with behaviour generally but are not
task specific). Again, I implement TNDM using a deep neural network to improve its
scalability and expressivity. On synthetic data and on real recordings from the premotor
(PMd) and primary motor cortex (M1) of a monkey performing a center-out reaching
task, I show that TNDM is able to extract low-dimensional neural dynamics that are
highly predictive of behaviour without sacrificing its fit to the neural data
The effect of different scripting methods on the process and outcomes of game-based collaborative language learning
Abstract. There has been growing interest in game-based language learning but instead of communicative skills, it mainly targeted vocabulary and grammar which are superficial linguistic skills. On the other hand, collaboration has also been seen as pedagogically beneficial though there is still a question as to what extent teacher’s support in the form of scripting is optimal. Addressing the gaps, this mainly quasi-experimental study was implemented in an English as a second language lesson to examine whether or not role assigning (microscripting) in a game-based collaboration would yield superior results than the condition without such method (macroscripting). To be specific, a narratively rich role-playing game (RPG) was utilized in the game-based learning phase due to its compatibility for language learning, proceeded by literature circle collaboration which had been renowned for its capability to foster not only reading skill but also the affective dimension of learning. Inferential statistics showed that groups treated with microscripting achieved superior reading comprehension, collaborative learning interest and empathy scores. Meanwhile, content analysis revealed that groups assisted by macroscripting could reach higher levels of knowledge construction in their collaboration. Findings, discussion and conclusion in this study have extended the field of game-based collaborative language learning and brought implications for similar future research
- …