10,151 research outputs found

    Assessing performance of artificial neural networks and re-sampling techniques for healthcare datasets.

    Get PDF
    Re-sampling methods to solve class imbalance problems have shown to improve classification accuracy by mitigating the bias introduced by differences in class size. However, it is possible that a model which uses a specific re-sampling technique prior to Artificial neural networks (ANN) training may not be suitable for aid in classifying varied datasets from the healthcare industry. Five healthcare-related datasets were used across three re-sampling conditions: under-sampling, over-sampling and combi-sampling. Within each condition, different algorithmic approaches were applied to the dataset and the results were statistically analysed for a significant difference in ANN performance. The combi-sampling condition showed that four out of the five datasets did not show significant consistency for the optimal re-sampling technique between the f1-score and Area Under the Receiver Operating Characteristic Curve performance evaluation methods. Contrarily, the over-sampling and under-sampling condition showed all five datasets put forward the same optimal algorithmic approach across performance evaluation methods. Furthermore, the optimal combi-sampling technique (under-, over-sampling and convergence point), were found to be consistent across evaluation measures in only two of the five datasets. This study exemplifies how discrete ANN performances on datasets from the same industry can occur in two ways: how the same re-sampling technique can generate varying ANN performance on different datasets, and how different re-sampling techniques can generate varying ANN performance on the same dataset

    Identifying and responding to people with mild learning disabilities in the probation service

    Get PDF
    It has long been recognised that, like many other individuals, people with learningdisabilities find their way into the criminal justice system. This fact is not disputed. Whathas been disputed, however, is the extent to which those with learning disabilities arerepresented within the various agencies of the criminal justice system and the ways inwhich the criminal justice system (and society) should address this. Recently, social andlegislative confusion over the best way to deal with offenders with learning disabilities andmental health problems has meant that the waters have become even more muddied.Despite current government uncertainty concerning the best way to support offenders withlearning disabilities, the probation service is likely to continue to play a key role in thesupervision of such offenders. The three studies contained herein aim to clarify the extentto which those with learning disabilities are represented in the probation service, toexamine the effectiveness of probation for them and to explore some of the ways in whichprobation could be adapted to fit their needs.Study 1 and study 2 showed that around 10% of offenders on probation in Kent appearedto have an IQ below 75, putting them in the bottom 5% of the general population. Study 3was designed to assess some of the support needs of those with learning disabilities in theprobation service, finding that many of the materials used by the probation service arelikely to be too complex for those with learning disabilities to use effectively. To addressthis, a model for service provision is tentatively suggested. This is based on the findings ofthe three studies and a pragmatic assessment of what the probation service is likely to becapable of achieving in the near future

    Machine learning and large scale cancer omic data: decoding the biological mechanisms underpinning cancer

    Get PDF
    Many of the mechanisms underpinning cancer risk and tumorigenesis are still not fully understood. However, the next-generation sequencing revolution and the rapid advances in big data analytics allow us to study cells and complex phenotypes at unprecedented depth and breadth. While experimental and clinical data are still fundamental to validate findings and confirm hypotheses, computational biology is key for the analysis of system- and population-level data for detection of hidden patterns and the generation of testable hypotheses. In this work, I tackle two main questions regarding cancer risk and tumorigenesis that require novel computational methods for the analysis of system-level omic data. First, I focused on how frequent, low-penetrance inherited variants modulate cancer risk in the broader population. Genome-Wide Association Studies (GWAS) have shown that Single Nucleotide Polymorphisms (SNP) contribute to cancer risk with multiple subtle effects, but they are still failing to give further insight into their synergistic effects. I developed a novel hierarchical Bayesian regression model, BAGHERA, to estimate heritability at the gene-level from GWAS summary statistics. I then used BAGHERA to analyse data from 38 malignancies in the UK Biobank. I showed that genes with high heritable risk are involved in key processes associated with cancer and are often localised in genes that are somatically mutated drivers. Heritability, like many other omics analysis methods, study the effects of DNA variants on single genes in isolation. However, we know that most biological processes require the interplay of multiple genes and we often lack a broad perspective on them. For the second part of this thesis, I then worked on the integration of Protein-Protein Interaction (PPI) graphs and omics data, which bridges this gap and recapitulates these interactions at a system level. First, I developed a modular and scalable Python package, PyGNA, that enables robust statistical testing of genesets' topological properties. PyGNA complements the literature with a tool that can be routinely introduced in bioinformatics automated pipelines. With PyGNA I processed multiple genesets obtained from genomics and transcriptomics data. However, topological properties alone have proven to be insufficient to fully characterise complex phenotypes. Therefore, I focused on a model that allows to combine topological and functional data to detect multiple communities associated with a phenotype. Detecting cancer-specific submodules is still an open problem, but it has the potential to elucidate mechanisms detectable only by integrating multi-omics data. Building on the recent advances in Graph Neural Networks (GNN), I present a supervised geometric deep learning model that combines GNNs and Stochastic Block Models (SBM). The model is able to learn multiple graph-aware representations, as multiple joint SBMs, of the attributed network, accounting for nodes participating in multiple processes. The simultaneous estimation of structure and function provides an interpretable picture of how genes interact in specific conditions and it allows to detect novel putative pathways associated with cancer

    Scalable software and models for large-scale extracellular recordings

    Get PDF
    The brain represents information about the world through the electrical activity of populations of neurons. By placing an electrode near a neuron that is firing (spiking), it is possible to detect the resulting extracellular action potential (EAP) that is transmitted down an axon to other neurons. In this way, it is possible to monitor the communication of a group of neurons to uncover how they encode and transmit information. As the number of recorded neurons continues to increase, however, so do the data processing and analysis challenges. It is crucial that scalable software and analysis tools are developed and made available to the neuroscience community to keep up with the large amounts of data that are already being gathered. This thesis is composed of three pieces of work which I develop in order to better process and analyze large-scale extracellular recordings. My work spans all stages of extracellular analysis from the processing of raw electrical recordings to the development of statistical models to reveal underlying structure in neural population activity. In the first work, I focus on developing software to improve the comparison and adoption of different computational approaches for spike sorting. When analyzing neural recordings, most researchers are interested in the spiking activity of individual neurons, which must be extracted from the raw electrical traces through a process called spike sorting. Much development has been directed towards improving the performance and automation of spike sorting. This continuous development, while essential, has contributed to an over-saturation of new, incompatible tools that hinders rigorous benchmarking and complicates reproducible analysis. To address these limitations, I develop SpikeInterface, an open-source, Python framework designed to unify preexisting spike sorting technologies into a single toolkit and to facilitate straightforward benchmarking of different approaches. With this framework, I demonstrate that modern, automated spike sorters have low agreement when analyzing the same dataset, i.e. they find different numbers of neurons with different activity profiles; This result holds true for a variety of simulated and real datasets. Also, I demonstrate that utilizing a consensus-based approach to spike sorting, where the outputs of multiple spike sorters are combined, can dramatically reduce the number of falsely detected neurons. In the second work, I focus on developing an unsupervised machine learning approach for determining the source location of individually detected spikes that are recorded by high-density, microelectrode arrays. By localizing the source of individual spikes, my method is able to determine the approximate position of the recorded neuriii ons in relation to the microelectrode array. To allow my model to work with large-scale datasets, I utilize deep neural networks, a family of machine learning algorithms that can be trained to approximate complicated functions in a scalable fashion. I evaluate my method on both simulated and real extracellular datasets, demonstrating that it is more accurate than other commonly used methods. Also, I show that location estimates for individual spikes can be utilized to improve the efficiency and accuracy of spike sorting. After training, my method allows for localization of one million spikes in approximately 37 seconds on a TITAN X GPU, enabling real-time analysis of massive extracellular datasets. In my third and final presented work, I focus on developing an unsupervised machine learning model that can uncover patterns of activity from neural populations associated with a behaviour being performed. Specifically, I introduce Targeted Neural Dynamical Modelling (TNDM), a statistical model that jointly models the neural activity and any external behavioural variables. TNDM decomposes neural dynamics (i.e. temporal activity patterns) into behaviourally relevant and behaviourally irrelevant dynamics; the behaviourally relevant dynamics constitute all activity patterns required to generate the behaviour of interest while behaviourally irrelevant dynamics may be completely unrelated (e.g. other behavioural or brain states), or even related to behaviour execution (e.g. dynamics that are associated with behaviour generally but are not task specific). Again, I implement TNDM using a deep neural network to improve its scalability and expressivity. On synthetic data and on real recordings from the premotor (PMd) and primary motor cortex (M1) of a monkey performing a center-out reaching task, I show that TNDM is able to extract low-dimensional neural dynamics that are highly predictive of behaviour without sacrificing its fit to the neural data

    The effect of different scripting methods on the process and outcomes of game-based collaborative language learning

    Get PDF
    Abstract. There has been growing interest in game-based language learning but instead of communicative skills, it mainly targeted vocabulary and grammar which are superficial linguistic skills. On the other hand, collaboration has also been seen as pedagogically beneficial though there is still a question as to what extent teacher’s support in the form of scripting is optimal. Addressing the gaps, this mainly quasi-experimental study was implemented in an English as a second language lesson to examine whether or not role assigning (microscripting) in a game-based collaboration would yield superior results than the condition without such method (macroscripting). To be specific, a narratively rich role-playing game (RPG) was utilized in the game-based learning phase due to its compatibility for language learning, proceeded by literature circle collaboration which had been renowned for its capability to foster not only reading skill but also the affective dimension of learning. Inferential statistics showed that groups treated with microscripting achieved superior reading comprehension, collaborative learning interest and empathy scores. Meanwhile, content analysis revealed that groups assisted by macroscripting could reach higher levels of knowledge construction in their collaboration. Findings, discussion and conclusion in this study have extended the field of game-based collaborative language learning and brought implications for similar future research
    • …
    corecore