20,967 research outputs found
Multi-view machine learning methods to uncover brain-behaviour associations
The heterogeneity of neurological and mental disorders has been a key confound in disease understanding and treatment outcome prediction, as the study of patient populations typically includes multiple subgroups that do not align with the diagnostic categories. The aim of this thesis is to investigate and extend classical multivariate methods, such as Canonical Correlation Analysis (CCA), and latent variable models, e.g., Group Factor Analysis (GFA), to uncover associations between brain and behaviour that may characterize patient populations and subgroups of patients. In the first contribution of this thesis, we applied CCA to investigate brain-behaviour associations in a sample of healthy and depressed adolescents and young adults. We found two positive-negative brain-behaviour modes of covariation, capturing externalisation/ internalisation symptoms and well-being/distress. In the second contribution of the thesis, I applied sparse CCA to the same dataset to present a regularised approach to investigate brain-behaviour associations in high dimensional datasets. Here, I compared two approaches to optimise the regularisation parameters of sparse CCA and showed that the choice of the optimisation strategy might have an impact on the results. In the third contribution, I extended the GFA model to mitigate some limitations of CCA, such as handling missing data. I applied the extended GFA model to investigate links between high dimensional brain imaging and non-imaging data from the Human Connectome Project, and predict non-imaging measures from brain functional connectivity. The results were consistent between complete and incomplete data, and replicated previously reported findings. In the final contribution of this thesis, I proposed two extensions of GFA to uncover brain behaviour associations that characterize subgroups of subjects in an unsupervised and supervised way, as well as explore within-group variability at the individual level. These extensions were demonstrated using a dataset of patients with genetic frontotemporal dementia. In summary, this thesis presents multi-view methods that can be used to deepen our understanding about the latent dimensions of disease in mental/neurological disorders and potentially enable patient stratification
Unsupervised discovery of temporal sequences in high-dimensional datasets, with applications to neuroscience.
Identifying low-dimensional features that describe large-scale neural recordings is a major challenge in neuroscience. Repeated temporal patterns (sequences) are thought to be a salient feature of neural dynamics, but are not succinctly captured by traditional dimensionality reduction techniques. Here, we describe a software toolbox-called seqNMF-with new methods for extracting informative, non-redundant, sequences from high-dimensional neural data, testing the significance of these extracted patterns, and assessing the prevalence of sequential structure in data. We test these methods on simulated data under multiple noise conditions, and on several real neural and behavioral datas. In hippocampal data, seqNMF identifies neural sequences that match those calculated manually by reference to behavioral events. In songbird data, seqNMF discovers neural sequences in untutored birds that lack stereotyped songs. Thus, by identifying temporal structure directly from neural data, seqNMF enables dissection of complex neural circuits without relying on temporal references from stimuli or behavioral outputs
A Quadratically Regularized Functional Canonical Correlation Analysis for Identifying the Global Structure of Pleiotropy with NGS Data
Investigating the pleiotropic effects of genetic variants can increase
statistical power, provide important information to achieve deep understanding
of the complex genetic structures of disease, and offer powerful tools for
designing effective treatments with fewer side effects. However, the current
multiple phenotype association analysis paradigm lacks breadth (number of
phenotypes and genetic variants jointly analyzed at the same time) and depth
(hierarchical structure of phenotype and genotypes). A key issue for high
dimensional pleiotropic analysis is to effectively extract informative internal
representation and features from high dimensional genotype and phenotype data.
To explore multiple levels of representations of genetic variants, learn their
internal patterns involved in the disease development, and overcome critical
barriers in advancing the development of novel statistical methods and
computational algorithms for genetic pleiotropic analysis, we proposed a new
framework referred to as a quadratically regularized functional CCA (QRFCCA)
for association analysis which combines three approaches: (1) quadratically
regularized matrix factorization, (2) functional data analysis and (3)
canonical correlation analysis (CCA). Large-scale simulations show that the
QRFCCA has a much higher power than that of the nine competing statistics while
retaining the appropriate type 1 errors. To further evaluate performance, the
QRFCCA and nine other statistics are applied to the whole genome sequencing
dataset from the TwinsUK study. We identify a total of 79 genes with rare
variants and 67 genes with common variants significantly associated with the 46
traits using QRFCCA. The results show that the QRFCCA substantially outperforms
the nine other statistics.Comment: 64 pages including 12 figure
Recommended from our members
Unravelling the dynamics of learning design within and between disciplines in higher education using learning analytics
Designing effective learning experience in virtual learning environment (VLE) can be supported by learning analytics (LA) through explicit feedback on how learning design (LD) influences students’ engagement, satisfaction and performance. Marrying LA with LD not only puts existing pedagogical theories in instructional design to the test with actual learning data, but also provides the context of learning which helps educators translate established LA findings to direct interventions. My dissertation aims at unpacking the complexity of LD and its impact on students’ engagement, satisfaction and performance on VLE using LA. The context of this study is 400+ online and blended learning modules at the Open University (OU) UK. This research combines multiple sources of data from the OU Learning Design Initiative (OULDI), system log data, self-reported surveys, and performance data. Given the scope of this study, a wide range of visualization techniques, social network analysis, multi-level modelling, and machine learning will be used
- …