3,113 research outputs found
Removing the influence of a group variable in high-dimensional predictive modelling
In many application areas, predictive models are used to support or make
important decisions. There is increasing awareness that these models may
contain spurious or otherwise undesirable correlations. Such correlations may
arise from a variety of sources, including batch effects, systematic
measurement errors, or sampling bias. Without explicit adjustment, machine
learning algorithms trained using these data can produce poor out-of-sample
predictions which propagate these undesirable correlations. We propose a method
to pre-process the training data, producing an adjusted dataset that is
statistically independent of the nuisance variables with minimum information
loss. We develop a conceptually simple approach for creating an adjusted
dataset in high-dimensional settings based on a constrained form of matrix
decomposition. The resulting dataset can then be used in any predictive
algorithm with the guarantee that predictions will be statistically independent
of the group variable. We develop a scalable algorithm for implementing the
method, along with theory support in the form of independence guarantees and
optimality. The method is illustrated on some simulation examples and applied
to two case studies: removing machine-specific correlations from brain scan
data, and removing race and ethnicity information from a dataset used to
predict recidivism. That the motivation for removing undesirable correlations
is quite different in the two applications illustrates the broad applicability
of our approach.Comment: Update. 18 pages, 3 figure
Robust correlation analyses: false positive and power validation using a new open source Matlab toolbox
Pearson’s correlation measures the strength of the association between two variables. The technique is, however, restricted to linear associations and is overly sensitive to outliers. Indeed, a single outlier can result in a highly inaccurate summary of the data. Yet, it remains the most commonly used measure of association in psychology research. Here we describe a free Matlab(R) based toolbox (http://sourceforge.net/projects/robustcorrtool/) that computes robust measures of association between two or more random variables: the percentage-bend correlation and skipped-correlations. After illustrating how to use the toolbox, we show that robust methods, where outliers are down weighted or removed and accounted for in significance testing, provide better estimates of the true association with accurate false positive control and without loss of power. The different correlation methods were tested with normal data and normal data contaminated with marginal or bivariate outliers. We report estimates of effect size, false positive rate and power, and advise on which technique to use depending on the data at hand
Permutation Inference for Canonical Correlation Analysis
Canonical correlation analysis (CCA) has become a key tool for population
neuroimaging, allowing investigation of associations between many imaging and
non-imaging measurements. As other variables are often a source of variability
not of direct interest, previous work has used CCA on residuals from a model
that removes these effects, then proceeded directly to permutation inference.
We show that such a simple permutation test leads to inflated error rates. The
reason is that residualisation introduces dependencies among the observations
that violate the exchangeability assumption. Even in the absence of nuisance
variables, however, a simple permutation test for CCA also leads to excess
error rates for all canonical correlations other than the first. The reason is
that a simple permutation scheme does not ignore the variability already
explained by previous canonical variables. Here we propose solutions for both
problems: in the case of nuisance variables, we show that transforming the
residuals to a lower dimensional basis where exchangeability holds results in a
valid permutation test; for more general cases, with or without nuisance
variables, we propose estimating the canonical correlations in a stepwise
manner, removing at each iteration the variance already explained, while
dealing with different number of variables in both sides. We also discuss how
to address the multiplicity of tests, proposing an admissible test that is not
conservative, and provide a complete algorithm for permutation inference for
CCA.Comment: 49 pages, 2 figures, 10 tables, 3 algorithms, 119 reference
Identification of gene pathways implicated in Alzheimer's disease using longitudinal imaging phenotypes with sparse regression
We present a new method for the detection of gene pathways associated with a
multivariate quantitative trait, and use it to identify causal pathways
associated with an imaging endophenotype characteristic of longitudinal
structural change in the brains of patients with Alzheimer's disease (AD). Our
method, known as pathways sparse reduced-rank regression (PsRRR), uses group
lasso penalised regression to jointly model the effects of genome-wide single
nucleotide polymorphisms (SNPs), grouped into functional pathways using prior
knowledge of gene-gene interactions. Pathways are ranked in order of importance
using a resampling strategy that exploits finite sample variability. Our
application study uses whole genome scans and MR images from 464 subjects in
the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. 66,182 SNPs
are mapped to 185 gene pathways from the KEGG pathways database. Voxel-wise
imaging signatures characteristic of AD are obtained by analysing 3D patterns
of structural change at 6, 12 and 24 months relative to baseline. High-ranking,
AD endophenotype-associated pathways in our study include those describing
chemokine, Jak-stat and insulin signalling pathways, and tight junction
interactions. All of these have been previously implicated in AD biology. In a
secondary analysis, we investigate SNPs and genes that may be driving pathway
selection, and identify a number of previously validated AD genes including
CR1, APOE and TOMM40
Comparison of MRI Spectroscopy software packages performance and application on HCV-infected patients’ real data
Treballs Finals de Grau d'Enginyeria Biomèdica. Facultat de Medicina i Ciències de la Salut. Universitat de Barcelona. Curs: 2022-2023. Tutor/Director: Sala Llonch, Roser, Laredo Gregorio, Carlos1H MRS is conceived as a pioneer methodology for brain metabolism inspection and health status
appraisal. Post-processing interventions are required to obtain explicit metabolite quantification
values from which to derive diagnosis. On the grounds of addressing and covering such operation,
multiple software packages have been recently developed and launched leading to an amorphous
assortment of spectroscopic image processing tools, with lack of standardization and regulation.
The current study thereby intends to judge the coherence and consistency of compound estimation
outputs in terms of result variability by intercorrelation and intracorrelation analyses between
appointed programs, being LCModel, Osprey, TARQUIN, and spant toolbox. The examination is
performed on a 83-subject SVS short-TE 3T SIEMENS PRESS spectroscopic acquisitions’
collection, including healthy controls and HCV-infected patients assisted with DAA treatment. The
analytical core of the project assesses software performance through the creation of a Python script
in order to automatically compute and display the results sought. The statistical tests providing
enough information to draw substantial conclusions stem from extraction of coefficient of
determination (R2
), Pearson’s coefficient (r), and intraclass correlation coefficient (ICC) together
with representation of boxplots, rainclouds, and scatter plots easing data visualization. A clinical
implementation is also entailed on the same basis, whose purpose is to reveal actual DAA
treatment effect on HCV-infected patients by means of metabolite concentration alteration and
hypothetical restoration. Conclusions declare evident and alarming variability among MRS
platforms compromising the rigor, sharpness and systematization demanded in this discipline since
quantification results hold incoherences, although they do not seem to affect or oppose medical
determinations jeopardizing patient’s health. However, it would be interesting to extend the analysis
to a greater cohort of subjects to reinforce and get to more solid resolutions
SEARCHING NEUROIMAGING BIOMARKERS IN MENTAL DISORDERS WITH GRAPH AND MULTIMODAL FUSION ANALYSIS OF FUNCTIONAL CONNECTIVITY
Mental disorders such as schizophrenia (SZ), bipolar (BD), and major depression disorders (MDD) can cause severe symptoms and life disruption. They share some symptoms, which can pose a major clinical challenge to their differentiation. Objective biomarkers based on neuroimaging may help to improve diagnostic accuracy and facilitate optimal treatment for patients. Over the last decades, non-invasive in-vivo neuroimaging techniques such as magnetic resonance imaging (MRI) have been increasingly applied to measure structure and function in human brains. With functional MRI (fMRI) or structural MRI (sMRI), studies have identified neurophysiological deficits in patients’ brain from different perspective. Functional connectivity (FC) analysis is an approach that measures functional integration in brains. By assessing the temporal coherence of the hemodynamic activity among brain regions, FC is considered capable of characterizing the large-scale integrity of neural activity.
In this work, we present two data analysis frameworks for biomarker detection on brain imaging with FC, 1) graph analysis of FC and 2) multimodal fusion analysis, to better understand the human brain. Graph analysis reveals the interaction among brain regions based on graph theory, while the multimodal fusion framework enables us to utilize the strength of different imaging modalities through joint analysis. Four applications related to FC using these frameworks were developed. First, FC was estimated using a model-based approach, and revealed altered the small-world network structure in SZ. Secondly, we applied graph analysis on functional network connectivity (FNC) to differentiate BD and MDD during resting-state. Thirdly, two functional measures, FNC and fractional amplitude of low frequency fluctuations (fALFF), were spatially overlaid to compare the FC and spatial alterations in SZ. And finally, we utilized a multimodal fusion analysis framework, multi-set canonical correlation analysis + joint independent component analysis (mCCA+jICA) to link functional and structural abnormalities in BD and MDD. We also evaluated the accuracy of predictive diagnosis through classifiers generated on the selected features. In summary, via the two frameworks, our work has made several contributions to advance FC analysis, which improves our understanding of underlying brain function and structure, and our findings may be ultimately useful for the development of biomarkers of mental disease
Dynamic fluctuations coincide with periods of high and low modularity in resting-state functional brain networks
We investigate the relationship of resting-state fMRI functional connectivity
estimated over long periods of time with time-varying functional connectivity
estimated over shorter time intervals. We show that using Pearson's correlation
to estimate functional connectivity implies that the range of fluctuations of
functional connections over short time scales is subject to statistical
constraints imposed by their connectivity strength over longer scales. We
present a method for estimating time-varying functional connectivity that is
designed to mitigate this issue and allows us to identify episodes where
functional connections are unexpectedly strong or weak. We apply this method to
data recorded from participants, and show that the number of
unexpectedly strong/weak connections fluctuates over time, and that these
variations coincide with intermittent periods of high and low modularity in
time-varying functional connectivity. We also find that during periods of
relative quiescence regions associated with default mode network tend to join
communities with attentional, control, and primary sensory systems. In
contrast, during periods where many connections are unexpectedly strong/weak,
default mode regions dissociate and form distinct modules. Finally, we go on to
show that, while all functional connections can at times manifest stronger
(more positively correlated) or weaker (more negatively correlated) than
expected, a small number of connections, mostly within the visual and
somatomotor networks, do so a disproportional number of times. Our statistical
approach allows the detection of functional connections that fluctuate more or
less than expected based on their long-time averages and may be of use in
future studies characterizing the spatio-temporal patterns of time-varying
functional connectivityComment: 47 Pages, 8 Figures, 4 Supplementary Figure
- …