3,113 research outputs found

    Removing the influence of a group variable in high-dimensional predictive modelling

    Full text link
    In many application areas, predictive models are used to support or make important decisions. There is increasing awareness that these models may contain spurious or otherwise undesirable correlations. Such correlations may arise from a variety of sources, including batch effects, systematic measurement errors, or sampling bias. Without explicit adjustment, machine learning algorithms trained using these data can produce poor out-of-sample predictions which propagate these undesirable correlations. We propose a method to pre-process the training data, producing an adjusted dataset that is statistically independent of the nuisance variables with minimum information loss. We develop a conceptually simple approach for creating an adjusted dataset in high-dimensional settings based on a constrained form of matrix decomposition. The resulting dataset can then be used in any predictive algorithm with the guarantee that predictions will be statistically independent of the group variable. We develop a scalable algorithm for implementing the method, along with theory support in the form of independence guarantees and optimality. The method is illustrated on some simulation examples and applied to two case studies: removing machine-specific correlations from brain scan data, and removing race and ethnicity information from a dataset used to predict recidivism. That the motivation for removing undesirable correlations is quite different in the two applications illustrates the broad applicability of our approach.Comment: Update. 18 pages, 3 figure

    Robust correlation analyses: false positive and power validation using a new open source Matlab toolbox

    Get PDF
    Pearson’s correlation measures the strength of the association between two variables. The technique is, however, restricted to linear associations and is overly sensitive to outliers. Indeed, a single outlier can result in a highly inaccurate summary of the data. Yet, it remains the most commonly used measure of association in psychology research. Here we describe a free Matlab(R) based toolbox (http://sourceforge.net/projects/robustcorrtool/) that computes robust measures of association between two or more random variables: the percentage-bend correlation and skipped-correlations. After illustrating how to use the toolbox, we show that robust methods, where outliers are down weighted or removed and accounted for in significance testing, provide better estimates of the true association with accurate false positive control and without loss of power. The different correlation methods were tested with normal data and normal data contaminated with marginal or bivariate outliers. We report estimates of effect size, false positive rate and power, and advise on which technique to use depending on the data at hand

    Permutation Inference for Canonical Correlation Analysis

    Get PDF
    Canonical correlation analysis (CCA) has become a key tool for population neuroimaging, allowing investigation of associations between many imaging and non-imaging measurements. As other variables are often a source of variability not of direct interest, previous work has used CCA on residuals from a model that removes these effects, then proceeded directly to permutation inference. We show that such a simple permutation test leads to inflated error rates. The reason is that residualisation introduces dependencies among the observations that violate the exchangeability assumption. Even in the absence of nuisance variables, however, a simple permutation test for CCA also leads to excess error rates for all canonical correlations other than the first. The reason is that a simple permutation scheme does not ignore the variability already explained by previous canonical variables. Here we propose solutions for both problems: in the case of nuisance variables, we show that transforming the residuals to a lower dimensional basis where exchangeability holds results in a valid permutation test; for more general cases, with or without nuisance variables, we propose estimating the canonical correlations in a stepwise manner, removing at each iteration the variance already explained, while dealing with different number of variables in both sides. We also discuss how to address the multiplicity of tests, proposing an admissible test that is not conservative, and provide a complete algorithm for permutation inference for CCA.Comment: 49 pages, 2 figures, 10 tables, 3 algorithms, 119 reference

    Identification of gene pathways implicated in Alzheimer's disease using longitudinal imaging phenotypes with sparse regression

    Get PDF
    We present a new method for the detection of gene pathways associated with a multivariate quantitative trait, and use it to identify causal pathways associated with an imaging endophenotype characteristic of longitudinal structural change in the brains of patients with Alzheimer's disease (AD). Our method, known as pathways sparse reduced-rank regression (PsRRR), uses group lasso penalised regression to jointly model the effects of genome-wide single nucleotide polymorphisms (SNPs), grouped into functional pathways using prior knowledge of gene-gene interactions. Pathways are ranked in order of importance using a resampling strategy that exploits finite sample variability. Our application study uses whole genome scans and MR images from 464 subjects in the Alzheimer's Disease Neuroimaging Initiative (ADNI) database. 66,182 SNPs are mapped to 185 gene pathways from the KEGG pathways database. Voxel-wise imaging signatures characteristic of AD are obtained by analysing 3D patterns of structural change at 6, 12 and 24 months relative to baseline. High-ranking, AD endophenotype-associated pathways in our study include those describing chemokine, Jak-stat and insulin signalling pathways, and tight junction interactions. All of these have been previously implicated in AD biology. In a secondary analysis, we investigate SNPs and genes that may be driving pathway selection, and identify a number of previously validated AD genes including CR1, APOE and TOMM40

    Comparison of MRI Spectroscopy software packages performance and application on HCV-infected patients’ real data

    Full text link
    Treballs Finals de Grau d'Enginyeria Biomèdica. Facultat de Medicina i Ciències de la Salut. Universitat de Barcelona. Curs: 2022-2023. Tutor/Director: Sala Llonch, Roser, Laredo Gregorio, Carlos1H MRS is conceived as a pioneer methodology for brain metabolism inspection and health status appraisal. Post-processing interventions are required to obtain explicit metabolite quantification values from which to derive diagnosis. On the grounds of addressing and covering such operation, multiple software packages have been recently developed and launched leading to an amorphous assortment of spectroscopic image processing tools, with lack of standardization and regulation. The current study thereby intends to judge the coherence and consistency of compound estimation outputs in terms of result variability by intercorrelation and intracorrelation analyses between appointed programs, being LCModel, Osprey, TARQUIN, and spant toolbox. The examination is performed on a 83-subject SVS short-TE 3T SIEMENS PRESS spectroscopic acquisitions’ collection, including healthy controls and HCV-infected patients assisted with DAA treatment. The analytical core of the project assesses software performance through the creation of a Python script in order to automatically compute and display the results sought. The statistical tests providing enough information to draw substantial conclusions stem from extraction of coefficient of determination (R2 ), Pearson’s coefficient (r), and intraclass correlation coefficient (ICC) together with representation of boxplots, rainclouds, and scatter plots easing data visualization. A clinical implementation is also entailed on the same basis, whose purpose is to reveal actual DAA treatment effect on HCV-infected patients by means of metabolite concentration alteration and hypothetical restoration. Conclusions declare evident and alarming variability among MRS platforms compromising the rigor, sharpness and systematization demanded in this discipline since quantification results hold incoherences, although they do not seem to affect or oppose medical determinations jeopardizing patient’s health. However, it would be interesting to extend the analysis to a greater cohort of subjects to reinforce and get to more solid resolutions

    SEARCHING NEUROIMAGING BIOMARKERS IN MENTAL DISORDERS WITH GRAPH AND MULTIMODAL FUSION ANALYSIS OF FUNCTIONAL CONNECTIVITY

    Get PDF
    Mental disorders such as schizophrenia (SZ), bipolar (BD), and major depression disorders (MDD) can cause severe symptoms and life disruption. They share some symptoms, which can pose a major clinical challenge to their differentiation. Objective biomarkers based on neuroimaging may help to improve diagnostic accuracy and facilitate optimal treatment for patients. Over the last decades, non-invasive in-vivo neuroimaging techniques such as magnetic resonance imaging (MRI) have been increasingly applied to measure structure and function in human brains. With functional MRI (fMRI) or structural MRI (sMRI), studies have identified neurophysiological deficits in patients’ brain from different perspective. Functional connectivity (FC) analysis is an approach that measures functional integration in brains. By assessing the temporal coherence of the hemodynamic activity among brain regions, FC is considered capable of characterizing the large-scale integrity of neural activity. In this work, we present two data analysis frameworks for biomarker detection on brain imaging with FC, 1) graph analysis of FC and 2) multimodal fusion analysis, to better understand the human brain. Graph analysis reveals the interaction among brain regions based on graph theory, while the multimodal fusion framework enables us to utilize the strength of different imaging modalities through joint analysis. Four applications related to FC using these frameworks were developed. First, FC was estimated using a model-based approach, and revealed altered the small-world network structure in SZ. Secondly, we applied graph analysis on functional network connectivity (FNC) to differentiate BD and MDD during resting-state. Thirdly, two functional measures, FNC and fractional amplitude of low frequency fluctuations (fALFF), were spatially overlaid to compare the FC and spatial alterations in SZ. And finally, we utilized a multimodal fusion analysis framework, multi-set canonical correlation analysis + joint independent component analysis (mCCA+jICA) to link functional and structural abnormalities in BD and MDD. We also evaluated the accuracy of predictive diagnosis through classifiers generated on the selected features. In summary, via the two frameworks, our work has made several contributions to advance FC analysis, which improves our understanding of underlying brain function and structure, and our findings may be ultimately useful for the development of biomarkers of mental disease

    Dynamic fluctuations coincide with periods of high and low modularity in resting-state functional brain networks

    Full text link
    We investigate the relationship of resting-state fMRI functional connectivity estimated over long periods of time with time-varying functional connectivity estimated over shorter time intervals. We show that using Pearson's correlation to estimate functional connectivity implies that the range of fluctuations of functional connections over short time scales is subject to statistical constraints imposed by their connectivity strength over longer scales. We present a method for estimating time-varying functional connectivity that is designed to mitigate this issue and allows us to identify episodes where functional connections are unexpectedly strong or weak. We apply this method to data recorded from N=80N=80 participants, and show that the number of unexpectedly strong/weak connections fluctuates over time, and that these variations coincide with intermittent periods of high and low modularity in time-varying functional connectivity. We also find that during periods of relative quiescence regions associated with default mode network tend to join communities with attentional, control, and primary sensory systems. In contrast, during periods where many connections are unexpectedly strong/weak, default mode regions dissociate and form distinct modules. Finally, we go on to show that, while all functional connections can at times manifest stronger (more positively correlated) or weaker (more negatively correlated) than expected, a small number of connections, mostly within the visual and somatomotor networks, do so a disproportional number of times. Our statistical approach allows the detection of functional connections that fluctuate more or less than expected based on their long-time averages and may be of use in future studies characterizing the spatio-temporal patterns of time-varying functional connectivityComment: 47 Pages, 8 Figures, 4 Supplementary Figure
    corecore