33 research outputs found

    Learning Signal Representations for EEG Cross-Subject Channel Selection and Trial Classification

    Get PDF
    EEG technology finds applications in several domains. Currently, most EEG systems require subjects to wear several electrodes on the scalp to be effective. However, several channels might include noisy information, redundant signals, induce longer preparation times and increase computational times of any automated system for EEG decoding. One way to reduce the signal-to-noise ratio and improve classification accuracy is to combine channel selection with feature extraction, but EEG signals are known to present high inter-subject variability. In this work we introduce a novel algorithm for subject-independent channel selection of EEG recordings. Considering multi-channel trial recordings as statistical units and the EEG decoding task as the class of reference, the algorithm (i) exploits channel-specific 1D-Convolutional Neural Networks (1D-CNNs) as feature extractors in a supervised fashion to maximize class separability; (ii) it reduces a high dimensional multi-channel trial representation into a unique trial vector by concatenating the channels' embeddings and (iii) recovers the complex inter-channel relationships during channel selection, by exploiting an ensemble of AutoEncoders (AE) to identify from these vectors the most relevant channels to perform classification. After training, the algorithm can be exploited by transferring only the parametrized subgroup of selected channel-specific 1D-CNNs to new signals from new subjects and obtain low-dimensional and highly informative trial vectors to be fed to any classifier

    A Deep Survival EWAS approach estimating risk profile based on pre-diagnostic DNA methylation: An application to breast cancer time to diagnosis

    Get PDF
    Previous studies for cancer biomarker discovery based on pre-diagnostic blood DNA methylation (DNAm) profiles, either ignore the explicit modeling of the Time To Diagnosis (TTD), or provide inconsistent results. This lack of consistency is likely due to the limitations of standard EWAS approaches, that model the effect of DNAm at CpG sites on TTD independently. In this work, we aim to identify blood DNAm profiles associated with TTD, with the aim to improve the reliability of the results, as well as their biological meaningfulness. We argue that a global approach to estimate CpG sites effect profile should capture the complex (potentially non-linear) relationships interplaying between sites. To prove our concept, we develop a new Deep Learning-based approach assessing the relevance of individual CpG Islands (i.e., assigning a weight to each site) in determining TTD while modeling their combined effect in a survival analysis scenario. The algorithm combines a tailored sampling procedure with DNAm sites agglomeration, deep non-linear survival modeling and SHapley Additive exPlanations (SHAP) values estimation to aid robustness of the derived effects profile. The proposed approach deals with the common complexities arising from epidemiological studies, such as small sample size, noise, and low signal-to-noise ratio of blood-derived DNAm. We apply our approach to a prospective case-control study on breast cancer nested in the EPIC Italy cohort and we perform weighted gene-set enrichment analyses to demonstrate the biological meaningfulness of the obtained results. We compared the results of Deep Survival EWAS with those of a traditional EWAS approach, demonstrating that our method performs better than the standard approach in identifying biologically relevant pathways

    Dual adversarial deconfounding autoencoder for joint batch-effects removal from multi-center and multi-scanner radiomics data

    Get PDF
    Abstract Medical imaging represents the primary tool for investigating and monitoring several diseases, including cancer. The advances in quantitative image analysis have developed towards the extraction of biomarkers able to support clinical decisions. To produce robust results, multi-center studies are often set up. However, the imaging information must be denoised from confounding factors—known as batch-effect—like scanner-specific and center-specific influences. Moreover, in non-solid cancers, like lymphomas, effective biomarkers require an imaging-based representation of the disease that accounts for its multi-site spreading over the patient’s body. In this work, we address the dual-factor deconfusion problem and we propose a deconfusion algorithm to harmonize the imaging information of patients affected by Hodgkin Lymphoma in a multi-center setting. We show that the proposed model successfully denoises data from domain-specific variability (p-value < 0.001) while it coherently preserves the spatial relationship between imaging descriptions of peer lesions (p-value = 0), which is a strong prognostic biomarker for tumor heterogeneity assessment. This harmonization step allows to significantly improve the performance in prognostic models with respect to state-of-the-art methods, enabling building exhaustive patient representations and delivering more accurate analyses (p-values < 0.001 in training, p-values < 0.05 in testing). This work lays the groundwork for performing large-scale and reproducible analyses on multi-center data that are urgently needed to convey the translation of imaging-based biomarkers into the clinical practice as effective prognostic tools. The code is available on GitHub at this https://github.com/LaraCavinato/Dual-ADAE

    A Deep Learning Approach Validates Genetic Risk Factors for Late Toxicity After Prostate Cancer Radiotherapy in a REQUITE Multi-National Cohort.

    Get PDF
    Background: REQUITE (validating pREdictive models and biomarkers of radiotherapy toxicity to reduce side effects and improve QUalITy of lifE in cancer survivors) is an international prospective cohort study. The purpose of this project was to analyse a cohort of patients recruited into REQUITE using a deep learning algorithm to identify patient-specific features associated with the development of toxicity, and test the approach by attempting to validate previously published genetic risk factors. Methods: The study involved REQUITE prostate cancer patients treated with external beam radiotherapy who had complete 2-year follow-up. We used five separate late toxicity endpoints: ≥grade 1 late rectal bleeding, ≥grade 2 urinary frequency, ≥grade 1 haematuria, ≥ grade 2 nocturia, ≥ grade 1 decreased urinary stream. Forty-three single nucleotide polymorphisms (SNPs) already reported in the literature to be associated with the toxicity endpoints were included in the analysis. No SNP had been studied before in the REQUITE cohort. Deep Sparse AutoEncoders (DSAE) were trained to recognize features (SNPs) identifying patients with no toxicity and tested on a different independent mixed population including patients without and with toxicity. Results: One thousand, four hundred and one patients were included, and toxicity rates were: rectal bleeding 11.7%, urinary frequency 4%, haematuria 5.5%, nocturia 7.8%, decreased urinary stream 17.1%. Twenty-four of the 43 SNPs that were associated with the toxicity endpoints were validated as identifying patients with toxicity. Twenty of the 24 SNPs were associated with the same toxicity endpoint as reported in the literature: 9 SNPs for urinary symptoms and 11 SNPs for overall toxicity. The other 4 SNPs were associated with a different endpoint. Conclusion: Deep learning algorithms can validate SNPs associated with toxicity after radiotherapy for prostate cancer. The method should be studied further to identify polygenic SNP risk signatures for radiotherapy toxicity. The signatures could then be included in integrated normal tissue complication probability models and tested for their ability to personalize radiotherapy treatment planning

    Spatial communication systems across languages reflect universal action constraints

    Get PDF
    The extent to which languages share properties reflecting the non-linguistic constraints of the speakers who speak them is key to the debate regarding the relationship between language and cognition. A critical case is spatial communication, where it has been argued that semantic universals should exist, if anywhere. Here, using an experimental paradigm able to separate variation within a language from variation between languages, we tested the use of spatial demonstratives—the most fundamental and frequent spatial terms across languages. In n = 874 speakers across 29 languages, we show that speakers of all tested languages use spatial demonstratives as a function of being able to reach or act on an object being referred to. In some languages, the position of the addressee is also relevant in selecting between demonstrative forms. Commonalities and differences across languages in spatial communication can be understood in terms of universal constraints on action shaping spatial language and cognition

    Spatial communication systems across languages reflect universal action constraints

    Get PDF
    The extent to which languages share properties reflecting the non-linguistic constraints of the speakers who speak them is key to the debate regarding the relationship between language and cognition. A critical case is spatial communication, where it has been argued that semantic universals should exist, if anywhere. Here, using an experimental paradigm able to separate variation within a language from variation between languages, we tested the use of spatial demonstratives—the most fundamental and frequent spatial terms across languages. In n = 874 speakers across 29 languages, we show that speakers of all tested languages use spatial demonstratives as a function of being able to reach or act on an object being referred to. In some languages, the position of the addressee is also relevant in selecting between demonstrative forms. Commonalities and differences across languages in spatial communication can be understood in terms of universal constraints on action shaping spatial language and cognition

    Learning Signal Representations for EEG Cross-Subject Channel Selection and Trial Classification

    Get PDF
    EEG is a non-invasive powerful system that finds applications in several domains and research areas. Most EEG systems are multi-channel in nature, but multiple channels might include noisy and redundant information and increase computational times of automated EEG decoding algorithms. To reduce the signal-to-noise ratio, improve accuracy and reduce computational time, one may combine channel selection with feature extraction and dimensionality reduction. However, as EEG signals present high inter-subject variability, we introduce a novel algorithm for subject-independent channel selection through representation learning of EEG recordings. The algorithm exploits channel-specific 1D-CNNs as supervised feature extractors to maximize class separability and reduces a high dimensional multi-channel signal into a unique 1-Dimensional representation from which it selects the most relevant channels for classification. The algorithm can be transferred to new signals from new subjects and obtain novel highly informative trial vectors of controlled dimensionality to be fed to any kind of classifier

    Feature selection for imbalanced data with deep sparse autoencoders ensemble

    Full text link
    Class imbalance is a common issue in many domain applications of learning algorithms. Oftentimes, in the same domains it is much more relevant to correctly classify and profile minority class observations. This need can be addressed by feature selection (FS), that offers several further advantages, such as decreasing computational costs, aiding inference and interpretability. However, traditional FS techniques may become suboptimal in the presence of strongly imbalanced data. To achieve FS advantages in this setting, we propose a filtering FS algorithm ranking feature importance on the basis of the reconstruction error of a deep sparse autoencoders ensemble (DSAEE). We use each DSAE trained only on majority class to reconstruct both classes. From the analysis of the aggregated reconstruction error, we determine the features where the minority class presents a different distribution of values w.r.t. the overrepresented one, thus identifying the most relevant features to discriminate between the two. We empirically demonstrate the efficacy of our algorithm in several experiments, both simulated and on high-dimensional datasets of varying sample size, showcasing its capability to select relevant and generalizable features to profile and classify minority class, outperforming other benchmark FS methods. We also briefly present a real application in radiogenomics, where the methodology was applied successfully
    corecore