15 research outputs found

    Uncovering the structure of clinical EEG signals with self-supervised learning

    Get PDF
    Objective. Supervised learning paradigms are often limited by the amount of labeled data that is available. This phenomenon is particularly problematic in clinically-relevant data, such as electroencephalography (EEG), where labeling can be costly in terms of specialized expertise and human processing time. Consequently, deep learning architectures designed to learn on EEG data have yielded relatively shallow models and performances at best similar to those of traditional feature-based approaches. However, in most situations, unlabeled data is available in abundance. By extracting information from this unlabeled data, it might be possible to reach competitive performance with deep neural networks despite limited access to labels. Approach. We investigated self-supervised learning (SSL), a promising technique for discovering structure in unlabeled data, to learn representations of EEG signals. Specifically, we explored two tasks based on temporal context prediction as well as contrastive predictive coding on two clinically-relevant problems: EEG-based sleep staging and pathology detection. We conducted experiments on two large public datasets with thousands of recordings and performed baseline comparisons with purely supervised and hand-engineered approaches. Main results. Linear classifiers trained on SSL-learned features consistently outperformed purely supervised deep neural networks in low-labeled data regimes while reaching competitive performance when all labels were available. Additionally, the embeddings learned with each method revealed clear latent structures related to physiological and clinical phenomena, such as age effects. Significance. We demonstrate the benefit of SSL approaches on EEG data. Our results suggest that self-supervision may pave the way to a wider use of deep learning models on EEG data.Peer reviewe

    Diversity of methicillin-resistant staphylococcus aureus strains isolated from residents of 26 nursing homes in orange county, california

    Get PDF
    Nursing homes represent a unique and important methicillin-resistant Staphylococcus aureus (MRSA) reservoir. Not only are strains imported from hospitals and the community, strains can be transported back into these settings from nursing homes. Since MRSA bacteria are prevalent in nursing homes and yet relatively poorly studied in this setting, a multicenter, regional assessment of the frequency and diversity of MRSA in the nursing home reservoir was carried out and compared to that of the MRSA from hospitals in the same region. The prospective study collected MRSA from nasal swabbing of residents of 26 nursing homes in Orange County, California, and characterized each isolate by spa typing. A total of 837 MRSA isolates were collected from the nursing homes. Estimates of admission prevalence and point prevalence of MRSA were 16% and 26%, respectively. The spa type genetic diversity was heterogeneous between nursing homes and significantly higher overall (77%) than the diversity in Orange County hospitals (72%). MRSA burden in nursing homes appears largely due to importation from hospitals. As seen in Orange County hospitals, USA300 (sequence type 8 [ST8]/t008), USA100 (ST5/t002), and a USA100 variant (ST5/t242) were the dominant MRSA clones in Orange County nursing homes, representing 83% of all isolates, although the USA100 variant was predominant in nursing homes, whereas USA300 was predominant in hospitals. Control strategies tailored to the complex problem of MRSA transmission and infection in nursing homes are needed in order to minimize the impact of this unique reservoir on the overall regional MRSA burden. Copyright © 2013, American Society for Microbiology. All Rights Reserved

    TRY plant trait database – enhanced coverage and open access

    Get PDF
    Plant traits - the morphological, anatomical, physiological, biochemical and phenological characteristics of plants - determine how plants respond to environmental factors, affect other trophic levels, and influence ecosystem properties and their benefits and detriments to people. Plant trait data thus represent the basis for a vast area of research spanning from evolutionary biology, community and functional ecology, to biodiversity conservation, ecosystem and landscape management, restoration, biogeography and earth system modelling. Since its foundation in 2007, the TRY database of plant traits has grown continuously. It now provides unprecedented data coverage under an open access data policy and is the main plant trait database used by the research community worldwide. Increasingly, the TRY database also supports new frontiers of trait‐based plant research, including the identification of data gaps and the subsequent mobilization or measurement of new data. To support this development, in this article we evaluate the extent of the trait data compiled in TRY and analyse emerging patterns of data coverage and representativeness. Best species coverage is achieved for categorical traits - almost complete coverage for ‘plant growth form’. However, most traits relevant for ecology and vegetation modelling are characterized by continuous intraspecific variation and trait–environmental relationships. These traits have to be measured on individual plants in their respective environment. Despite unprecedented data coverage, we observe a humbling lack of completeness and representativeness of these continuous traits in many aspects. We, therefore, conclude that reducing data gaps and biases in the TRY database remains a key challenge and requires a coordinated approach to data mobilization and trait measurements. This can only be achieved in collaboration with other initiatives

    Robust learning from corrupted EEG with dynamic spatial filtering

    Get PDF
    Building machine learning models using EEG recorded outside of the laboratory setting requires methods robust to noisy data and randomly missing channels. This need is particularly great when working with sparse EEG montages (1-6 channels), often encountered in consumergrade or mobile EEG devices. Neither classical machine learning models nor deep neural networks trained end-to-end on EEG are typically designed or tested for robustness to corruption, and especially to randomly missing channels. While some studies have proposed strategies for using data with missing channels, these approaches are not practical when sparse montages are used and computing power is limited (e.g., wearables, cell phones). To tackle this problem, we propose dynamic spatial filtering (DSF), a multi-head attention module that can be plugged in before the first layer of a neural network to handle missing EEG channels by learning to focus on good channels and to ignore bad ones. We tested DSF on public EEG data encompassing ∌4,000 recordings with simulated channel corruption and on a private dataset of ∌100 at-home recordings of mobile EEG with natural corruption. Our proposed approach achieves the same performance as baseline models when no noise is applied, but outperforms baselines by as much as 29.4% accuracy when significant channel corruption is present. Moreover, DSF outputs are interpretable, making it possible to monitor channel importance in real-time. This approach has the potential to enable the analysis of EEG in challenging settings where channel corruption hampers the reading of brain signals

    Apprentissage de représentations auto-supervisé à partir de signaux d'électroencéphalographie

    Get PDF
    International audienceThe supervised learning paradigm is limited by the cost - and sometimes the impracticality - of data collection and labeling in multiple domains. Self-supervised learning, a paradigm which exploits the structure of unlabeled data to create learning problems that can be solved with standard supervised approaches, has shown great promise as a pretraining or feature learning approach in fields like computer vision and time series processing. In this work, we present self-supervision strategies that can be used to learn informative representations from multivariate time series. One successful approach relies on predicting whether time windows are sampled from the same temporal context or not. As demonstrated on a clinically relevant task (sleep scoring) and with two electroencephalography datasets, our approach outperforms a purely supervised approach in low data regimes, while capturing important physiological information without any access to labels.Le paradigme de l’apprentissage supervisĂ© est limitĂ© par le coĂ»t - et parfois l’impraticabilitĂ© - de la collecte de donnĂ©es et de l’étiquetage dans de multiples domaines. L'apprentissage auto-supervisĂ©, un paradigme qui exploite la structure de donnĂ©es non Ă©tiquetĂ©es pour crĂ©er des problĂšmes d'apprentissage qui peuvent ĂȘtre rĂ©solus avec des approches supervisĂ©es standard, s'est rĂ©vĂ©lĂ© trĂšs prometteur en tant qu'approche de prĂ©-entraĂźnement ou d'apprentissage de traits caractĂ©ristiques dans des domaines tels que la vision par ordinateur et le traitement de sĂ©ries temporelles. Dans ce travail, nous prĂ©sentons des stratĂ©gies d'auto-supervision pouvant ĂȘtre utilisĂ©es pour apprendre des reprĂ©sentations informatives Ă  partir de sĂ©ries temporelles multivariĂ©es. Une approche fructueuse consiste Ă  prĂ©dire si des fenĂȘtres temporelles sont Ă©chantillonnĂ©es dans le mĂȘme contexte temporel ou non. Comme le dĂ©montre une tĂąche cliniquement pertinente (classification des stades du sommeil) et avec deux jeux de donnĂ©es d'Ă©lectroencĂ©phalographie, notre approche surpasse une approche purement supervisĂ©e dans des rĂ©gimes de donnĂ©es faibles, tout en capturant des informations physiologiques importantes sans accĂšs aux Ă©tiquettes
    corecore