12 research outputs found

    reval: A Python package to determine best clustering solutions with stability-based relative clustering validation.

    Get PDF
    Determining the best partition for a dataset can be a challenging task because of the lack of a priori information within an unsupervised learning framework and the absence of a unique clustering validation approach to evaluate clustering solutions. Here we present reval: a Python package that leverages stability-based relative clustering validation methods to select best clustering solutions as the ones that replicate, via supervised learning, on unseen subsets of data. The implementation of relative validation methods can contribute to the theory of clustering by fostering new approaches for the investigation of clustering results in different situations and for different data distributions. This work aims at contributing to this effort by implementing a package that works with multiple clustering and classification algorithms, hence allowing both the automation of the labeling process and the assessment of the stability of different clustering mechanisms

    Keyword-optimized Template Insertion for Clinical Information Extraction via Prompt-based Learning

    Full text link
    Clinical note classification is a common clinical NLP task. However, annotated data-sets are scarse. Prompt-based learning has recently emerged as an effective method to adapt pre-trained models for text classification using only few training examples. A critical component of prompt design is the definition of the template (i.e. prompt text). The effect of template position, however, has been insufficiently investigated. This seems particularly important in the clinical setting, where task-relevant information is usually sparse in clinical notes. In this study we develop a keyword-optimized template insertion method (KOTI) and show how optimizing position can improve performance on several clinical tasks in a zero-shot and few-shot training setting

    Deep Representation Learning of Electronic Health Records to Unlock Patient Stratification at Scale

    Full text link
    Deriving disease subtypes from electronic health records (EHRs) can guide next-generation personalized medicine. However, challenges in summarizing and representing patient data prevent widespread practice of scalable EHR-based stratification analysis. Here we present an unsupervised framework based on deep learning to process heterogeneous EHRs and derive patient representations that can efficiently and effectively enable patient stratification at scale. We considered EHRs of 1,608,741 patients from a diverse hospital cohort comprising of a total of 57,464 clinical concepts. We introduce a representation learning model based on word embeddings, convolutional neural networks, and autoencoders (i.e., ConvAE) to transform patient trajectories into low-dimensional latent vectors. We evaluated these representations as broadly enabling patient stratification by applying hierarchical clustering to different multi-disease and disease-specific patient cohorts. ConvAE significantly outperformed several baselines in a clustering task to identify patients with different complex conditions, with 2.61 entropy and 0.31 purity average scores. When applied to stratify patients within a certain condition, ConvAE led to various clinically relevant subtypes for different disorders, including type 2 diabetes, Parkinson's disease and Alzheimer's disease, largely related to comorbidities, disease progression, and symptom severity. With these results, we demonstrate that ConvAE can generate patient representations that lead to clinically meaningful insights. This scalable framework can help better understand varying etiologies in heterogeneous sub-populations and unlock patterns for EHR-based research in the realm of personalized medicine.Comment: C.F. and R.M. share senior authorshi

    Pre-treatment clinical and gene expression patterns predict developmental change in early intervention in autism.

    Get PDF
    Funder: U.S. Department of Health & Human Services | NIH | National Institute of Mental Health (NIMH)Early detection and intervention are believed to be key to facilitating better outcomes in children with autism, yet the impact of age at treatment start on the outcome is poorly understood. While clinical traits such as language ability have been shown to predict treatment outcome, whether or not and how information at the genomic level can predict treatment outcome is unknown. Leveraging a cohort of toddlers with autism who all received the same standardized intervention at a very young age and provided a blood sample, here we find that very early treatment engagement (i.e., <24 months) leads to greater gains while controlling for time in treatment. Pre-treatment clinical behavioral measures predict 21% of the variance in the rate of skill growth during early intervention. Pre-treatment blood leukocyte gene expression patterns also predict the rate of skill growth, accounting for 13% of the variance in treatment slopes. Results indicated that 295 genes can be prioritized as driving this effect. These treatment-relevant genes highly interact at the protein level, are enriched for differentially histone acetylated genes in autism postmortem cortical tissue, and are normatively highly expressed in a variety of subcortical and cortical areas important for social communication and language development. This work suggests that pre-treatment biological and clinical behavioral characteristics are important for predicting developmental change in the context of early intervention and that individualized pre-treatment biology related to histone acetylation may be key

    Therapeutic Factors in a Psychiatric Group Therapy: a Preliminary Validation of Therapeutic Factors Inventory-8, Italian Version

    No full text
    Several studies support group therapy effectiveness due to the activation in patients of unique psychological mechanisms defined as non-specific therapeutic factors (Therapeutic Factors-TFs), which shape the setting and, at the same time, enhance the specific group therapeutic factors. The objectives of this study were to preliminarly validate Therapeutic Factors Inventory-8 (TFI-8) Italian version and identify group therapeutic factors. In a psychiatric residential facility, a weekly psychotherapeutic group was evaluated during 1 year. One scale on group process (TFI-8, Ferrara-Group Experience Scale) and three clinical scales (Brief Symptom Inventory-53, Sheehan Disability Scale, WHO Quality of Life-Bref) were administered to participating patients. Internal consistency, Exploratory Factor Analysis (EFA), convergent validity of TFI-8 were assessed. Correlations between TFI-8 and other scale scores and selected variables were pwerformed. Our sample consisted of 64 participants. TFI-8 showed good internal consistency (Chronbach’s alpha = 0.84), concurrent validity with Fe-GES (Rho = 0.42, p = 0.0008). EFA highlighted a single Factor, accounting for 92% of variance. TFI-8 was not significantly related to clinical scale scores. TFI-8 Italian version proved to be a valid and reliable tool which allowed us to identify one therapeutic factor indicating relational attraction in group therapy, composed of three dimensions: infusion of hope, cohesion and social learning

    Convolutional neural networks for structured omics:OmicsCNN and the OmicsConv layer

    No full text
    Convolutional Neural Networks (CNNs) are a popular deep learning architecture widely applied in different domains, in particular in classifying over images, for which the concept of convolution with a filter comes naturally. Unfortunately, the requirement of a distance (or, at least, of a neighbourhood function) in the input feature space has so far prevented its direct use on data types such as omics data. However, a number of omics data are metrizable, i.e., they can be endowed with a metric structure, enabling to adopt a convolutional based deep learning framework, e.g., for prediction. We propose a generalized solution for CNNs on omics data, implemented through a dedicated Keras layer. In particular, for metagenomics data, a metric can be derived from the patristic distance on the phylogenetic tree. For transcriptomics data, we combine Gene Ontology semantic similarity and gene co-expression to define a distance; the function is defined through a multilayer network where 3 layers are defined by the GO mutual semantic similarity while the fourth one by gene co-expression. As a general tool, feature distance on omics data is enabled by OmicsConv, a novel Keras layer, obtaining OmicsCNN, a dedicated deep learning framework. Here we demonstrate OmicsCNN on gut microbiota sequencing data, for Inflammatory Bowel Disease (IBD) 16S data, first on synthetic data and then a metagenomics collection of gut microbiota of 222 IBD patients.Comment: 7 pages, 3 figures. arXiv admin note: text overlap with arXiv:1709.0226
    corecore