12 research outputs found
reval: A Python package to determine best clustering solutions with stability-based relative clustering validation.
Determining the best partition for a dataset can be a challenging task because of the lack of a priori information within an unsupervised learning framework and the absence of a unique clustering validation approach to evaluate clustering solutions. Here we present reval: a Python package that leverages stability-based relative clustering validation methods to select best clustering solutions as the ones that replicate, via supervised learning, on unseen subsets of data. The implementation of relative validation methods can contribute to the theory of clustering by fostering new approaches for the investigation of clustering results in different situations and for different data distributions. This work aims at contributing to this effort by implementing a package that works with multiple clustering and classification algorithms, hence allowing both the automation of the labeling process and the assessment of the stability of different clustering mechanisms
Recommended from our members
reval: A Python package to determine best clustering solutions with stability-based relative clustering validation.
Determining the best partition for a dataset can be a challenging task because of the lack of a priori information within an unsupervised learning framework and the absence of a unique clustering validation approach to evaluate clustering solutions. Here we present reval: a Python package that leverages stability-based relative clustering validation methods to select best clustering solutions as the ones that replicate, via supervised learning, on unseen subsets of data. The implementation of relative validation methods can contribute to the theory of clustering by fostering new approaches for the investigation of clustering results in different situations and for different data distributions. This work aims at contributing to this effort by implementing a package that works with multiple clustering and classification algorithms, hence allowing both the automation of the labeling process and the assessment of the stability of different clustering mechanisms
Keyword-optimized Template Insertion for Clinical Information Extraction via Prompt-based Learning
Clinical note classification is a common clinical NLP task. However,
annotated data-sets are scarse. Prompt-based learning has recently emerged as
an effective method to adapt pre-trained models for text classification using
only few training examples. A critical component of prompt design is the
definition of the template (i.e. prompt text). The effect of template position,
however, has been insufficiently investigated. This seems particularly
important in the clinical setting, where task-relevant information is usually
sparse in clinical notes. In this study we develop a keyword-optimized template
insertion method (KOTI) and show how optimizing position can improve
performance on several clinical tasks in a zero-shot and few-shot training
setting
Deep Representation Learning of Electronic Health Records to Unlock Patient Stratification at Scale
Deriving disease subtypes from electronic health records (EHRs) can guide
next-generation personalized medicine. However, challenges in summarizing and
representing patient data prevent widespread practice of scalable EHR-based
stratification analysis. Here we present an unsupervised framework based on
deep learning to process heterogeneous EHRs and derive patient representations
that can efficiently and effectively enable patient stratification at scale. We
considered EHRs of 1,608,741 patients from a diverse hospital cohort comprising
of a total of 57,464 clinical concepts. We introduce a representation learning
model based on word embeddings, convolutional neural networks, and autoencoders
(i.e., ConvAE) to transform patient trajectories into low-dimensional latent
vectors. We evaluated these representations as broadly enabling patient
stratification by applying hierarchical clustering to different multi-disease
and disease-specific patient cohorts. ConvAE significantly outperformed several
baselines in a clustering task to identify patients with different complex
conditions, with 2.61 entropy and 0.31 purity average scores. When applied to
stratify patients within a certain condition, ConvAE led to various clinically
relevant subtypes for different disorders, including type 2 diabetes,
Parkinson's disease and Alzheimer's disease, largely related to comorbidities,
disease progression, and symptom severity. With these results, we demonstrate
that ConvAE can generate patient representations that lead to clinically
meaningful insights. This scalable framework can help better understand varying
etiologies in heterogeneous sub-populations and unlock patterns for EHR-based
research in the realm of personalized medicine.Comment: C.F. and R.M. share senior authorshi
Pre-treatment clinical and gene expression patterns predict developmental change in early intervention in autism.
Funder: U.S. Department of Health & Human Services | NIH | National Institute of Mental Health (NIMH)Early detection and intervention are believed to be key to facilitating better outcomes in children with autism, yet the impact of age at treatment start on the outcome is poorly understood. While clinical traits such as language ability have been shown to predict treatment outcome, whether or not and how information at the genomic level can predict treatment outcome is unknown. Leveraging a cohort of toddlers with autism who all received the same standardized intervention at a very young age and provided a blood sample, here we find that very early treatment engagement (i.e., <24 months) leads to greater gains while controlling for time in treatment. Pre-treatment clinical behavioral measures predict 21% of the variance in the rate of skill growth during early intervention. Pre-treatment blood leukocyte gene expression patterns also predict the rate of skill growth, accounting for 13% of the variance in treatment slopes. Results indicated that 295 genes can be prioritized as driving this effect. These treatment-relevant genes highly interact at the protein level, are enriched for differentially histone acetylated genes in autism postmortem cortical tissue, and are normatively highly expressed in a variety of subcortical and cortical areas important for social communication and language development. This work suggests that pre-treatment biological and clinical behavioral characteristics are important for predicting developmental change in the context of early intervention and that individualized pre-treatment biology related to histone acetylation may be key
Recommended from our members
Imbalanced social-communicative and restricted repetitive behavior subtypes of autism spectrum disorder exhibit different neural circuitry
Abstract: Social-communication (SC) and restricted repetitive behaviors (RRB) are autism diagnostic symptom domains. SC and RRB severity can markedly differ within and between individuals and may be underpinned by different neural circuitry and genetic mechanisms. Modeling SC-RRB balance could help identify how neural circuitry and genetic mechanisms map onto such phenotypic heterogeneity. Here, we developed a phenotypic stratification model that makes highly accurate (97–99%) out-of-sample SC = RRB, SC > RRB, and RRB > SC subtype predictions. Applying this model to resting state fMRI data from the EU-AIMS LEAP dataset (n = 509), we find that while the phenotypic subtypes share many commonalities in terms of intrinsic functional connectivity, they also show replicable differences within some networks compared to a typically-developing group (TD). Specifically, the somatomotor network is hypoconnected with perisylvian circuitry in SC > RRB and visual association circuitry in SC = RRB. The SC = RRB subtype show hyperconnectivity between medial motor and anterior salience circuitry. Genes that are highly expressed within these networks show a differential enrichment pattern with known autism-associated genes, indicating that such circuits are affected by differing autism-associated genomic mechanisms. These results suggest that SC-RRB imbalance subtypes share many commonalities, but also express subtle differences in functional neural circuitry and the genomic underpinnings behind such circuitry
Therapeutic Factors in a Psychiatric Group Therapy: a Preliminary Validation of Therapeutic Factors Inventory-8, Italian Version
Several studies support group therapy effectiveness due to the activation in patients of unique psychological mechanisms defined as non-specific therapeutic factors (Therapeutic Factors-TFs), which shape the setting and, at the same time, enhance the specific group therapeutic factors. The objectives of this study were to preliminarly validate Therapeutic Factors Inventory-8 (TFI-8) Italian version and identify group therapeutic factors. In a psychiatric residential facility, a weekly psychotherapeutic group was evaluated during 1 year. One scale on group process (TFI-8, Ferrara-Group Experience Scale) and three clinical scales (Brief Symptom Inventory-53, Sheehan Disability Scale, WHO Quality of Life-Bref) were administered to participating patients. Internal consistency, Exploratory Factor Analysis (EFA), convergent validity of TFI-8 were assessed. Correlations between TFI-8 and other scale scores and selected variables were pwerformed. Our sample consisted of 64 participants. TFI-8 showed good internal consistency (Chronbach’s alpha = 0.84), concurrent validity with Fe-GES (Rho = 0.42, p = 0.0008). EFA highlighted a single Factor, accounting for 92% of variance. TFI-8 was not significantly related to clinical scale scores. TFI-8 Italian version proved to be a valid and reliable tool which allowed us to identify one therapeutic factor indicating relational attraction in group therapy, composed of three dimensions: infusion of hope, cohesion and social learning
Convolutional neural networks for structured omics:OmicsCNN and the OmicsConv layer
Convolutional Neural Networks (CNNs) are a popular deep learning architecture
widely applied in different domains, in particular in classifying over images,
for which the concept of convolution with a filter comes naturally.
Unfortunately, the requirement of a distance (or, at least, of a neighbourhood
function) in the input feature space has so far prevented its direct use on
data types such as omics data. However, a number of omics data are metrizable,
i.e., they can be endowed with a metric structure, enabling to adopt a
convolutional based deep learning framework, e.g., for prediction. We propose a
generalized solution for CNNs on omics data, implemented through a dedicated
Keras layer. In particular, for metagenomics data, a metric can be derived from
the patristic distance on the phylogenetic tree. For transcriptomics data, we
combine Gene Ontology semantic similarity and gene co-expression to define a
distance; the function is defined through a multilayer network where 3 layers
are defined by the GO mutual semantic similarity while the fourth one by gene
co-expression. As a general tool, feature distance on omics data is enabled by
OmicsConv, a novel Keras layer, obtaining OmicsCNN, a dedicated deep learning
framework. Here we demonstrate OmicsCNN on gut microbiota sequencing data, for
Inflammatory Bowel Disease (IBD) 16S data, first on synthetic data and then a
metagenomics collection of gut microbiota of 222 IBD patients.Comment: 7 pages, 3 figures. arXiv admin note: text overlap with
arXiv:1709.0226
Recommended from our members
Pre-treatment clinical and gene expression patterns predict developmental change in early intervention in autism.
Early detection and intervention are believed to be key to facilitating better outcomes in children with autism, yet the impact of age at treatment start on the outcome is poorly understood. While clinical traits such as language ability have been shown to predict treatment outcome, whether or not and how information at the genomic level can predict treatment outcome is unknown. Leveraging a cohort of toddlers with autism who all received the same standardized intervention at a very young age and provided a blood sample, here we find that very early treatment engagement (i.e., <24 months) leads to greater gains while controlling for time in treatment. Pre-treatment clinical behavioral measures predict 21% of the variance in the rate of skill growth during early intervention. Pre-treatment blood leukocyte gene expression patterns also predict the rate of skill growth, accounting for 13% of the variance in treatment slopes. Results indicated that 295 genes can be prioritized as driving this effect. These treatment-relevant genes highly interact at the protein level, are enriched for differentially histone acetylated genes in autism postmortem cortical tissue, and are normatively highly expressed in a variety of subcortical and cortical areas important for social communication and language development. This work suggests that pre-treatment biological and clinical behavioral characteristics are important for predicting developmental change in the context of early intervention and that individualized pre-treatment biology related to histone acetylation may be key