21 research outputs found
Urine steroid metabolomics as a diagnostic tool in primary aldosteronism
Primary aldosteronism (PA) causes 5-10% of hypertension cases, but only a minority of patients are currently diagnosed and treated because of a complex, stepwise, and partly invasive workup. We tested the performance of urine steroid metabolomics, the computational analysis of 24-hour urine steroid metabolome data by machine learning, for the identification and subtyping of PA. Mass spectrometry-based multi-steroid profiling was used to quantify the excretion of 34 steroid metabolites in 24-hour urine samples from 158 adults with PA (88 with unilateral PA [UPA] due to aldosterone-producing adenomas [APAs]; 70 with bilateral PA [BPA]) and 65 sex- and age-matched healthy controls. All APAs were resected and underwent targeted gene sequencing to detect somatic mutations associated with UPA. Patients with PA had increased urinary metabolite excretion of mineralocorticoids, glucocorticoids, and glucocorticoid precursors. Urine steroid metabolomics identified patients with PA with high accuracy, both when applied to all 34 or only the three most discriminative steroid metabolites (average areas under the receiver-operating characteristics curve [AUCs-ROC] 0.95-0.97). Whilst machine learning was suboptimal in differentiating UPA from BPA (average AUCs-ROC 0.65-0.73), it readily identified APA cases harbouring somatic KCNJ5 mutations (average AUCs-ROC 0.79-85). These patients showed a distinctly increased urine excretion of the hybrid steroid 18-hydroxycortisol and its metabolite 18-oxo-tetrahydrocortisol, the latter identified by machine learning as by far the most discriminative steroid. In conclusion, urine steroid metabolomics is a non-invasive candidate test for the accurate identification of PA cases and KCNJ5-mutated APAs.</p
Precision and Recall Reject Curves for Classification
For some classification scenarios, it is desirable to use only those
classification instances that a trained model associates with a high certainty.
To obtain such high-certainty instances, previous work has proposed
accuracy-reject curves. Reject curves allow to evaluate and compare the
performance of different certainty measures over a range of thresholds for
accepting or rejecting classifications. However, the accuracy may not be the
most suited evaluation metric for all applications, and instead precision or
recall may be preferable. This is the case, for example, for data with
imbalanced class distributions. We therefore propose reject curves that
evaluate precision and recall, the recall-reject curve and the precision-reject
curve. Using prototype-based classifiers from learning vector quantization, we
first validate the proposed curves on artificial benchmark data against the
accuracy reject curve as a baseline. We then show on imbalanced benchmarks and
medical, real-world data that for these scenarios, the proposed precision- and
recall-curves yield more accurate insights into classifier performance than
accuracy reject curves.Comment: 11 pages, 3 figures. Updated figure label
Cardiometabolic burden and biomarkers of autonomous cortisol secretion
The overwhelming majority of incidentally discovered adrenal tumours are benign adrenocortical tumours. These can be non-functioning (NFAT) or associated with autonomous cortisol secretion on a spectrum ranging from rare clinically overt adrenal Cushing’s syndrome (CS) to much more prevalent mild autonomous cortisol secretion (MACS) without signs of CS. Here I present the characteristics of a large cohort of persons with newly diagnosed benign adrenocortical tumours.
In 1305 prospectively recruited cases, almost every second person with benign adrenocortical tumours was diagnosed with MACS. Persons with MACS had rates of cardiometabolic disease similar to CS, particularly increased prevalence and severity of hypertension and type 2 diabetes.
Urine multi-steroid profiling of these persons revealed a gradual increase in glucocorticoid excretion from NFAT over MACS to CS, whilst androgen excretion decreased. Increased glucocorticoid and 11-oxygenated androgen metabolite excretion predicted clinical outcomes including hypertension, type 2 diabetes, and the presence of bilateral adrenal tumours.
A representative group of 291 persons underwent untargeted serum metabolome profiling. Prototype-based supervised machine learning identified progressive metabolic changes in MACS and CS suggestive of lipotoxicity, dysfunctional lipid β-oxidation, and oxidative stress across the spectrum of autonomous cortisol secretion.
These results show that MACS is a prevalent cardiometabolic risk condition associated with distinct changes in the steroid and untargeted metabolome. Observed changes offer the prospect of risk stratification of affected individuals
Complex-valued embeddings of generic proximity data
Proximities are at the heart of almost all machine learning methods. If the
input data are given as numerical vectors of equal lengths, euclidean distance,
or a Hilbertian inner product is frequently used in modeling algorithms. In a
more generic view, objects are compared by a (symmetric) similarity or
dissimilarity measure, which may not obey particular mathematical properties.
This renders many machine learning methods invalid, leading to convergence
problems and the loss of guarantees, like generalization bounds. In many cases,
the preferred dissimilarity measure is not metric, like the earth mover
distance, or the similarity measure may not be a simple inner product in a
Hilbert space but in its generalization a Krein space. If the input data are
non-vectorial, like text sequences, proximity-based learning is used or ngram
embedding techniques can be applied. Standard embeddings lead to the desired
fixed-length vector encoding, but are costly and have substantial limitations
in preserving the original data's full information. As an information
preserving alternative, we propose a complex-valued vector embedding of
proximity data. This allows suitable machine learning algorithms to use these
fixed-length, complex-valued vectors for further processing. The complex-valued
data can serve as an input to complex-valued machine learning algorithms. In
particular, we address supervised learning and use extensions of
prototype-based learning. The proposed approach is evaluated on a variety of
standard benchmarks and shows strong performance compared to traditional
techniques in processing non-metric or non-psd proximity data.Comment: Proximity learning, embedding, complex values, complex-valued
embedding, learning vector quantizatio
Accurate non-invasive diagnosis and staging of non-alcoholic fatty liver disease using the urinary steroid metabolome
Background: The development of accurate, non-invasive markers to diagnose and stage non-alcoholic fatty liver disease (NAFLD) is critical to reduce the need for an invasive liver biopsy and to identify patients who are at the highest risk of hepatic and cardio-metabolic complications.Aim(s): As the liver represents the main site of steroid hormone metabolism, and disruption of specific pathways has been described in patients with NAFLD, we hypothesised that assessment of the urinary steroid metabolome may provide a novel, non-invasive biomarker strategy to stage NAFLD.Methods: We analysed the urinary steroid metabolome in 275 subjects (121 with biopsy-proven NAFLD, 48 with alcohol-related cirrhosis, 106 controls), using gas chromatography-mass spectrometry (GC-MS) coupled with machine learning-based generalised matrix learning vector quantisation (GMLVQ) analysis.Results: GMLVQ analysis achieved excellent separation of early (F0-F2) from advanced (F3-F4) fibrosis (AUC ROC: 0.92 [0.91-0.94]). Furthermore, there was near perfect separation of controls from patients with advanced fibrotic NAFLD (AUC ROC=0.99 [0.98-0.99]) and from those with NAFLD cirrhosis (AUC ROC=1.0 [1.0-1.0]). This approach was also able to distinguish patients with NAFLD cirrhosis from those with alcohol-related cirrhosis (AUC ROC=0.83 [0.81-0.85]).Conclusions: Unbiased GMLVQ analysis of the urinary steroid metabolome offers excellent potential as a non-invasive biomarker approach to stage NAFLD fibrosis as well as to screen for NAFLD. A highly sensitive and specific urinary biomarker is likely to have clinical utility both in secondary care and in the broader general population within primary care and could significantly decrease the need for liver biopsy
Feature Relevance Bounds for Linear Classification
Göpfert C, Pfannschmidt L, Hammer B. Feature Relevance Bounds for Linear Classification. In: Verleysen M, ed. Proceedings of the ESANN, 24th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Louvain-la-Neuve: Ciaco - i6doc.com; 2017: 187--192.Biomedical applications often aim for an identification of relevant features for a given classification task, since these carry the promise of semantic insight into the underlying process.
For correlated input dimensions, feature relevances are not unique, and the identification of meaningful subtle biomarkers remains a challenge.
One approach is to identify intervals for the possible relevance of given features, a problem related to all relevant feature determination.
In this contribution, we address the important case of linear classifiers and we transfer the problem how to infer feature relevance bounds to a convex optimization problem.
We demonstrate the superiority of the resulting technique in comparison to popular feature-relevance determination methods in several benchmarks