30 research outputs found
A model-agnostic algorithm for Bayes error determination in binary classification
This paper presents the intrinsic limit determination algorithm (ILD Algorithm), a novel technique to determine the best possible performance, measured in terms of the AUC (area under the ROC curve) and accuracy, that can be obtained from a specific dataset in a binary classification problem with categorical features regardless of the model used. This limit, namely, the Bayes error, is completely independent of any model used and describes an intrinsic property of the dataset. The ILD algorithm thus provides important information regarding the prediction limits of any binary classification algorithm when applied to the considered dataset. In this paper, the algorithm is described in detail, its entire mathematical framework is presented and the pseudocode is given to facilitate its implementation. Finally, an example with a real dataset is given
One-dimensional convolutional neural networks design for fluorescence spectroscopy with prior knowledge : explainability techniques applied to olive oil fluorescence spectra
Optical spectra, and particularly fluorescence spectra, contain a large quantity of information about the substances and their interaction with the environment. It is of great interest, therefore, to try to extract as much of this information as possible, as optical measurements can be easy, non-invasive, and can happen in-situ making the data collection a very appealing method of gathering knowledge. Artificial neural networks are known for their feature extraction capabilities and are therefore well suited for this challenge. In this work, inspired by convolutional neural network (CNN) architectures in 2D and their success with images, a novel approach using one-dimensional convolutional neural networks (1D-CNN) is used to extract information on the measured spectra by using explainability techniques. The 1D-CNN architecture has as input the entire fluorescence spectrum and takes advantage in its design of prior knowledge about the instrumentation and sample characteristics as, for example, spectrometer resolution or the expected number of relevant features in the spectrum. Even if network performance is good, it remains an open question if the features used for the predictions make sense from a physical and chemical point of view and if they match what is known from existing studies. This work studies the output of the convolutional layers, known as feature maps, to understand which features the network has effectively used for the predictions, and thus which part of the measured spectra contains the relevant information about the phenomena at the basis of what has to be predicted. The proposed approach is demonstrated by applying it to the determination of the UV absorbance at 232 nm, K232, from fluorescence spectra using a dataset of 18 Spanish olive oils, which were chemically analyzed from certified laboratories. The 1D-CNN successfully predicts the parameter K232 and enables, by studying feature maps, the clear identification of the relevant spectral features. The main contributions of this work are two. Firstly, it describes how designing the neural network architecture with prior knowledge (spectrometer resolution, etc.) will help the network in learning features that have a clear connection to the chemical composition of the substances, and thus are clearly explainable. Secondly, it shows how, in the case of olive oil, the identified features match perfectly the relevant features known from existing previous studies, thus confirming that the network is learning from the underlying chemical process
Extraction of physicochemical properties from the fluorescence spectrum with 1D convolutional neural networks : application to olive oil
One of the main challenges for olive oil producers is the ability to assess oil quality regularly during the production cycle. The quality of olive oil is evaluated through a series of parameters that can be determined, up to now, only through multiple chemical analysis techniques. This requires samples to be sent to approved laboratories, making the quality control an expensive, time-consuming process, that cannot be performed regularly and cannot guarantee the quality of oil up to the point it reaches the consumer. This work presents a new approach that is fast and based on low-cost instrumentation, and which can be easily performed in the field. The proposed method is based on fluorescence spectroscopy and one-dimensional convolutional neural networks and allows to predict five chemical quality indicators of olive oil (acidity, peroxide value, UV spectroscopic parameters K270 and K232, and ethyl esters) from one single fluorescence spectrum obtained with a very fast measurement from a low-cost portable fluorescence sensor. The results indicate that the proposed approach gives exceptional results for quality determination through the extraction of the relevant physicochemical parameters. This would make the continuous quality control of olive oil during and after the entire production cycle a reality
Exploration of Spanish olive oil quality with a miniaturized low-cost fluorescence sensor and machine learning techniques
This article belongs to the Special Issue Advanced Analysis Methods for Food Safety, Authenticity and Traceability AssessmentExtra virgin olive oil (EVOO) is the highest quality of olive oil and is characterized by highly beneficial nutritional properties. The large increase in both consumption and fraud, for example through adulteration, creates new challenges and an increasing demand for developing new quality assessment methodologies that are easier and cheaper to perform. As of today, the determination of olive oil quality is performed by producers through chemical analysis and organoleptic evaluation. The chemical analysis requires advanced equipment and chemical knowledge of certified laboratories, and has therefore limited accessibility. In this work a minimalist, portable, and low-cost sensor is presented, which can perform olive oil quality assessment using fluorescence spectroscopy. The potential of the proposed technology is explored by analyzing several olive oils of different quality levels, EVOO, virgin olive oil (VOO), and lampante olive oil (LOO). The spectral data were analyzed using a large number of machine learning methods, including artificial neural networks. The analysis performed in this work demonstrates the possibility of performing the classification of olive oil in the three mentioned classes with an accuracy of 100%. These results confirm that this minimalist low-cost sensor has the potential to substitute expensive and complex chemical analysis
A Machine Learning Approach for Mortality Prediction in COVID-19 Pneumonia: Development and Evaluation of the Piacenza Score
Background: Several models have been developed to predict mortality in patients with COVID-19 pneumonia, but only a few have demonstrated enough discriminatory capacity. Machine learning algorithms represent a novel approach for the data-driven prediction of clinical outcomes with advantages over statistical modeling.Objective: We aimed to develop a machine learning-based score-the Piacenza score-for 30-day mortality prediction in patients with COVID-19 pneumonia.Methods: The study comprised 852 patients with COVID-19 pneumonia, admitted to the Guglielmo da Saliceto Hospital in Italy from February to November 2020. Patients' medical history, demographics, and clinical data were collected using an electronic health record. The overall patient data set was randomly split into derivation and test cohorts. The score was obtained through the naive Bayes classifier and externally validated on 86 patients admitted to Centro Cardiologico Monzino (Italy) in February 2020. Using a forward-search algorithm, 6 features were identified: age, mean corpuscular hemoglobin concentration, PaO2/FiO(2) ratio, temperature, previous stroke, and gender. The Brier index was used to evaluate the ability of the machine learning model to stratify and predict the observed outcomes. A user-friendly website was designed and developed to enable fast and easy use of the tool by physicians. Regarding the customization properties of the Piacenza score, we added a tailored version of the algorithm to the website, which enables an optimized computation of the mortality risk score for a patient when some of the variables used by the Piacenza score are not available. In this case, the naive Bayes classifier is retrained over the same derivation cohort but using a different set of patient characteristics. We also compared the Piacenza score with the 4C score and with a naive Bayes algorithm with 14 features chosen a priori.Results: The Piacenza score exhibited an area under the receiver operating characteristic curve (AUC) of 0.78 (95% CI 0.74-0.84, Brier score=0.19) in the internal validation cohort and 0.79 (95% CI 0.68-0.89, Brier score=0.16) in the external validation cohort, showing a comparable accuracy with respect to the 4C score and to the naive Bayes model with a priori chosen features; this achieved an AUC of 0.78 (95% CI 0.73-0.83, Brier score=0.26) and 0.80 (95% CI 0.75-0.86, Brier score=0.17), respectively.Conclusions: Our findings demonstrated that a customizable machine learning-based score with a purely data-driven selection of features is feasible and effective for the prediction of mortality among patients with COVID-19 pneumonia
Cardiovascular Risk Prediction in Ankylosing Spondylitis: From Traditional Scores to Machine Learning Assessment
Abstract Introduction The performance of seven cardiovascular (CV) risk algorithms is evaluated in a multicentric cohort of ankylosing spondylitis (AS) patients. Performance and calibration of traditional CV predictors have been compared with the novel paradigm of machine learning (ML). Methods A retrospective analysis of prospectively collected data from an AS cohort has been performed. The primary outcome was the first CV event. The discriminatory ability of the algorithms was evaluated using the area under the receiver operating characteristic (ROC) curve (AUC), which is like the concordance-statistic (c-statistic). Three ML techniques were considered to calculate the CV risk: support vector machine (SVM), random forest (RF), and k-nearest neighbor (KNN). Results Of 133 AS patients enrolled, 18 had a CV event. c-statistic scores of 0.71, 0.61, 0.66, 0.68, 0.66, 0.72, and 0.67 were found, respectively, for SCORE, CUORE, FRS, QRISK2, QRISK3, RRS, and ASSIGN. AUC values for the ML algorithms were: 0.70 for SVM, 0.73 for RF, and 0.64 for KNN. Feature analysis showed that C-reactive protein (CRP) has the highest importance, while SBP and hypertension treatment have lower importance. Conclusions All of the evaluated CV risk algorithms exhibit a poor discriminative ability, except for RRS and SCORE, which showed a fair performance. For the first time, we demonstrated that AS patients do not show the traditional ones used by CV scores and that the most important variable is CRP. The present study contributes to a deeper understanding of CV risk in AS, allowing the development of innovative CV risk patient-specific models
Chemical analysis of olive oils from fluorescence spectra thanks to one-dimensional convolutional neural networks
Optical Sensing and Detection VII: 12139-81The chemical analysis of food is essential to monitor and guarantee its quality. The determination of the chemical parameters, like the concentration of particular substances, is performed by specialized laboratories and is a time-consuming and costly process. Therefore, alternative methods with easier handling are of great interest. Among these fluorescence spectroscopy offers great opportunities. Fluorescence spectra are one-dimensional arrays of values already successfully employed together with artificial neural networks for classification problems in chemistry, physics, and other fields. However, the extraction of specific quantities from the spectra poses a much harder challenge. This work analyzes and compares the ability of feed-forward neural networks (FFNN) and one-dimensional convolutional neural networks (1D-CNN) to extract relevant features from fluorescence spectra of olive oils. The results indicate that 1D-CNN, contrary to FFNN, successfully predicts the chemical parameters with high accuracy. The great advantages of the proposed method are: 1) the possibility of using optical methods instead of time-consuming chemical ones, like chromatography, 2) the lack of any special sample handling, like dilution and 3) the lack of any pre-processing of the data. The problem of small datasets, which may arise for novel techniques like the proposed one, is also addressed statistically by using the leave-one-out resampling technique
Chemical analysis of olive oils from fluorescence spectra thanks to one-dimensional convolutional neural networks
Optical Sensing and Detection VII: 12139-81The chemical analysis of food is essential to monitor and guarantee its quality. The determination of the chemical parameters, like the concentration of particular substances, is performed by specialized laboratories and is a time-consuming and costly process. Therefore, alternative methods with easier handling are of great interest. Among these fluorescence spectroscopy offers great opportunities. Fluorescence spectra are one-dimensional arrays of values already successfully employed together with artificial neural networks for classification problems in chemistry, physics, and other fields. However, the extraction of specific quantities from the spectra poses a much harder challenge. This work analyzes and compares the ability of feed-forward neural networks (FFNN) and one-dimensional convolutional neural networks (1D-CNN) to extract relevant features from fluorescence spectra of olive oils. The results indicate that 1D-CNN, contrary to FFNN, successfully predicts the chemical parameters with high accuracy. The great advantages of the proposed method are: 1) the possibility of using optical methods instead of time-consuming chemical ones, like chromatography, 2) the lack of any special sample handling, like dilution and 3) the lack of any pre-processing of the data. The problem of small datasets, which may arise for novel techniques like the proposed one, is also addressed statistically by using the leave-one-out resampling technique
Understanding the learning mechanism of convolutional neural networks applied to fluorescence spectra
AI and Optical Data Sciences IV; Paper 12438-50The power of artificial neural networks to determine the quality and properties of olive oil was proven by several studies in the last years. Less clear is, however, how the neural network is able to extract useful information from the input data. This work investigates the learning mechanism of one-dimensional convolutional neural networks (1D-CNNs) trained to predict the physicochemical properties of olive oil from single fluorescence spectra. Such a 1D-CNN can successfully predict the parameters relevant to the quality assessment: acidity, peroxide value, and UV absorbance. To go beyond a simple quality assessment algorithm, it is important to identify which spectral features in the measured spectra are correlated with each chemical parameter and therefore with the quality of olive oil. To obtain this information, explainability techniques can be used by studying the latent feature space generated by the intermediate layers of the one-dimensional trained convolutional neural network. This work analyses in detail the common features that are used by the 1D-CNN to predict the two physicochemical parameters: acidity and K232
One-dimensional convolutional neural networks design for fluorescence spectroscopy with prior knowledge : explainability techniques applied to olive oil fluorescence spectra
Invited oral presentation
Optical Sensing and Detection VIIOptical spectra, and particularly fluorescence spectra, contain a large quantity of information about the substances and their interaction with the environment. It is of great interest, therefore, to try to extract as much of this information as possible, as optical measurements can be easy, non-invasive, and can happen in-situ making the data collection a very appealing method of gathering knowledge. Artificial neural networks are known for their feature extraction capabilities and are therefore well suited for this challenge. In this work, inspired by convolutional neural network (CNN) architectures in 2D and their success with images, a novel approach using one-dimensional convolutional neural networks (1D-CNN) is used to extract information on the measured spectra by using explainability techniques. The 1D-CNN architecture has as input the entire fluorescence spectrum and takes advantage in its design of prior knowledge about the instrumentation and sample characteristics as, for example, spectrometer resolution or the expected number of relevant features in the spectrum. Even if network performance is good, it remains an open question if the features used for the predictions make sense from a physical and chemical point of view and if they match what is known from existing studies. This work studies the output of the convolutional layers, known as feature maps, to understand which features the network has effectively used for the predictions, and thus which part of the measured spectra contains the relevant information about the phenomena at the basis of what has to be predicted. The proposed approach is demonstrated by applying it to the determination of the UV absorbance at 232 nm, K232, from fluorescence spectra using a dataset of 18 Spanish olive oils, which were chemically analysed from certified laboratories. The 1D-CNN successfully predicts the parameter K232 and enables, by studying feature maps, the clear identification of the relevant spectral features. The main contributions of this work are two. Firstly, it describes how designing the neural network architecture with prior knowledge (spectrometer resolution, etc.) will help the network in learning features that have a clear connection to the chemical composition of the substances, and thus are clearly explainable. Secondly, it shows how, in the case of olive oil, the identified features match perfectly the relevant features known from existing previous studies, thus confirming that the network is learning from the underlying chemical process