5 research outputs found

    Robust variable selection for model-based learning in presence of adulteration

    Full text link
    The problem of identifying the most discriminating features when performing supervised learning has been extensively investigated. In particular, several methods for variable selection in model-based classification have been proposed. Surprisingly, the impact of outliers and wrongly labeled units on the determination of relevant predictors has received far less attention, with almost no dedicated methodologies available in the literature. In the present paper, we introduce two robust variable selection approaches: one that embeds a robust classifier within a greedy-forward selection procedure and the other based on the theory of maximum likelihood estimation and irrelevance. The former recasts the feature identification as a model selection problem, while the latter regards the relevant subset as a model parameter to be estimated. The benefits of the proposed methods, in contrast with non-robust solutions, are assessed via an experiment on synthetic data. An application to a high-dimensional classification problem of contaminated spectroscopic data concludes the paper

    Multivariate Statistical Models for the Authentication of Traditional Balsamic Vinegar of Modena and Balsamic Vinegar of Modena on 1H-NMR Data: Comparison of Targeted and Untargeted Approaches

    Get PDF
    This work aimed to compare targeted and untargeted approaches based on NMR data for the construction of classification models for Traditional Balsamic Vinegar of Modena (TBVM) and Balsamic Vinegar of Modena (BVM). Their complexity in terms of composition makes the authentication of these products difficult, which requires the employment of several time-consuming analytical methods. Here, 1H-NMR spectroscopy was selected as the analytical method for the analysis of TVBM and BVM due to its rapidity and efficacy in food authentication. 1H-NMR spectra of old (>12 years) and extra-old (>25 years) TVBM and BVM (>60 days) and aged (>3 years) BVM were acquired, and targeted and untargeted approaches were used for building unsupervised and supervised multivariate statistical modes. Targeted and untargeted approaches were based on quantitative results of peculiar compounds present in vinegar obtained through qNMR, and all spectral variables, respectively. Several classification models were employed, and linear discriminant analysis (LDA) demonstrated sensitivity and specificity percentages higher than 85% for both approaches. The most important discriminating variables were glucose, fructose, and 5-hydroxymethylfurfural. The untargeted approach proved to be the most promising strategy for the construction of LDA models of authentication for TVBM and BVM due to its easier applicability, rapidity, and slightly higher predictive performance. The proposed method for authenticating TBVM and BVM could be employed by Italian producers for safeguarding their valuable products

    PARAMETRIC STUDY OF A CNC TURNING PROCESS USING DISCRIMINANT ANALYSIS

    Get PDF
    In the present day manufacturing scenario, computer numerical control (CNC) technology has evolved out as a cost effective process to perform repetitive, difficult and unsafe machining tasks while fulfilling the dynamic requirements of high dimensional accuracy and low surface finish. Adoption of CNC technology would help an organization in achieving enhanced productivity, better product quality and higher flexibility. In this paper, an endeavor is put forward to apply discriminant analysis as a multivariate statistical tool to investigate the effects of speed, feed, depth of cut, nose radius and type of the machining environment of a CNC turning center on surface roughness, tool life, cutting force and power consumption. Simultaneous discrimination analysis develops the corresponding discriminant function for each of the responses taking into account all the input parameters together. On the contrary, step-wise discriminant analysis develops the same functions while considering only those significant input parameters influencing the responses. Higher values of hit ratio and cross-validation percentage prove the application of both the discriminant functions as effective prediction tools for achieving enhanced performance of the considered CNC turning operation

    Classification Models for Progression of Chronic Kidney Disease within a Secondary Prevention Program

    Get PDF
    Loss of renal function has severe repercussions in patients’ health and life quality. Using scientific tools to improve the knowledge of the disease and to prevent its progression on each patient could prevent terminal stages and even save lives. For a set of patients enrolled in a secondary prevention program, which aims to avoid reaching advanced stages of chronic kidney disease, we developed a complete statistical strategy: first, we described and prepared the data set. Then, we made groups of patients and afterwards we fit some classification models to understand such partition. Finally, we developed and estimation of the patients’ future trajectory. We found that the classification models had good performance, with even 90% of good classification, also, that the estimation on the future trajectory seemed to be reliable, even in patients in which the model was not trained. Finally, an interactive tool was created in order to allow a real use of the results of this work in the diary medical careLa pérdida de la función renal tiene repercusiones significativas en la salud y en la vida de los pacientes. Con el uso de herramientas estadísticas es posible mejorar el conocimiento de la enfermedad y predecir el comportamiento de esta en cada paciente, haciendo viable prevenir etapas terminales e incluso permitiendo salvar vidas. En este trabajo se combinan técnicas estadísticas con conocimiento medico en nefrología para obtener una herramienta que ayude a los médicos a tratar y a tomar decisiones sobre sus pacientes. Para este fin, se tomó un conjunto de pacientes que pertenecen a un programa de prevención secundaria que trata de evitar la llegada a fases avanzadas de la enfermedad renal crónica y, primero, se desarrolló una estrategia estadística en la que inicialmente se describió y preparo la base de datos. Después, se formaron grupos de pacientes y se ajustaron algunos modelos de clasificación para analizar las particiones. Finalmente, se realizó una estimación de la trayectoria futura de los pacientes. Encontramos un buen desempeño de los modelos de clasificación, con hasta el 90% de buena clasificación, además, la estimación de la trayectoria futura dio resultados confiables, incluso en pacientes en los que el modelo no se había entrenado. Finalmente, se creó una herramienta interactiva para permitir el uso real de los resultados de este trabajo en la práctica clínica diaria.Magíster en EstadisticaMaestrí
    corecore