3 research outputs found

    Classification of high dimensional data using LASSO ensembles

    Get PDF
    Urda, D., Franco, L. and Jerez, J.M. (2017). Classification of high dimensional data using LASSO ensembles. Proceedings IEEE SSCI'17, Symposium Series on Computational Intelligence, Honolulu, Hawaii, U.S.A. (2017). ISBN: 978-1-5386-2726-6The estimation of multivariable predictors with good performance in high dimensional settings is a crucial task in biomedical contexts. Usually, solutions based on the application of a single machine learning model are provided while the use of ensemble methods is often overlooked within this area despite the well-known benefits that these methods provide in terms of predictive performance. In this paper, four ensemble approaches are described using LASSO base learners to predict the vital status of a patient from RNA-Seq gene expression data. The results of the analysis carried out in a public breast invasive cancer (BRCA) dataset shows that the ensemble approaches outperform statistically significant the standard LASSO model considered as baseline case. We also perform an analysis of the computational costs involved for each of the approaches, providing different usage recommendations according to the available computational power.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec

    Sparse Modeling for Predicting Class Assignments for Trace Forensic Evidence

    Get PDF
    Numerous disciplines in forensic science utilize various types of spectral data when analyzing evidence. Spectral techniques are particularly critical for the analysis of trace evidence, as these methods are normally non-destructive. Preserving evidence, especially trace evidence, is of high priority in a forensic laboratory setting. Once a piece of evidence has been fully consumed, no more analyses can be performed. Typically, visual comparison, or spectral overlay, is performed to compare questioned samples (evidence) to standards or knowns. However, such an approach may not be optimal in distinguishing the subtle, yet highly important, discriminating characteristics present in the spectra. As statistical analysis becomes increasingly influential in the forensic science community, multivariate chemometric approaches may aid in overcoming the major downfall of spectral overlay to classify and identify samples. More traditional approaches allow for dimension reduction and classification of samples. However, multivariate data sets can pose a problem with having far fewer samples than variables to build the classification model. Sparse statistical approaches overcome this limitation by reducing the number of variables retained in the final model. Only a few, significant parameters remain. This reduces model complexity and increases prediction accuracy of the model. Here, logistic regression with Lasso regularization is the sparse approach that was compared to traditional classification techniques to group fiber and lipstick samples based on their spectral data

    Conservation of Polyester-urethane Magnetic Audio Tapes and Discrimination of Textile Fiber Dyes Using Spectroscopy and Chemometrics

    Get PDF
    This dissertation explores and discusses conservation of magnetic audio tapes and discrimination of fiber dyes employing spectroscopic techniques and chemometrics. Conservation of polyester urethane magnetic audio tapes has become a major challenge since they are susceptible to degradation via hydrolysis. The world’s modern cultural history is vastly recorded in this media, and now conservationists need a non-destructive technique to determine the playability status of these tapes. Once conservationists identify degraded tapes, they can be subjected to the baking process. This process temporarily reverses hydrolysis and provides enough time to digitize tapes, preserving information for future generations. The first three chapters of this dissertation investigates use of attenuated total reflectance Fourier transform infrared spectroscopy (ATR FT-IR) and chemometrics as a nondestructive technique to determine the degradation status of tapes under different circumstances. The first chapter of this study focuses on determining how reliable it is to collect spectra at the beginning of the magnetic audio tapes. This is a main concern since tapes used in this study are lengthy, and it is crucial to find the best sample location to represent the entire tape to minimize misclassifications. Using two different test sets and six different classification techniques, it was found with above 90% prediction accuracy that taking spectra from the beginning of the tape is probably representative of the degradation behavior of most tapes. The second chapter focuses on building a more robust model to determine the degradation status of problematic tapes. This was successfully achieved by using neural networks and least absolute shrinkage and selection operator (Lasso). The third part of this study explores the degradation taking place along the back-coat and compares the results with the magnetic layer of the same tape identities. Results obtained show poor predictability in the back layer, indicating that the magnetic layer is the preferred side to obtain spectra to determine the degradation status. The final part of this dissertation discusses the applications of chemometrics and microspectrophotometry to differentiate fiber dyes. This was demonstrated using the blue nylon 6 subgroup with seven different fiber identities. Applying three different machining learning approaches, it was determined that Lasso is probably the best technique to differentiate fiber dyes
    corecore