48 research outputs found

    Uncertainty in Neural Networks: Approximately Bayesian Ensembling

    Get PDF
    Understanding the uncertainty of a neural network's (NN) predictions is essential for many purposes. The Bayesian framework provides a principled approach to this, however applying it to NNs is challenging due to large numbers of parameters and data. Ensembling NNs provides an easily implementable, scalable method for uncertainty quantification, however, it has been criticised for not being Bayesian. This work proposes one modification to the usual process that we argue does result in approximate Bayesian inference; regularising parameters about values drawn from a distribution which can be set equal to the prior. A theoretical analysis of the procedure in a simplified setting suggests the recovered posterior is centred correctly but tends to have an underestimated marginal variance, and overestimated correlation. However, two conditions can lead to exact recovery. We argue that these conditions are partially present in NNs. Empirical evaluations demonstrate it has an advantage over standard ensembling, and is competitive with variational methods.The lead author was funded through EPSRC (EP/N509620/1) and partially accommodated by the Alan Turing Institute

    A Multivariate Framework for Variable Selection and Identification of Biomarkers in High-Dimensional Omics Data

    Get PDF
    In this thesis, we address the identification of biomarkers in high-dimensional omics data. The identification of valid biomarkers is especially relevant for personalized medicine that depends on accurate prediction rules. Moreover, biomarkers elucidate the provenance of disease, or molecular changes related to disease. From a statistical point of view the identification of biomarkers is best cast as variable selection. In particular, we refer to variables as the molecular attributes under investigation, e.g. genes, genetic variation, or metabolites; and we refer to observations as the specific samples whose attributes we investigate, e.g. patients and controls. Variable selection in high-dimensional omics data is a complicated challenge due to the characteristic structure of omics data. For one, omics data is high-dimensional, comprising cellular information in unprecedented details. Moreover, there is an intricate correlation structure among the variables due to e.g internal cellular regulation, or external, latent factors. Variable selection for uncorrelated data is well established. In contrast, there is no consensus on how to approach variable selection under correlation. Here, we introduce a multivariate framework for variable selection that explicitly accounts for the correlation among markers. In particular, we present two novel quantities for variable importance: the correlation-adjusted t (CAT) score for classification, and the correlation-adjusted (marginal) correlation (CAR) score for regression. The CAT score is defined as the Mahalanobis-decorrelated t-score vector, and the CAR score as the Mahalanobis-decorrelated correlation between the predictor variables and the outcome. We derive the CAT and CAR score from a predictive point of view in linear discriminant analysis and regression; both quantities assess the weight of a decorrelated and standardized variable on the prediction rule. Furthermore, we discuss properties of both scores and relations to established quantities. Above all, the CAT score decomposes Hotelling’s T 2 and the CAR score the proportion of variance explained. Notably, the decomposition of total variance into explained and unexplained variance in the linear model can be rewritten in terms of CAR scores. To render our approach applicable on high-dimensional omics data we devise an efficient algorithm for shrinkage estimates of the CAT and CAR score. Subsequently, we conduct extensive simulation studies to investigate the performance of our novel approaches in ranking and prediction under correlation. Here, CAT and CAR scores consistently improve over marginal approaches in terms of more true positives selected and a lower model error. Finally, we illustrate the application of CAT and CAR score on real omics data. In particular, we analyze genomics, transcriptomics, and metabolomics data. We ascertain that CAT and CAR score are competitive or outperform state of the art techniques in terms of true positives detected and prediction error

    Sviluppo di metodologie per la valutazione della freschezza del pesce mediante applicazioni metabonomiche.

    Get PDF
    This study focuses on the use of metabonomics applications in measuring fish freshness in various biological species and in evaluating how they are stored. This metabonomic approach is innovative and is based upon molecular profiling through nuclear magnetic resonance (NMR). On one hand, the aim is to ascertain if a type of fish has maintained, within certain limits, its sensory and nutritional characteristics after being caught; and on the second, the research observes the alterations in the product’s composition. The spectroscopic data obtained through experimental nuclear magnetic resonance, 1H-NMR, of the molecular profiles of the fish extracts are compared with those obtained on the same samples through analytical and conventional methods now in practice. These second methods are used to obtain chemical indices of freshness through biochemical and microbial degradation of the proteic nitrogen compounds and not (trimethylamine, N-(CH3)3, nucleotides, amino acids, etc.). At a later time, a principal components analysis (PCA) and a linear discriminant analysis (PLS-DA) are performed through a metabonomic approach to condense the temporal evolution of freshness into a single parameter. In particular, the first principal component (PC1) under both storage conditions (4 °C and 0 °C) represents the component together with the molecular composition of the samples (through 1H-NMR spectrum) evolving during storage with a very high variance. The results of this study give scientific evidence supporting the objective elements evaluating the freshness of fish products showing those which can be labeled “fresh fish.”Il presente studio è centrato all’utilizzo di applicazioni metabonomiche, finalizzate alla misura della freschezza di prodotti ittici in funzione della specie biologica e della modalità di conservazione. Questo approccio metabonomico nello studio in esame è innovativo e si basa sulla profilazione molecolare mediante la risonanza magnetica nucleare (NMR), per valutare da una parte se una tipologia di pesce ha ancora mantenuto, entro certi limiti, le proprie caratteristiche sensoriali e nutrizionali presenti al tempo iniziale e dall’altra per osservare le eventuali alterazioni che intervengono nella composizione del prodotto ittico. I dati spettroscopici sperimentali ottenuti attraverso la risonanza magnetica nucleare, 1H-NMR, dei profili molecolari di estratti di pesce preparati in modo opportuno sono stati confrontati con quelli ottenuti sugli stessi campioni attraverso metodiche analitiche strumentali classiche e convenzionali, a cui le metodologie ufficiali fanno riferimento. Quest’ultime vengono utilizzate per l’ottenimento di indici chimici di freschezza derivanti dalla degradazione biochimica e microbica di composti azotati proteici e non (trimetilammina ,N-(CH3)3, nucleotidi, amminoacidi, ecc.). In un secondo momento mediante un approccio metabonomico è stata eseguita un’analisi delle componenti principali (PCA) e un’analisi discriminante lineare (PLS-DA) al fine di condensare il concetto di evoluzione temporale della freschezza in un parametro omnicomprensivo. In particolare, la prima componente principale (PC1) in entrambe le condizioni di conservazione (4 °C e 0 °C) rappresenta la componente lungo la quale la composizione molecolare dei campioni, descritta dallo spettro1H-NMR, evolve durante il tempo di conservazione con una varianza molto elevata. I risultati di questo studio vogliono mirare ad ottenere un supporto scientifico che sia in grado di fornire elementi oggettivi di valutazione, per far si che il prodotto ittico possa fregiarsi della denominazione di “pesce fresco”

    Parametric classification in domains of characters, numerals, punctuation, typefaces and image qualities

    Get PDF
    This thesis contributes to the Optical Font Recognition problem (OFR), by developing a classifier system to differentiate ten typefaces using a single English character ‘e’. First, features which need to be used in the classifier system are carefully selected after a thorough typographical study of global font features and previous related experiments. These features have been modeled by multivariate normal laws in order to use parameter estimation in learning. Then, the classifier system is built up on six independent schemes, each performing typeface classification using a different method. The results have shown a remarkable performance in the field of font recognition. Finally, the classifiers have been implemented on Lowercase characters, Uppercase characters, Digits, Punctuation and also on Degraded Images
    corecore