Search CORE

70 research outputs found

Harnessing function from form: towards bio-inspired artificial intelligence in neuronal substrates

Author: Dold Dominik
Publication venue
Publication date: 01/01/2020
Field of study

Despite the recent success of deep learning, the mammalian brain is still unrivaled when it comes to interpreting complex, high-dimensional data streams like visual, auditory and somatosensory stimuli. However, the underlying computational principles allowing the brain to deal with unreliable, high-dimensional and often incomplete data while having a power consumption on the order of a few watt are still mostly unknown. In this work, we investigate how specific functionalities emerge from simple structures observed in the mammalian cortex, and how these might be utilized in non-von Neumann devices like “neuromorphic hardware”. Firstly, we show that an ensemble of deterministic, spiking neural networks can be shaped by a simple, local learning rule to perform sampling-based Bayesian inference. This suggests a coding scheme where spikes (or “action potentials”) represent samples of a posterior distribution, constrained by sensory input, without the need for any source of stochasticity. Secondly, we introduce a top-down framework where neuronal and synaptic dynamics are derived using a least action principle and gradient-based minimization. Combined, neurosynaptic dynamics approximate real-time error backpropagation, mappable to mechanistic components of cortical networks, whose dynamics can again be described within the proposed framework. The presented models narrow the gap between well-defined, functional algorithms and their biophysical implementation, improving our understanding of the computational principles the brain might employ. Furthermore, such models are naturally translated to hardware mimicking the vastly parallel neural structure of the brain, promising a strongly accelerated and energy-efficient implementation of powerful learning and inference algorithms, which we demonstrate for the physical model system “BrainScaleS–1”

Heidelberger Dokumentenserver

Data analysis and interpretable machine learning for HVAC predictive control: A case-study based implementation

Author: Grammenos Ryan
Karagiannis Konstantinos
Mao Jianqiao
Publication venue
Publication date: 24/08/2023
Field of study

University of Birmingham Research Portal

Enhancing associative memory recall and storage capacity using confocal cavity QED

Author: Ganguli Surya
Gopalakrishnan Sarang
Guo Yudan
Keeling Jonathan
Kroeze Ronen M.
Lev Benjamin L.
Marsh Brendan P.
Publication venue: 'American Physical Society (APS)'
Publication date: 02/09/2020
Field of study

Funding: Y.G. and B.M. acknowledgefunding from the Stanford Q-FARM Graduate Student Fellowship and the NSF Graduate Research Fellowship, respectively. J.K. acknowledges support from the Leverhulme Trust (IAF-2014-025), and S.G. acknowledges funding from the James S. McDonnell and Simons Foundations and an NSF Career Award.We introduce a near-term experimental platform for realizing an associative memory. It can simultaneously store many memories by using spinful bosons coupled to a degenerate multimode optical cavity. The associative memory is realized by a confocal cavity QED neural network, with the modes serving as the synapses, connecting a network of superradiant atomic spin ensembles,which serve as the neurons. Memories are encoded in the connectivity matrix between the spins and can be accessed through the input and output of patterns of light. Each aspect of the scheme is based on recently demonstrated technology using a confocal cavity and Bose-condensed atoms. Our scheme has two conceptually novel elements. First, it introduces a new form of random spin system that interpolates between a ferromagnetic and a spin glass regime as a physical parameter is tuned—the positions of ensembles within the cavity. Second, and more importantly, the spins relax via deterministic steepest-descent dynamics rather than Glauber dynamics. We show that this nonequilibrium quantum-optical scheme has significant advantages for associative memory over Glauber dynamics: These dynamics can enhance the network’s ability to store and recall memories beyond that of the standard Hopfield model. Surprisingly, the cavity QED dynamics can retrieve memories even when the system is in the spin glass phase. Thus, the experimental platform provides a novel physical instantiation of associative memories and spin glasses as well as provides an unusual form of relaxational dynamics that is conducive to memory recall even in regimes where it was thought to be impossible.Publisher PDFPeer reviewe

arXiv.org e-Print Archive

University of St. Andrews - Pure

St Andrews Research Repository

Models for time series prediction based on neural networks. Case study : GLP sales prediction from ANCAP.

Author: Paggi Straneo Horacio
Publication venue: Udelar.FI.
Publication date: 01/01/2013
Field of study

A time series is a sequence of real values that can be considered as observations of a certain system. In this work, we are interested in time series coming from dynamical systems. Such systems can be sometimes described by a set of equations that model the underlying mechanism from where the samples come. However, in several real systems, those equations are unknown, and the only information available is a set of temporal measures, that constitute a time series. On the other hand, by practical reasons it is usually required to have a prediction, v.g. to know the (approximated) value of the series in a future instant t. The goal of this thesis is to solve one of such real-world prediction problem: given historical data related with the lique ed bottled propane gas sales, predict the future gas sales, as accurately as possible. This time series prediction problem is addressed by means of neural networks, using both (dynamic) reconstruction and prediction. The problem of to dynamically reconstruct the original system consists in building a model that captures certain characteristics of it in order to have a correspondence between the long-term behavior of the model and of the system. The networks design process is basically guided by three ingredients. The dimensionality of the problem is explored by our rst ingredient, the Takens-Mañé's theorem. By means of this theorem, the optimal dimension of the (neural) network input can be investigated. Our second ingredient is a strong theorem: neural networks with a single hidden layer are universal approximators. As the third ingredient, we faced the search of the optimal size of the hidden layer by means of genetic algorithms, used to suggest the number of hidden neurons that maximizes a target tness function (related with prediction errors). These algorithms are also used to nd the most in uential networks inputs in some cases. The determination of the hidden layer size is a central (and hard) problem in the determination of the network topology. This thesis includes a state of the art of neural networks design for time series prediction, including related topics such as dynamical systems, universal approximators, gradient-descent searches and variations, as well as meta-heuristics. The survey of the related literature is intended to be extensive, for both printed material and electronic format, in order to have a landscape of the main aspects for the state of the art in time series prediction using neural networks. The material found was sometimes extremely redundant (as in the case of the back-propagation algorithm and its improvements) and scarce in others (memory structures or estimation of the signal subspace dimension in the stochastic case). The surveyed literature includes classical research works ([27], [50], [52]) as well as more recent ones ([79] , [16] or [82]), which pretends to be another contribution of this thesis. Special attention is given to the available software tools for neural networks design and time series processing. After a review of the available software packages, the most promising computational tools for both approaches are discussed. As a result, a whole framework based on mature software tools was set and used. In order to work with such dynamical systems, software intended speci cally for the analysis and processing of time series was employed, and then chaotic series were part of our focus. Since not all randomness is attributable to chaos, in order to characterize the dynamical system generating the time series, an exploration of chaotic-stochastic systems is required, as well as network models to predict a time series associated to one of them. Here we pretend to show how the knowledge of the domain, something extensively treated in the bibliography, can be someway sophisticated (such as the Lyapunov's spectrum for a series or the embedding dimension). In order to model the dynamical system generated by the time series we used the state-space model, so the time series prediction was translated in the prediction of the next system state. This state-space model, together with the delays method (delayed coordinates) have practical importance for the development of this work, speci cally, the design of the input layer in some networks (multi-layer perceptrons - MLPs) and other parameters (taps in the TFLNs). Additionally, the rest of the network components where determined in many cases through procedures traditionally used in neural networks : genetic algorithms. The criteria of model (network) selection are discussed and a trade-o between performance and network complexity is further explored, inspired in the Rissanen's minimum description length and its estimation given by the chosen software. Regarding the employed network models, the network topologies suggested from the literature as adequate for the prediction are used (TLFNs and recurrent networks) together with MLPs (a classic of arti cial neural networks) and networks committees. The e ectiveness of each method is con rmed for the proposed prediction problem. Network committees, where the predictions are a naive convex combination of predictions from individual networks, are also extensively used. The need of criteria to compare the behaviors of the model and of the real system, in the long run, for a dynamic stochastic systems, is presented and two alternatives are commented. The obtained results proof the existence of a solution to the problem of learning of the dependence Input ! Output . We also conjecture that the system is dynamic-stochastic but not chaotic, because we only have a realization of the random process corresponding to the sales. As a non-chaotic system, the mean of the predictions of the sales would improve as the available data increase, although the probability of a prediction with a big error is always non-null due to the randomness present. This solution is found in a constructive and exhaustive way. The exhaustiveness can be deduced from the next ve statements: the design of a neural network requires knowing the input and output dimension,the number of the hidden layers and of the neurons in each of them. the use of the Takens-Mañé's theorem allows to derive the dimension of the input data by theorems such as the Kolmogorov's and Cybenko's ones the use of multi-layer perceptrons with only one hidden layer is justi ed so several of such models were tested the number of neurons in the hidden layer is determined many times heuristically using genetic algorithms a neuron in the output gives the desired prediction As we said, two tasks are carried out: the development of a time series prediction model and the analysis of a feasible model for the dynamic reconstruction of the system. With the best predictive model, obtained by an ensemble of two networks, an acceptable average error was obtained when the week to be predicted is not adjacent to the training set (7.04% for the week 46/2011). We believe that these results are acceptable provided the quantity of information available, and represent an additional validation that neural networks are useful for time series prediction coming from dynamical systems, no matter whether they are stochastic or not. Finally, the results con rmed several already known facts (such as that adding noise to the inputs and outputs of the training values can improve the results; that recurrent networks trained with the back-propagation algorithm don't have the problem of vanishing gradients in short periods and that the use of committees - which can be seen as a very basic of distributed arti cial intelligence - allows to improve signi cantly the predictions).Una serie temporal es una secuencia de valores reales que pueden ser considerados como observaciones de un cierto sistema. En este trabajo, estamos interesados en series temporales provenientes de sistemas dinámicos. Tales sistemas pueden ser algunas veces descriptos por un conjunto de ecuaciones que modelan el mecanismo subyacente que genera las muestras. sin embargo, en muchos sistemas reales, esas ecuaciones son desconocidas, y la única información disponible es un conjunto de medidas en el tiempo, que constituyen la serie temporal. Por otra parte, por razones prácticas es generalmente requerida una predicción, es decir, conocer el valor (aproximado) de la serie en un instante futuro t. La meta de esta tesis es resolver un problema de predicción del mundo real: dados los datos históricos relacionados con las ventas de gas propano licuado, predecir las ventas futuras, tan aproximadamente como sea posible. Este problema de predicción de series temporales es abordado por medio de redes neuronales, tanto para la reconstrucción como para la predicción. El problema de reconstruir dinámicamente el sistema original consiste en construir un modelo que capture ciertas características de él de forma de tener una correspondencia entre el comportamiento a largo plazo del modelo y del sistema. El proceso de diseño de las redes es guiado básicamente por tres ingredientes. La dimensionalidad del problema es explorada por nuestro primer ingrediente, el teorema de Takens-Mañé. Por medio de este teorema, la dimensión óptima de la entrada de la red neuronal puede ser investigada. Nuestro segundo ingrediente es un teorema muy fuerte: las redes neuronales con una sola capa oculta son un aproximador universal. Como tercer ingrediente, encaramos la búsqueda del tamaño oculta de la capa oculta por medio de algoritmos genéticos, usados para sugerir el número de neuronas ocultas que maximizan una función objetivo (relacionada con los errores de predicción). Estos algoritmos se usan además para encontrar las entradas a la red que influyen más en la salida en algunos casos. La determinación del tamaño de la capa oculta es un problema central (y duro) en la determinación de la topología de la red. Esta tesis incluye un estado del arte del diseño de redes neuronales para la predicción de series temporales, incluyendo tópicos relacionados tales como sistemas dinámicos, aproximadores universales, búsquedas basadas en el gradiente y sus variaciones, así como meta-heurísticas. El relevamiento de la literatura relacionada busca ser extenso, para tanto el material impreso como para el que esta en formato electrónico, de forma de tener un panorama de los principales aspectos del estado del arte en la predicción de series temporales usando redes neuronales. El material hallado fue algunas veces extremadamente redundante (como en el caso del algoritmo de retropropagación y sus mejoras) y escaso en otros (estructuras de memoria o estimación de la dimensión del sub-espacio de señal en el caso estocástico). La literatura consultada incluye trabajos de investigación clásicos ( ([27], [50], [52])' así como de los más reciente ([79] , [16] or [82]). Se presta especial atención a las herramientas de software disponibles para el diseño de redes neuronales y el procesamiento de series temporales. Luego de una revisión de los paquetes de software disponibles, las herramientas más promisiorias para ambas tareas son discutidas. Como resultado, un entorno de trabajo completo basado en herramientas de software maduras fue definido y usado. Para trabajar con los mencionados sistemas dinámicos, software especializado en el análisis y proceso de las series temporales fue empleado, y entonces las series caóticas fueron estudiadas. Ya que no toda la aleatoriedad es atribuible al caos, para caracterizar al sistema dinámico que genera la serie temporal se requiere una exploración de los sistemas caóticos-estocásticos, así como de los modelos de red para predecir una serie temporal asociada a uno de ellos. Aquí se pretende mostrar cómo el conocimiento del dominio, algo extensamente tratado en la literatura, puede ser de alguna manera sofisticado (tal como el espectro de Lyapunov de la serie o la dimensión del sub-espacio de señal). Para modelar el sistema dinámico generado por la serie temporal se usa el modelo de espacio de estados, por lo que la predicción de la serie temporal es traducida en la predicción del siguiente estado del sistema. Este modelo de espacio de estados, junto con el método de los delays (coordenadas demoradas) tiene importancia práctica en el desarrollo de este trabajo, específicamente, en el diseño de la capa de entrada en algunas redes (los perceptrones multicapa) y otros parámetros (los taps de las redes TLFN). Adicionalmente, el resto de los componentes de la red con determinados en varios casos a través de procedimientos tradicionalmente usados en las redes neuronales: los algoritmos genéticos. Los criterios para la selección de modelo (red) son discutidos y un balance entre performance y complejidad de la red es explorado luego, inspirado en el minimum description length de Rissanen y su estimación dada por el software elegido. Con respecto a los modelos de red empleados, las topologóas de sugeridas en la literatura como adecuadas para la predicción son usadas (TLFNs y redes recurrentes) junto con perceptrones multicapa (un clásico de las redes neuronales) y comités de redes. La efectividad de cada método es confirmada por el problema de predicción propuesto. Los comités de redes, donde las predicciones son una combinación convexa de las predicciones dadas por las redes individuales, son también usados extensamente. La necesidad de criterios para comparar el comportamiento del modelo con el del sistema real, a largo plazo, para un sistema dinámico estocástico, es presentada y dos alternativas son comentadas. Los resultados obtenidos prueban la existencia de una solución al problema del aprendizaje de la dependencia Entrada - Salida . Conjeturamos además que el sistema generador de serie de las ventas es dinámico-estocástico pero no caótico, ya que sólo tenemos una realización del proceso aleatorio correspondiente a las ventas. Al ser un sistema no caótico, la media de las predicciones de las ventas debería mejorar a medida que los datos disponibles aumentan, aunque la probabilidad de una predicción con un gran error es siempre no nula debido a la aleatoriedad presente. Esta solución es encontrada en una forma constructiva y exhaustiva. La exhaustividad puede deducirse de las siguiente cinco afirmaciones : el diseño de una red neuronal requiere conocer la dimensión de la entrada y de la salida, el número de capas ocultas y las neuronas en cada una de ellas el uso del teorema de takens-Mañé permite derivar la dimensión de la entrada por teoremas tales como los de Kolmogorov y Cybenko el uso de perceptrones con solo una capa oculta es justificado, por lo que varios de tales modelos son probados el número de neuronas en la capa oculta es determinada varias veces heurísticamente a través de algoritmos genéticos una sola neurona de salida da la predicción deseada. Como se dijo, dos tareas son llevadas a cabo: el desarrollo de un modelo para la predicción de la serie temporal y el análisis de un modelo factible para la reconstrucción dinámica del sistema. Con el mejor modelo predictivo, obtenido por el comité de dos redes se logró obtener un error aceptable en la predicción de una semana no contigua al conjunto de entrenamiento (7.04% para la semana 46/2011). Creemos que este es un resultado aceptable dada la cantidad de información disponible y representa una validación adicional de que las redes neuronales son útiles para la predicción de series temporales provenientes de sistemas dinámicos, sin importar si son estocásticos o no. Finalmente, los resultados experimentales confirmaron algunos hechos ya conocidos (tales como que agregar ruido a los datos de entrada y de salida de los valores de entrenamiento puede mejorar los resultados: que las redes recurrentes entrenadas con el algoritmo de retropropagación no presentan el problema del gradiente evanescente en periodos cortos y que el uso de de comités - que puede ser visto como una forma muy básica de inteligencia artificial distribuida - permite mejorar significativamente las predicciones)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

A Probabilistic Perspective on Ensemble Diversity

Author: Zanda Manuela
Publication venue
Publication date: 01/08/2011
Field of study

The University of Manchester - Institutional Repository

Otimização multi-objetivo em aprendizado de máquina

Author: Raimundo Marcos Medeiros, 1988-
Publication venue: [s.n.]
Publication date: 15/04/2019
Field of study

Orientador: Fernando José Von ZubenTese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de ComputaçãoResumo: Regressão logística multinomial regularizada, classificação multi-rótulo e aprendizado multi-tarefa são exemplos de problemas de aprendizado de máquina em que objetivos conflitantes, como funções de perda e penalidades que promovem regularização, devem ser simultaneamente minimizadas. Portanto, a perspectiva simplista de procurar o modelo de aprendizado com o melhor desempenho deve ser substituída pela proposição e subsequente exploração de múltiplos modelos de aprendizado eficientes, cada um caracterizado por um compromisso (trade-off) distinto entre os objetivos conflitantes. Comitês de máquinas e preferências a posteriori do tomador de decisão podem ser implementadas visando explorar adequadamente este conjunto diverso de modelos de aprendizado eficientes, em busca de melhoria de desempenho. A estrutura conceitual multi-objetivo para aprendizado de máquina é suportada por três etapas: (1) Modelagem multi-objetivo de cada problema de aprendizado, destacando explicitamente os objetivos conflitantes envolvidos; (2) Dada a formulação multi-objetivo do problema de aprendizado, por exemplo, considerando funções de perda e termos de penalização como objetivos conflitantes, soluções eficientes e bem distribuídas ao longo da fronteira de Pareto são obtidas por um solver determinístico e exato denominado NISE (do inglês Non-Inferior Set Estimation); (3) Esses modelos de aprendizado eficientes são então submetidos a um processo de seleção de modelos que opera com preferências a posteriori, ou a filtragem e agregação para a síntese de ensembles. Como o NISE é restrito a problemas de dois objetivos, uma extensão do NISE capaz de lidar com mais de dois objetivos, denominada MONISE (do inglês Many-Objective NISE), também é proposta aqui, sendo uma contribuição adicional que expande a aplicabilidade da estrutura conceitual proposta. Para atestar adequadamente o mérito da nossa abordagem multi-objetivo, foram realizadas investigações mais específicas, restritas à aprendizagem de modelos lineares regularizados: (1) Qual é o mérito relativo da seleção a posteriori de um único modelo de aprendizado, entre os produzidos pela nossa proposta, quando comparado com outras abordagens de modelo único na literatura? (2) O nível de diversidade dos modelos de aprendizado produzidos pela nossa proposta é superior àquele alcançado por abordagens alternativas dedicadas à geração de múltiplos modelos de aprendizado? (3) E quanto à qualidade de predição da filtragem e agregação dos modelos de aprendizado produzidos pela nossa proposta quando aplicados a: (i) classificação multi-classe, (ii) classificação desbalanceada, (iii) classificação multi-rótulo, (iv) aprendizado multi-tarefa, (v) aprendizado com multiplos conjuntos de atributos? A natureza determinística de NISE e MONISE, sua capacidade de lidar adequadamente com a forma da fronteira de Pareto em cada problema de aprendizado, e a garantia de sempre obter modelos de aprendizado eficientes são aqui pleiteados como responsáveis pelos resultados promissores alcançados em todas essas três frentes de investigação específicasAbstract: Regularized multinomial logistic regression, multi-label classification, and multi-task learning are examples of machine learning problems in which conflicting objectives, such as losses and regularization penalties, should be simultaneously minimized. Therefore, the narrow perspective of looking for the learning model with the best performance should be replaced by the proposition and further exploration of multiple efficient learning models, each one characterized by a distinct trade-off among the conflicting objectives. Committee machines and a posteriori preferences of the decision-maker may be implemented to properly explore this diverse set of efficient learning models toward performance improvement. The whole multi-objective framework for machine learning is supported by three stages: (1) The multi-objective modelling of each learning problem, explicitly highlighting the conflicting objectives involved; (2) Given the multi-objective formulation of the learning problem, for instance, considering loss functions and penalty terms as conflicting objective functions, efficient solutions well-distributed along the Pareto front are obtained by a deterministic and exact solver named NISE (Non-Inferior Set Estimation); (3) Those efficient learning models are then subject to a posteriori model selection, or to ensemble filtering and aggregation. Given that NISE is restricted to two objective functions, an extension for many objectives, named MONISE (Many Objective NISE), is also proposed here, being an additional contribution and expanding the applicability of the proposed framework. To properly access the merit of our multi-objective approach, more specific investigations were conducted, restricted to regularized linear learning models: (1) What is the relative merit of the a posteriori selection of a single learning model, among the ones produced by our proposal, when compared with other single-model approaches in the literature? (2) Is the diversity level of the learning models produced by our proposal higher than the diversity level achieved by alternative approaches devoted to generating multiple learning models? (3) What about the prediction quality of ensemble filtering and aggregation of the learning models produced by our proposal on: (i) multi-class classification, (ii) unbalanced classification, (iii) multi-label classification, (iv) multi-task learning, (v) multi-view learning? The deterministic nature of NISE and MONISE, their ability to properly deal with the shape of the Pareto front in each learning problem, and the guarantee of always obtaining efficient learning models are advocated here as being responsible for the promising results achieved in all those three specific investigationsDoutoradoEngenharia de ComputaçãoDoutor em Engenharia Elétrica2014/13533-0FAPES

Repositorio da Producao Cientifica e Intelectual da Unicamp

Inductive biases in deep learning models for weather prediction

Author: Butz Martin V.
Friedrich Ulrich
Goswami Bedartha
Karlbauer Matthias
Ludwig Nicole
Martius Georg
Otte Sebastian
Scholten Thomas
Thuemmel Jannik
Wulfmeyer Volker
Zarfl Christiane
Publication venue
Publication date: 06/04/2023
Field of study

Deep learning has recently gained immense popularity in the Earth sciences as it enables us to formulate purely data-driven models of complex Earth system processes. Deep learning-based weather prediction (DLWP) models have made significant progress in the last few years, achieving forecast skills comparable to established numerical weather prediction (NWP) models with comparatively lesser computational costs. In order to train accurate, reliable, and tractable DLWP models with several millions of parameters, the model design needs to incorporate suitable inductive biases that encode structural assumptions about the data and modelled processes. When chosen appropriately, these biases enable faster learning and better generalisation to unseen data. Although inductive biases play a crucial role in successful DLWP models, they are often not stated explicitly and how they contribute to model performance remains unclear. Here, we review and analyse the inductive biases of six state-of-the-art DLWP models, involving a deeper look at five key design elements: input data, forecasting objective, loss components, layered design of the deep learning architectures, and optimisation methods. We show how the design choices made in each of the five design elements relate to structural assumptions. Given recent developments in the broader DL community, we anticipate that the future of DLWP will likely see a wider use of foundation models -- large models pre-trained on big databases with self-supervised learning -- combined with explicit physics-informed inductive biases that allow the models to provide competitive forecasts even at the more challenging subseasonal-to-seasonal scales

arXiv.org e-Print Archive

Essays On Random Forest Ensembles

Author: Olson Matthew
Publication venue: ScholarlyCommons
Publication date: 01/01/2018
Field of study

A random forest is a popular machine learning ensemble method that has proven successful in solving a wide range of classification problems. While other successful classifiers, such as boosting algorithms or neural networks, admit natural interpretations as maximum likelihood, a suitable statistical interpretation is much more elusive for a random forest. In the first part of this thesis, we demonstrate that a random forest is a fruitful framework in which to study AdaBoost and deep neural networks. We explore the concept and utility of interpolation, the ability of a classifier to perfectly fit its training data. In the second part of this thesis, we place a random forest on more sound statistical footing by framing it as kernel regression with the proximity kernel. We then analyze the parameters that control the bandwidth of this kernel and discuss useful generalizations

ScholarlyCommons@Penn

Hybrid expert ensembles for identifying unreliable data in citizen science

Author: Johnston Ali
Moran Nick
Wang Wenjia
Wessels Pieter
Publication venue: 'Elsevier BV'
Publication date: 01/05/2019
Field of study

Citizen science utilises public resources for scientific research. BirdTrack is such a project established in 2004 by the British Trust for Ornithology (BTO) for the public to log their bird observations through its web or mobile applications. It has accumulated over 40 million observations. However, the veracity of these observations needs to be checked and the current process involves time-consuming interventions by human experts. This research therefore aims to develop a more efficient system to automatically identify unreliable observations from large volume of records. This paper presents a novel approach — a Hybrid Expert Ensemble System (HEES) that combines an Expert System (ES) and machine induced models to perform the intended task. The ES is built based on human expertise and used as a base member of the ensemble. Other members are decision trees induced from county-based data. The HEES uses accuracy and diversity as criteria to select its members with an aim of improving its accuracy and reliability. The experiments were carried out using the county-based data and the results indicate that (1) the performance of the expert system is reasonable for some counties but varied considerably on others. (2) An HEES is more accurate and reliable than the Expert System and also other individual models, with Sensitivity of 85% for correctly identifying unreliable observations and Specificity of 99% for reliable observations. These results demonstrated that the proposed approach has the ability to be an alternative or additional means to validate the observations in a timely and cost-effective manner and also has a potential to be applied in other citizen science projects where the huge amount of data needs to be checked effectively and efficiently

University of East Anglia digital repository

University of St. Andrews - Pure

Epistemic Uncertainty Quantification in Deep Learning by the Delta Method

Author: Nilsen Geir Kjetil
Publication venue: The University of Bergen
Publication date: 01/01/2022
Field of study

This thesis explores the Delta method and its application to deep learning image classification. The Delta method is a classical procedure for quantifying uncertainty in statistical models, but its direct application to deep neural networks is prevented by the large number of parameters P. We recognize the Delta method as a measure of epistemic as opposed to aleatoric uncertainty and break it into two components: the eigenvalue spectrum of the inverse Fisher information (i.e. inverse Hessian) of the cost function and the per-example sensitivities (i.e. gradients) of the model function. We mainly focus on the computational aspects, and show how to efficiently compute low and full-rank approximations of the inverse Fisher information matrix, which in turn reduces the computational complexity of the naïve Delta method from O(P²) space and O(P³) time, to O(P) space and time. We provide bounds for the approximation error by a novel error propagating technique, and validate the developed methodology with a released TensorFlow implementation. By a comparison with the classical Bootstrap, we show that there is a strong linear relationship between the quantified predictive epistemic uncertainty levels obtained from the two methods when applied on a few well known architectures using the MNIST and CIFAR-10 datasets.Doktorgradsavhandlin

University of Bergen

NORA - Norwegian Open Research Archives