Search CORE

8 research outputs found

Variance-Based Feature Importance in Neural Networks

Author: BP Welford
F Pedregosa
G David Garson
JD Olden
L Breiman
M Paliwal
Publication venue: Springer
Publication date: 16/10/2019
Field of study

Crossref

University of Twente Research Information

Adaptive Differential Privacy in Federated Learning: A Priority-Based Approach

Author: Izadi Iman
Talaei Mahtab
Publication venue
Publication date: 03/01/2024
Field of study

Federated learning (FL) as one of the novel branches of distributed machine learning (ML), develops global models through a private procedure without direct access to local datasets. However, access to model updates (e.g. gradient updates in deep neural networks) transferred between clients and servers can reveal sensitive information to adversaries. Differential privacy (DP) offers a framework that gives a privacy guarantee by adding certain amounts of noise to parameters. This approach, although being effective in terms of privacy, adversely affects model performance due to noise involvement. Hence, it is always needed to find a balance between noise injection and the sacrificed accuracy. To address this challenge, we propose adaptive noise addition in FL which decides the value of injected noise based on features' relative importance. Here, we first propose two effective methods for prioritizing features in deep neural network models and then perturb models' weights based on this information. Specifically, we try to figure out whether the idea of adding more noise to less important parameters and less noise to more important parameters can effectively save the model accuracy while preserving privacy. Our experiments confirm this statement under some conditions. The amount of noise injected, the proportion of parameters involved, and the number of global iterations can significantly change the output. While a careful choice of parameters by considering the properties of datasets can improve privacy without intense loss of accuracy, a bad choice can make the model performance worse

arXiv.org e-Print Archive

Normalization Influence on ANN-Based Models Performance: A New Proposal for Features’ Contribution Analysis

Author: Landa-Torres Itziar
Manjarres Diana
Nino-Adan Iratxe
Portillo Eva
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2021
Field of study

Artificial Neural Networks (ANNs) are weighted directed graphs of interconnected neurons widely employed to model complex problems. However, the selection of the optimal ANN architecture and its training parameters is not enough to obtain reliable models. The data preprocessing stage is fundamental to improve the model’s performance. Specifically, Feature Normalisation (FN) is commonly utilised to remove the features’ magnitude aiming at equalising the features’ contribution to the model training. Nevertheless, this work demonstrates that the FN method selection affects the model performance. Also, it is well-known that ANNs are commonly considered a “black box” due to their lack of interpretability. In this sense, several works aim to analyse the features’ contribution to the network for estimating the output. However, these methods, specifically those based on network’s weights, like Garson’s or Yoon’s methods, do not consider preprocessing factors, such as dispersion factors , previously employed to transform the input data. This work proposes a new features’ relevance analysis method that includes the dispersion factors into the weight matrix analysis methods to infer each feature’s actual contribution to the network output more precisely. Besides, in this work, the Proportional Dispersion Weights (PWD) are proposed as explanatory factors of similarity between models’ performance results. The conclusions from this work improve the understanding of the features’ contribution to the model that enhances the feature selection strategy, which is fundamental for reliably modelling a given problem.This work was supported in part by DATA Inc. Fellowship under Grant 48-AF-W1-2019-00002, in part by Tecnalia Research and Innovation Ph.D. Scholarship, in part by the Spanish Centro para el Desarrollo Tecnológico Industrial (CDTI, Ministry of Science and Innovation) through the ‘‘Red Cervera’’ Programme (AI4ES Project) under Grant CER-20191029, and in part by the 3KIA Project funded by the ELKARTEK Program of the SPRI-Basque Government under Grant KK-2020/00049

TECNALIA Publications

The Conditions of Reliable Demand Forecasting in a High-mix Low-volume Environment

Author: den Boer Tim J.
Publication venue
Publication date: 29/04/2022
Field of study

Pure OAI Repository

The Berkelmans-Pries Feature Importance Method: a generic measure of informativeness of features

Author: Berkelmans G.A. (Guus)
Bhulai S. (Sandjai)
Mei R.D. (Rob) van der
Pries J. (Joris)
Publication venue
Publication date: 11/01/2023
Field of study

Over the past few years, the use of machine learning models has emerged as a generic and powerful means for prediction purposes. At the same time, there is a growing demand for interpretability of prediction models. To determine which features of a dataset are important to predict a target variable Y, a Feature Importance (FI) method can be used. By quantifying how important each feature is for predicting Y, irrelevant features can be identified and removed, which could increase the speed and accuracy of a model, and moreover, important features can be discovered, which could lead to valuable insights. A major problem with evaluating FI methods, is that the ground truth FI is often unknown. As a consequence, existing FI methods do not give the exact correct FI values. This is one of the many reasons why it can be hard to properly interpret the results of an FI method. Motivated by this, we introduce a new global approach named the Berkelmans-Pries FI method, which is based on a combination of Shapley values and the Berkelmans-Pries dependency function. We prove that our method has many useful properties, and accurately predicts the correct FI values for several cases where the ground truth FI can be derived in an exact manner. We experimentally show for a large collection of FI methods (468) that existing methods do not have the same useful properties. This shows that the Berkelmans-Pries FI method is a highly valuable tool for analyzing datasets with complex interdependencies

CWI's Institutional Repository

Clasificación del territorio peruano de acuerdo con su potencial de agua subterránea utilizando algoritmos de aprendizaje automatizado

Author: Portocarrero Rodríguez César Augusto
Publication venue: 'Baishideng Publishing Group Inc.'
Publication date: 16/12/2020
Field of study

El agravamiento del estrés hídrico tanto en el sector urbano como en el rural motiva cada vez más a los tomadores de decisión a impulsar la explotación sostenible de este recurso. Para ello, se requiere conocer con certeza los emplazamientos con un mayor potencial de explotación. Para hacer frente a este problema sin recurrir a perforaciones directas, la presente investigación tiene como objetivo principal explorar el potencial hidrológico subterráneo del Perú correspondiente a acuíferos de baja profundidad mediante la aplicación de modelos de clasificación de bosques aleatorios y redes neuronales, dos algoritmos de aprendizaje automatizado. Esta rama de la inteligencia artificial permite generar modelos multidimensionales y con variables complejas sin efectuar presuposiciones estadísticas. Para explicar el potencial de agua subterránea, se recurren a variables topográficas, hidrológicas, geológicas, pedológicas y ambientales que influyen en diferente medida en la conductividad hidráulica subterránea y en la tasa de recarga de los acuíferos. Los resultados obtenidos indican que el mejor desempeño equiparable al estado del arte se obtiene para el modelo de bosques aleatorios (exactitud=0.77, puntaje F1=0.73, AUC=0.88) y que la construcción de modelos especializados en una región dada permite mejorar la capacidad de los modelos al reducir la varianza de los datos. Las variables más importantes en los modelos fueron: aspecto, densidad de drenaje, elevación, NDWI y precipitación. La principal limitación identificada en el desempeño de los modelos es la escasa cantidad y distribución irregular de los pozos de caudal conocido en el Perú, factor que parcializa el modelo hacia la costa, la región mejor documentada. El presente estudio sirve como marco referencial para la construcción de futuros modelos de aprendizaje automatizado una vez se amplíe el inventario público de pozos de agua subterránea o en caso privados introduzcan su propio inventario. El código empleado para el procesamiento de variables geoespaciales se encuentra en https://code.earthengine.google.com/fe63cd6184b009824ed3c843fdc5544d. El código utilizado para la construcción de modelos se encuentra registrado en Github en https://github.com/cesport/Tesis. Aplicaciones para visualizar los resultados de manera interactiva están disponibles para computadoras en https://cesarportocarrero.users.earthengine.app/view/gwp-peru y dispositivos móviles en https://cesarportocarrero.users.earthengine.app/view/gwp-peru-movil

Repositorio Digital de Tesis PUCP

Fundamentals of data science : a new outlook

Author: Pries J. (Joris)
Publication venue
Publication date: 26/06/2023
Field of study

CWI's Institutional Repository

Variance-Based Feature Importance in Neural Networks

Author: de Sá Cláudio Rebelo
Džeroski Sašo
Kralj Novak Petra
Šmuc Tomislav
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 16/10/2019
Field of study

This paper proposes a new method to measure the relative importance of features in Artificial Neural Networks (ANN) models. Its underlying principle assumes that the more important a feature is, the more the weights, connected to the respective input neuron, will change during the training of the model. To capture this behavior, a running variance of every weight connected to the input layer is measured during training. For that, an adaptation of Welford’s online algorithm for computing the online variance is proposed. When the training is finished, for each input, the variances of the weights are combined with the final weights to obtain the measure of relative importance for each feature. This method was tested with shallow and deep neural network architectures on several well-known classification and regression problems. The results obtained confirm that this approach is making meaningful measurements. Moreover, results showed that the importance scores are highly correlated with the variable importance method from Random Forests (RF)