29 research outputs found
Influence in Classification via Cooperative Game Theory
A dataset has been classified by some unknown classifier into two types of
points. What were the most important factors in determining the classification
outcome? In this work, we employ an axiomatic approach in order to uniquely
characterize an influence measure: a function that, given a set of classified
points, outputs a value for each feature corresponding to its influence in
determining the classification outcome. We show that our influence measure
takes on an intuitive form when the unknown classifier is linear. Finally, we
employ our influence measure in order to analyze the effects of user profiling
on Google's online display advertising.Comment: accepted to IJCAI 201
Public spending impact on short term growth : a machine learning approach
The public spending multiplier has long been a subject of analysis with central discussion on how its size varies under different economic contexts. The article that integrates this dissertation introduces a causal machine learning technique as a tool to estimate the public spending multiplier and make individual predictions based on each countryâs economic context. We propose to model the multiplier with a causal random forest, developed by Wager e Athey (2018), uncovering possible heterogeneous treatment effects. We apply this methodology to a dataset provided by the International Monetary Fund, including data from 35 developed countries for the years from 2000 to 2020. The multiplier estimates obtained with this methodology are between 1.7 and 2.7. In addition, we use this methodology as a tool to uncover which features are important to the multiplier heterogeneity.O multiplicador do gasto pĂșblico Ă© objeto de anĂĄlise hĂĄ muito tempo, com a discussĂŁo centrada em como seu tamanho varia em diferentes contextos econĂŽmicos. No artigo que integra esta dissertação, apresentamos uma tĂ©cnica de aprendizado de mĂĄquina causal como uma ferramenta para estimar o multiplicador do gasto pĂșblico e fazer previsĂ”es individualizadas com base no contexto econĂŽmico de cada paĂs. Propomos modelar o multiplicador com uma floresta aleatĂłria causal, desenvolvida por Wager e Athey (2018), descobrindo possĂveis efeitos de tratamento heterogĂȘneos. Aplicamos essa metodologia em um conjunto de dados fornecido pelo Fundo MonetĂĄrio Internacional, incluindo dados de 35 paĂses desenvolvidos ao longo dos anos de 2000 a 2020. As estimativas dos multiplicadores obtidas com esta metodologia estĂŁo entre 1,7 e 2,7. AlĂ©m disso, usamos essa metodologia como uma ferramenta para descobrir quais recursos sĂŁo importantes para a heterogeneidade do multiplicador
Elucidating the Auxetic Behavior of Cementitious Cellular Composites Using Finite Element Analysis and Interpretable Machine Learning
With the advent of 3D printing, auxetic cellular cementitious composites (ACCCs) have recently garnered signiïŹcant attention owing to their unique mechanical performance. To enable seamless performance prediction of the ACCCs, interpretable machine learning (ML)-based approaches can provide efïŹcient means. However, the prediction of Poissonâs ratio using such ML approaches requires large and consistent datasets which is not readily available for ACCCs. To address this challenge, this paper synergistically integrates a ïŹnite element analysis (FEA)-based framework with ML to predict the Poissonâs ratios. In particular, the FEA-based approach is used to generate a dataset containing 850 combinations of different mesoscale architectural void features. The dataset is leveraged to develop an ML-based prediction tool using a feed-forward multilayer perceptron-based neural network (NN) approach which shows excellent prediction efïŹcacy. To shed light on the relative inïŹuence of the design parameters on the auxetic behavior of the ACCCs, Shapley additive explanations (SHAP) is employed, which establishes the volume fraction of voids as the most inïŹuential parameter in inducing auxetic behavior. Overall, this paper develops an efïŹcient approach to evaluate geometry-dependent auxetic behaviors for cementitious materials which can be used as a starting point toward the design and development of auxetic behavior in cementitious composites
DU-Shapley: A Shapley Value Proxy for Efficient Dataset Valuation
Many machine learning problems require performing dataset valuation, i.e. to
quantify the incremental gain, to some relevant pre-defined utility, of
aggregating an individual dataset to others. As seminal examples, dataset
valuation has been leveraged in collaborative and federated learning to create
incentives for data sharing across several data owners. The Shapley value has
recently been proposed as a principled tool to achieve this goal due to formal
axiomatic justification. Since its computation often requires exponential time,
standard approximation strategies based on Monte Carlo integration have been
considered. Such generic approximation methods, however, remain expensive in
some cases. In this paper, we exploit the knowledge about the structure of the
dataset valuation problem to devise more efficient Shapley value estimators. We
propose a novel approximation of the Shapley value, referred to as discrete
uniform Shapley (DU-Shapley) which is expressed as an expectation under a
discrete uniform distribution with support of reasonable size. We justify the
relevancy of the proposed framework via asymptotic and non-asymptotic
theoretical guarantees and show that DU-Shapley tends towards the Shapley value
when the number of data owners is large. The benefits of the proposed framework
are finally illustrated on several dataset valuation benchmarks. DU-Shapley
outperforms other Shapley value approximations, even when the number of data
owners is small.Comment: 22 page
Elucidating the Costitutive Relationship of Calcium-Silicate-Hydrate Gel Using High Throughput Reactive Molecular Simulations and Machine Learning
Prediction of material behavior using machine learning (ML) requires consistent, accurate, and, representative large data for training. However, such consistent and reliable experimental datasets are not always available for materials. To address this challenge, we synergistically integrate ML with high-throughput reactive molecular dynamics (MD) simulations to elucidate the constitutive relationship of calciumâsilicateâhydrate (CâSâH) gelâthe primary binding phase in concrete formed via the hydration of ordinary Portland cement. Specifically, a highly consistent dataset on the nine elastic constants of more than 300 compositions of CâSâH gel is developed using high-throughput reactive simulations. From a comparative analysis of various ML algorithms including neural networks (NN) and Gaussian process (GP), we observe that NN provides excellent predictions. To interpret the predicted results from NN, we employ SHapley Additive exPlanations (SHAP), which reveals that the influence of silicate network on all the elastic constants of CâSâH is significantly higher than that of water and CaO content. Additionally, the water content is found to have a more prominent influence on the shear components than the normal components along the direction of the interlayer spaces within CâSâH. This result suggests that the in-plane elastic response is controlled by water molecules whereas the transverse response is mainly governed by the silicate network. Overall, by seamlessly integrating MD simulations with ML, this paper can be used as a starting point toward accelerated optimization of CâSâH nanostructures to design efficient cementitious binders with targeted properties
Elucidating the constitutive relationship of calciumâsilicateâhydrate gel using high throughput reactive molecular simulations and machine learning
Prediction of material behavior using machine learning (ML) requires consistent, accurate, and, representative large data for training. However, such consistent and reliable experimental datasets are not always available for materials. To address this challenge, we synergistically integrate ML with high-throughput reactive molecular dynamics (MD) simulations to elucidate the constitutive relationship of calciumâsilicateâhydrate (CâSâH) gelâthe primary binding phase in concrete formed via the hydration of ordinary portland cement. Specifically, a highly consistent dataset on the nine elastic constants of more than 300 compositions of CâSâH gel is developed using high-throughput reactive simulations. From a comparative analysis of various ML algorithms including neural networks (NN) and Gaussian process (GP), we observe that NN provides excellent predictions. To interpret the predicted results from NN, we employ SHapley Additive exPlanations (SHAP), which reveals that the influence of silicate network on all the elastic constants of CâSâH is significantly higher than that of water and CaO content. Additionally, the water content is found to have a more prominent influence on the shear components than the normal components along the direction of the interlayer spaces within CâSâH. This result suggests that the in-plane elastic response is controlled by water molecules whereas the transverse response is mainly governed by the silicate network. Overall, by seamlessly integrating MD simulations with ML, this paper can be used as a starting point toward accelerated optimization of CâSâH nanostructures to design efficient cementitious binders with targeted properties