10 research outputs found
Predicting Aesthetic Score Distribution through Cumulative Jensen-Shannon Divergence
Aesthetic quality prediction is a challenging task in the computer vision
community because of the complex interplay with semantic contents and
photographic technologies. Recent studies on the powerful deep learning based
aesthetic quality assessment usually use a binary high-low label or a numerical
score to represent the aesthetic quality. However the scalar representation
cannot describe well the underlying varieties of the human perception of
aesthetics. In this work, we propose to predict the aesthetic score
distribution (i.e., a score distribution vector of the ordinal basic human
ratings) using Deep Convolutional Neural Network (DCNN). Conventional DCNNs
which aim to minimize the difference between the predicted scalar numbers or
vectors and the ground truth cannot be directly used for the ordinal basic
rating distribution. Thus, a novel CNN based on the Cumulative distribution
with Jensen-Shannon divergence (CJS-CNN) is presented to predict the aesthetic
score distribution of human ratings, with a new reliability-sensitive learning
method based on the kurtosis of the score distribution, which eliminates the
requirement of the original full data of human ratings (without normalization).
Experimental results on large scale aesthetic dataset demonstrate the
effectiveness of our introduced CJS-CNN in this task.Comment: AAAI Conference on Artificial Intelligence (AAAI), New Orleans,
Louisiana, USA. 2-7 Feb. 201
Towards a Multi-Objective Optimization of Subgroups for the Discovery of Materials with Exceptional Performance
Artificial intelligence (AI) can accelerate the design of materials by
identifying correlations and complex patterns in data. However, AI methods
commonly attempt to describe the entire, immense materials space with a single
model, while it is typical that different mechanisms govern the materials
behaviors across the materials space. The subgroup-discovery (SGD) approach
identifies local rules describing exceptional subsets of data with respect to a
given target. Thus, SGD can focus on mechanisms leading to exceptional
performance. However, the identification of appropriate SG rules requires a
careful consideration of the generality-exceptionality tradeoff. Here, we
discuss challenges to advance the SGD approach in materials science and analyse
the tradeoff between exceptionality and generality based on a Pareto front of
SGD solutions
Empirical Survival Jensen-Shannon Divergence as a Goodness-of-Fit Measure for Maximum Likelihood Estimation and Curve Fitting
The coefficient of determination, known as R2, is commonly used as a goodness-of-fit
criterion for fitting linear models. R2 is somewhat controversial when fitting nonlinear
models, although it may be generalised on a case-by-case basis to deal with specific models
such as the logistic model. Assume we are fitting a parametric distribution to a data set
using, say, the maximum likelihood estimation method. A general approach to measure
the goodness-of-fit of the fitted parameters, which is advocated herein, is to use a non-
parametric measure for comparison between the empirical distribution, comprising the
raw data, and the fitted model. In particular, for this purpose we put forward the Survi-
val Jensen-Shannon divergence (SJS) and its empirical counterpart (ESJS) as a metric
which is bounded, and is a natural generalisation of the Jensen-Shannon divergence. We
demonstrate, via a straightforward procedure making use of the ESJS, that it can be used
as part of maximum likelihood estimation or curve fitting as a measure of goodness-of-fit,
including the construction of a confidence interval for the fitted parametric distribution.
Furthermore, we show the validity of the proposed method with simulated data, and three
empirical data sets
Identifying outstanding transition-metal-alloy heterogeneous catalysts for the oxygen reduction and evolution reactions via subgroup discovery
In order to estimate the reactivity of a large number of potentially complex
heterogeneous catalysts while searching for novel and more efficient materials,
physical as well as data-centric models have been developed for a faster
evaluation of adsorption energies compared to first-principles calculations.
However, global models designed to describe as many materials as possible might
overlook the very few compounds that have the appropriate adsorption properties
to be suitable for a given catalytic process. Here, the subgroup-discovery
(SGD) local artificial-intelligence approach is used to identify the key
descriptive parameters and constrains on their values, the so-called SG rules,
which particularly describe transition-metal surfaces with outstanding
adsorption properties for the oxygen reduction and evolution reactions. We
start from a data set of 95 oxygen adsorption energy values evaluated by
density-functional-theory calculations for several monometallic surfaces along
with 16 atomic, bulk and surface properties as candidate descriptive
parameters. From this data set, SGD identifies constraints on the most relevant
parameters describing materials and adsorption sites that (i) result in O
adsorption energies within the Sabatier-optimal range required for the oxygen
reduction reaction and (ii) present the largest deviations from the linear
scaling relations between O and OH adsorption energies, which limit the
performance in the oxygen evolution reaction. The SG rules not only reflect the
local underlying physicochemical phenomena that result in the desired
adsorption properties but also guide the challenging design of alloy catalysts
Identifying Outstanding Transition‑Metal‑Alloy Heterogeneous Catalysts for the Oxygen Reduction and Evolution Reactions via Subgroup Discovery
In order to estimate the reactivity of a large number of potentially complex heterogeneous catalysts while searching for novel and more efficient materials, physical as well as data-centric models have been developed for a faster evaluation of adsorption energies compared to first-principles calculations. However, global models designed to describe as many materials as possible might overlook the very few compounds that have the appropriate adsorption properties to be suitable for a given catalytic process. Here, the subgroup-discovery (SGD) local artificial-intelligence approach is used to identify the key descriptive parameters and constrains on their values, the so-called SG rules, which particularly describe transition-metal surfaces with outstanding adsorption properties for the oxygen reduction and evolution reactions. We start from a data set of 95 oxygen adsorption energy values evaluated by density-functional-theory calculations for several monometallic surfaces along with 16 atomic, bulk and surface properties as candidate descriptive parameters. From this data set, SGD identifies constraints on the most relevant parameters describing materials and adsorption sites that (i) result in O adsorption energies within the Sabatier-optimal range required for the oxygen reduction reaction and (ii) present the largest deviations from the linear scaling relations between O and OH adsorption energies, which limit the performance in the oxygen evolution reaction. The SG rules not only reflect the local underlying physicochemical phenomena that result in the desired adsorption properties but also guide the challenging design of alloy catalysts
Detecting and diagnosing prior and likelihood sensitivity with power-scaling
Determining the sensitivity of the posterior to perturbations of the prior
and likelihood is an important part of the Bayesian workflow. We introduce a
practical and computationally efficient sensitivity analysis approach using
importance sampling to estimate properties of posteriors resulting from
power-scaling the prior or likelihood. On this basis, we suggest a diagnostic
that can indicate the presence of prior-data conflict or likelihood
noninformativity and discuss limitations to the power-scaling approach. The
approach can be easily included in Bayesian workflows with minimal effort by
the model builder and we present an implementation in our new R package
\texttt{priorsense}. We further demonstrate the workflow on case studies of
real data using models varying in complexity from simple linear models to
Gaussian process models.Comment: 26 pages, 14 figure
Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distances
[EN] Biomedical data may be composed of individuals generated from distinct, meaningful
sources. Due to possible contextual biases in the processes that generate data,
there may exist an undesirable and unexpected variability among the probability
distribution functions (PDFs) of the source subsamples, which, when uncontrolled,
may lead to inaccurate or unreproducible research results. Classical statistical
methods may have difficulties to undercover such variabilities when dealing with
multi-modal, multi-type, multi-variate data. This work proposes two metrics for
the analysis of stability among multiple data sources, robust to the aforementioned
conditions, and defined in the context of data quality assessment. Specifically, a
global probabilistic deviation (GPD) and a source probabilistic outlyingness (SPO)
metrics are proposed. The first provides a bounded degree of the global multi-source
variability, designed as an estimator equivalent to the notion of normalized standard
deviation of PDFs. The second provides a bounded degree of the dissimilarity of
each source to a latent central distribution. The metrics are based on the projection
of a simplex geometrical structure constructed from the Jensen-Shannon distances
among the sources PDFs. The metrics have been evaluated and demonstrated their
correct behaviour on a simulated benchmark and with real multi-source biomedical
data using the UCI Heart Disease dataset. The biomedical data quality assessment
based on the proposed stability metrics may improve the efficiency and effectiveness
of biomedical data exploitation and research.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by own IBIME funds under the UPV project Servicio de evaluacion y rating de la calidad de repositorios de datos biomedicos [UPV-2014-872] and the EU FP7 Project Help4Mood - A Computational Distributed System to Support the Treatment of Patients with Major Depression [ICT-248765].Sáez Silvestre, C.; Robles Viejo, M.; GarcĂa GĂłmez, JM. (2014). Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distances. Statistical Methods in Medical Research. 1-25. https://doi.org/10.1177/0962280214545122S12
A skew logistic distribution for modelling COVID-19 waves and its evaluation using the empirical survival Jensen-Shannon divergence
A novel yet simple extension of the symmetric logistic distribution is proposed by introducing a skewness parameter. It is shown how the three parameters of the ensuing skew logistic distribution may be estimated using maximum likelihood. The skew logistic distribution is then extended to the skew bi-logistic distribution to allow the modelling of multiple waves in epidemic time series data. The proposed skew-logistic model is validated on COVID-19 data from the UK, and is evaluated for goodness-of-fit against the logistic and normal distributions using the recently formulated empirical survival Jensen–Shannon divergence (ESJS) and the Kolmogorov–Smirnov two-sample test statistic (KS2). We employ 95% bootstrap confidence intervals to assess the improvement in goodness-of-fit of the skew logistic distribution over the other distributions. The obtained confidence intervals for the ESJS are narrower than those for the KS2 on using this dataset, implying that the ESJS is more powerful than the KS2