10 research outputs found

    Predicting Aesthetic Score Distribution through Cumulative Jensen-Shannon Divergence

    Full text link
    Aesthetic quality prediction is a challenging task in the computer vision community because of the complex interplay with semantic contents and photographic technologies. Recent studies on the powerful deep learning based aesthetic quality assessment usually use a binary high-low label or a numerical score to represent the aesthetic quality. However the scalar representation cannot describe well the underlying varieties of the human perception of aesthetics. In this work, we propose to predict the aesthetic score distribution (i.e., a score distribution vector of the ordinal basic human ratings) using Deep Convolutional Neural Network (DCNN). Conventional DCNNs which aim to minimize the difference between the predicted scalar numbers or vectors and the ground truth cannot be directly used for the ordinal basic rating distribution. Thus, a novel CNN based on the Cumulative distribution with Jensen-Shannon divergence (CJS-CNN) is presented to predict the aesthetic score distribution of human ratings, with a new reliability-sensitive learning method based on the kurtosis of the score distribution, which eliminates the requirement of the original full data of human ratings (without normalization). Experimental results on large scale aesthetic dataset demonstrate the effectiveness of our introduced CJS-CNN in this task.Comment: AAAI Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, USA. 2-7 Feb. 201

    Towards a Multi-Objective Optimization of Subgroups for the Discovery of Materials with Exceptional Performance

    Full text link
    Artificial intelligence (AI) can accelerate the design of materials by identifying correlations and complex patterns in data. However, AI methods commonly attempt to describe the entire, immense materials space with a single model, while it is typical that different mechanisms govern the materials behaviors across the materials space. The subgroup-discovery (SGD) approach identifies local rules describing exceptional subsets of data with respect to a given target. Thus, SGD can focus on mechanisms leading to exceptional performance. However, the identification of appropriate SG rules requires a careful consideration of the generality-exceptionality tradeoff. Here, we discuss challenges to advance the SGD approach in materials science and analyse the tradeoff between exceptionality and generality based on a Pareto front of SGD solutions

    Empirical Survival Jensen-Shannon Divergence as a Goodness-of-Fit Measure for Maximum Likelihood Estimation and Curve Fitting

    Get PDF
    The coefficient of determination, known as R2, is commonly used as a goodness-of-fit criterion for fitting linear models. R2 is somewhat controversial when fitting nonlinear models, although it may be generalised on a case-by-case basis to deal with specific models such as the logistic model. Assume we are fitting a parametric distribution to a data set using, say, the maximum likelihood estimation method. A general approach to measure the goodness-of-fit of the fitted parameters, which is advocated herein, is to use a non- parametric measure for comparison between the empirical distribution, comprising the raw data, and the fitted model. In particular, for this purpose we put forward the Survi- val Jensen-Shannon divergence (SJS) and its empirical counterpart (ESJS) as a metric which is bounded, and is a natural generalisation of the Jensen-Shannon divergence. We demonstrate, via a straightforward procedure making use of the ESJS, that it can be used as part of maximum likelihood estimation or curve fitting as a measure of goodness-of-fit, including the construction of a confidence interval for the fitted parametric distribution. Furthermore, we show the validity of the proposed method with simulated data, and three empirical data sets

    Identifying outstanding transition-metal-alloy heterogeneous catalysts for the oxygen reduction and evolution reactions via subgroup discovery

    Get PDF
    In order to estimate the reactivity of a large number of potentially complex heterogeneous catalysts while searching for novel and more efficient materials, physical as well as data-centric models have been developed for a faster evaluation of adsorption energies compared to first-principles calculations. However, global models designed to describe as many materials as possible might overlook the very few compounds that have the appropriate adsorption properties to be suitable for a given catalytic process. Here, the subgroup-discovery (SGD) local artificial-intelligence approach is used to identify the key descriptive parameters and constrains on their values, the so-called SG rules, which particularly describe transition-metal surfaces with outstanding adsorption properties for the oxygen reduction and evolution reactions. We start from a data set of 95 oxygen adsorption energy values evaluated by density-functional-theory calculations for several monometallic surfaces along with 16 atomic, bulk and surface properties as candidate descriptive parameters. From this data set, SGD identifies constraints on the most relevant parameters describing materials and adsorption sites that (i) result in O adsorption energies within the Sabatier-optimal range required for the oxygen reduction reaction and (ii) present the largest deviations from the linear scaling relations between O and OH adsorption energies, which limit the performance in the oxygen evolution reaction. The SG rules not only reflect the local underlying physicochemical phenomena that result in the desired adsorption properties but also guide the challenging design of alloy catalysts

    Identifying Outstanding Transition‑Metal‑Alloy Heterogeneous Catalysts for the Oxygen Reduction and Evolution Reactions via Subgroup Discovery

    Get PDF
    In order to estimate the reactivity of a large number of potentially complex heterogeneous catalysts while searching for novel and more efficient materials, physical as well as data-centric models have been developed for a faster evaluation of adsorption energies compared to first-principles calculations. However, global models designed to describe as many materials as possible might overlook the very few compounds that have the appropriate adsorption properties to be suitable for a given catalytic process. Here, the subgroup-discovery (SGD) local artificial-intelligence approach is used to identify the key descriptive parameters and constrains on their values, the so-called SG rules, which particularly describe transition-metal surfaces with outstanding adsorption properties for the oxygen reduction and evolution reactions. We start from a data set of 95 oxygen adsorption energy values evaluated by density-functional-theory calculations for several monometallic surfaces along with 16 atomic, bulk and surface properties as candidate descriptive parameters. From this data set, SGD identifies constraints on the most relevant parameters describing materials and adsorption sites that (i) result in O adsorption energies within the Sabatier-optimal range required for the oxygen reduction reaction and (ii) present the largest deviations from the linear scaling relations between O and OH adsorption energies, which limit the performance in the oxygen evolution reaction. The SG rules not only reflect the local underlying physicochemical phenomena that result in the desired adsorption properties but also guide the challenging design of alloy catalysts

    Detecting and diagnosing prior and likelihood sensitivity with power-scaling

    Full text link
    Determining the sensitivity of the posterior to perturbations of the prior and likelihood is an important part of the Bayesian workflow. We introduce a practical and computationally efficient sensitivity analysis approach using importance sampling to estimate properties of posteriors resulting from power-scaling the prior or likelihood. On this basis, we suggest a diagnostic that can indicate the presence of prior-data conflict or likelihood noninformativity and discuss limitations to the power-scaling approach. The approach can be easily included in Bayesian workflows with minimal effort by the model builder and we present an implementation in our new R package \texttt{priorsense}. We further demonstrate the workflow on case studies of real data using models varying in complexity from simple linear models to Gaussian process models.Comment: 26 pages, 14 figure

    Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distances

    Full text link
    [EN] Biomedical data may be composed of individuals generated from distinct, meaningful sources. Due to possible contextual biases in the processes that generate data, there may exist an undesirable and unexpected variability among the probability distribution functions (PDFs) of the source subsamples, which, when uncontrolled, may lead to inaccurate or unreproducible research results. Classical statistical methods may have difficulties to undercover such variabilities when dealing with multi-modal, multi-type, multi-variate data. This work proposes two metrics for the analysis of stability among multiple data sources, robust to the aforementioned conditions, and defined in the context of data quality assessment. Specifically, a global probabilistic deviation (GPD) and a source probabilistic outlyingness (SPO) metrics are proposed. The first provides a bounded degree of the global multi-source variability, designed as an estimator equivalent to the notion of normalized standard deviation of PDFs. The second provides a bounded degree of the dissimilarity of each source to a latent central distribution. The metrics are based on the projection of a simplex geometrical structure constructed from the Jensen-Shannon distances among the sources PDFs. The metrics have been evaluated and demonstrated their correct behaviour on a simulated benchmark and with real multi-source biomedical data using the UCI Heart Disease dataset. The biomedical data quality assessment based on the proposed stability metrics may improve the efficiency and effectiveness of biomedical data exploitation and research.The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by own IBIME funds under the UPV project Servicio de evaluacion y rating de la calidad de repositorios de datos biomedicos [UPV-2014-872] and the EU FP7 Project Help4Mood - A Computational Distributed System to Support the Treatment of Patients with Major Depression [ICT-248765].Sáez Silvestre, C.; Robles Viejo, M.; García Gómez, JM. (2014). Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distances. Statistical Methods in Medical Research. 1-25. https://doi.org/10.1177/0962280214545122S12

    A skew logistic distribution for modelling COVID-19 waves and its evaluation using the empirical survival Jensen-Shannon divergence

    Get PDF
    A novel yet simple extension of the symmetric logistic distribution is proposed by introducing a skewness parameter. It is shown how the three parameters of the ensuing skew logistic distribution may be estimated using maximum likelihood. The skew logistic distribution is then extended to the skew bi-logistic distribution to allow the modelling of multiple waves in epidemic time series data. The proposed skew-logistic model is validated on COVID-19 data from the UK, and is evaluated for goodness-of-fit against the logistic and normal distributions using the recently formulated empirical survival Jensen–Shannon divergence (ESJS) and the Kolmogorov–Smirnov two-sample test statistic (KS2). We employ 95% bootstrap confidence intervals to assess the improvement in goodness-of-fit of the skew logistic distribution over the other distributions. The obtained confidence intervals for the ESJS are narrower than those for the KS2 on using this dataset, implying that the ESJS is more powerful than the KS2
    corecore