259,816 research outputs found
An Epistemic Approach to the Formal Specification of Statistical Machine Learning
We propose an epistemic approach to formalizing statistical properties of
machine learning. Specifically, we introduce a formal model for supervised
learning based on a Kripke model where each possible world corresponds to a
possible dataset and modal operators are interpreted as transformation and
testing on datasets. Then we formalize various notions of the classification
performance, robustness, and fairness of statistical classifiers by using our
extension of statistical epistemic logic (StatEL). In this formalization, we
show relationships among properties of classifiers, and relevance between
classification performance and robustness. As far as we know, this is the first
work that uses epistemic models and logical formulas to express statistical
properties of machine learning, and would be a starting point to develop
theories of formal specification of machine learning.Comment: Accepted in Software and Systems Modeling https://rdcu.be/b7ssR This
paper is the journal version of the SEFM'19 conference paper arxiv:1907.1032
Robust detection and attribution of climate change under interventions
Fingerprints are key tools in climate change detection and attribution (D&A)
that are used to determine whether changes in observations are different from
internal climate variability (detection), and whether observed changes can be
assigned to specific external drivers (attribution). We propose a direct D&A
approach based on supervised learning to extract fingerprints that lead to
robust predictions under relevant interventions on exogenous variables, i.e.,
climate drivers other than the target. We employ anchor regression, a
distributionally-robust statistical learning method inspired by causal
inference that extrapolates well to perturbed data under the interventions
considered. The residuals from the prediction achieve either uncorrelatedness
or mean independence with the exogenous variables, thus guaranteeing
robustness. We define D&A as a unified hypothesis testing framework that relies
on the same statistical model but uses different targets and test statistics.
In the experiments, we first show that the CO2 forcing can be robustly
predicted from temperature spatial patterns under strong interventions on the
solar forcing. Second, we illustrate attribution to the greenhouse gases and
aerosols while protecting against interventions on the aerosols and CO2
forcing, respectively. Our study shows that incorporating robustness
constraints against relevant interventions may significantly benefit detection
and attribution of climate change
Model checking the evolution of gene regulatory networks
The behaviour of gene regulatory networks (GRNs) is typically analysed using simulation-based statistical testing-like methods. In this paper, we demonstrate that we can replace this approach by a formal verification-like method that gives higher assurance and scalability. We focus on Wagner’s weighted GRN model with varying weights, which is used in evolutionary biology. In the model, weight parameters represent the gene interaction strength that may change due to genetic mutations. For a property of interest, we synthesise the constraints over the parameter space that represent the set of GRNs satisfying the property. We experimentally show that our parameter synthesis procedure computes the mutational robustness of GRNs—an important problem of interest in evolutionary biology—more efficiently than the classical simulation method. We specify the property in linear temporal logic. We employ symbolic bounded model checking and SMT solving to compute the space of GRNs that satisfy the property, which amounts to synthesizing a set of linear constraints on the weights
Sickle cell disease classification using deep learning
This paper presents a transfer and deep learning based approach to the classification of Sickle Cell Disease (SCD). Five transfer learning models such as ResNet-50, AlexNet, MobileNet, VGG-16 and VGG-19, and a sequential convolutional neural network (CNN) have been implemented for SCD classification. ErythrocytesIDB dataset has been used for training and testing the models. In order to make up for the data insufficiency of the erythrocytesIDB dataset, advanced image augmentation techniques are employed to ensure the robustness of the dataset, enhance dataset diversity and improve the accuracy of the models. An ablation experiment using Random Forest and Support Vector Machine (SVM) classifiers along with various hyperparameter tweaking was carried out to determine the contribution of different model elements on their predicted accuracy. A rigorous statistical analysis was carried out for evaluation and to further evaluate the model's robustness, an adversarial attack test was conducted. The experimental results demonstrate compelling performance across all models. After performing the statistical tests, it was observed that MobileNet showed a significant improvement (p = 0.0229), while other models (ResNet-50, AlexNet, VGG-16, VGG-19) did not (p > 0.05). Notably, the ResNet-50 model achieves remarkable precision, recall, and F1-score values of 100 % for circular, elongated, and other cell shapes when experimented with a smaller dataset. The AlexNet model achieves a balanced precision (98 %) and recall (99 %) for circular and elongated shapes. Meanwhile, the other models showcase competitive performance. [Abstract copyright: © 2023 The Authors. Published by Elsevier Ltd.
- …