87 research outputs found

    Interpreting a Classical Geometric Proof with Interactive Realizability.

    Get PDF
    We show how to extract a monotonic learning algorithm from a classical proof of a geometric statement by interpreting the proof by means of interactive realizability, a realizability sematics for classical logic. The statement is about the existence of a convex angle including a finite collections of points in the real plane and it is related to the existence of a convex hull. We define real numbers as Cauchy sequences of rational numbers, therefore equality and ordering are not decidable. While the proof looks superficially constructive, it employs classical reasoning to handle undecidable comparisons between real numbers, making the underlying algorithm non-effective. The interactive realizability interpretation transforms the non-effective linear algorithm described by the proof into an effective one that uses backtracking to learn from its mistakes. The effective algorithm exhibits a "smart" behavior, performing comparisons only up to the precision required to prove the final statement. This behavior is not explicitly planned but arises from the interactive interpretation of comparisons between Cauchy sequences

    Challenges in predicting stabilizing variations: An exploration

    Get PDF
    An open challenge of computational and experimental biology is understanding the impact of non-synonymous DNA variations on protein function and, subsequently, human health. The effects of these variants on protein stability can be measured as the difference in the free energy of unfolding (ΔΔG) between the mutated structure of the protein and its wild-type form. Throughout the years, bioinformaticians have developed a wide variety of tools and approaches to predict the ΔΔG. Although the performance of these tools is highly variable, overall they are less accurate in predicting ΔΔG stabilizing variations rather than the destabilizing ones. Here, we analyze the possible reasons for this difference by focusing on the relationship between experimentally-measured ΔΔG and seven protein properties on three widely-used datasets (S2648, VariBench, Ssym) and a recently introduced one (S669). These properties include protein structural information, different physical properties and statistical potentials. We found that two highly used input features, i.e., hydrophobicity and the Blosum62 substitution matrix, show a performance close to random choice when trying to separate stabilizing variants from either neutral or destabilizing ones. We then speculate that, since destabilizing variations are the most abundant class in the available datasets, the overall performance of the methods is higher when including features that improve the prediction for the destabilizing variants at the expense of the stabilizing ones. These findings highlight the need of designing predictive methods able to exploit also input features highly correlated with the stabilizing variants. New tools should also be tested on a not-artificially balanced dataset, reporting the performance on all the three classes (i.e., stabilizing, neutral and destabilizing variants) and not only the overall results

    Protein Stability Perturbation Contributes to the Loss of Function in Haploinsufficient Genes

    Get PDF
    Missense variants are among the most studied genome modifications as disease biomarkers. It has been shown that the \u201cperturbation\u201d of the protein stability upon a missense variant (in terms of absolute \u394\u394G value, i.e., |\u394\u394G|) has a significant, but not predictive, correlation with the pathogenicity of that variant. However, here we show that this correlation becomes significantly amplified in haploinsufficient genes. Moreover, the enrichment of pathogenic variants increases at the increasing protein stability perturbation value. These findings suggest that protein stability perturbation might be considered as a potential cofactor in diseases associated with haploinsufficient genes reporting missense variants

    How good Neural Networks interpretation methods really are? A quantitative benchmark

    Full text link
    Saliency Maps (SMs) have been extensively used to interpret deep learning models decision by highlighting the features deemed relevant by the model. They are used on highly nonlinear problems, where linear feature selection (FS) methods fail at highlighting relevant explanatory variables. However, the reliability of gradient-based feature attribution methods such as SM has mostly been only qualitatively (visually) assessed, and quantitative benchmarks are currently missing, partially due to the lack of a definite ground truth on image data. Concerned about the apophenic biases introduced by visual assessment of these methods, in this paper we propose a synthetic quantitative benchmark for Neural Networks (NNs) interpretation methods. For this purpose, we built synthetic datasets with nonlinearly separable classes and increasing number of decoy (random) features, illustrating the challenge of FS in high-dimensional settings. We also compare these methods to conventional approaches such as mRMR or Random Forests. Our results show that our simple synthetic datasets are sufficient to challenge most of the benchmarked methods. TreeShap, mRMR and LassoNet are the best performing FS methods. We also show that, when quantifying the relevance of a few non linearly-entangled predictive features diluted in a large number of irrelevant noisy variables, neural network-based FS and interpretation methods are still far from being reliable
    • …
    corecore