2,542 research outputs found

    Interpretable statistics for complex modelling: quantile and topological learning

    Get PDF
    As the complexity of our data increased exponentially in the last decades, so has our need for interpretable features. This thesis revolves around two paradigms to approach this quest for insights. In the first part we focus on parametric models, where the problem of interpretability can be seen as a “parametrization selection”. We introduce a quantile-centric parametrization and we show the advantages of our proposal in the context of regression, where it allows to bridge the gap between classical generalized linear (mixed) models and increasingly popular quantile methods. The second part of the thesis, concerned with topological learning, tackles the problem from a non-parametric perspective. As topology can be thought of as a way of characterizing data in terms of their connectivity structure, it allows to represent complex and possibly high dimensional through few features, such as the number of connected components, loops and voids. We illustrate how the emerging branch of statistics devoted to recovering topological structures in the data, Topological Data Analysis, can be exploited both for exploratory and inferential purposes with a special emphasis on kernels that preserve the topological information in the data. Finally, we show with an application how these two approaches can borrow strength from one another in the identification and description of brain activity through fMRI data from the ABIDE project

    Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

    Full text link
    This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination

    On the stability of persistent entropy and new summary functions for Topological Data Analysis

    Get PDF
    Persistent entropy of persistence barcodes, which is based on the Shannon entropy, has been recently defined and successfully applied to different scenarios: characterization of the idiotypic immune network, detection of the transition between the preictal and ictal states in EEG signals, or the classification problem of real long-length noisy signals of DC electrical motors, to name a few. In this paper, we study properties of persistent entropy and prove its stability under small perturbations in the given input data. From this concept, we define three summary functions and show how to use them to detect patterns and topological features

    Mathematics in Medical Diagnostics - 2022 Proceedings of the 4th International Conference on Trauma Surgery Technology

    Get PDF
    The 4th event of the Giessen International Conference Series on Trauma Surgery Technology took place on April, the 23rd 2022 in Warsaw, Poland. It aims to bring together practical application research, with a focus on medical imaging, and the TDA experts from Warsaw. This publication contains details of our presentations and discussions

    Topological Learning for Brain Networks

    Full text link
    This paper proposes a novel topological learning framework that can integrate networks of different sizes and topology through persistent homology. This is possible through the introduction of a new topological loss function that enables such challenging task. The use of the proposed loss function bypasses the intrinsic computational bottleneck associated with matching networks. We validate the method in extensive statistical simulations with ground truth to assess the effectiveness of the topological loss in discriminating networks with different topology. The method is further applied to a twin brain imaging study in determining if the brain network is genetically heritable. The challenge is in overlaying the topologically different functional brain networks obtained from the resting-state functional MRI (fMRI) onto the template structural brain network obtained through the diffusion MRI (dMRI)

    A topological approach for protein classification

    Full text link
    Protein function and dynamics are closely related to its sequence and structure. However prediction of protein function and dynamics from its sequence and structure is still a fundamental challenge in molecular biology. Protein classification, which is typically done through measuring the similarity be- tween proteins based on protein sequence or physical information, serves as a crucial step toward the understanding of protein function and dynamics. Persistent homology is a new branch of algebraic topology that has found its success in the topological data analysis in a variety of disciplines, including molecular biology. The present work explores the potential of using persistent homology as an indepen- dent tool for protein classification. To this end, we propose a molecular topological fingerprint based support vector machine (MTF-SVM) classifier. Specifically, we construct machine learning feature vectors solely from protein topological fingerprints, which are topological invariants generated during the filtration process. To validate the present MTF-SVM approach, we consider four types of problems. First, we study protein-drug binding by using the M2 channel protein of influenza A virus. We achieve 96% accuracy in discriminating drug bound and unbound M2 channels. Additionally, we examine the use of MTF-SVM for the classification of hemoglobin molecules in their relaxed and taut forms and obtain about 80% accuracy. The identification of all alpha, all beta, and alpha-beta protein domains is carried out in our next study using 900 proteins. We have found a 85% success in this identifica- tion. Finally, we apply the present technique to 55 classification tasks of protein superfamilies over 1357 samples. An average accuracy of 82% is attained. The present study establishes computational topology as an independent and effective alternative for protein classification
    • …
    corecore