31 research outputs found
Quantifying Homology Classes
We develop a method for measuring homology classes. This involves three
problems. First, we define the size of a homology class, using ideas from
relative homology. Second, we define an optimal basis of a homology group to be
the basis whose elements' size have the minimal sum. We provide a greedy
algorithm to compute the optimal basis and measure classes in it. The algorithm
runs in time, where is the size of the simplicial
complex and is the Betti number of the homology group. Third, we
discuss different ways of localizing homology classes and prove some hardness
results
A topological approach for protein classification
Protein function and dynamics are closely related to its sequence and
structure. However prediction of protein function and dynamics from its
sequence and structure is still a fundamental challenge in molecular biology.
Protein classification, which is typically done through measuring the
similarity be- tween proteins based on protein sequence or physical
information, serves as a crucial step toward the understanding of protein
function and dynamics. Persistent homology is a new branch of algebraic
topology that has found its success in the topological data analysis in a
variety of disciplines, including molecular biology. The present work explores
the potential of using persistent homology as an indepen- dent tool for protein
classification. To this end, we propose a molecular topological fingerprint
based support vector machine (MTF-SVM) classifier. Specifically, we construct
machine learning feature vectors solely from protein topological fingerprints,
which are topological invariants generated during the filtration process. To
validate the present MTF-SVM approach, we consider four types of problems.
First, we study protein-drug binding by using the M2 channel protein of
influenza A virus. We achieve 96% accuracy in discriminating drug bound and
unbound M2 channels. Additionally, we examine the use of MTF-SVM for the
classification of hemoglobin molecules in their relaxed and taut forms and
obtain about 80% accuracy. The identification of all alpha, all beta, and
alpha-beta protein domains is carried out in our next study using 900 proteins.
We have found a 85% success in this identifica- tion. Finally, we apply the
present technique to 55 classification tasks of protein superfamilies over 1357
samples. An average accuracy of 82% is attained. The present study establishes
computational topology as an independent and effective alternative for protein
classification
Parametrized Homology via Zigzag Persistence
This paper develops the idea of homology for 1-parameter families of
topological spaces. We express parametrized homology as a collection of real
intervals with each corresponding to a homological feature supported over that
interval or, equivalently, as a persistence diagram. By defining persistence in
terms of finite rectangle measures, we classify barcode intervals into four
classes. Each of these conveys how the homological features perish at both ends
of the interval over which they are defined
Topological Machine Learning with Persistence Indicator Functions
Techniques from computational topology, in particular persistent homology,
are becoming increasingly relevant for data analysis. Their stable metrics
permit the use of many distance-based data analysis methods, such as
multidimensional scaling, while providing a firm theoretical ground. Many
modern machine learning algorithms, however, are based on kernels. This paper
presents persistence indicator functions (PIFs), which summarize persistence
diagrams, i.e., feature descriptors in topological data analysis. PIFs can be
calculated and compared in linear time and have many beneficial properties,
such as the availability of a kernel-based similarity measure. We demonstrate
their usage in common data analysis scenarios, such as confidence set
estimation and classification of complex structured data.Comment: Topology-based Methods in Visualization 201