6,810 research outputs found
Uncertainty in phylogenetic tree estimates
Estimating phylogenetic trees is an important problem in evolutionary
biology, environmental policy and medicine. Although trees are estimated, their
uncertainties are discarded by mathematicians working in tree space. Here we
explicitly model the multivariate uncertainty of tree estimates. We consider
both the cases where uncertainty information arises extrinsically (through
covariate information) and intrinsically (through the tree estimates
themselves). The importance of accounting for tree uncertainty in tree space is
demonstrated in two case studies. In the first instance, differences between
gene trees are small relative to their uncertainties, while in the second, the
differences are relatively large. Our main goal is visualization of tree
uncertainty, and we demonstrate advantages of our method with respect to
reproducibility, speed and preservation of topological differences compared to
visualization based on multidimensional scaling. The proposal highlights that
phylogenetic trees are estimated in an extremely high-dimensional space,
resulting in uncertainty information that cannot be discarded. Most
importantly, it is a method that allows biologists to diagnose whether
differences between gene trees are biologically meaningful, or due to
uncertainty in estimation.Comment: Final version accepted to Journal of Computational and Graphical
Statistic
Persistent topology for natural data analysis - A survey
Natural data offer a hard challenge to data analysis. One set of tools is
being developed by several teams to face this difficult task: Persistent
topology. After a brief introduction to this theory, some applications to the
analysis and classification of cells, lesions, music pieces, gait, oil and gas
reservoirs, cyclones, galaxies, bones, brain connections, languages,
handwritten and gestured letters are shown
Coordinating views for data visualisation and algorithmic profiling
A number of researchers have designed visualisation systems that consist of multiple components, through which data and interaction commands flow. Such multistage (hybrid) models can be used to reduce algorithmic complexity, and to open up intermediate stages of algorithms for inspection and steering. In this paper, we present work on aiding the developer and the user of such algorithms through the application of interactive visualisation techniques. We present a set of tools designed to profile the performance of other visualisation components, and provide further functionality for the exploration of high dimensional data sets. Case studies are provided, illustrating the application of the profiling modules to a number of data sets. Through this work we are exploring ways in which techniques traditionally used to prepare for visualisation runs, and to retrospectively analyse them, can find new uses within the context of a multi-component visualisation system
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
- …