8,177 research outputs found
Multivariate Topology Simplification
Topological simplification of scalar and vector fields is well-established as an effective method for analysing and visualising complex data sets. For multivariate (alternatively, multi-field) data, topological analysis requires simultaneous advances both mathematically and computationally. We propose a robust multivariate topology simplification method based on “lip”-pruning from the Reeb space. Mathematically, we show that the projection of the Jacobi set of multivariate data into the Reeb space produces a Jacobi structure that separates the Reeb space into simple components. We also show that the dual graph of these components gives rise to a Reeb skeleton that has properties similar to the scalar contour tree and Reeb graph, for topologically simple domains. We then introduce a range measure to give a scaling-invariant total ordering of the components or features that can be used for simplification. Computationally, we show how to compute Jacobi structure, Reeb skeleton, range and geometric measures in the Joint Contour Net (an approximation of the Reeb space) and that these can be used for visualisation similar to the contour tree or Reeb graph
From Data Topology to a Modular Classifier
This article describes an approach to designing a distributed and modular
neural classifier. This approach introduces a new hierarchical clustering that
enables one to determine reliable regions in the representation space by
exploiting supervised information. A multilayer perceptron is then associated
with each of these detected clusters and charged with recognizing elements of
the associated cluster while rejecting all others. The obtained global
classifier is comprised of a set of cooperating neural networks and completed
by a K-nearest neighbor classifier charged with treating elements rejected by
all the neural networks. Experimental results for the handwritten digit
recognition problem and comparison with neural and statistical nonmodular
classifiers are given
Centrality anomalies in complex networks as a result of model over-simplification
Tremendous advances have been made in our understanding of the properties and
evolution of complex networks. These advances were initially driven by
information-poor empirical networks and theoretical analysis of unweighted and
undirected graphs. Recently, information-rich empirical data complex networks
supported the development of more sophisticated models that include edge
directionality and weight properties, and multiple layers. Many studies still
focus on unweighted undirected description of networks, prompting an essential
question: how to identify when a model is simpler than it must be? Here, we
argue that the presence of centrality anomalies in complex networks is a result
of model over-simplification. Specifically, we investigate the well-known
anomaly in betweenness centrality for transportation networks, according to
which highly connected nodes are not necessarily the most central. Using a
broad class of network models with weights and spatial constraints and four
large data sets of transportation networks, we show that the unweighted
projection of the structure of these networks can exhibit a significant
fraction of anomalous nodes compared to a random null model. However, the
weighted projection of these networks, compared with an appropriated null
model, significantly reduces the fraction of anomalies observed, suggesting
that centrality anomalies are a symptom of model over-simplification. Because
lack of information-rich data is a common challenge when dealing with complex
networks and can cause anomalies that misestimate the role of nodes in the
system, we argue that sufficiently sophisticated models be used when anomalies
are detected.Comment: 14 pages, including 9 figures. APS style. Accepted for publication in
New Journal of Physic
Exploring Causal Influences
Recent data mining techniques exploit patterns of statistical independence in multivariate data to make conjectures about cause/effect relationships. These relationships can be used to construct causal graphs, which are sometimes represented by weighted node-link diagrams, with nodes representing variables and combinations of weighted links and/or nodes showing the strength of causal relationships. We present an interactive visualization for causal graphs (ICGs), inspired in part by the Influence Explorer. The key principles of this visualization are as follows: Variables are represented with vertical bars attached to nodes in a graph. Direct manipulation of variables is achieved by sliding a variable value up and down, which reveals causality by producing instantaneous change in causally and/or probabilistically linked variables. This direct manipulation technique gives users the impression they are causally influencing the variables linked to the one they are manipulating. In this context, we demonstrate the subtle distinction between seeing and setting of variable values, and in an extended example, show how this visualization can help a user understand the relationships in a large variable set, and with some intuitions about the domain and a few basic concepts, quickly detect bugs in causal models constructed from these data mining techniques
Joint Contour Net Analysis for Feature Detection in Lattice Quantum Chromodynamics Data
In this paper we demonstrate the use of multivariate topological algorithms
to analyse and interpret Lattice Quantum Chromodynamics (QCD) data. Lattice QCD
is a long established field of theoretical physics research in the pursuit of
understanding the strong nuclear force. Complex computer simulations model
interactions between quarks and gluons to test theories regarding the behaviour
of matter in a range of extreme environments. Data sets are typically generated
using Monte Carlo methods, providing an ensemble of configurations, from which
observable averages must be computed. This presents issues with regard to
visualisation and analysis of the data as a typical ensemble study can generate
hundreds or thousands of unique configurations. We show how multivariate
topological methods, such as the Joint Contour Net, can assist physicists in
the detection and tracking of important features within their data in a
temporal setting. This enables them to focus upon the structure and
distribution of the core observables by identifying them within the surrounding
data. These techniques also demonstrate how quantitative approaches can help
understand the lifetime of objects in a dynamic system.Comment: 30 pages, 19 figures, 4 table
Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening
This work introduces a number of algebraic topology approaches, such as
multicomponent persistent homology, multi-level persistent homology and
electrostatic persistence for the representation, characterization, and
description of small molecules and biomolecular complexes. Multicomponent
persistent homology retains critical chemical and biological information during
the topological simplification of biomolecular geometric complexity.
Multi-level persistent homology enables a tailored topological description of
inter- and/or intra-molecular interactions of interest. Electrostatic
persistence incorporates partial charge information into topological
invariants. These topological methods are paired with Wasserstein distance to
characterize similarities between molecules and are further integrated with a
variety of machine learning algorithms, including k-nearest neighbors, ensemble
of trees, and deep convolutional neural networks, to manifest their descriptive
and predictive powers for chemical and biological problems. Extensive numerical
experiments involving more than 4,000 protein-ligand complexes from the PDBBind
database and near 100,000 ligands and decoys in the DUD database are performed
to test respectively the scoring power and the virtual screening power of the
proposed topological approaches. It is demonstrated that the present approaches
outperform the modern machine learning based methods in protein-ligand binding
affinity predictions and ligand-decoy discrimination
Topological summaries for Time-Varying Data
Topology has proven to be a useful tool in the current quest for ”insights on the data”, since it characterises objects through their connectivity structure, in an easy and interpretable way. More specifically, the new, but growing, field of TDA (Topological Data Analysis) deals with Persistent Homology, a multiscale version of Homology Groups summarized by the Persistence Diagram and its functional representations (Persistence Landscapes, Silhouettes etc). All of these objects, how- ever, are designed and work only for static point clouds. We define a new topological summary, the Landscape Surface, that takes into account the changes in the topology of a dynamical point cloud such as a (possibly very high dimensional) time series. We prove its continuity and its stability and, finally, we sketch a simple example
Rank-based inference for bivariate extreme-value copulas
Consider a continuous random pair whose dependence is characterized
by an extreme-value copula with Pickands dependence function . When the
marginal distributions of and are known, several consistent estimators
of are available. Most of them are variants of the estimators due to
Pickands [Bull. Inst. Internat. Statist. 49 (1981) 859--878] and
Cap\'{e}ra\`{a}, Foug\`{e}res and Genest [Biometrika 84 (1997) 567--577]. In
this paper, rank-based versions of these estimators are proposed for the more
common case where the margins of and are unknown. Results on the limit
behavior of a class of weighted bivariate empirical processes are used to show
the consistency and asymptotic normality of these rank-based estimators. Their
finite- and large-sample performance is then compared to that of their
known-margin analogues, as well as with endpoint-corrected versions thereof.
Explicit formulas and consistent estimates for their asymptotic variances are
also given.Comment: Published in at http://dx.doi.org/10.1214/08-AOS672 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Scaling of transmission capacities in coarse-grained renewable electricity networks
Network models of large-scale electricity systems feature only a limited
spatial resolution, either due to lack of data or in order to reduce the
complexity of the problem with respect to numerical calculations. In such
cases, both the network topology, the load and the generation patterns below a
given spatial scale are aggregated into representative nodes. This
coarse-graining affects power flows and thus the resulting transmission needs
of the system. We derive analytical scaling laws for measures of network
transmission capacity and cost in coarse-grained renewable electricity
networks. For the cost measure only a very weak scaling with the spatial
resolution of the system is found. The analytical results are shown to describe
the scaling of the transmission infrastructure measures for a simplified, but
data-driven and spatially detailed model of the European electricity system
with a high share of fluctuating renewable generation.Comment: to be published in EP
- …