8,177 research outputs found

    Multivariate Topology Simplification

    Get PDF
    Topological simplification of scalar and vector fields is well-established as an effective method for analysing and visualising complex data sets. For multivariate (alternatively, multi-field) data, topological analysis requires simultaneous advances both mathematically and computationally. We propose a robust multivariate topology simplification method based on “lip”-pruning from the Reeb space. Mathematically, we show that the projection of the Jacobi set of multivariate data into the Reeb space produces a Jacobi structure that separates the Reeb space into simple components. We also show that the dual graph of these components gives rise to a Reeb skeleton that has properties similar to the scalar contour tree and Reeb graph, for topologically simple domains. We then introduce a range measure to give a scaling-invariant total ordering of the components or features that can be used for simplification. Computationally, we show how to compute Jacobi structure, Reeb skeleton, range and geometric measures in the Joint Contour Net (an approximation of the Reeb space) and that these can be used for visualisation similar to the contour tree or Reeb graph

    From Data Topology to a Modular Classifier

    Full text link
    This article describes an approach to designing a distributed and modular neural classifier. This approach introduces a new hierarchical clustering that enables one to determine reliable regions in the representation space by exploiting supervised information. A multilayer perceptron is then associated with each of these detected clusters and charged with recognizing elements of the associated cluster while rejecting all others. The obtained global classifier is comprised of a set of cooperating neural networks and completed by a K-nearest neighbor classifier charged with treating elements rejected by all the neural networks. Experimental results for the handwritten digit recognition problem and comparison with neural and statistical nonmodular classifiers are given

    Centrality anomalies in complex networks as a result of model over-simplification

    Get PDF
    Tremendous advances have been made in our understanding of the properties and evolution of complex networks. These advances were initially driven by information-poor empirical networks and theoretical analysis of unweighted and undirected graphs. Recently, information-rich empirical data complex networks supported the development of more sophisticated models that include edge directionality and weight properties, and multiple layers. Many studies still focus on unweighted undirected description of networks, prompting an essential question: how to identify when a model is simpler than it must be? Here, we argue that the presence of centrality anomalies in complex networks is a result of model over-simplification. Specifically, we investigate the well-known anomaly in betweenness centrality for transportation networks, according to which highly connected nodes are not necessarily the most central. Using a broad class of network models with weights and spatial constraints and four large data sets of transportation networks, we show that the unweighted projection of the structure of these networks can exhibit a significant fraction of anomalous nodes compared to a random null model. However, the weighted projection of these networks, compared with an appropriated null model, significantly reduces the fraction of anomalies observed, suggesting that centrality anomalies are a symptom of model over-simplification. Because lack of information-rich data is a common challenge when dealing with complex networks and can cause anomalies that misestimate the role of nodes in the system, we argue that sufficiently sophisticated models be used when anomalies are detected.Comment: 14 pages, including 9 figures. APS style. Accepted for publication in New Journal of Physic

    Exploring Causal Influences

    Get PDF
    Recent data mining techniques exploit patterns of statistical independence in multivariate data to make conjectures about cause/effect relationships. These relationships can be used to construct causal graphs, which are sometimes represented by weighted node-link diagrams, with nodes representing variables and combinations of weighted links and/or nodes showing the strength of causal relationships. We present an interactive visualization for causal graphs (ICGs), inspired in part by the Influence Explorer. The key principles of this visualization are as follows: Variables are represented with vertical bars attached to nodes in a graph. Direct manipulation of variables is achieved by sliding a variable value up and down, which reveals causality by producing instantaneous change in causally and/or probabilistically linked variables. This direct manipulation technique gives users the impression they are causally influencing the variables linked to the one they are manipulating. In this context, we demonstrate the subtle distinction between seeing and setting of variable values, and in an extended example, show how this visualization can help a user understand the relationships in a large variable set, and with some intuitions about the domain and a few basic concepts, quickly detect bugs in causal models constructed from these data mining techniques

    Joint Contour Net Analysis for Feature Detection in Lattice Quantum Chromodynamics Data

    Get PDF
    In this paper we demonstrate the use of multivariate topological algorithms to analyse and interpret Lattice Quantum Chromodynamics (QCD) data. Lattice QCD is a long established field of theoretical physics research in the pursuit of understanding the strong nuclear force. Complex computer simulations model interactions between quarks and gluons to test theories regarding the behaviour of matter in a range of extreme environments. Data sets are typically generated using Monte Carlo methods, providing an ensemble of configurations, from which observable averages must be computed. This presents issues with regard to visualisation and analysis of the data as a typical ensemble study can generate hundreds or thousands of unique configurations. We show how multivariate topological methods, such as the Joint Contour Net, can assist physicists in the detection and tracking of important features within their data in a temporal setting. This enables them to focus upon the structure and distribution of the core observables by identifying them within the surrounding data. These techniques also demonstrate how quantitative approaches can help understand the lifetime of objects in a dynamic system.Comment: 30 pages, 19 figures, 4 table

    Representability of algebraic topology for biomolecules in machine learning based scoring and virtual screening

    Full text link
    This work introduces a number of algebraic topology approaches, such as multicomponent persistent homology, multi-level persistent homology and electrostatic persistence for the representation, characterization, and description of small molecules and biomolecular complexes. Multicomponent persistent homology retains critical chemical and biological information during the topological simplification of biomolecular geometric complexity. Multi-level persistent homology enables a tailored topological description of inter- and/or intra-molecular interactions of interest. Electrostatic persistence incorporates partial charge information into topological invariants. These topological methods are paired with Wasserstein distance to characterize similarities between molecules and are further integrated with a variety of machine learning algorithms, including k-nearest neighbors, ensemble of trees, and deep convolutional neural networks, to manifest their descriptive and predictive powers for chemical and biological problems. Extensive numerical experiments involving more than 4,000 protein-ligand complexes from the PDBBind database and near 100,000 ligands and decoys in the DUD database are performed to test respectively the scoring power and the virtual screening power of the proposed topological approaches. It is demonstrated that the present approaches outperform the modern machine learning based methods in protein-ligand binding affinity predictions and ligand-decoy discrimination

    Topological summaries for Time-Varying Data

    Get PDF
    Topology has proven to be a useful tool in the current quest for ”insights on the data”, since it characterises objects through their connectivity structure, in an easy and interpretable way. More specifically, the new, but growing, field of TDA (Topological Data Analysis) deals with Persistent Homology, a multiscale version of Homology Groups summarized by the Persistence Diagram and its functional representations (Persistence Landscapes, Silhouettes etc). All of these objects, how- ever, are designed and work only for static point clouds. We define a new topological summary, the Landscape Surface, that takes into account the changes in the topology of a dynamical point cloud such as a (possibly very high dimensional) time series. We prove its continuity and its stability and, finally, we sketch a simple example

    Rank-based inference for bivariate extreme-value copulas

    Full text link
    Consider a continuous random pair (X,Y)(X,Y) whose dependence is characterized by an extreme-value copula with Pickands dependence function AA. When the marginal distributions of XX and YY are known, several consistent estimators of AA are available. Most of them are variants of the estimators due to Pickands [Bull. Inst. Internat. Statist. 49 (1981) 859--878] and Cap\'{e}ra\`{a}, Foug\`{e}res and Genest [Biometrika 84 (1997) 567--577]. In this paper, rank-based versions of these estimators are proposed for the more common case where the margins of XX and YY are unknown. Results on the limit behavior of a class of weighted bivariate empirical processes are used to show the consistency and asymptotic normality of these rank-based estimators. Their finite- and large-sample performance is then compared to that of their known-margin analogues, as well as with endpoint-corrected versions thereof. Explicit formulas and consistent estimates for their asymptotic variances are also given.Comment: Published in at http://dx.doi.org/10.1214/08-AOS672 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Scaling of transmission capacities in coarse-grained renewable electricity networks

    Full text link
    Network models of large-scale electricity systems feature only a limited spatial resolution, either due to lack of data or in order to reduce the complexity of the problem with respect to numerical calculations. In such cases, both the network topology, the load and the generation patterns below a given spatial scale are aggregated into representative nodes. This coarse-graining affects power flows and thus the resulting transmission needs of the system. We derive analytical scaling laws for measures of network transmission capacity and cost in coarse-grained renewable electricity networks. For the cost measure only a very weak scaling with the spatial resolution of the system is found. The analytical results are shown to describe the scaling of the transmission infrastructure measures for a simplified, but data-driven and spatially detailed model of the European electricity system with a high share of fluctuating renewable generation.Comment: to be published in EP
    • …
    corecore