7,375 research outputs found

    Correlation, hierarchies, and networks in financial markets

    Full text link
    We discuss some methods to quantitatively investigate the properties of correlation matrices. Correlation matrices play an important role in portfolio optimization and in several other quantitative descriptions of asset price dynamics in financial markets. Specifically, we discuss how to define and obtain hierarchical trees, correlation based trees and networks from a correlation matrix. The hierarchical clustering and other procedures performed on the correlation matrix to detect statistically reliable aspects of the correlation matrix are seen as filtering procedures of the correlation matrix. We also discuss a method to associate a hierarchically nested factor model to a hierarchical tree obtained from a correlation matrix. The information retained in filtering procedures and its stability with respect to statistical fluctuations is quantified by using the Kullback-Leibler distance.Comment: 37 pages, 9 figures, 3 table

    Incomplete graphical model inference via latent tree aggregation

    Get PDF
    Graphical network inference is used in many fields such as genomics or ecology to infer the conditional independence structure between variables, from measurements of gene expression or species abundances for instance. In many practical cases, not all variables involved in the network have been observed, and the samples are actually drawn from a distribution where some variables have been marginalized out. This challenges the sparsity assumption commonly made in graphical model inference, since marginalization yields locally dense structures, even when the original network is sparse. We present a procedure for inferring Gaussian graphical models when some variables are unobserved, that accounts both for the influence of missing variables and the low density of the original network. Our model is based on the aggregation of spanning trees, and the estimation procedure on the Expectation-Maximization algorithm. We treat the graph structure and the unobserved nodes as missing variables and compute posterior probabilities of edge appearance. To provide a complete methodology, we also propose several model selection criteria to estimate the number of missing nodes. A simulation study and an illustration flow cytometry data reveal that our method has favorable edge detection properties compared to existing graph inference techniques. The methods are implemented in an R package

    Estimating the inverse trace using random forests on graphs

    Get PDF
    Some data analysis problems require the computation of (regularised) inverse traces, i.e. quantities of the form \Tr (q \bI + \bL)^{-1}. For large matrices, direct methods are unfeasible and one must resort to approximations, for example using a conjugate gradient solver combined with Girard's trace estimator (also known as Hutchinson's trace estimator). Here we describe an unbiased estimator of the regularized inverse trace, based on Wilson's algorithm, an algorithm that was initially designed to draw uniform spanning trees in graphs. Our method is fast, easy to implement, and scales to very large matrices. Its main drawback is that it is limited to diagonally dominant matrices \bL.Comment: Submitted to GRETSI conferenc

    Causal Dependence Tree Approximations of Joint Distributions for Multiple Random Processes

    Full text link
    We investigate approximating joint distributions of random processes with causal dependence tree distributions. Such distributions are particularly useful in providing parsimonious representation when there exists causal dynamics among processes. By extending the results by Chow and Liu on dependence tree approximations, we show that the best causal dependence tree approximation is the one which maximizes the sum of directed informations on its edges, where best is defined in terms of minimizing the KL-divergence between the original and the approximate distribution. Moreover, we describe a low-complexity algorithm to efficiently pick this approximate distribution.Comment: 9 pages, 15 figure

    Globally and Locally Minimal Weight Spanning Tree Networks

    Full text link
    The competition between local and global driving forces is significant in a wide variety of naturally occurring branched networks. We have investigated the impact of a global minimization criterion versus a local one on the structure of spanning trees. To do so, we consider two spanning tree structures - the generalized minimal spanning tree (GMST) defined by Dror et al. [1] and an analogous structure based on the invasion percolation network, which we term the generalized invasive spanning tree or GIST. In general, these two structures represent extremes of global and local optimality, respectively. Structural characteristics are compared between the GMST and GIST for a fixed lattice. In addition, we demonstrate a method for creating a series of structures which enable one to span the range between these two extremes. Two structural characterizations, the occupied edge density (i.e., the fraction of edges in the graph that are included in the tree) and the tortuosity of the arcs in the trees, are shown to correlate well with the degree to which an intermediate structure resembles the GMST or GIST. Both characterizations are straightforward to determine from an image and are potentially useful tools in the analysis of the formation of network structures.Comment: 23 pages, 5 figures, 2 tables, typographical error correcte

    Estimating Infection Sources in Networks Using Partial Timestamps

    Full text link
    We study the problem of identifying infection sources in a network based on the network topology, and a subset of infection timestamps. In the case of a single infection source in a tree network, we derive the maximum likelihood estimator of the source and the unknown diffusion parameters. We then introduce a new heuristic involving an optimization over a parametrized family of Gromov matrices to develop a single source estimation algorithm for general graphs. Compared with the breadth-first search tree heuristic commonly adopted in the literature, simulations demonstrate that our approach achieves better estimation accuracy than several other benchmark algorithms, even though these require more information like the diffusion parameters. We next develop a multiple sources estimation algorithm for general graphs, which first partitions the graph into source candidate clusters, and then applies our single source estimation algorithm to each cluster. We show that if the graph is a tree, then each source candidate cluster contains at least one source. Simulations using synthetic and real networks, and experiments using real-world data suggest that our proposed algorithms are able to estimate the true infection source(s) to within a small number of hops with a small portion of the infection timestamps being observed.Comment: 15 pages, 15 figures, accepted by IEEE Transactions on Information Forensics and Securit
    • …
    corecore