7,375 research outputs found
Correlation, hierarchies, and networks in financial markets
We discuss some methods to quantitatively investigate the properties of
correlation matrices. Correlation matrices play an important role in portfolio
optimization and in several other quantitative descriptions of asset price
dynamics in financial markets. Specifically, we discuss how to define and
obtain hierarchical trees, correlation based trees and networks from a
correlation matrix. The hierarchical clustering and other procedures performed
on the correlation matrix to detect statistically reliable aspects of the
correlation matrix are seen as filtering procedures of the correlation matrix.
We also discuss a method to associate a hierarchically nested factor model to a
hierarchical tree obtained from a correlation matrix. The information retained
in filtering procedures and its stability with respect to statistical
fluctuations is quantified by using the Kullback-Leibler distance.Comment: 37 pages, 9 figures, 3 table
Incomplete graphical model inference via latent tree aggregation
Graphical network inference is used in many fields such as genomics or
ecology to infer the conditional independence structure between variables, from
measurements of gene expression or species abundances for instance. In many
practical cases, not all variables involved in the network have been observed,
and the samples are actually drawn from a distribution where some variables
have been marginalized out. This challenges the sparsity assumption commonly
made in graphical model inference, since marginalization yields locally dense
structures, even when the original network is sparse. We present a procedure
for inferring Gaussian graphical models when some variables are unobserved,
that accounts both for the influence of missing variables and the low density
of the original network. Our model is based on the aggregation of spanning
trees, and the estimation procedure on the Expectation-Maximization algorithm.
We treat the graph structure and the unobserved nodes as missing variables and
compute posterior probabilities of edge appearance. To provide a complete
methodology, we also propose several model selection criteria to estimate the
number of missing nodes. A simulation study and an illustration flow cytometry
data reveal that our method has favorable edge detection properties compared to
existing graph inference techniques. The methods are implemented in an R
package
Estimating the inverse trace using random forests on graphs
Some data analysis problems require the computation of (regularised) inverse
traces, i.e. quantities of the form \Tr (q \bI + \bL)^{-1}. For large
matrices, direct methods are unfeasible and one must resort to approximations,
for example using a conjugate gradient solver combined with Girard's trace
estimator (also known as Hutchinson's trace estimator). Here we describe an
unbiased estimator of the regularized inverse trace, based on Wilson's
algorithm, an algorithm that was initially designed to draw uniform spanning
trees in graphs. Our method is fast, easy to implement, and scales to very
large matrices. Its main drawback is that it is limited to diagonally dominant
matrices \bL.Comment: Submitted to GRETSI conferenc
Causal Dependence Tree Approximations of Joint Distributions for Multiple Random Processes
We investigate approximating joint distributions of random processes with
causal dependence tree distributions. Such distributions are particularly
useful in providing parsimonious representation when there exists causal
dynamics among processes. By extending the results by Chow and Liu on
dependence tree approximations, we show that the best causal dependence tree
approximation is the one which maximizes the sum of directed informations on
its edges, where best is defined in terms of minimizing the KL-divergence
between the original and the approximate distribution. Moreover, we describe a
low-complexity algorithm to efficiently pick this approximate distribution.Comment: 9 pages, 15 figure
Globally and Locally Minimal Weight Spanning Tree Networks
The competition between local and global driving forces is significant in a
wide variety of naturally occurring branched networks. We have investigated the
impact of a global minimization criterion versus a local one on the structure
of spanning trees. To do so, we consider two spanning tree structures - the
generalized minimal spanning tree (GMST) defined by Dror et al. [1] and an
analogous structure based on the invasion percolation network, which we term
the generalized invasive spanning tree or GIST. In general, these two
structures represent extremes of global and local optimality, respectively.
Structural characteristics are compared between the GMST and GIST for a fixed
lattice. In addition, we demonstrate a method for creating a series of
structures which enable one to span the range between these two extremes. Two
structural characterizations, the occupied edge density (i.e., the fraction of
edges in the graph that are included in the tree) and the tortuosity of the
arcs in the trees, are shown to correlate well with the degree to which an
intermediate structure resembles the GMST or GIST. Both characterizations are
straightforward to determine from an image and are potentially useful tools in
the analysis of the formation of network structures.Comment: 23 pages, 5 figures, 2 tables, typographical error correcte
Estimating Infection Sources in Networks Using Partial Timestamps
We study the problem of identifying infection sources in a network based on
the network topology, and a subset of infection timestamps. In the case of a
single infection source in a tree network, we derive the maximum likelihood
estimator of the source and the unknown diffusion parameters. We then introduce
a new heuristic involving an optimization over a parametrized family of Gromov
matrices to develop a single source estimation algorithm for general graphs.
Compared with the breadth-first search tree heuristic commonly adopted in the
literature, simulations demonstrate that our approach achieves better
estimation accuracy than several other benchmark algorithms, even though these
require more information like the diffusion parameters. We next develop a
multiple sources estimation algorithm for general graphs, which first
partitions the graph into source candidate clusters, and then applies our
single source estimation algorithm to each cluster. We show that if the graph
is a tree, then each source candidate cluster contains at least one source.
Simulations using synthetic and real networks, and experiments using real-world
data suggest that our proposed algorithms are able to estimate the true
infection source(s) to within a small number of hops with a small portion of
the infection timestamps being observed.Comment: 15 pages, 15 figures, accepted by IEEE Transactions on Information
Forensics and Securit
- …