12,275 research outputs found
Sparse random graphs: regularization and concentration of the Laplacian
We study random graphs with possibly different edge probabilities in the
challenging sparse regime of bounded expected degrees. Unlike in the dense
case, neither the graph adjacency matrix nor its Laplacian concentrate around
their expectations due to the highly irregular distribution of node degrees. It
has been empirically observed that simply adding a constant of order to
each entry of the adjacency matrix substantially improves the behavior of
Laplacian. Here we prove that this regularization indeed forces Laplacian to
concentrate even in sparse graphs. As an immediate consequence in network
analysis, we establish the validity of one of the simplest and fastest
approaches to community detection -- regularized spectral clustering, under the
stochastic block model. Our proof of concentration of regularized Laplacian is
based on Grothendieck's inequality and factorization, combined with paving
arguments.Comment: Added reference
Accounting for the Role of Long Walks on Networks via a New Matrix Function
We introduce a new matrix function for studying graphs and real-world
networks based on a double-factorial penalization of walks between nodes in a
graph. This new matrix function is based on the matrix error function. We find
a very good approximation of this function using a matrix hyperbolic tangent
function. We derive a communicability function, a subgraph centrality and a
double-factorial Estrada index based on this new matrix function. We obtain
upper and lower bounds for the double-factorial Estrada index of graphs,
showing that they are similar to those of the single-factorial Estrada index.
We then compare these indices with the single-factorial one for simple graphs
and real-world networks. We conclude that for networks containing chordless
cycles---holes---the two penalization schemes produce significantly different
results. In particular, we study two series of real-world networks representing
urban street networks, and protein residue networks. We observe that the
subgraph centrality based on both indices produce significantly different
ranking of the nodes. The use of the double factorial penalization of walks
opens new possibilities for studying important structural properties of
real-world networks where long-walks play a fundamental role, such as the cases
of networks containing chordless cycles
Penalized Likelihood Methods for Estimation of Sparse High Dimensional Directed Acyclic Graphs
Directed acyclic graphs (DAGs) are commonly used to represent causal
relationships among random variables in graphical models. Applications of these
models arise in the study of physical, as well as biological systems, where
directed edges between nodes represent the influence of components of the
system on each other. The general problem of estimating DAGs from observed data
is computationally NP-hard, Moreover two directed graphs may be observationally
equivalent. When the nodes exhibit a natural ordering, the problem of
estimating directed graphs reduces to the problem of estimating the structure
of the network. In this paper, we propose a penalized likelihood approach that
directly estimates the adjacency matrix of DAGs. Both lasso and adaptive lasso
penalties are considered and an efficient algorithm is proposed for estimation
of high dimensional DAGs. We study variable selection consistency of the two
penalties when the number of variables grows to infinity with the sample size.
We show that although lasso can only consistently estimate the true network
under stringent assumptions, adaptive lasso achieves this task under mild
regularity conditions. The performance of the proposed methods is compared to
alternative methods in simulated, as well as real, data examples.Comment: 19 pages, 8 figure
- …