162,575 research outputs found
Bayesian Fused Lasso regression for dynamic binary networks
We propose a multinomial logistic regression model for link prediction in a
time series of directed binary networks. To account for the dynamic nature of
the data we employ a dynamic model for the model parameters that is strongly
connected with the fused lasso penalty. In addition to promoting sparseness,
this prior allows us to explore the presence of change points in the structure
of the network. We introduce fast computational algorithms for estimation and
prediction using both optimization and Bayesian approaches. The performance of
the model is illustrated using simulated data and data from a financial trading
network in the NYMEX natural gas futures market. Supplementary material
containing the trading network data set and code to implement the algorithms is
available online
Large scale bias and the inaccuracy of the peak-background split
The peak-background split argument is commonly used to relate the abundance
of dark matter halos to their spatial clustering. Testing this argument
requires an accurate determination of the halo mass function. We present a
Maximum Likelihood method for fitting parametric functional forms to halo
abundances which differs from previous work because it does not require binned
counts. Our conclusions do not depend on whether we use our method or more
conventional ones. In addition, halo abundances depend on how halos are
defined. Our conclusions do not depend on the choice of link length associated
with the friends-of-friends halo-finder, nor do they change if we identify
halos using a spherical overdensity algorithm instead. The large scale halo
bias measured from the matter-halo cross spectrum b_x and the halo
autocorrelation function b_xi (on scales k~0.03h/Mpc and r ~50 Mpc/h) can
differ by as much as 5% for halos that are significantly more massive than the
characteristic mass M*. At these large masses, the peak background split
estimate of the linear bias factor b1 is 3-5% smaller than b_xi, which is 5%
smaller than b_x. We discuss the origin of these discrepancies: deterministic
nonlinear local bias, with parameters determined by the peak-background split
argument, is unable to account for the discrepancies we see. A simple linear
but nonlocal bias model, motivated by peaks theory, may also be difficult to
reconcile with our measurements. More work on such nonlocal bias models may be
needed to understand the nature of halo bias at this level of precision.Comment: MNRAS accepted. New section with Spherical Overdensity identified
halos included. Appendix enlarge
Representation Learning for Attributed Multiplex Heterogeneous Network
Network embedding (or graph embedding) has been widely used in many
real-world applications. However, existing methods mainly focus on networks
with single-typed nodes/edges and cannot scale well to handle large networks.
Many real-world networks consist of billions of nodes and edges of multiple
types, and each node is associated with different attributes. In this paper, we
formalize the problem of embedding learning for the Attributed Multiplex
Heterogeneous Network and propose a unified framework to address this problem.
The framework supports both transductive and inductive learning. We also give
the theoretical analysis of the proposed framework, showing its connection with
previous works and proving its better expressiveness. We conduct systematical
evaluations for the proposed framework on four different genres of challenging
datasets: Amazon, YouTube, Twitter, and Alibaba. Experimental results
demonstrate that with the learned embeddings from the proposed framework, we
can achieve statistically significant improvements (e.g., 5.99-28.23% lift by
F1 scores; p<<0.01, t-test) over previous state-of-the-art methods for link
prediction. The framework has also been successfully deployed on the
recommendation system of a worldwide leading e-commerce company, Alibaba Group.
Results of the offline A/B tests on product recommendation further confirm the
effectiveness and efficiency of the framework in practice.Comment: Accepted to KDD 2019. Website: https://sites.google.com/view/gatn
Network Kriging
Network service providers and customers are often concerned with aggregate
performance measures that span multiple network paths. Unfortunately, forming
such network-wide measures can be difficult, due to the issues of scale
involved. In particular, the number of paths grows too rapidly with the number
of endpoints to make exhaustive measurement practical. As a result, it is of
interest to explore the feasibility of methods that dramatically reduce the
number of paths measured in such situations while maintaining acceptable
accuracy.
We cast the problem as one of statistical prediction--in the spirit of the
so-called `kriging' problem in spatial statistics--and show that end-to-end
network properties may be accurately predicted in many cases using a
surprisingly small set of carefully chosen paths. More precisely, we formulate
a general framework for the prediction problem, propose a class of linear
predictors for standard quantities of interest (e.g., averages, totals,
differences) and show that linear algebraic methods of subset selection may be
used to effectively choose which paths to measure. We characterize the
performance of the resulting methods, both analytically and numerically. The
success of our methods derives from the low effective rank of routing matrices
as encountered in practice, which appears to be a new observation in its own
right with potentially broad implications on network measurement generally.Comment: 16 pages, 9 figures, single-space
- …