9,793 research outputs found
Different approaches to community detection
A precise definition of what constitutes a community in networks has remained
elusive. Consequently, network scientists have compared community detection
algorithms on benchmark networks with a particular form of community structure
and classified them based on the mathematical techniques they employ. However,
this comparison can be misleading because apparent similarities in their
mathematical machinery can disguise different reasons for why we would want to
employ community detection in the first place. Here we provide a focused review
of these different motivations that underpin community detection. This
problem-driven classification is useful in applied network science, where it is
important to select an appropriate algorithm for the given purpose. Moreover,
highlighting the different approaches to community detection also delineates
the many lines of research and points out open directions and avenues for
future research.Comment: 14 pages, 2 figures. Written as a chapter for forthcoming Advances in
network clustering and blockmodeling, and based on an extended version of The
many facets of community detection in complex networks, Appl. Netw. Sci. 2: 4
(2017) by the same author
Multidimensional approximation of nonlinear dynamical systems
A key task in the field of modeling and analyzing nonlinear dynamical systems is the recovery of unknown governing equations from measurement data only. There is a wide range of application areas for this important instance of system identification, ranging from industrial engineering and acoustic signal processing to stock market models. In order to find appropriate representations of underlying dynamical systems, various data-driven methods have been proposed by different communities. However, if the given data sets are high-dimensional, then these methods typically suffer from the curse of dimensionality. To significantly reduce the computational costs and storage consumption, we propose the method multidimensional approximation of nonlinear dynamical systems (MANDy) which combines data-driven methods with tensor network decompositions. The efficiency of the introduced approach will be illustrated with the aid of several high-dimensional nonlinear dynamical systems
Simplified Energy Landscape for Modularity Using Total Variation
Networks capture pairwise interactions between entities and are frequently
used in applications such as social networks, food networks, and protein
interaction networks, to name a few. Communities, cohesive groups of nodes,
often form in these applications, and identifying them gives insight into the
overall organization of the network. One common quality function used to
identify community structure is modularity. In Hu et al. [SIAM J. App. Math.,
73(6), 2013], it was shown that modularity optimization is equivalent to
minimizing a particular nonconvex total variation (TV) based functional over a
discrete domain. They solve this problem, assuming the number of communities is
known, using a Merriman, Bence, Osher (MBO) scheme.
We show that modularity optimization is equivalent to minimizing a convex
TV-based functional over a discrete domain, again, assuming the number of
communities is known. Furthermore, we show that modularity has no convex
relaxation satisfying certain natural conditions. We therefore, find a
manageable non-convex approximation using a Ginzburg Landau functional, which
provably converges to the correct energy in the limit of a certain parameter.
We then derive an MBO algorithm with fewer hand-tuned parameters than in Hu et
al. and which is 7 times faster at solving the associated diffusion equation
due to the fact that the underlying discretization is unconditionally stable.
Our numerical tests include a hyperspectral video whose associated graph has
2.9x10^7 edges, which is roughly 37 times larger than was handled in the paper
of Hu et al.Comment: 25 pages, 3 figures, 3 tables, submitted to SIAM J. App. Mat
Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs
Laplacian mixture models identify overlapping regions of influence in
unlabeled graph and network data in a scalable and computationally efficient
way, yielding useful low-dimensional representations. By combining Laplacian
eigenspace and finite mixture modeling methods, they provide probabilistic or
fuzzy dimensionality reductions or domain decompositions for a variety of input
data types, including mixture distributions, feature vectors, and graphs or
networks. Provable optimal recovery using the algorithm is analytically shown
for a nontrivial class of cluster graphs. Heuristic approximations for scalable
high-performance implementations are described and empirically tested.
Connections to PageRank and community detection in network analysis demonstrate
the wide applicability of this approach. The origins of fuzzy spectral methods,
beginning with generalized heat or diffusion equations in physics, are reviewed
and summarized. Comparisons to other dimensionality reduction and clustering
methods for challenging unsupervised machine learning problems are also
discussed.Comment: 13 figures, 35 reference
Data based identification and prediction of nonlinear and complex dynamical systems
We thank Dr. R. Yang (formerly at ASU), Dr. R.-Q. Su (formerly at ASU), and Mr. Zhesi Shen for their contributions to a number of original papers on which this Review is partly based. This work was supported by ARO under Grant No. W911NF-14-1-0504. W.-X. Wang was also supported by NSFC under Grants No. 61573064 and No. 61074116, as well as by the Fundamental Research Funds for the Central Universities, Beijing Nova Programme.Peer reviewedPostprin
Impact of Community Structure on Cascades
The threshold model is widely used to study the propagation of opinions and
technologies in social networks. In this model, individuals adopt the new
behavior based on how many neighbors have already chosen it. Specifically, we
consider the permanent adoption model where individuals that have adopted the
new behavior cannot change their state. We study cascades under the threshold
model on sparse random graphs with community structure to see whether the
existence of communities affects the number of individuals who finally adopt
the new behavior.
When seeding a small number of agents with the new behavior, the community
structure has little effect on the final proportion of people that adopt it,
i.e., the contagion threshold is the same as if there were just one community.
On the other hand, seeding a fraction of the population with the new behavior
has a significant impact on the cascade with the optimal seeding strategy
depending on how strongly the communities are connected. In particular, when
the communities are strongly connected, seeding in one community outperforms
the symmetric seeding strategy that seeds equally in all communities. We also
investigate the problem of optimum seeding given a budget constraint, and
propose a gradient-based heuristic seeding strategy. Our algorithm,
numerically, dispels commonly held beliefs in the literature that suggest the
best seeding strategy is to seed over the nodes with the highest number of
neighbors.Comment: Version to be published to EC 201
Bayesian nonparametric sparse VAR models
High dimensional vector autoregressive (VAR) models require a large number of
parameters to be estimated and may suffer of inferential problems. We propose a
new Bayesian nonparametric (BNP) Lasso prior (BNP-Lasso) for high-dimensional
VAR models that can improve estimation efficiency and prediction accuracy. Our
hierarchical prior overcomes overparametrization and overfitting issues by
clustering the VAR coefficients into groups and by shrinking the coefficients
of each group toward a common location. Clustering and shrinking effects
induced by the BNP-Lasso prior are well suited for the extraction of causal
networks from time series, since they account for some stylized facts in
real-world networks, which are sparsity, communities structures and
heterogeneity in the edges intensity. In order to fully capture the richness of
the data and to achieve a better understanding of financial and macroeconomic
risk, it is therefore crucial that the model used to extract network accounts
for these stylized facts.Comment: Forthcoming in "Journal of Econometrics" ---- Revised Version of the
paper "Bayesian nonparametric Seemingly Unrelated Regression Models" ----
Supplementary Material available on reques
- …