84,335 research outputs found
Clustering and Community Detection in Directed Networks: A Survey
Networks (or graphs) appear as dominant structures in diverse domains,
including sociology, biology, neuroscience and computer science. In most of the
aforementioned cases graphs are directed - in the sense that there is
directionality on the edges, making the semantics of the edges non symmetric.
An interesting feature that real networks present is the clustering or
community structure property, under which the graph topology is organized into
modules commonly called communities or clusters. The essence here is that nodes
of the same community are highly similar while on the contrary, nodes across
communities present low similarity. Revealing the underlying community
structure of directed complex networks has become a crucial and
interdisciplinary topic with a plethora of applications. Therefore, naturally
there is a recent wealth of research production in the area of mining directed
graphs - with clustering being the primary method and tool for community
detection and evaluation. The goal of this paper is to offer an in-depth review
of the methods presented so far for clustering directed networks along with the
relevant necessary methodological background and also related applications. The
survey commences by offering a concise review of the fundamental concepts and
methodological base on which graph clustering algorithms capitalize on. Then we
present the relevant work along two orthogonal classifications. The first one
is mostly concerned with the methodological principles of the clustering
algorithms, while the second one approaches the methods from the viewpoint
regarding the properties of a good cluster in a directed network. Further, we
present methods and metrics for evaluating graph clustering results,
demonstrate interesting application domains and provide promising future
research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear
Strongly Hierarchical Factorization Machines and ANOVA Kernel Regression
High-order parametric models that include terms for feature interactions are
applied to various data mining tasks, where ground truth depends on
interactions of features. However, with sparse data, the high- dimensional
parameters for feature interactions often face three issues: expensive
computation, difficulty in parameter estimation and lack of structure. Previous
work has proposed approaches which can partially re- solve the three issues. In
particular, models with factorized parameters (e.g. Factorization Machines) and
sparse learning algorithms (e.g. FTRL-Proximal) can tackle the first two issues
but fail to address the third. Regarding to unstructured parameters,
constraints or complicated regularization terms are applied such that
hierarchical structures can be imposed. However, these methods make the
optimization problem more challenging. In this work, we propose Strongly
Hierarchical Factorization Machines and ANOVA kernel regression where all the
three issues can be addressed without making the optimization problem more
difficult. Experimental results show the proposed models significantly
outperform the state-of-the-art in two data mining tasks: cold-start user
response time prediction and stock volatility prediction.Comment: 9 pages, to appear in SDM'1
Outlier Detection from Network Data with Subnetwork Interpretation
Detecting a small number of outliers from a set of data observations is
always challenging. This problem is more difficult in the setting of multiple
network samples, where computing the anomalous degree of a network sample is
generally not sufficient. In fact, explaining why the network is exceptional,
expressed in the form of subnetwork, is also equally important. In this paper,
we develop a novel algorithm to address these two key problems. We treat each
network sample as a potential outlier and identify subnetworks that mostly
discriminate it from nearby regular samples. The algorithm is developed in the
framework of network regression combined with the constraints on both network
topology and L1-norm shrinkage to perform subnetwork discovery. Our method thus
goes beyond subspace/subgraph discovery and we show that it converges to a
global optimum. Evaluation on various real-world network datasets demonstrates
that our algorithm not only outperforms baselines in both network and high
dimensional setting, but also discovers highly relevant and interpretable local
subnetworks, further enhancing our understanding of anomalous networks
- …