4,243 research outputs found
Considerations about multistep community detection
The problem and implications of community detection in networks have raised a
huge attention, for its important applications in both natural and social
sciences. A number of algorithms has been developed to solve this problem,
addressing either speed optimization or the quality of the partitions
calculated. In this paper we propose a multi-step procedure bridging the
fastest, but less accurate algorithms (coarse clustering), with the slowest,
most effective ones (refinement). By adopting heuristic ranking of the nodes,
and classifying a fraction of them as `critical', a refinement step can be
restricted to this subset of the network, thus saving computational time.
Preliminary numerical results are discussed, showing improvement of the final
partition.Comment: 12 page
Embedding Graphs under Centrality Constraints for Network Visualization
Visual rendering of graphs is a key task in the mapping of complex network
data. Although most graph drawing algorithms emphasize aesthetic appeal,
certain applications such as travel-time maps place more importance on
visualization of structural network properties. The present paper advocates two
graph embedding approaches with centrality considerations to comply with node
hierarchy. The problem is formulated first as one of constrained
multi-dimensional scaling (MDS), and it is solved via block coordinate descent
iterations with successive approximations and guaranteed convergence to a KKT
point. In addition, a regularization term enforcing graph smoothness is
incorporated with the goal of reducing edge crossings. A second approach
leverages the locally-linear embedding (LLE) algorithm which assumes that the
graph encodes data sampled from a low-dimensional manifold. Closed-form
solutions to the resulting centrality-constrained optimization problems are
determined yielding meaningful embeddings. Experimental results demonstrate the
efficacy of both approaches, especially for visualizing large networks on the
order of thousands of nodes.Comment: Submitted to IEEE Transactions on Visualization and Computer Graphic
Complex Networks and Symmetry I: A Review
In this review we establish various connections between complex networks and
symmetry. While special types of symmetries (e.g., automorphisms) are studied
in detail within discrete mathematics for particular classes of deterministic
graphs, the analysis of more general symmetries in real complex networks is far
less developed. We argue that real networks, as any entity characterized by
imperfections or errors, necessarily require a stochastic notion of invariance.
We therefore propose a definition of stochastic symmetry based on graph
ensembles and use it to review the main results of network theory from an
unusual perspective. The results discussed here and in a companion paper show
that stochastic symmetry highlights the most informative topological properties
of real networks, even in noisy situations unaccessible to exact techniques.Comment: Final accepted versio
Clustering and Community Detection in Directed Networks: A Survey
Networks (or graphs) appear as dominant structures in diverse domains,
including sociology, biology, neuroscience and computer science. In most of the
aforementioned cases graphs are directed - in the sense that there is
directionality on the edges, making the semantics of the edges non symmetric.
An interesting feature that real networks present is the clustering or
community structure property, under which the graph topology is organized into
modules commonly called communities or clusters. The essence here is that nodes
of the same community are highly similar while on the contrary, nodes across
communities present low similarity. Revealing the underlying community
structure of directed complex networks has become a crucial and
interdisciplinary topic with a plethora of applications. Therefore, naturally
there is a recent wealth of research production in the area of mining directed
graphs - with clustering being the primary method and tool for community
detection and evaluation. The goal of this paper is to offer an in-depth review
of the methods presented so far for clustering directed networks along with the
relevant necessary methodological background and also related applications. The
survey commences by offering a concise review of the fundamental concepts and
methodological base on which graph clustering algorithms capitalize on. Then we
present the relevant work along two orthogonal classifications. The first one
is mostly concerned with the methodological principles of the clustering
algorithms, while the second one approaches the methods from the viewpoint
regarding the properties of a good cluster in a directed network. Further, we
present methods and metrics for evaluating graph clustering results,
demonstrate interesting application domains and provide promising future
research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear
Finding Streams in Knowledge Graphs to Support Fact Checking
The volume and velocity of information that gets generated online limits
current journalistic practices to fact-check claims at the same rate.
Computational approaches for fact checking may be the key to help mitigate the
risks of massive misinformation spread. Such approaches can be designed to not
only be scalable and effective at assessing veracity of dubious claims, but
also to boost a human fact checker's productivity by surfacing relevant facts
and patterns to aid their analysis. To this end, we present a novel,
unsupervised network-flow based approach to determine the truthfulness of a
statement of fact expressed in the form of a (subject, predicate, object)
triple. We view a knowledge graph of background information about real-world
entities as a flow network, and knowledge as a fluid, abstract commodity. We
show that computational fact checking of such a triple then amounts to finding
a "knowledge stream" that emanates from the subject node and flows toward the
object node through paths connecting them. Evaluation on a range of real-world
and hand-crafted datasets of facts related to entertainment, business, sports,
geography and more reveals that this network-flow model can be very effective
in discerning true statements from false ones, outperforming existing
algorithms on many test cases. Moreover, the model is expressive in its ability
to automatically discover several useful path patterns and surface relevant
facts that may help a human fact checker corroborate or refute a claim.Comment: Extended version of the paper in proceedings of ICDM 201
LINE: Large-scale Information Network Embedding
This paper studies the problem of embedding very large information networks
into low-dimensional vector spaces, which is useful in many tasks such as
visualization, node classification, and link prediction. Most existing graph
embedding methods do not scale for real world information networks which
usually contain millions of nodes. In this paper, we propose a novel network
embedding method called the "LINE," which is suitable for arbitrary types of
information networks: undirected, directed, and/or weighted. The method
optimizes a carefully designed objective function that preserves both the local
and global network structures. An edge-sampling algorithm is proposed that
addresses the limitation of the classical stochastic gradient descent and
improves both the effectiveness and the efficiency of the inference. Empirical
experiments prove the effectiveness of the LINE on a variety of real-world
information networks, including language networks, social networks, and
citation networks. The algorithm is very efficient, which is able to learn the
embedding of a network with millions of vertices and billions of edges in a few
hours on a typical single machine. The source code of the LINE is available
online.Comment: WWW 201
Are there any good digraph width measures?
Several different measures for digraph width have appeared in the last few
years. However, none of them shares all the "nice" properties of treewidth:
First, being \emph{algorithmically useful} i.e. admitting polynomial-time
algorithms for all \MS1-definable problems on digraphs of bounded width. And,
second, having nice \emph{structural properties} i.e. being monotone under
taking subdigraphs and some form of arc contractions. As for the former,
(undirected) \MS1 seems to be the least common denominator of all reasonably
expressive logical languages on digraphs that can speak about the edge/arc
relation on the vertex set.The latter property is a necessary condition for a
width measure to be characterizable by some version of the cops-and-robber game
characterizing the ordinary treewidth. Our main result is that \emph{any
reasonable} algorithmically useful and structurally nice digraph measure cannot
be substantially different from the treewidth of the underlying undirected
graph. Moreover, we introduce \emph{directed topological minors} and argue that
they are the weakest useful notion of minors for digraphs
Large Graph Analysis in the GMine System
Current applications have produced graphs on the order of hundreds of
thousands of nodes and millions of edges. To take advantage of such graphs, one
must be able to find patterns, outliers and communities. These tasks are better
performed in an interactive environment, where human expertise can guide the
process. For large graphs, though, there are some challenges: the excessive
processing requirements are prohibitive, and drawing hundred-thousand nodes
results in cluttered images hard to comprehend. To cope with these problems, we
propose an innovative framework suited for any kind of tree-like graph visual
design. GMine integrates (a) a representation for graphs organized as
hierarchies of partitions - the concepts of SuperGraph and Graph-Tree; and (b)
a graph summarization methodology - CEPS. Our graph representation deals with
the problem of tracing the connection aspects of a graph hierarchy with sub
linear complexity, allowing one to grasp the neighborhood of a single node or
of a group of nodes in a single click. As a proof of concept, the visual
environment of GMine is instantiated as a system in which large graphs can be
investigated globally and locally
- …