2,611 research outputs found
Efficiently Clustering Very Large Attributed Graphs
Attributed graphs model real networks by enriching their nodes with
attributes accounting for properties. Several techniques have been proposed for
partitioning these graphs into clusters that are homogeneous with respect to
both semantic attributes and to the structure of the graph. However, time and
space complexities of state of the art algorithms limit their scalability to
medium-sized graphs. We propose SToC (for Semantic-Topological Clustering), a
fast and scalable algorithm for partitioning large attributed graphs. The
approach is robust, being compatible both with categorical and with
quantitative attributes, and it is tailorable, allowing the user to weight the
semantic and topological components. Further, the approach does not require the
user to guess in advance the number of clusters. SToC relies on well known
approximation techniques such as bottom-k sketches, traditional graph-theoretic
concepts, and a new perspective on the composition of heterogeneous distance
measures. Experimental results demonstrate its ability to efficiently compute
high-quality partitions of large scale attributed graphs.Comment: This work has been published in ASONAM 2017. This version includes an
appendix with validation of our attribute model and distance function,
omitted in the converence version for lack of space. Please refer to the
published versio
Context Selection on Attributed Graphs for Outlier and Community Detection
Today\u27s applications store large amounts of complex data that combine information of different types. Attributed graphs are an example for such a complex database where each object is characterized by its relationships to other objects and its individual properties. Specifically, each node in an attributed graph may be characterized by a large number of attributes. In this thesis, we present different approaches for mining such high dimensional attributed graphs
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
A Fast and Efficient Incremental Approach toward Dynamic Community Detection
Community detection is a discovery tool used by network scientists to analyze
the structure of real-world networks. It seeks to identify natural divisions
that may exist in the input networks that partition the vertices into coherent
modules (or communities). While this problem space is rich with efficient
algorithms and software, most of this literature caters to the static use-case
where the underlying network does not change. However, many emerging real-world
use-cases give rise to a need to incorporate dynamic graphs as inputs.
In this paper, we present a fast and efficient incremental approach toward
dynamic community detection. The key contribution is a generic technique called
, which examines the most recent batch of changes made to an
input graph and selects a subset of vertices to reevaluate for potential
community (re)assignment. This technique can be incorporated into any of the
community detection methods that use modularity as its objective function for
clustering. For demonstration purposes, we incorporated the technique into two
well-known community detection tools. Our experiments demonstrate that our new
incremental approach is able to generate performance speedups without
compromising on the output quality (despite its heuristic nature). For
instance, on a real-world network with 63M temporal edges (over 12 time steps),
our approach was able to complete in 1056 seconds, yielding a 3x speedup over a
baseline implementation. In addition to demonstrating the performance benefits,
we also show how to use our approach to delineate appropriate intervals of
temporal resolutions at which to analyze an input network
The Dynamics of Vehicular Networks in Urban Environments
Vehicular Ad hoc NETworks (VANETs) have emerged as a platform to support
intelligent inter-vehicle communication and improve traffic safety and
performance. The road-constrained, high mobility of vehicles, their unbounded
power source, and the emergence of roadside wireless infrastructures make
VANETs a challenging research topic. A key to the development of protocols for
inter-vehicle communication and services lies in the knowledge of the
topological characteristics of the VANET communication graph. This paper
explores the dynamics of VANETs in urban environments and investigates the
impact of these findings in the design of VANET routing protocols. Using both
real and realistic mobility traces, we study the networking shape of VANETs
under different transmission and market penetration ranges. Given that a number
of RSUs have to be deployed for disseminating information to vehicles in an
urban area, we also study their impact on vehicular connectivity. Through
extensive simulations we investigate the performance of VANET routing protocols
by exploiting the knowledge of VANET graphs analysis.Comment: Revised our testbed with even more realistic mobility traces. Used
the location of real Wi-Fi hotspots to simulate RSUs in our study. Used a
larger, real mobility trace set, from taxis in Shanghai. Examine the
implications of our findings in the design of VANET routing protocols by
implementing in ns-3 two routing protocols (GPCR & VADD). Updated the
bibliography section with new research work
Recommended from our members
Multi-objective community detection applied to social and COVID-19 constructed networks
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University LondonCommunity Detection plays an integral part in network analysis, as it facilitates understanding the structures and functional characteristics of the network. Communities organize real-world networks into densely connected groups of nodes. This thesis provides a critical analysis of the Community Detection and highlights the main areas including algorithms, evaluation metrics, applications, and datasets in social networks.
After defining the research gap, this thesis proposes two Attribute-Based Label Propagation algorithms that maximizes both Modularity and homogeneity. Homogeneity is considered as an objective function one time, and as a constraint another time. To better capture the homogeneity of real-world networks, a new Penalized Homogeneity degree (PHd) is proposed, that can be easily personalized based on the network characteristics.
For the first time, COVID-19 tracing data are utilized to form two dataset networks: one is based on the virus transition between the world countries. While the second dataset is an attributed network based on the virus transition among the contact-tracing in the Kingdom of Bahrain. This type of networks that is concerned in tracking a disease was not formed based on COVID-19 virus and has never been studied as a community detection problem. The proposed datasets are validated and tested in several experiments. The proposed Penalized Homogeneity measure is personalized and used to evaluate the proposed attributed network.
Extensive experiments and analysis are carried out to evaluate the proposed methods and benchmark the results with other well-known algorithms. The results are compared in terms of Modularity, proposed PHd, and accuracy measures. The proposed methods have achieved maximum performance among other methods, with 26.6% better performance in Modularity, and 33.96% in PHd on the proposed dataset, as well as noteworthy results on benchmarking datasets with improvement in Modularity measures of 7.24%, and 4.96% respectively, and proposed PHd values 27% and 81.9%
A novel clustering methodology based on modularity optimisation for detecting authorship affinities in Shakespearean era plays
© 2016 Naeni et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. In this study we propose a novel, unsupervised clustering methodology for analyzing large datasets. This new, efficient methodology converts the general clustering problem into the community detection problem in graph by using the Jensen-Shannon distance, a dissimilarity measure originating in Information Theory. Moreover, we use graph theoretic concepts for the generation and analysis of proximity graphs. Our methodology is based on a newly proposed memetic algorithm (iMA-Net) for discovering clusters of data elements by maximizing the modularity function in proximity graphs of literary works. To test the effectiveness of this general methodology, we apply it to a text corpus dataset, which contains frequencies of approximately 55,114 unique words across all 168 written in the Shakespearean era (16th and 17th centuries), to analyze and detect clusters of similar plays. Experimental results and comparison with state-of-the-art clustering methods demonstrate the remarkable performance of our new method for identifying high quality clusters which reflect the commonalities in the literary style of the plays
- …