5,300 research outputs found
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
Efficiently Clustering Very Large Attributed Graphs
Attributed graphs model real networks by enriching their nodes with
attributes accounting for properties. Several techniques have been proposed for
partitioning these graphs into clusters that are homogeneous with respect to
both semantic attributes and to the structure of the graph. However, time and
space complexities of state of the art algorithms limit their scalability to
medium-sized graphs. We propose SToC (for Semantic-Topological Clustering), a
fast and scalable algorithm for partitioning large attributed graphs. The
approach is robust, being compatible both with categorical and with
quantitative attributes, and it is tailorable, allowing the user to weight the
semantic and topological components. Further, the approach does not require the
user to guess in advance the number of clusters. SToC relies on well known
approximation techniques such as bottom-k sketches, traditional graph-theoretic
concepts, and a new perspective on the composition of heterogeneous distance
measures. Experimental results demonstrate its ability to efficiently compute
high-quality partitions of large scale attributed graphs.Comment: This work has been published in ASONAM 2017. This version includes an
appendix with validation of our attribute model and distance function,
omitted in the converence version for lack of space. Please refer to the
published versio
Community detection of political blogs network based on structure-attribute graph clustering model
Complex networks provide means to represent different kinds of networks with multiple features. Most biological, sensor and social networks can be represented as a graph depending on the pattern of connections among their elements. The goal of the graph clustering is to divide a large graph into many clusters based on various similarity criteria’s. Political blogs as standard social dataset network, in which it can be considered as blog-blog connection, where each node has political learning beside other attributes. The main objective of work is to introduce a graph clustering method in social network analysis. The proposed Structure-Attribute Similarity (SAS-Cluster) able to detect structures of community, based on nodes similarities. The method combines topological structure with multiple characteristics of nodes, to earn the ultimate similarity. The proposed method is evaluated using well-known evaluation measures, Density, and Entropy. Finally, the presented method was compared with the state-of-art comparative method, and the results show that the proposed method is superior to the comparative method according to the evaluations measures
Algorithms to measure diversity and clustering in social networks through dot product graphs.
Social networks are often analyzed through a graph model of the network. The dot product model assumes that two individuals are connected in the social network if their attributes or opinions are similar. In the model, a d-dimensional vector a v represents the extent to which individual v has each of a set of d attributes or opinions. Then two individuals u and v are assumed to be friends, that is, they are connected in the graph model, if and only if a u · a v  ≥ t, for some fixed, positive threshold t. The resulting graph is called a d-dot product graph..
We consider two measures for diversity and clustering in social networks by using a d-dot product graph model for the network. Diversity is measured through the size of the largest independent set of the graph, and clustering is measured through the size of the largest clique. We obtain a tight result for the diversity problem, namely that it is polynomial-time solvable for d = 2, but NP-complete for d ≥ 3. We show that the clustering problem is polynomial-time solvable for d = 2. To our knowledge, these results are also the first on the computational complexity of combinatorial optimization problems on dot product graphs.
We also consider the situation when two individuals are connected if their preferences are not opposite. This leads to a variant of the standard dot product graph model by taking the threshold t to be zero. We prove in this case that the diversity problem is polynomial-time solvable for any fixed d
A survey of outlier detection methodologies
Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review
- …