146,846 research outputs found
Clones in Graphs
Finding structural similarities in graph data, like social networks, is a
far-ranging task in data mining and knowledge discovery. A (conceptually)
simple reduction would be to compute the automorphism group of a graph.
However, this approach is ineffective in data mining since real world data does
not exhibit enough structural regularity. Here we step in with a novel approach
based on mappings that preserve the maximal cliques. For this we exploit the
well known correspondence between bipartite graphs and the data structure
formal context from Formal Concept Analysis. From there we utilize
the notion of clone items. The investigation of these is still an open problem
to which we add new insights with this work. Furthermore, we produce a
substantial experimental investigation of real world data. We conclude with
demonstrating the generalization of clone items to permutations.Comment: 11 pages, 2 figures, 1 tabl
GCG: Mining Maximal Complete Graph Patterns from Large Spatial Data
Recent research on pattern discovery has progressed from mining frequent
patterns and sequences to mining structured patterns, such as trees and graphs.
Graphs as general data structure can model complex relations among data with
wide applications in web exploration and social networks. However, the process
of mining large graph patterns is a challenge due to the existence of large
number of subgraphs. In this paper, we aim to mine only frequent complete graph
patterns. A graph g in a database is complete if every pair of distinct
vertices is connected by a unique edge. Grid Complete Graph (GCG) is a mining
algorithm developed to explore interesting pruning techniques to extract
maximal complete graphs from large spatial dataset existing in Sloan Digital
Sky Survey (SDSS) data. Using a divide and conquer strategy, GCG shows high
efficiency especially in the presence of large number of patterns. In this
paper, we describe GCG that can mine not only simple co-location spatial
patterns but also complex ones. To the best of our knowledge, this is the first
algorithm used to exploit the extraction of maximal complete graphs in the
process of mining complex co-location patterns in large spatial dataset.Comment: 1
A Regularized Graph Layout Framework for Dynamic Network Visualization
Many real-world networks, including social and information networks, are
dynamic structures that evolve over time. Such dynamic networks are typically
visualized using a sequence of static graph layouts. In addition to providing a
visual representation of the network structure at each time step, the sequence
should preserve the mental map between layouts of consecutive time steps to
allow a human to interpret the temporal evolution of the network. In this
paper, we propose a framework for dynamic network visualization in the on-line
setting where only present and past graph snapshots are available to create the
present layout. The proposed framework creates regularized graph layouts by
augmenting the cost function of a static graph layout algorithm with a grouping
penalty, which discourages nodes from deviating too far from other nodes
belonging to the same group, and a temporal penalty, which discourages large
node movements between consecutive time steps. The penalties increase the
stability of the layout sequence, thus preserving the mental map. We introduce
two dynamic layout algorithms within the proposed framework, namely dynamic
multidimensional scaling (DMDS) and dynamic graph Laplacian layout (DGLL). We
apply these algorithms on several data sets to illustrate the importance of
both grouping and temporal regularization for producing interpretable
visualizations of dynamic networks.Comment: To appear in Data Mining and Knowledge Discovery, supporting material
(animations and MATLAB toolbox) available at
http://tbayes.eecs.umich.edu/xukevin/visualization_dmkd_201
Edge-based mining of frequent subgraphs from graph streams
In the current era of Big data, high volumes of valuable data can be generated at a high velocity from high-varieties of data sources in various real-life applications ranging from sensor networks to social networks, from bio-informatics to chemical informatics. In addition, Big data are also available in business, education, engineering, finance, healthcare, scientific, telecommunication, and transportation domains. A collection of these data can be viewed as a big dynamic graph structure. Embedded in them are implicit, previously unknown, and potentially useful knowledge. Consequently, efficient knowledge discovery algorithms for mining frequent subgraphs from these dynamic streaming graph structured data are in demand. On the one hand, some existing algorithms discover collections of frequently co-occurring edges, which may be disjoint. On the other hand, some other existing algorithms discover frequent subgraphs by requiring very large memory space. With high volumes of Big data, available memory space may be limited. To discover collections of frequently co-occurring connected edges, we present in this paper two efficient algorithms that require small memory space. Evaluation results show the efficiency of our edge-based algorithms in mining frequent subgraphs from graph streams
Graph search and beyond:SIGIR 2015 workshop summary
Modern Web data is highly structured in terms of entities and relations from large knowledge resources, geo-temporal references and social network structure, resulting in a massive multidimensional graph. This graph essentially unifies both the searcher and the information resources that played a fundamentally different role in traditional IR, and "Graph Search" offers major new ways to access relevant information. Graph search affects both query formulation (complex queries about entities and relations building on the searcher's context) as well as result exploration and discovery (slicing and dicing the information using the graph structure) in a completely personalized way. This new graph based approach introduces great opportunities, but also great challenges, in terms of data quality and data integration, user interface design, and privacy. We view the notion of "graph search" as searching information from your personal point of view (you are the query) over a highly structured and curated information space. This goes beyond the traditional two-term queries and ten blue links results that users are familiar with, requiring a highly interactive session covering both query formulation and result exploration. The workshop attracted a range of researchers working on this and related topics, and made concrete progress working together on one of the greatest challenges in the years to come
Recommended from our members
A Visual Query Language for Relational Knowledge Discovery
QGRAPH is a visual query language for knowledge discovery in relational data. Using QGRAPH, a user can query and update relational data in ways that support data exploration, data transformation, and sampling. When combined with modeling algorithms, such as those developed in inductive logic programming and relational learning, the language assists analysis of relational data, such as data drawn fromtheWeb, chemical structure-activity relationships, and social networks. Several features distinguish QGRAPH from other query languages such as SQL and Datalog. It is a visual language, so its queries are annotated graphs that reflect potential structures within a database. QGRAPH treats objects, links, and attributes as first-class entities, so its queries can dynamically alter a data schema by adding and deleting those entities. Finally, the language provides grouping and counting constructs that facilitate calculation of attributes that can capture features of local graph structure. We describe the language in detail, discuss key aspects of the underlying data model and implementation, and discuss several uses of QGRAPH for knowledge discovery
Estimating Properties of Social Networks via Random Walk considering Private Nodes
Accurately analyzing graph properties of social networks is a challenging
task because of access limitations to the graph data. To address this
challenge, several algorithms to obtain unbiased estimates of properties from
few samples via a random walk have been studied. However, existing algorithms
do not consider private nodes who hide their neighbors in real social networks,
leading to some practical problems. Here we design random walk-based algorithms
to accurately estimate properties without any problems caused by private nodes.
First, we design a random walk-based sampling algorithm that comprises the
neighbor selection to obtain samples having the Markov property and the
calculation of weights for each sample to correct the sampling bias. Further,
for two graph property estimators, we propose the weighting methods to reduce
not only the sampling bias but also estimation errors due to private nodes. The
proposed algorithms improve the estimation accuracy of the existing algorithms
by up to 92.6% on real-world datasets.Comment: 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
(KDD 2020
- …