803 research outputs found
Spectral Detection on Sparse Hypergraphs
We consider the problem of the assignment of nodes into communities from a
set of hyperedges, where every hyperedge is a noisy observation of the
community assignment of the adjacent nodes. We focus in particular on the
sparse regime where the number of edges is of the same order as the number of
vertices. We propose a spectral method based on a generalization of the
non-backtracking Hashimoto matrix into hypergraphs. We analyze its performance
on a planted generative model and compare it with other spectral methods and
with Bayesian belief propagation (which was conjectured to be asymptotically
optimal for this model). We conclude that the proposed spectral method detects
communities whenever belief propagation does, while having the important
advantages to be simpler, entirely nonparametric, and to be able to learn the
rule according to which the hyperedges were generated without prior
information.Comment: 8 pages, 5 figure
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
Concept Based Document Clustering using a Simplicial Complex, a Hypergraph
This thesis evaluates the effectiveness of using a combinatorial topology structure (a simplicial complex) for document clustering. It is believed that a simplicial complex better identifies the latent concept space defined by a collection of documents than the use of hypergraphs or human categorization. The complex is constructed using groups of co-occurring words (term associations) identified using traditional data mining methods. Disjoint subsections of the complex (connect components) represent general concepts within the documents’ concept space. Documents clustered to these connect components will produce meaningful groupings. Instead, the most specific concepts (maximal simplices) are used as representative connect components to demonstrate this technique’s effectiveness. Each document in a cluster is compared against its human assigned category to determine the cluster’s precision. It is shown that this technique is better able to cluster documents than human classifiers
Risk-Averse Matchings over Uncertain Graph Databases
A large number of applications such as querying sensor networks, and
analyzing protein-protein interaction (PPI) networks, rely on mining uncertain
graph and hypergraph databases. In this work we study the following problem:
given an uncertain, weighted (hyper)graph, how can we efficiently find a
(hyper)matching with high expected reward, and low risk?
This problem naturally arises in the context of several important
applications, such as online dating, kidney exchanges, and team formation. We
introduce a novel formulation for finding matchings with maximum expected
reward and bounded risk under a general model of uncertain weighted
(hyper)graphs that we introduce in this work. Our model generalizes
probabilistic models used in prior work, and captures both continuous and
discrete probability distributions, thus allowing to handle privacy related
applications that inject appropriately distributed noise to (hyper)edge
weights. Given that our optimization problem is NP-hard, we turn our attention
to designing efficient approximation algorithms. For the case of uncertain
weighted graphs, we provide a -approximation algorithm, and a
-approximation algorithm with near optimal run time. For the case
of uncertain weighted hypergraphs, we provide a
-approximation algorithm, where is the rank of the
hypergraph (i.e., any hyperedge includes at most nodes), that runs in
almost (modulo log factors) linear time.
We complement our theoretical results by testing our approximation algorithms
on a wide variety of synthetic experiments, where we observe in a controlled
setting interesting findings on the trade-off between reward, and risk. We also
provide an application of our formulation for providing recommendations of
teams that are likely to collaborate, and have high impact.Comment: 25 page
- …