80,537 research outputs found
On Graph Stream Clustering with Side Information
Graph clustering becomes an important problem due to emerging applications
involving the web, social networks and bio-informatics. Recently, many such
applications generate data in the form of streams. Clustering massive, dynamic
graph streams is significantly challenging because of the complex structures of
graphs and computational difficulties of continuous data. Meanwhile, a large
volume of side information is associated with graphs, which can be of various
types. The examples include the properties of users in social network
activities, the meta attributes associated with web click graph streams and the
location information in mobile communication networks. Such attributes contain
extremely useful information and has the potential to improve the clustering
process, but are neglected by most recent graph stream mining techniques. In
this paper, we define a unified distance measure on both link structures and
side attributes for clustering. In addition, we propose a novel optimization
framework DMO, which can dynamically optimize the distance metric and make it
adapt to the newly received stream data. We further introduce a carefully
designed statistics SGS(C) which consume constant storage spaces with the
progression of streams. We demonstrate that the statistics maintained are
sufficient for the clustering process as well as the distance optimization and
can be scalable to massive graphs with side attributes. We will present
experiment results to show the advantages of the approach in graph stream
clustering with both links and side information over the baselines.Comment: Full version of SIAM SDM 2013 pape
Catching the head, tail, and everything in between: a streaming algorithm for the degree distribution
The degree distribution is one of the most fundamental graph properties of
interest for real-world graphs. It has been widely observed in numerous domains
that graphs typically have a tailed or scale-free degree distribution. While
the average degree is usually quite small, the variance is quite high and there
are vertices with degrees at all scales. We focus on the problem of
approximating the degree distribution of a large streaming graph, with small
storage. We design an algorithm headtail, whose main novelty is a new estimator
of infrequent degrees using truncated geometric random variables. We give a
mathematical analysis of headtail and show that it has excellent behavior in
practice. We can process streams will millions of edges with storage less than
1% and get extremely accurate approximations for all scales in the degree
distribution.
We also introduce a new notion of Relative Hausdorff distance between tailed
histograms. Existing notions of distances between distributions are not
suitable, since they ignore infrequent degrees in the tail. The Relative
Hausdorff distance measures deviations at all scales, and is a more suitable
distance for comparing degree distributions. By tracking this new measure, we
are able to give strong empirical evidence of the convergence of headtail
A Simple Attack on Some Clock-Controlled Generators
We present a new approach to edit distance attacks on certain
clock-controlled generators, which applies basic concepts of Graph Theory to
simplify the search trees of the original attacks in such a way that only the
most promising branches are analyzed. In particular, the proposed improvement
is based on cut sets defined on some graphs so that certain shortest paths
provide the edit distances. The strongest aspects of the proposal are that the
obtained results from the attack are absolutely deterministic, and that many
inconsistent initial states of the target registers are recognized beforehand
and avoided during search
THE POLITICAL ROBUSTNESS IN INDONESIA
The result of Indonesian legislative election 2004 is analyzed with certain comparative with the previous one (1999). This analysis is constructed by using the graph theoretical analysis by finding the Euclidean distances among political parties. The distances are then treated in ultrametric spaces by using the minimum spanning tree algorithm. By having the Indonesian hierarchical taxonomy model of political parties we show some patterns emerging the pattern agrees with the classical anthropological analysis of socio-political system in Indonesia. This fact accentuates a character of robustness in Indonesian political society as a self-organized system evolves to critical state. Some small perturbations i.e.: different voting process resulting the same pattern and occasions statistically, emerges from the social structure based upon political streams: Islamic, secular, traditional, and some complements of all
Anomaly and Change Detection in Graph Streams through Constant-Curvature Manifold Embeddings
Mapping complex input data into suitable lower dimensional manifolds is a
common procedure in machine learning. This step is beneficial mainly for two
reasons: (1) it reduces the data dimensionality and (2) it provides a new data
representation possibly characterised by convenient geometric properties.
Euclidean spaces are by far the most widely used embedding spaces, thanks to
their well-understood structure and large availability of consolidated
inference methods. However, recent research demonstrated that many types of
complex data (e.g., those represented as graphs) are actually better described
by non-Euclidean geometries. Here, we investigate how embedding graphs on
constant-curvature manifolds (hyper-spherical and hyperbolic manifolds) impacts
on the ability to detect changes in sequences of attributed graphs. The
proposed methodology consists in embedding graphs into a geometric space and
perform change detection there by means of conventional methods for numerical
streams. The curvature of the space is a parameter that we learn to reproduce
the geometry of the original application-dependent graph space. Preliminary
experimental results show the potential capability of representing graphs by
means of curved manifold, in particular for change and anomaly detection
problems.Comment: To be published in IEEE IJCNN 201
Finding Streams in Knowledge Graphs to Support Fact Checking
The volume and velocity of information that gets generated online limits
current journalistic practices to fact-check claims at the same rate.
Computational approaches for fact checking may be the key to help mitigate the
risks of massive misinformation spread. Such approaches can be designed to not
only be scalable and effective at assessing veracity of dubious claims, but
also to boost a human fact checker's productivity by surfacing relevant facts
and patterns to aid their analysis. To this end, we present a novel,
unsupervised network-flow based approach to determine the truthfulness of a
statement of fact expressed in the form of a (subject, predicate, object)
triple. We view a knowledge graph of background information about real-world
entities as a flow network, and knowledge as a fluid, abstract commodity. We
show that computational fact checking of such a triple then amounts to finding
a "knowledge stream" that emanates from the subject node and flows toward the
object node through paths connecting them. Evaluation on a range of real-world
and hand-crafted datasets of facts related to entertainment, business, sports,
geography and more reveals that this network-flow model can be very effective
in discerning true statements from false ones, outperforming existing
algorithms on many test cases. Moreover, the model is expressive in its ability
to automatically discover several useful path patterns and surface relevant
facts that may help a human fact checker corroborate or refute a claim.Comment: Extended version of the paper in proceedings of ICDM 201
- …