Search CORE

80,537 research outputs found

On Graph Stream Clustering with Side Information

Author: Yu Philip S.
Zhao Yuchen
Publication venue
Publication date: 28/01/2013
Field of study

Graph clustering becomes an important problem due to emerging applications involving the web, social networks and bio-informatics. Recently, many such applications generate data in the form of streams. Clustering massive, dynamic graph streams is significantly challenging because of the complex structures of graphs and computational difficulties of continuous data. Meanwhile, a large volume of side information is associated with graphs, which can be of various types. The examples include the properties of users in social network activities, the meta attributes associated with web click graph streams and the location information in mobile communication networks. Such attributes contain extremely useful information and has the potential to improve the clustering process, but are neglected by most recent graph stream mining techniques. In this paper, we define a unified distance measure on both link structures and side attributes for clustering. In addition, we propose a novel optimization framework DMO, which can dynamically optimize the distance metric and make it adapt to the newly received stream data. We further introduce a carefully designed statistics SGS(C) which consume constant storage spaces with the progression of streams. We demonstrate that the statistics maintained are sufficient for the clustering process as well as the distance optimization and can be scalable to massive graphs with side attributes. We will present experiment results to show the advantages of the approach in graph stream clustering with both links and side information over the baselines.Comment: Full version of SIAM SDM 2013 pape

arXiv.org e-Print Archive

Crossref

Catching the head, tail, and everything in between: a streaming algorithm for the degree distribution

Author: McGregor Andrew
Seshadhri C.
Simpson Olivia
Publication venue
Publication date: 25/11/2015
Field of study

The degree distribution is one of the most fundamental graph properties of interest for real-world graphs. It has been widely observed in numerous domains that graphs typically have a tailed or scale-free degree distribution. While the average degree is usually quite small, the variance is quite high and there are vertices with degrees at all scales. We focus on the problem of approximating the degree distribution of a large streaming graph, with small storage. We design an algorithm headtail, whose main novelty is a new estimator of infrequent degrees using truncated geometric random variables. We give a mathematical analysis of headtail and show that it has excellent behavior in practice. We can process streams will millions of edges with storage less than 1% and get extremely accurate approximations for all scales in the degree distribution. We also introduce a new notion of Relative Hausdorff distance between tailed histograms. Existing notions of distances between distributions are not suitable, since they ignore infrequent degrees in the tail. The Relative Hausdorff distance measures deviations at all scales, and is a more suitable distance for comparing degree distributions. By tracking this new measure, we are able to give strong empirical evidence of the convergence of headtail

arXiv.org e-Print Archive

Crossref

A Simple Attack on Some Clock-Controlled Generators

Author: A. Fúster-Sabater
Caballero-Gil
Coppersmith
Golic
Golic
Golic
Golic
Golic
Gollmann
Gomulkiewicz
Gusfield
Günther
Jabri
Johansson
Kanso
P. Caballero-Gil
Petrovic
Pighizzini
Simpsom
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/07/2009
Field of study

We present a new approach to edit distance attacks on certain clock-controlled generators, which applies basic concepts of Graph Theory to simplify the search trees of the original attacks in such a way that only the most promising branches are analyzed. In particular, the proposed improvement is based on cut sets defined on some graphs so that certain shortest paths provide the edit distances. The strongest aspects of the proposal are that the obtained results from the attack are absolutely deterministic, and that many inconsistent initial states of the target registers are recognized beforehand and avoided during search

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

Digital.CSIC

THE POLITICAL ROBUSTNESS IN INDONESIA

Author: Situngkir Mr. Hokky
Publication venue
Publication date: 01/04/2004
Field of study

The result of Indonesian legislative election 2004 is analyzed with certain comparative with the previous one (1999). This analysis is constructed by using the graph theoretical analysis by finding the Euclidean distances among political parties. The distances are then treated in ultrametric spaces by using the minimum spanning tree algorithm. By having the Indonesian hierarchical taxonomy model of political parties we show some patterns emerging the pattern agrees with the classical anthropological analysis of socio-political system in Indonesia. This fact accentuates a character of robustness in Indonesian political society as a self-organized system evolves to critical state. Some small perturbations i.e.: different voting process resulting the same pattern and occasions statistically, emerges from the social structure based upon political streams: Islamic, secular, traditional, and some complements of all

CogPrints Cognitive Sciences Eprint Archive

Anomaly and Change Detection in Graph Streams through Constant-Curvature Manifold Embeddings

Author: Alippi Cesare
Livi Lorenzo
Zambon Daniele
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Mapping complex input data into suitable lower dimensional manifolds is a common procedure in machine learning. This step is beneficial mainly for two reasons: (1) it reduces the data dimensionality and (2) it provides a new data representation possibly characterised by convenient geometric properties. Euclidean spaces are by far the most widely used embedding spaces, thanks to their well-understood structure and large availability of consolidated inference methods. However, recent research demonstrated that many types of complex data (e.g., those represented as graphs) are actually better described by non-Euclidean geometries. Here, we investigate how embedding graphs on constant-curvature manifolds (hyper-spherical and hyperbolic manifolds) impacts on the ability to detect changes in sequences of attributed graphs. The proposed methodology consists in embedding graphs into a geometric space and perform change detection there by means of conventional methods for numerical streams. The curvature of the space is a parameter that we learn to reproduce the geometry of the original application-dependent graph space. Preliminary experimental results show the potential capability of representing graphs by means of curved manifold, in particular for change and anomaly detection problems.Comment: To be published in IEEE IJCNN 201

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Politecnico di Milano

Crossref

Finding Streams in Knowledge Graphs to Support Fact Checking

Author: Ciampaglia Giovanni Luca
Flammini Alessandro
Menczer Filippo
Shiralkar Prashant
Publication venue
Publication date: 23/08/2017
Field of study

The volume and velocity of information that gets generated online limits current journalistic practices to fact-check claims at the same rate. Computational approaches for fact checking may be the key to help mitigate the risks of massive misinformation spread. Such approaches can be designed to not only be scalable and effective at assessing veracity of dubious claims, but also to boost a human fact checker's productivity by surfacing relevant facts and patterns to aid their analysis. To this end, we present a novel, unsupervised network-flow based approach to determine the truthfulness of a statement of fact expressed in the form of a (subject, predicate, object) triple. We view a knowledge graph of background information about real-world entities as a flow network, and knowledge as a fluid, abstract commodity. We show that computational fact checking of such a triple then amounts to finding a "knowledge stream" that emanates from the subject node and flows toward the object node through paths connecting them. Evaluation on a range of real-world and hand-crafted datasets of facts related to entertainment, business, sports, geography and more reveals that this network-flow model can be very effective in discerning true statements from false ones, outperforming existing algorithms on many test cases. Moreover, the model is expressive in its ability to automatically discover several useful path patterns and surface relevant facts that may help a human fact checker corroborate or refute a claim.Comment: Extended version of the paper in proceedings of ICDM 201

arXiv.org e-Print Archive

Crossref