9 research outputs found

    Community Detection on Evolving Graphs

    Get PDF
    Clustering is a fundamental step in many information-retrieval and data-mining applications. Detecting clusters in graphs is also a key tool for finding the community structure in social and behavioral networks. In many of these applications, the input graph evolves over time in a continual and decentralized manner, and, to maintain a good clustering, the clustering algorithm needs to repeatedly probe the graph. Furthermore, there are often limitations on the frequency of such probes, either imposed explicitly by the online platform (e.g., in the case of crawling proprietary social networks like twitter) or implicitly because of resource limitations (e.g., in the case of crawling the web). In this paper, we study a model of clustering on evolving graphs that captures this aspect of the problem. Our model is based on the classical stochastic block model, which has been used to assess rigorously the quality of various static clustering methods. In our model, the algorithm is supposed to reconstruct the planted clustering, given the ability to query for small pieces of local information about the graph, at a limited rate. We design and analyze clustering algorithms that work in this model, and show asymptotically tight upper and lower bounds on their accuracy. Finally, we perform simulations, which demonstrate that our main asymptotic results hold true also in practice

    Random Subgraph Detection Using Queries

    Full text link
    The planted densest subgraph detection problem refers to the task of testing whether in a given (random) graph there is a subgraph that is unusually dense. Specifically, we observe an undirected and unweighted graph on nn nodes. Under the null hypothesis, the graph is a realization of an Erd\H{o}s-R\'{e}nyi graph with edge probability (or, density) qq. Under the alternative, there is a subgraph on kk vertices with edge probability p>qp>q. The statistical as well as the computational barriers of this problem are well-understood for a wide range of the edge parameters pp and qq. In this paper, we consider a natural variant of the above problem, where one can only observe a small part of the graph using adaptive edge queries. For this model, we determine the number of queries necessary and sufficient for detecting the presence of the planted subgraph. Specifically, we show that any (possibly randomized) algorithm must make Q=Ω(n2k2χ4(pq)log2n)\mathsf{Q} = \Omega(\frac{n^2}{k^2\chi^4(p||q)}\log^2n) adaptive queries (on expectation) to the adjacency matrix of the graph to detect the planted subgraph with probability more than 1/21/2, where χ2(pq)\chi^2(p||q) is the Chi-Square distance. On the other hand, we devise a quasi-polynomial-time algorithm that detects the planted subgraph with high probability by making Q=O(n2k2χ4(pq)log2n)\mathsf{Q} = O(\frac{n^2}{k^2\chi^4(p||q)}\log^2n) non-adaptive queries. We then propose a polynomial-time algorithm which is able to detect the planted subgraph using Q=O(n3k3χ2(pq)log3n)\mathsf{Q} = O(\frac{n^3}{k^3\chi^2(p||q)}\log^3 n) queries. We conjecture that in the leftover regime, where n2k2Qn3k3\frac{n^2}{k^2}\ll\mathsf{Q}\ll \frac{n^3}{k^3}, no polynomial-time algorithms exist. Our results resolve two questions posed in \cite{racz2020finding}, where the special case of adaptive detection and recovery of a planted clique was considered.Comment: 29 page

    Graph Learning and Its Applications: A Holistic Survey

    Full text link
    Graph learning is a prevalent domain that endeavors to learn the intricate relationships among nodes and the topological structure of graphs. These relationships endow graphs with uniqueness compared to conventional tabular data, as nodes rely on non-Euclidean space and encompass rich information to exploit. Over the years, graph learning has transcended from graph theory to graph data mining. With the advent of representation learning, it has attained remarkable performance in diverse scenarios, including text, image, chemistry, and biology. Owing to its extensive application prospects, graph learning attracts copious attention from the academic community. Despite numerous works proposed to tackle different problems in graph learning, there is a demand to survey previous valuable works. While some researchers have perceived this phenomenon and accomplished impressive surveys on graph learning, they failed to connect related objectives, methods, and applications in a more coherent way. As a result, they did not encompass current ample scenarios and challenging problems due to the rapid expansion of graph learning. Different from previous surveys on graph learning, we provide a holistic review that analyzes current works from the perspective of graph structure, and discusses the latest applications, trends, and challenges in graph learning. Specifically, we commence by proposing a taxonomy from the perspective of the composition of graph data and then summarize the methods employed in graph learning. We then provide a detailed elucidation of mainstream applications. Finally, based on the current trend of techniques, we propose future directions.Comment: 20 pages, 7 figures, 3 table

    Análisis de redes sociales y visualización yuxtapuesta de las dinámicas de opinión en Twitter a la llegada del Papa Francisco a Colombia / Analysis of social networks and juxtaposed view of the dynamics of opinion on Twitter at the arrival of Pope Francisco in Colombia

    Get PDF
    El presente estudio propone un modelo de análisis para caracterizar el movimiento observable de los agentes en las visualizaciones dinámicas de la red social Twitter desde la perspectiva física del magnetismo. Entendiendo que los modelos de dinámicas de opinión propuestos por la sociofísica recurren fundamentalmente al Modelo Ising ferromagnético, y que los algoritmos de visualización dinámica del tipo Force-Directed que traducen las redes a movimiento se basan también en principios magnéticos, se construyó un modelo de análisis bajo los mismos principios magnéticos. El modelo tiene tres descriptores magnéticos:  sentido, dirección y velocidad, a los que se les suman las variables de campo magnético, temperatura y entropía. Este modelo se puso a prueba junto a la metodología clásica imágenes yuxtapuestas y análisis de redes sociales, para analizar el comportamiento de agentes comunicativos publicitarios en la conversación generada en Twitter a la llegada del Papa a Colombia. Los resultados fueron contrastados y cotejados para resaltar su utilidad en el campo publicitario.

    A Scalable Clustering Algorithm for High-dimensional Data Streams over Sliding Windows

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 공과대학 전기·컴퓨터공학부, 2017. 8. 이상구.Data stream clustering over sliding windows generates clustering results whenever a window moves. However, iterative clustering using all data in a window is highly inefficient in terms of memory and computation time. In this thesis, we address problem of data stream clustering over sliding windows using sliding window aggregation and nearest neighbor search techniques. Our algorithm constructs and maintains temporal group features as a summary of the window using the sliding window aggregation technique. The technique divides a window into disjoint chunks, computes partial aggregates over each chunk, and merges the partial aggregates to compute overall aggregates. To maintain constant size of the summary, the algorithm reduces the size of summary by joining the nearest neighbor. We exploit Locality-Sensitive Hashing for fast nearest neighbor search. We show that Locality-Sensitive Hashing can serve as an effective method for reducing synopses while minimizing the impact on quality. In addition, we also suggest re-clustering policy, which decides whether to append new summary to pre-existing clusters or to perform clustering on whole summary. Our experiments on real-world and synthetic datasets demonstrate that our algorithm can achieve a significant improvement when performing continuous clustering on data streams with sliding windows.1 Introduction 1 2. Preliminaries and Related Work 7 2.1 Data Streams 7 2.2 Window Models 7 2.3 kMeans Clustering 11 2.4 Coreset 12 2.5 Group Features 14 2.6 Related Work 16 2.7 Problem Statement 31 3. GFCS: Group Featurebased Data Stream Clustering with Sliding Windows 35 3.1 2-Level Coresets Construction 35 3.2 2-Level Coresets Maintenance 38 3.3 Clustering on 2-Level Coresets 40 4. CSCS: Coresetbased Data Stream Clustering with Sliding Windows 46 4.1 Coreset Construction based on Nearest Neighbor Search 47 4.2 Coreset Construction based on LocalitySensitive Hashing 60 4.3 Reclustering Policy 66 5. Empirical Evaluation of Data Stream Clustering with Sliding Windows 69 5.1 Experimental Setup 69 5.2 Experimental Results 71 6. Application: Documents Clustering 78 6.1 Vector Representation of Documents 78 6.2 Extension to Other Clustering Algorithms 83 6.3 Evaluation 88 7. Conclusion 95 A. Appendix 109 A.1 Experimental Results of GFCS and CSCS 109 A.2 Experimental Results of Document Clustering 117Docto

    Community Detection on Evolving Graphs

    No full text
    Abstract Clustering is a fundamental step in many information-retrieval and data-mining applications. Detecting clusters in graphs is also a key tool for finding the community structure in social and behavioral networks. In many of these applications, the input graph evolves over time in a continual and decentralized manner, and, to maintain a good clustering, the clustering algorithm needs to repeatedly probe the graph. Furthermore, there are often limitations on the frequency of such probes, either imposed explicitly by the online platform (e.g., in the case of crawling proprietary social networks like twitter) or implicitly because of resource limitations (e.g., in the case of crawling the web). In this paper, we study a model of clustering on evolving graphs that captures this aspect of the problem. Our model is based on the classical stochastic block model, which has been used to assess rigorously the quality of various static clustering methods. In our model, the algorithm is supposed to reconstruct the planted clustering, given the ability to query for small pieces of local information about the graph, at a limited rate. We design and analyze clustering algorithms that work in this model, and show asymptotically tight upper and lower bounds on their accuracy. Finally, we perform simulations, which demonstrate that our main asymptotic results hold true also in practice
    corecore