36 research outputs found

    Dense subgraph maintenance under streaming edge weight updates for real-time story identification

    Get PDF
    Recent years have witnessed an unprecedented proliferation of social media. People around the globe author, everyday, millions of blog posts, social network status updates, etc. This rich stream of information can be used to identify, on an ongoing basis, emerging stories, and events that capture popular attention. Stories can be identified via groups of tightly coupled real-world entities, namely the people, locations, products, etc, that are involved in the story. The sheer scale and rapid evolution of the data involved necessitate highly efficient techniques for identifying important stories at every point of time. The main challenge in real-time story identification is the maintenance of dense subgraphs (corresponding to groups of tightly coupled entities) under streaming edge weight updates (resulting from a stream of user-generated content). This is the first work to study the efficient maintenance of dense subgraphs under such streaming edge weight updates. For a wide range of definitions of density, we derive theoretical results regarding the magnitude of change that a single edge weight update can cause. Based on these, we propose a novel algorithm, DynDens, which outperforms adaptations of existing techniques to this setting and yields meaningful, intuitive results. Our approach is validated by a thorough experimental evaluation on large-scale real and synthetic datasets

    Composite Events: A Fact-based Representation

    Get PDF
    For any given newsworthy event, thousands of articles, blog posts, micro-blog posts and social network status updates are often published about it. This is an overload to a reader who wants to quickly grasp the key aspects of an event. Addressing this requires a representation that characterizes events in a succinct yet descriptive manner. This poster proposes a fact-based event representation. Methods from prior work mostly generate events as clusters of related entities. No explicit semantic relations between the entities are given. Thus the entity-based representation exhibits succinctness but lacks descriptiveness. Preliminary experiments show the potential of a fact-based event representation.ye

    EviDense: a Graph-based Method for Finding Unique High-impact Events with Succinct Keyword-based Descriptions

    Get PDF
    International audienceDespite the significant efforts made by the research community in recent years, automatically acquiring valuable information about high impact-events from social media remains challenging. We present EVIDENSE, a graph-based approach for finding high-impact events (such as disaster events) in social media. Our evaluation shows that our method outper-forms state-of-the-art approaches for the same problem, in terms of having higher precision, lower number of duplicates, while providing a keyword-based description that is succinct and informative

    Robust Densest Subgraph Discovery

    Full text link
    Dense subgraph discovery is an important primitive in graph mining, which has a wide variety of applications in diverse domains. In the densest subgraph problem, given an undirected graph G=(V,E)G=(V,E) with an edge-weight vector w=(we)e∈Ew=(w_e)_{e\in E}, we aim to find S⊆VS\subseteq V that maximizes the density, i.e., w(S)/∣S∣w(S)/|S|, where w(S)w(S) is the sum of the weights of the edges in the subgraph induced by SS. Although the densest subgraph problem is one of the most well-studied optimization problems for dense subgraph discovery, there is an implicit strong assumption; it is assumed that the weights of all the edges are known exactly as input. In real-world applications, there are often cases where we have only uncertain information of the edge weights. In this study, we provide a framework for dense subgraph discovery under the uncertainty of edge weights. Specifically, we address such an uncertainty issue using the theory of robust optimization. First, we formulate our fundamental problem, the robust densest subgraph problem, and present a simple algorithm. We then formulate the robust densest subgraph problem with sampling oracle that models dense subgraph discovery using an edge-weight sampling oracle, and present an algorithm with a strong theoretical performance guarantee. Computational experiments using both synthetic graphs and popular real-world graphs demonstrate the effectiveness of our proposed algorithms.Comment: 10 pages; Accepted to ICDM 201
    corecore