36 research outputs found
Dense subgraph maintenance under streaming edge weight updates for real-time story identification
Recent years have witnessed an unprecedented proliferation of social media. People around the globe author, everyday, millions of blog posts, social network status updates, etc. This rich stream of information can be used to identify, on an ongoing basis, emerging stories, and events that capture popular attention. Stories can be identified via groups of tightly coupled real-world entities, namely the people, locations, products, etc, that are involved in the story. The sheer scale and rapid evolution of the data involved necessitate highly efficient techniques for identifying important stories at every point of time. The main challenge in real-time story identification is the maintenance of dense subgraphs (corresponding to groups of tightly coupled entities) under streaming edge weight updates (resulting from a stream of user-generated content). This is the first work to study the efficient maintenance of dense subgraphs under such streaming edge weight updates. For a wide range of definitions of density, we derive theoretical results regarding the magnitude of change that a single edge weight update can cause. Based on these, we propose a novel algorithm, DynDens, which outperforms adaptations of existing techniques to this setting and yields meaningful, intuitive results. Our approach is validated by a thorough experimental evaluation on large-scale real and synthetic datasets
Composite Events: A Fact-based Representation
For any given newsworthy event, thousands of articles, blog posts, micro-blog posts and social network
status updates are often published about it. This is an overload to a reader who wants to quickly grasp the
key aspects of an event. Addressing this requires a representation that characterizes events in a succinct
yet descriptive manner. This poster proposes a fact-based event representation. Methods from prior work
mostly generate events as clusters of related entities. No explicit semantic relations between the entities are
given. Thus the entity-based representation exhibits succinctness but lacks descriptiveness. Preliminary
experiments show the potential of a fact-based event representation.ye
EviDense: a Graph-based Method for Finding Unique High-impact Events with Succinct Keyword-based Descriptions
International audienceDespite the significant efforts made by the research community in recent years, automatically acquiring valuable information about high impact-events from social media remains challenging. We present EVIDENSE, a graph-based approach for finding high-impact events (such as disaster events) in social media. Our evaluation shows that our method outper-forms state-of-the-art approaches for the same problem, in terms of having higher precision, lower number of duplicates, while providing a keyword-based description that is succinct and informative
Robust Densest Subgraph Discovery
Dense subgraph discovery is an important primitive in graph mining, which has
a wide variety of applications in diverse domains. In the densest subgraph
problem, given an undirected graph with an edge-weight vector
, we aim to find that maximizes the density,
i.e., , where is the sum of the weights of the edges in the
subgraph induced by . Although the densest subgraph problem is one of the
most well-studied optimization problems for dense subgraph discovery, there is
an implicit strong assumption; it is assumed that the weights of all the edges
are known exactly as input. In real-world applications, there are often cases
where we have only uncertain information of the edge weights. In this study, we
provide a framework for dense subgraph discovery under the uncertainty of edge
weights. Specifically, we address such an uncertainty issue using the theory of
robust optimization. First, we formulate our fundamental problem, the robust
densest subgraph problem, and present a simple algorithm. We then formulate the
robust densest subgraph problem with sampling oracle that models dense subgraph
discovery using an edge-weight sampling oracle, and present an algorithm with a
strong theoretical performance guarantee. Computational experiments using both
synthetic graphs and popular real-world graphs demonstrate the effectiveness of
our proposed algorithms.Comment: 10 pages; Accepted to ICDM 201