3 research outputs found
Incremental Discovery of Prominent Situational Facts
We study the novel problem of finding new, prominent situational facts, which
are emerging statements about objects that stand out within certain contexts.
Many such facts are newsworthy---e.g., an athlete's outstanding performance in
a game, or a viral video's impressive popularity. Effective and efficient
identification of these facts assists journalists in reporting, one of the main
goals of computational journalism. Technically, we consider an ever-growing
table of objects with dimension and measure attributes. A situational fact is a
"contextual" skyline tuple that stands out against historical tuples in a
context, specified by a conjunctive constraint involving dimension attributes,
when a set of measure attributes are compared. New tuples are constantly added
to the table, reflecting events happening in the real world. Our goal is to
discover constraint-measure pairs that qualify a new tuple as a contextual
skyline tuple, and discover them quickly before the event becomes yesterday's
news. A brute-force approach requires exhaustive comparison with every tuple,
under every constraint, and in every measure subspace. We design algorithms in
response to these challenges using three corresponding ideas---tuple reduction,
constraint pruning, and sharing computation across measure subspaces. We also
adopt a simple prominence measure to rank the discovered facts when they are
numerous. Experiments over two real datasets validate the effectiveness and
efficiency of our techniques
Interesting event detection through hall of fame rankings
Everything is relative. Cars are compared by gas per mile, websites by page rank, students based on GPA, scientists by number of publications, and celebrities by beauty or wealth. In this paper, we study the characteristics of such entity rankings based on a set of rankings obtained from a popular Web portal. The obtained insights are integrated in our approach, coined Pantheon. Pantheon maintains sets of top-k rankings and reports identified changes in a way that appeals to users, using a novel combination of different characteristics like competitiveness, information entropy, and scale of change. Entity rankings are assembled by combining entity type attributes with data-driven categorical constraints and sorting criteria on numeric attributes. We report on the results of an experimental evaluation using real-world data obtained from a basketball statistics website