Search CORE

5 research outputs found

Cross Document Event Clustering Using Knowledge Mining from Co-Reference Chains

Author: Le Guern Anne Laure
Themines Jean-François
Publication venue
Publication date: 07/09/2010
Field of study

International audienc

HAL - Normandie Université

HAL Descartes

National Taiwan University Repository

Hal-Diderot

Cross Document Event Clustering Using Knowledge Mining from Co-Reference Chains

Author: June-Jei Kuo and Hsin-Hsi Chen
Publication venue
Publication date: 20/01/2011
Field of study

National Taiwan University Repository

Cross Document Event Clustering Using Knowledge Mining from Co-Reference Chains

Author: Kuo June Jei
Publication venue
Publication date: 07/09/2010
Field of study

National Taiwan University Repository

Identifying chronological and coherent information threads using 5W1H questions and temporal relationships

Author: McDonald Graham
Narvala Hitarth
Ounis Iadh
Publication venue: 'Elsevier BV'
Publication date: 01/05/2023
Field of study

Due to the massive volume of articles produced online every day, it is challenging for online platforms (e.g., news agencies) to present the information about an event, activity or discussion to their users in an easily digestible format. Therefore, there is a need for automatic methods to extract related and time-ordered information about events (i.e., information threads) from large unstructured collections of documents. In this work, we propose a novel unsupervised hierarchical agglomerative clustering (HAC) based information threading approach to generate chronological and coherent threads of information in a collection. Unlike, the well-known tasks of topic detection and tracking or event threading that focus on grouping information by important keywords and/or entities, our proposed approach identifies threads based on temporal relations and diverse information about an event, i.e., who did what, why, where, when and how (aka the 5W1H questions). In particular, our proposed approach, deploys a tailored similarity function for HAC by leveraging extracted answers to 5W1H questions along with time decay between documents. We evaluate our proposed HAC 5W1H information threading approach on two large expert-annotated collections of news articles, i.e., NewSHead and Multi-News (over 112k and 32k articles, respectively). Our experiments show that HAC 5W1H markedly improves the number of, and quality of, threads that are generated compared to existing state-of-the-art approaches from the literature, e.g., 100.98% more threads and +213.39% improvement in Normalised Mutual Information compared to the best evaluated baseline on the larger NewSHead collection. We also conducted a user study that shows that our proposed HAC 5W1H information threading approach is significantly (p < 0.05) preferred by users in terms of coherence, diversity and chronological correctness compared to the existing state-of-the-art approaches

Enlighten

Cross Document Event Clustering Using Knowledge Mining from Co-Reference Chains

Author: Hsin-hsi Chen
June-jei Kuo
Publication venue
Publication date
Field of study

Abstract. Unification of the terminology usages which captures more term semantics is useful for event clustering. This paper proposes a metric of normalized chain edit distance to mine controlled vocabulary from cross-document coreference chains incrementally. A novel threshold model that incorporates time decay function and spanning window utilizes the controlled vocabulary for event clustering on streaming news. The experimental results show that the proposed system has 16 % performance increase compared to the baseline system and 6 % performance increase compared to the system without introducing controlled vocabulary.

CiteSeerX