Search CORE

3 research outputs found

문장단위의 이종 그래프에 기반한 뉴스 중복 제거

Author: 현일성
Publication venue: 서울대학교 대학원
Publication date: 01/02/2015
Field of study

학위논문 (석사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2015. 2. 이상구.With the flourishing development of the media of the network, dealing with the abusing news is becoming an essential requirement for portal news websites. However, previous research has only been attempting to improve the detecting efficiency or accuracy during finding near-duplicate news. Most of them rarely think about which news should be deleted or retained. Thus, we propose a heterogeneous graph-based news filtering framework using novel sentence level graph model for a new generation of duplicate news filtering, which is composed of two basic algorithms. First, extract and identify more duplicate news pairs by using sentence-level near-duplicate news detection algorithmand second, calculate an accurate representative score by using the graph-ranking based on representative news selection algorithm. The proposed framework has been tested using real world dataset and the experimental result show that the proposed algorithms can improve the accuracy of descriptive news selection effectively.Chapter 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . 3 1.3 Outline . . . . . . . . . . . . . . .. . . . . . . 4 Chapter 2 Related Work 5 2.1 Near-duplicate detection . . . . . . . . . . . . 5 2.2 Graph-based representative selection . . . . . . . 6 2.2.1 TextRank . . . . . . . . . . . . . . . . . . . . 6 2.2.2 CoRank . . . . . . . . . . . . . . . . . . . . . 7 2.2.3 FutureRank . . . . . . . . . . . . . . . . . . . 8 2.2.4 MutualRank . . . . . . . . . . . . . . . . . . . 10 2.2.5 Other Approach . . . . . . . . . . . . . . . . . 11 Chapter 3 Preliminaries 12 3.1 Problem Denition . . . . . . . . . . . . . . . . . 12 Chapter 4 Framework 15 4.1 Near-Duplicate Detection . . . . . . . . . . . . . 15 4.2 Representative Selection . . . . . . . . . . . . . 16 4.2.1 Graph Model . . . . . . . . . . . . . . . . . . 16 4.2.2 Algorithm . . . . . . . . . . . . . . . . . . . 19 Chapter 5 Experiment 23 5.1 Data Preparation . . . . . . . . . . . . . . . . . 23 5.2 Evaluation . . . . . . . . . . . . . . . . . . . . 24 5.2.1 Near-duplicate detection . . . . . . . . . . . . 24 5.2.2 Representative Selection . . . . . . . . . . . . 28 Chapter 6 Conclusion 32 Bibliography 33 요약 36 Acknowledgements 37Maste

SNU Open Repository and Archive