Search CORE

20 research outputs found

A Continuously Growing Dataset of Sentential Paraphrases

Author: He Hua
Lan Wuwei
Qiu Siyu
Xu Wei
Publication venue
Publication date: 01/01/2017
Field of study

A major challenge in paraphrase research is the lack of parallel corpora. In this paper, we present a new method to collect large-scale sentential paraphrases from Twitter by linking tweets through shared URLs. The main advantage of our method is its simplicity, as it gets rid of the classifier or human in the loop needed to select data before annotation and subsequent application of paraphrase identification algorithms in the previous work. We present the largest human-labeled paraphrase corpus to date of 51,524 sentence pairs and the first cross-domain benchmarking for automatic paraphrase identification. In addition, we show that more than 30,000 new sentential paraphrases can be easily and continuously captured every month at ~70% precision, and demonstrate their utility for downstream NLP tasks through phrasal paraphrase extraction. We make our code and data freely available.Comment: 11 pages, accepted to EMNLP 201

arXiv.org e-Print Archive

Crossref

Sentential Paraphrase Generation for Agglutinative Languages Using SVM with a String Kernel

Author: Choi Ho-Jin
Gweon Gahgene
Heo Jeong
Park Hancheol
Ryu Pum-Mo
Publication venue: Department of Linguistics, Faculty of Arts, Chulalongkorn University
Publication date: 01/01/2014
Field of study

Waseda University Repository

Domain-Independent Novel Event Discovery and Semi-Automatic Event Annotation

Author: Ji Heng
Li Hao
Li Xiang
Marton Yuval
Publication venue: Institute of Digital Enhancement of Cognitive Processing, Waseda University
Publication date: 01/01/2011
Field of study

Waseda University Repository

Comparing Phrase-based and Syntax-based Paraphrase Generation

Author: Krahmer E.J.
Marsi E.C.
van den Bosch A.
Wubben S.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2011
Field of study

Tilburg University Repository

Extracting News Events from Microblogs

Author: Ramampiaro Heri
Repp Øystein
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2018
Field of study

Twitter stream has become a large source of information for many people, but the magnitude of tweets and the noisy nature of its content have made harvesting the knowledge from Twitter a challenging task for researchers for a long time. Aiming at overcoming some of the main challenges of extracting the hidden information from tweet streams, this work proposes a new approach for real-time detection of news events from the Twitter stream. We divide our approach into three steps. The first step is to use a neural network or deep learning to detect news-relevant tweets from the stream. The second step is to apply a novel streaming data clustering algorithm to the detected news tweets to form news events. The third and final step is to rank the detected events based on the size of the event clusters and growth speed of the tweet frequencies. We evaluate the proposed system on a large, publicly available corpus of annotated news events from Twitter. As part of the evaluation, we compare our approach with a related state-of-the-art solution. Overall, our experiments and user-based evaluation show that our approach on detecting current (real) news events delivers a state-of-the-art performance

arXiv.org e-Print Archive

NORA - Norwegian Open Research Archives

Neural-Driven Multi-criteria Tree Search for Paraphrase Generation

Author: Chevelu Jonathan
Fabre Betty
Lolive Damien
Urvoy Tanguy
Publication venue: HAL CCSD
Publication date: 12/12/2020
Field of study

International audienceA good paraphrase is semantically similar to the original sentence but it must be also well formed, and syntactically different to ensure diversity. To deal with this tradeoff, we propose to cast the paraphrase generation task as a multi-objectives search problem on the lattice of text transformations. We use BERT and GPT2 to measure respectively the semantic distance and the correctness of the candidates. We study two search algorithms: Monte-Carlo Tree Search For Paraphrase Generation (MCPG) and Pareto Tree Search (PTS) that we use to explore the huge sets of candidates generated by applying the PPDB-2.0 edition rules. We evaluate this approach on 5 datasets and show that it performs reasonably well and that it outperforms a state-of-the-art edition-based text generation method

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Generating Syntactic Paraphrases

Author: Colin Émilie
Gardent Claire
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

International audienceWe study the automatic generation of syntactic paraphrases using four different models for generation: data-to-text generation, textto-text generation, text reduction and text expansion, We derive training data for each of these tasks from the WebNLG dataset and we show (i) that conditioning generation on syntactic constraints effectively permits the generation of syntactically distinct paraphrases for the same input and (ii) that exploiting different types of input (data, text or data+text) further increases the number of distinct paraphrases that can be generated for a given input

Crossref

INRIA a CCSD electronic archive server