Search CORE

3 research outputs found

ASE@DPIL-FIRE2016: Hindi Paraphrase Detection using Natural Language Processing Techniques & Semantic Similarity Computations

Author: Deepa Gupta
Vani K
Publication venue
Publication date: 23/04/2020
Field of study

ABSTRACT The paper reports the approaches utilized and results achieved for our system in the shared task (in FIRE-2016) for paraphrase identification in Indian languages (DPIL). Since Indian languages have a complex inherent nature, paraphrase identification in these languages becomes a challenging task. In the DPIL task, the challenge is to detect and identify whether a given sentence pairs paraphrased or not. In the proposed work, natural language processing with semantic concept extractions is explored for paraphrase detection in Hindi. Stop word removal, stemming and part of speech tagging are employed. Further similarity computations between the sentence pairs are done by extracting semantic concepts using WordNet lexical database. Initially, the proposed approach is evaluated over the given training sets using different machine learning classifiers. Then testing phase is used to predict the classes using the proposed features. The results are found to be promising, which shows the potency of natural language processing techniques and semantic concept extractions in detecting paraphrases. CCS Concepts Computing methodologies-Natural language processing Information systems -Document analysis and feature selection; Near-duplicate and paraphrase detectio

CiteSeerX

Neural Machine Translation for Malayalam Paraphrase Generation

Author: Koshelev Sergey
Varghese Christeena
Yamshchikov Ivan P.
Publication venue
Publication date: 31/01/2024
Field of study

This study explores four methods of generating paraphrases in Malayalam, utilizing resources available for English paraphrasing and pre-trained Neural Machine Translation (NMT) models. We evaluate the resulting paraphrases using both automated metrics, such as BLEU, METEOR, and cosine similarity, as well as human annotation. Our findings suggest that automated evaluation measures may not be fully appropriate for Malayalam, as they do not consistently align with human judgment. This discrepancy underscores the need for more nuanced paraphrase evaluation approaches especially for highly agglutinative languages

arXiv.org e-Print Archive