Search CORE

3 research outputs found

ASE@DPIL-FIRE2016: Hindi Paraphrase Detection using Natural Language Processing Techniques & Semantic Similarity Computations

Author: Deepa Gupta
Vani K
Publication venue
Publication date: 23/04/2020
Field of study

ABSTRACT The paper reports the approaches utilized and results achieved for our system in the shared task (in FIRE-2016) for paraphrase identification in Indian languages (DPIL). Since Indian languages have a complex inherent nature, paraphrase identification in these languages becomes a challenging task. In the DPIL task, the challenge is to detect and identify whether a given sentence pairs paraphrased or not. In the proposed work, natural language processing with semantic concept extractions is explored for paraphrase detection in Hindi. Stop word removal, stemming and part of speech tagging are employed. Further similarity computations between the sentence pairs are done by extracting semantic concepts using WordNet lexical database. Initially, the proposed approach is evaluated over the given training sets using different machine learning classifiers. Then testing phase is used to predict the classes using the proposed features. The results are found to be promising, which shows the potency of natural language processing techniques and semantic concept extractions in detecting paraphrases. CCS Concepts Computing methodologies-Natural language processing Information systems -Document analysis and feature selection; Near-duplicate and paraphrase detectio

CiteSeerX

SemPCA-Summarizer: Exploiting Semantic Principal Component Analysis for Automatic Summary Generation

Author: Alcón Óscar
Lloret Elena
Publication venue: Institute of Informatics, Slovak Academy of Sciences
Publication date: 21/11/2018
Field of study

Text summarization is the task of condensing a document keeping the relevant information. This task integrated in wider information systems can help users to access key information without having to read everything, allowing for a higher efficiency. In this research work, we have developed and evaluated a single-document extractive summarization approach, named SemPCA-Summarizer, which reduces the dimension of a document using Principal Component Analysis technique enriched with semantic information. A concept-sentence matrix is built from the textual input document, and then, PCA is used to identify and rank the relevant concepts, which are used for selecting the most important sentences through different heuristics, thus leading to various types of summaries. The results obtained show that the generated summaries are very competitive, both from a quantitative and a qualitative viewpoint, thus indicating that our proposed approach is appropriate for briefly providing key information, and thus helping to cope with a huge amount of information available in a quicker and efficient manner

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)

SemPCA-Summarizer: Exploiting Semantic Principal Component Analysis for Automatic Summary Generation

Author: Alcón Óscar
Lloret Elena
Publication venue: 'AEPress, s.r.o.'
Publication date: 01/01/2018
Field of study

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Computing and Informatics (E-Journal - Institute of Informatics, SAS, Bratislava)