Search CORE

192 research outputs found

A Deep Network Model for Paraphrase Detection in Short Text Messages

Author: Agarwal Basant
Langseth Helge
Ramampiaro Heri
Ruocco Massimiliano
Publication venue: 'Elsevier BV'
Publication date: 07/12/2017
Field of study

This paper is concerned with paraphrase detection. The ability to detect similar sentences written in natural language is crucial for several applications, such as text mining, text summarization, plagiarism detection, authorship authentication and question answering. Given two sentences, the objective is to detect whether they are semantically identical. An important insight from this work is that existing paraphrase systems perform well when applied on clean texts, but they do not necessarily deliver good performance against noisy texts. Challenges with paraphrase detection on user generated short texts, such as Twitter, include language irregularity and noise. To cope with these challenges, we propose a novel deep neural network-based approach that relies on coarse-grained sentence modeling using a convolutional neural network and a long short-term memory model, combined with a specific fine-grained word-level similarity matching model. Our experimental results show that the proposed approach outperforms existing state-of-the-art approaches on user-generated noisy social media data, such as Twitter texts, and achieves highly competitive performance on a cleaner corpus

arXiv.org e-Print Archive

NORA - Norwegian Open Research Archives

Conception: Multilingually-Enhanced, Human-Readable Concept Vector Representations

Author: Conia Simone
Navigli Roberto
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

To date, the most successful word, word sense, and concept modelling techniques have used large corpora and knowledge resources to produce dense vector representations that capture semantic similarities in a relatively low-dimensional space. Most current approaches, however, suffer from a monolingual bias, with their strength depending on the amount of data available across languages. In this paper we address this issue and propose Conception, a novel technique for building language-independent vector representations of concepts which places multilinguality at its core while retaining explicit relationships between concepts. Our approach results in high-coverage representations that outperform the state of the art in multilingual and cross-lingual Semantic Word Similarity and Word Sense Disambiguation, proving particularly robust on low-resource languages. Conception – its software and the complete set of representations – is available at https://github.com/SapienzaNLP/conception

Crossref

Archivio della ricerca- Università di Roma La Sapienza

SemEval-2015 Task 3: Answer Selection in Community Question Answering

Author: Glass Jim
Magdy Walid
Moschitti Alessandro
Màrquez Lluís
Nakov Preslav
Randeree Bilal
Publication venue
Publication date: 01/06/2015
Field of study

Edinburgh Research Explorer

QCRI: Answer Selection for Community Question Answering - Experiments for Arabic and English

Author: Barrón-Cedeño Alberto
Darwish Kareem
Filice Simone
Gao Wei
Joty Shafiq R.
Magdy Walid
Martino Giovanni Da San
Moschitti Alessandro
Mubarak Hamdy
Màrquez Lluís
Nakov Preslav
Nicosia Massimo
Saleh Iman
Publication venue
Publication date: 01/01/2015
Field of study

Crossref

Institutional Knowledge at Singapore Management University

Edinburgh Research Explorer

Computing Semantic Text Similarity Using Rich Features

Author: Lin Lei
Liu Yang
Sun Chengjie
Wang Xiaolong
Zhao Yuming
Publication venue
Publication date: 01/01/2015
Field of study

Waseda University Repository

Story Cloze Ending Selection Baselines and Data Examination

Author: Frank Anette
Mihaylov Todor
Publication venue
Publication date: 01/01/2017
Field of study

This paper describes two supervised baseline systems for the Story Cloze Test Shared Task (Mostafazadeh et al., 2016a). We first build a classifier using features based on word embeddings and semantic similarity computation. We further implement a neural LSTM system with different encoding strategies that try to model the relation between the story and the provided endings. Our experiments show that a model using representation features based on average word embedding vectors over the given story words and the candidate ending sentences words, joint with similarity features between the story and candidate ending representations performed better than the neural models. Our best model achieves an accuracy of 72.42, ranking 3rd in the official evaluation.Comment: Submission for the LSDSem 2017 - Linking Models of Lexical, Sentential and Discourse-level Semantics - Shared Tas

arXiv.org e-Print Archive

TUbiblio

Crossref