Search CORE

5 research outputs found

Learning Embeddings to lexicalise RDF Properties

Author: Gardent Claire
Perez-Beltrachini Laura
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

International audienceA difficult task when generating text from knowledge bases (KB) consists in finding appropriate lexicalisations for KB symbols. We present an approach for lexicalis-ing knowledge base relations and apply it to DBPedia data. Our model learns low-dimensional embeddings of words and RDF resources and uses these representations to score RDF properties against candidate lexicalisations. Training our model using (i) pairs of RDF triples and automatically generated verbalisations of these triples and (ii) pairs of paraphrases extracted from various resources, yields competitive results on DBPedia data

Crossref

INRIA a CCSD electronic archive server

Edinburgh Research Explorer

Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM

Author: Agarwal Apoorv
Alshenqeeti Hamza
Baccianella Stefano
Clayton
Dimson Thomas
Esuli Andrea
González-Ibánez Roberto
Hu Tianran
Kouloumpis Efthymios
Li Weijian
Li Xiang
Liu Kun-Lin
Novak Petra Kralj
Pak Alexander
van der Maaten Laurens
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/08/2018
Field of study

Sentiment analysis on large-scale social media data is important to bridge the gaps between social media contents and real world activities including political election prediction, individual and public emotional status monitoring and analysis, and so on. Although textual sentiment analysis has been well studied based on platforms such as Twitter and Instagram, analysis of the role of extensive emoji uses in sentiment analysis remains light. In this paper, we propose a novel scheme for Twitter sentiment analysis with extra attention on emojis. We first learn bi-sense emoji embeddings under positive and negative sentimental tweets individually, and then train a sentiment classifier by attending on these bi-sense emoji embeddings with an attention-based long short-term memory network (LSTM). Our experiments show that the bi-sense embedding is effective for extracting sentiment-aware embeddings of emojis and outperforms the state-of-the-art models. We also visualize the attentions to show that the bi-sense emoji embedding provides better guidance on the attention mechanism to obtain a more robust understanding of the semantics and sentiments

arXiv.org e-Print Archive

Crossref

I run as fast as a rabbit, can you? A Multilingual Simile Dialogue Dataset

Author: Ke Changxin
Liu Ting
Ma Longxuan
Sun Churui
Zhang Weinan
Zhou Shuhan
Publication venue
Publication date: 09/06/2023
Field of study

A simile is a figure of speech that compares two different things (called the tenor and the vehicle) via shared properties. The tenor and the vehicle are usually connected with comparator words such as "like" or "as". The simile phenomena are unique and complex in a real-life dialogue scene where the tenor and the vehicle can be verbal phrases or sentences, mentioned by different speakers, exist in different sentences, or occur in reversed order. However, the current simile research usually focuses on similes in a triplet tuple (tenor, property, vehicle) or a single sentence where the tenor and vehicle are usually entities or noun phrases, which could not reflect complex simile phenomena in real scenarios. In this paper, we propose a novel and high-quality multilingual simile dialogue (MSD) dataset to facilitate the study of complex simile phenomena. The MSD is the largest manually annotated simile data (

\sim

20K) and it contains both English and Chinese data. Meanwhile, the MSD data can also be used on dialogue tasks to test the ability of dialogue systems when using similes. We design 3 simile tasks (recognition, interpretation, and generation) and 2 dialogue tasks (retrieval and generation) with MSD. For each task, we provide experimental results from strong pre-trained or state-of-the-art models. The experiments demonstrate the challenge of MSD and we have released the data/code on GitHub.Comment: 13 Pages, 1 Figure, 12 Tables, ACL 2023 finding

arXiv.org e-Print Archive

A Multilingual Test Collection for the Semantic Search of Entity Categories

Author: Barzegar Siamak
Bermeitinger Bernhard
Cunha Tiago
Davis Brian
Franco Wellington
Freitas Andre
Handschuh Siegfried
Sales Juliano Efson
Publication venue: LREC: Language Resources and Evaluation Conference
Publication date: 01/01/2018
Field of study

Humans naturally organise and classify the world into sets and categories. These categories expressed in natural language are present in all data artefacts from structured to unstructured data and play a fundamental role as tags, dataset predicates or ontology attributes. A better understanding of the category syntactic structure and how to match them semantically is a fundamental problem in the computational linguistics domain. Despite the high popularity of entity search, entity categories have not been receiving equivalent attention. This paper aims to present the task of semantic search of entity categories by defining, developing and making publicly available a multilingual test collection comprehending English, Portuguese and German. The test collections were designed to meet the demands of the entity search community in providing more representative and semantically complex query sets. In addition, we also provide comparative baselines and a brief analysis of the results

MURAL - Maynooth University Research Archive Library