5 research outputs found
Learning Embeddings to lexicalise RDF Properties
International audienceA difficult task when generating text from knowledge bases (KB) consists in finding appropriate lexicalisations for KB symbols. We present an approach for lexicalis-ing knowledge base relations and apply it to DBPedia data. Our model learns low-dimensional embeddings of words and RDF resources and uses these representations to score RDF properties against candidate lexicalisations. Training our model using (i) pairs of RDF triples and automatically generated verbalisations of these triples and (ii) pairs of paraphrases extracted from various resources, yields competitive results on DBPedia data
Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM
Sentiment analysis on large-scale social media data is important to bridge
the gaps between social media contents and real world activities including
political election prediction, individual and public emotional status
monitoring and analysis, and so on. Although textual sentiment analysis has
been well studied based on platforms such as Twitter and Instagram, analysis of
the role of extensive emoji uses in sentiment analysis remains light. In this
paper, we propose a novel scheme for Twitter sentiment analysis with extra
attention on emojis. We first learn bi-sense emoji embeddings under positive
and negative sentimental tweets individually, and then train a sentiment
classifier by attending on these bi-sense emoji embeddings with an
attention-based long short-term memory network (LSTM). Our experiments show
that the bi-sense embedding is effective for extracting sentiment-aware
embeddings of emojis and outperforms the state-of-the-art models. We also
visualize the attentions to show that the bi-sense emoji embedding provides
better guidance on the attention mechanism to obtain a more robust
understanding of the semantics and sentiments
I run as fast as a rabbit, can you? A Multilingual Simile Dialogue Dataset
A simile is a figure of speech that compares two different things (called the
tenor and the vehicle) via shared properties. The tenor and the vehicle are
usually connected with comparator words such as "like" or "as". The simile
phenomena are unique and complex in a real-life dialogue scene where the tenor
and the vehicle can be verbal phrases or sentences, mentioned by different
speakers, exist in different sentences, or occur in reversed order. However,
the current simile research usually focuses on similes in a triplet tuple
(tenor, property, vehicle) or a single sentence where the tenor and vehicle are
usually entities or noun phrases, which could not reflect complex simile
phenomena in real scenarios. In this paper, we propose a novel and high-quality
multilingual simile dialogue (MSD) dataset to facilitate the study of complex
simile phenomena. The MSD is the largest manually annotated simile data
(20K) and it contains both English and Chinese data. Meanwhile, the MSD
data can also be used on dialogue tasks to test the ability of dialogue systems
when using similes. We design 3 simile tasks (recognition, interpretation, and
generation) and 2 dialogue tasks (retrieval and generation) with MSD. For each
task, we provide experimental results from strong pre-trained or
state-of-the-art models. The experiments demonstrate the challenge of MSD and
we have released the data/code on GitHub.Comment: 13 Pages, 1 Figure, 12 Tables, ACL 2023 finding
A Multilingual Test Collection for the Semantic Search of Entity Categories
Humans naturally organise and classify the world into sets and categories. These categories expressed in natural language are present
in all data artefacts from structured to unstructured data and play a fundamental role as tags, dataset predicates or ontology attributes.
A better understanding of the category syntactic structure and how to match them semantically is a fundamental problem in the
computational linguistics domain. Despite the high popularity of entity search, entity categories have not been receiving equivalent
attention. This paper aims to present the task of semantic search of entity categories by defining, developing and making publicly
available a multilingual test collection comprehending English, Portuguese and German. The test collections were designed to meet the
demands of the entity search community in providing more representative and semantically complex query sets. In addition, we also
provide comparative baselines and a brief analysis of the results