900 research outputs found
Insights into Analogy Completion from the Biomedical Domain
Analogy completion has been a popular task in recent years for evaluating the
semantic properties of word embeddings, but the standard methodology makes a
number of assumptions about analogies that do not always hold, either in recent
benchmark datasets or when expanding into other domains. Through an analysis of
analogies in the biomedical domain, we identify three assumptions: that of a
Single Answer for any given analogy, that the pairs involved describe the Same
Relationship, and that each pair is Informative with respect to the other. We
propose modifying the standard methodology to relax these assumptions by
allowing for multiple correct answers, reporting MAP and MRR in addition to
accuracy, and using multiple example pairs. We further present BMASS, a novel
dataset for evaluating linguistic regularities in biomedical embeddings, and
demonstrate that the relationships described in the dataset pose significant
semantic challenges to current word embedding methods.Comment: Accepted to BioNLP 2017. (10 pages
A context based model for sentiment analysis in twitter for the italian language
Studi recenti per la Sentiment
Analysis in Twitter hanno tentato di creare
modelli per caratterizzare la polarit´a di
un tweet osservando ciascun messaggio
in isolamento. In realt`a, i tweet fanno
parte di conversazioni, la cui natura pu`o
essere sfruttata per migliorare la qualit`a
dell’analisi da parte di sistemi automatici.
In (Vanzo et al., 2014) `e stato proposto un
modello basato sulla classificazione di sequenze
per la caratterizzazione della polarit`
a dei tweet, che sfrutta il contesto in
cui il messaggio `e immerso. In questo lavoro,
si vuole verificare l’applicabilit`a di
tale metodologia anche per la lingua Italiana.Recent works on Sentiment
Analysis over Twitter leverage the idea
that the sentiment depends on a single
incoming tweet. However, tweets are
plunged into streams of posts, thus making
available a wider context. The contribution
of this information has been recently
investigated for the English language by
modeling the polarity detection as a sequential
classification task over streams of
tweets (Vanzo et al., 2014). Here, we want
to verify the applicability of this method
even for a morphological richer language,
i.e. Italian
Structured lexical similarity via convolution Kernels on dependency trees
A central topic in natural language process-ing is the design of lexical and syntactic fea-tures suitable for the target application. In this paper, we study convolution dependency tree kernels for automatic engineering of syntactic and semantic patterns exploiting lexical simi-larities. We define efficient and powerful ker-nels for measuring the similarity between de-pendency structures, whose surface forms of the lexical nodes are in part or completely dif-ferent. The experiments with such kernels for question classification show an unprecedented results, e.g. 41 % of error reduction of the for-mer state-of-the-art. Additionally, semantic role classification confirms the benefit of se-mantic smoothing for dependency kernels.
From Word to Sense Embeddings: A Survey on Vector Representations of Meaning
Over the past years, distributed semantic representations have proved to be
effective and flexible keepers of prior knowledge to be integrated into
downstream applications. This survey focuses on the representation of meaning.
We start from the theoretical background behind word vector space models and
highlight one of their major limitations: the meaning conflation deficiency,
which arises from representing a word with all its possible meanings as a
single vector. Then, we explain how this deficiency can be addressed through a
transition from the word level to the more fine-grained level of word senses
(in its broader acceptation) as a method for modelling unambiguous lexical
meaning. We present a comprehensive overview of the wide range of techniques in
the two main branches of sense representation, i.e., unsupervised and
knowledge-based. Finally, this survey covers the main evaluation procedures and
applications for this type of representation, and provides an analysis of four
of its important aspects: interpretability, sense granularity, adaptability to
different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence
Researc
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
- …