900 research outputs found

    Insights into Analogy Completion from the Biomedical Domain

    Get PDF
    Analogy completion has been a popular task in recent years for evaluating the semantic properties of word embeddings, but the standard methodology makes a number of assumptions about analogies that do not always hold, either in recent benchmark datasets or when expanding into other domains. Through an analysis of analogies in the biomedical domain, we identify three assumptions: that of a Single Answer for any given analogy, that the pairs involved describe the Same Relationship, and that each pair is Informative with respect to the other. We propose modifying the standard methodology to relax these assumptions by allowing for multiple correct answers, reporting MAP and MRR in addition to accuracy, and using multiple example pairs. We further present BMASS, a novel dataset for evaluating linguistic regularities in biomedical embeddings, and demonstrate that the relationships described in the dataset pose significant semantic challenges to current word embedding methods.Comment: Accepted to BioNLP 2017. (10 pages

    A context based model for sentiment analysis in twitter for the italian language

    Get PDF
    Studi recenti per la Sentiment Analysis in Twitter hanno tentato di creare modelli per caratterizzare la polarit´a di un tweet osservando ciascun messaggio in isolamento. In realt`a, i tweet fanno parte di conversazioni, la cui natura pu`o essere sfruttata per migliorare la qualit`a dell’analisi da parte di sistemi automatici. In (Vanzo et al., 2014) `e stato proposto un modello basato sulla classificazione di sequenze per la caratterizzazione della polarit` a dei tweet, che sfrutta il contesto in cui il messaggio `e immerso. In questo lavoro, si vuole verificare l’applicabilit`a di tale metodologia anche per la lingua Italiana.Recent works on Sentiment Analysis over Twitter leverage the idea that the sentiment depends on a single incoming tweet. However, tweets are plunged into streams of posts, thus making available a wider context. The contribution of this information has been recently investigated for the English language by modeling the polarity detection as a sequential classification task over streams of tweets (Vanzo et al., 2014). Here, we want to verify the applicability of this method even for a morphological richer language, i.e. Italian

    Structured lexical similarity via convolution Kernels on dependency trees

    Get PDF
    A central topic in natural language process-ing is the design of lexical and syntactic fea-tures suitable for the target application. In this paper, we study convolution dependency tree kernels for automatic engineering of syntactic and semantic patterns exploiting lexical simi-larities. We define efficient and powerful ker-nels for measuring the similarity between de-pendency structures, whose surface forms of the lexical nodes are in part or completely dif-ferent. The experiments with such kernels for question classification show an unprecedented results, e.g. 41 % of error reduction of the for-mer state-of-the-art. Additionally, semantic role classification confirms the benefit of se-mantic smoothing for dependency kernels.

    From Word to Sense Embeddings: A Survey on Vector Representations of Meaning

    Get PDF
    Over the past years, distributed semantic representations have proved to be effective and flexible keepers of prior knowledge to be integrated into downstream applications. This survey focuses on the representation of meaning. We start from the theoretical background behind word vector space models and highlight one of their major limitations: the meaning conflation deficiency, which arises from representing a word with all its possible meanings as a single vector. Then, we explain how this deficiency can be addressed through a transition from the word level to the more fine-grained level of word senses (in its broader acceptation) as a method for modelling unambiguous lexical meaning. We present a comprehensive overview of the wide range of techniques in the two main branches of sense representation, i.e., unsupervised and knowledge-based. Finally, this survey covers the main evaluation procedures and applications for this type of representation, and provides an analysis of four of its important aspects: interpretability, sense granularity, adaptability to different domains and compositionality.Comment: 46 pages, 8 figures. Published in Journal of Artificial Intelligence Researc

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201
    • …
    corecore