Search CORE

1,505 research outputs found

Tibetan Microblog Emotional Analysis Based on Sequential Model in Online Social Platforms

Author: Huili Zhang
Lirong Qiu
Qiumei Pu
Zhen Zhang
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Comparing Fifty Natural Languages and Twelve Genetic Languages Using Word Embedding Language Divergence (WELD) as a Quantitative Measure of Language Distance

Author: Asgari Ehsaneddin
Mofrad Mohammad R. K.
Publication venue
Publication date: 28/04/2016
Field of study

We introduce a new measure of distance between languages based on word embedding, called word embedding language divergence (WELD). WELD is defined as divergence between unified similarity distribution of words between languages. Using such a measure, we perform language comparison for fifty natural languages and twelve genetic languages. Our natural language dataset is a collection of sentence-aligned parallel corpora from bible translations for fifty languages spanning a variety of language families. Although we use parallel corpora, which guarantees having the same content in all languages, interestingly in many cases languages within the same family cluster together. In addition to natural languages, we perform language comparison for the coding regions in the genomes of 12 different organisms (4 plants, 6 animals, and two human subjects). Our result confirms a significant high-level difference in the genetic language model of humans/animals versus plants. The proposed method is a step toward defining a quantitative measure of similarity between languages, with applications in languages classification, genre identification, dialect identification, and evaluation of translations

arXiv.org e-Print Archive

eScholarship - University of California

Zero-Shot Learning by Convex Combination of Semantic Embeddings

Author: Bengio Samy
Corrado Greg S.
Dean Jeffrey
Frome Andrea
Mikolov Tomas
Norouzi Mohammad
Shlens Jonathon
Singer Yoram
Publication venue
Publication date: 21/03/2014
Field of study

Several recent publications have proposed methods for mapping images into continuous semantic embedding spaces. In some cases the embedding space is trained jointly with the image transformation. In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage. Proponents of these image embedding systems have stressed their advantages over the traditional \nway{} classification framing of image understanding, particularly in terms of the promise for zero-shot learning -- the ability to correctly annotate images of previously unseen object categories. In this paper, we propose a simple method for constructing an image embedding system from any existing \nway{} image classifier and a semantic word embedding model, which contains the \n class labels in its vocabulary. Our method maps images into the semantic embedding space via convex combination of the class label embedding vectors, and requires no additional training. We show that this simple and direct method confers many of the advantages associated with more complex image embedding schemes, and indeed outperforms state of the art methods on the ImageNet zero-shot learning task

arXiv.org e-Print Archive

CiteSeerX

Paradigm Completion for Derivational Morphology

Author: Cotterell Ryan
Khayrallah Huda
Kirov Christo
Vylomova Ekaterina
Yarowsky David
Publication venue
Publication date: 01/01/2017
Field of study

The generation of complex derived word forms has been an overlooked problem in NLP; we fill this gap by applying neural sequence-to-sequence models to the task. We overview the theoretical motivation for a paradigmatic treatment of derivational morphology, and introduce the task of derivational paradigm completion as a parallel to inflectional paradigm completion. State-of-the-art neural models, adapted from the inflection task, are able to learn a range of derivation patterns, and outperform a non-neural baseline by 16.4%. However, due to semantic, historical, and lexical considerations involved in derivational morphology, future work will be needed to achieve performance parity with inflection-generating systems.Comment: EMNLP 201

arXiv.org e-Print Archive

Crossref

The Today Tendency of Sentiment Classification

Author: Phu Vo Ngoc
Tran Vo Thi Ngoc
Publication venue: 'IntechOpen'
Publication date: 27/06/2018
Field of study

Sentiment classification has already been studied for many years because it has had many crucial contributions to many different fields in everyday life, such as in political activities, commodity production, and commercial activities. There have been many kinds of the sentiment analysis such as machine learning approaches, lexicon-based approaches, etc., for many years. The today tendency of the sentiment classification is as follows: (1) Processing many big data sets with shortening execution times (2) Having a high accuracy (3) Integrating flexibly and easily into many small machines or many different approaches. We will present each category in more details

IntechOpen

Crossref