8,791 research outputs found

    Beyond Sentiment: The Manifold of Human Emotions

    Get PDF
    Sentiment analysis predicts the presence of positive or negative emotions in a text document. In this paper we consider higher dimensional extensions of the sentiment concept, which represent a richer set of human emotions. Our approach goes beyond previous work in that our model contains a continuous manifold rather than a finite set of human emotions. We investigate the resulting model, compare it to psychological observations, and explore its predictive capabilities. Besides obtaining significant improvements over a baseline without manifold, we are also able to visualize different notions of positive sentiment in different domains.Comment: 15 pages, 7 figure

    DocTag2Vec: An Embedding Based Multi-label Learning Approach for Document Tagging

    Full text link
    Tagging news articles or blog posts with relevant tags from a collection of predefined ones is coined as document tagging in this work. Accurate tagging of articles can benefit several downstream applications such as recommendation and search. In this work, we propose a novel yet simple approach called DocTag2Vec to accomplish this task. We substantially extend Word2Vec and Doc2Vec---two popular models for learning distributed representation of words and documents. In DocTag2Vec, we simultaneously learn the representation of words, documents, and tags in a joint vector space during training, and employ the simple kk-nearest neighbor search to predict tags for unseen documents. In contrast to previous multi-label learning methods, DocTag2Vec directly deals with raw text instead of provided feature vector, and in addition, enjoys advantages like the learning of tag representation, and the ability of handling newly created tags. To demonstrate the effectiveness of our approach, we conduct experiments on several datasets and show promising results against state-of-the-art methods.Comment: 10 page

    Language Without Words: A Pointillist Model for Natural Language Processing

    Full text link
    This paper explores two separate questions: Can we perform natural language processing tasks without a lexicon?; and, Should we? Existing natural language processing techniques are either based on words as units or use units such as grams only for basic classification tasks. How close can a machine come to reasoning about the meanings of words and phrases in a corpus without using any lexicon, based only on grams? Our own motivation for posing this question is based on our efforts to find popular trends in words and phrases from online Chinese social media. This form of written Chinese uses so many neologisms, creative character placements, and combinations of writing systems that it has been dubbed the "Martian Language." Readers must often use visual queues, audible queues from reading out loud, and their knowledge and understanding of current events to understand a post. For analysis of popular trends, the specific problem is that it is difficult to build a lexicon when the invention of new ways to refer to a word or concept is easy and common. For natural language processing in general, we argue in this paper that new uses of language in social media will challenge machines' abilities to operate with words as the basic unit of understanding, not only in Chinese but potentially in other languages.Comment: 5 pages, 2 figure

    Crowdsourced real-world sensing: sentiment analysis and the real-time web

    Get PDF
    The advent of the real-time web is proving both challeng- ing and at the same time disruptive for a number of areas of research, notably information retrieval and web data mining. As an area of research reaching maturity, sentiment analysis oers a promising direction for modelling the text content available in real-time streams. This paper reviews the real-time web as a new area of focus for sentiment analysis and discusses the motivations and challenges behind such a direction
    corecore