108,130 research outputs found
Language Without Words: A Pointillist Model for Natural Language Processing
This paper explores two separate questions: Can we perform natural language
processing tasks without a lexicon?; and, Should we? Existing natural language
processing techniques are either based on words as units or use units such as
grams only for basic classification tasks. How close can a machine come to
reasoning about the meanings of words and phrases in a corpus without using any
lexicon, based only on grams?
Our own motivation for posing this question is based on our efforts to find
popular trends in words and phrases from online Chinese social media. This form
of written Chinese uses so many neologisms, creative character placements, and
combinations of writing systems that it has been dubbed the "Martian Language."
Readers must often use visual queues, audible queues from reading out loud, and
their knowledge and understanding of current events to understand a post. For
analysis of popular trends, the specific problem is that it is difficult to
build a lexicon when the invention of new ways to refer to a word or concept is
easy and common. For natural language processing in general, we argue in this
paper that new uses of language in social media will challenge machines'
abilities to operate with words as the basic unit of understanding, not only in
Chinese but potentially in other languages.Comment: 5 pages, 2 figure
Clustering-based analysis of semantic concept models for video shots
In this paper we present a clustering-based method for representing semantic concepts on multimodal low-level feature spaces and study the evaluation of the goodness of such models with entropy-based methods. As different semantic concepts in video are most accurately represented with different features and modalities, we utilize the relative model-wise confidence values of the feature extraction techniques in weighting them automatically. The method also provides a natural way of measuring the similarity of different concepts in a multimedia lexicon. The experiments of the paper are conducted using the development set of the TRECVID 2005 corpus together with a common annotation for 39 semantic concept
Generating a Malay sentiment lexicon based on wordnet
Sentiment lexicon is a list of vocabularies that consists of positive and negative words. In opinion mining, sentiment lexicon is one of the important source in text polarity classification task in sentiment analysis model. Studies in Malay sentiment analysis is increasing since the volume of sentiment data is growing on social media. Therefore, requirement in Malay sentiment lexicon is high. However, Malay sentiment lexicon development is a difficult task due to the scarcity of Malay language resource. Thus, various approaches and techniques are used to generate sentiment lexicon. The objective of this paper is to develop Malay sentiment lexicon generation algorithm based on WordNet. In this study, the method is to map the WordNet Bahasa with English WordNet to get the offset value of a seed set of sentiment words. The seed set is used to generate the synonym and antonym semantic relation in English WordNet. The highest result achives 86.58% agreement with human annotators and 91.31% F1-measure in word polarity classification. The result shows the effectiveness of the proposed algorithm to generate Malay sentiment lexicon based on WordNet
Modelling SO-CAL in an Inheritance-based Sentiment Analysis Framework
Sentiment analysis is the computational study of people\u27s opinions, as expressed in text. This is an active area of research in Natural Language Processing with many applications in social media. There are two main approaches to sentiment analysis: machine learning and lexicon-based. The machine learning approach uses statistical modelling techniques, whereas the lexicon-based approach uses \u27sentiment lexicons\u27 containing explicit sentiment values for individual words to calculate sentiment scores for documents. In this paper we present a novel method for modelling lexicon-based sentiment analysis using a lexical inheritance network. Further, we present a case study of applying inheritance-based modelling to an existing sentiment analysis system as proof of concept, before developing the ideas further in future work
Using a Probabilistic Class-Based Lexicon for Lexical Ambiguity Resolution
This paper presents the use of probabilistic class-based lexica for
disambiguation in target-word selection. Our method employs minimal but precise
contextual information for disambiguation. That is, only information provided
by the target-verb, enriched by the condensed information of a probabilistic
class-based lexicon, is used. Induction of classes and fine-tuning to verbal
arguments is done in an unsupervised manner by EM-based clustering techniques.
The method shows promising results in an evaluation on real-world translations.Comment: 7 pages, uses colacl.st
- …
