5,698 research outputs found
Semantic Text Mining using Domain Ontology
Abstract— Presently in Customer Relationship Management, there is a need to achieve greater customer centricity, and this requires a deeper understanding of customer needs. Also, the volume of textual data generated by the social networking sites in recent times has greatly increased, creating a platform for analysis, towards the much needed customer understanding. One of the issues that evolve from analyzing these texts to retrieve non trivial patterns (text mining) is text representation, which this research is aimed at addressing. In particular, this paper focuses on using domain ontology for text pre-processing in order to improve the quality of the textual corpus being mined. The methodology used in this research is based on developing a domain Ontology for textual pre-processing of the experimental data and sentiment analysis of social media data. In conclusion, the inferences gotten from the research carried out reveal that domain ontology has the ability to improve the results of sentiment analysis. It was also discovered that, due to the nature of social media data, there is need for a deeper level of semantic analysis, to be able to maximize its richness
Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Machine Learning has been a big success story during the AI resurgence. One
particular stand out success relates to learning from a massive amount of data.
In spite of early assertions of the unreasonable effectiveness of data, there
is increasing recognition for utilizing knowledge whenever it is available or
can be created purposefully. In this paper, we discuss the indispensable role
of knowledge for deeper understanding of content where (i) large amounts of
training data are unavailable, (ii) the objects to be recognized are complex,
(e.g., implicit entities and highly subjective content), and (iii) applications
need to use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP
techniques. Using diverse examples, we seek to foretell unprecedented progress
in our ability for deeper understanding and exploitation of multimodal data and
continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International
Conference on Web Intelligence (WI). arXiv admin note: substantial text
overlap with arXiv:1610.0770
The Development of a Temporal Information Dictionary for Social Media Analytics
Dictionaries have been used to analyze text even before the emergence of social media and the use of dictionaries for sentiment analysis there. While dictionaries have been used to understand the tonality of text, so far it has not been possible to automatically detect if the tonality refers to the present, past, or future. In this research, we develop a dictionary containing time-indicating words in a wordlist (T-wordlist). To test how the dictionary performs, we apply our T-wordlist on different disaster related social media datasets. Subsequently we will validate the wordlist and results by a manual content analysis. So far, in this research-in-progress, we were able to develop a first dictionary and will also provide some initial insight into the performance of our wordlist
Enhancing Twitter Data Analysis with Simple Semantic Filtering: Example in Tracking Influenza-Like Illnesses
Systems that exploit publicly available user generated content such as
Twitter messages have been successful in tracking seasonal influenza. We
developed a novel filtering method for Influenza-Like-Illnesses (ILI)-related
messages using 587 million messages from Twitter micro-blogs. We first filtered
messages based on syndrome keywords from the BioCaster Ontology, an extant
knowledge model of laymen's terms. We then filtered the messages according to
semantic features such as negation, hashtags, emoticons, humor and geography.
The data covered 36 weeks for the US 2009 influenza season from 30th August
2009 to 8th May 2010. Results showed that our system achieved the highest
Pearson correlation coefficient of 98.46% (p-value<2.2e-16), an improvement of
3.98% over the previous state-of-the-art method. The results indicate that
simple NLP-based enhancements to existing approaches to mine Twitter data can
increase the value of this inexpensive resource.Comment: 10 pages, 5 figures, IEEE HISB 2012 conference, Sept 27-28, 2012, La
Jolla, California, U
Combination of Domain Knowledge and Deep Learning for Sentiment Analysis of Short and Informal Messages on Social Media
Sentiment analysis has been emerging recently as one of the major natural
language processing (NLP) tasks in many applications. Especially, as social
media channels (e.g. social networks or forums) have become significant sources
for brands to observe user opinions about their products, this task is thus
increasingly crucial. However, when applied with real data obtained from social
media, we notice that there is a high volume of short and informal messages
posted by users on those channels. This kind of data makes the existing works
suffer from many difficulties to handle, especially ones using deep learning
approaches. In this paper, we propose an approach to handle this problem. This
work is extended from our previous work, in which we proposed to combine the
typical deep learning technique of Convolutional Neural Networks with domain
knowledge. The combination is used for acquiring additional training data
augmentation and a more reasonable loss function. In this work, we further
improve our architecture by various substantial enhancements, including
negation-based data augmentation, transfer learning for word embeddings, the
combination of word-level embeddings and character-level embeddings, and using
multitask learning technique for attaching domain knowledge rules in the
learning process. Those enhancements, specifically aiming to handle short and
informal messages, help us to enjoy significant improvement in performance once
experimenting on real datasets.Comment: A Preprint of an article accepted for publication by Inderscience in
IJCVR on September 201
- …