2,288 research outputs found
Slang feature extraction by analysing topic change on social media
Recently, the authors often see words such as youth slang, neologism and Internet slang on social networking sites (SNSs) that are not registered on dictionaries. Since the documents posted to SNSs include a lot of fresh information, they are thought to be useful for collecting information. It is important to analyse these words (hereinafter referred to as ‘slang’) and capture their features for the improvement of the accuracy of automatic information collection. This study aims to analyse what features can be observed in slang by focusing on the topic. They construct topic models from document groups including target slang on Twitter by latent Dirichlet allocation. With the models, they chronologically the analyse change of topics during a certain period of time to find out the difference in the features between slang and general words. Then, they propose a slang classification method based on the change of features
Semantic Variation in Online Communities of Practice
We introduce a framework for quantifying semantic variation of common words
in Communities of Practice and in sets of topic-related communities. We show
that while some meaning shifts are shared across related communities, others
are community-specific, and therefore independent from the discussed topic. We
propose such findings as evidence in favour of sociolinguistic theories of
socially-driven semantic variation. Results are evaluated using an independent
language modelling task. Furthermore, we investigate extralinguistic features
and show that factors such as prominence and dissemination of words are related
to semantic variation.Comment: 13 pages, Proceedings of the 12th International Conference on
Computational Semantics (IWCS 2017
The Creation of an Arabic Emotion Ontology Based on E-Motive
© 2017 The Authors. Published by Elsevier B.V. There is an increased interest in social media monitoring to analyse massive, free form, short user-generated text from multiple social media sites such as Facebook, WhatsApp and Twitter. Companies are interested in sentiment analysis to understand customers\u27 opinions about their products/services. Governments and law enforcement agencies are interested in identifying threats to safeguard their country\u27s national security. They are actively seeking ways to monitor and analyse the public\u27s responses to various services, activities and events, especially since social media has become a valuable real-time resource of information. This study builds on prior work that focused on sentiment classification (i.e., positive, negative). This study primarily aims to design and develop a social sentiment-parsing algorithm for capturing and monitoring an extensive and comprehensive range of emotions from Arabic social media text. The study contributes to the field of sentiment analysis (opinion mining) and can subsequently be used for web mining, cleansing and analytics
Weblog and short text feature extraction and impact on categorisation
The characterisation and categorisation of weblogs and other short texts has become an important research theme in the
areas of topic/trend detection, and pattern recognition, amongst others. The value of analysing and characterising short text is to
understand and identify the features that can identify and distinguish them, thereby improving input to the classification process.
In this research work, we analyse a large number of text features and establish which combinations are useful to discriminate
between the different genres of short text. Having identified the most promising features, we then confirm our findings by
performing the categorisation task using three approaches: the Gaussian and SVM classifiers and the K-means clustering algorithm.
Several hundred combinations of features were analysed in order to identify the best combinations and the results confirmed the
observations made. The novel aspect of our work is the detection of the best combination of individual metrics which are identified
as potential features to be used for the categorisation process.The research work of the third author is partially funded by the WIQ-EI (IRSES grant n. 269180) and DIANA APPLICATIONS (TIN2012-38603-C02-01), and done in the framework of the VLC/Campus Microcluster on Multimodal Interaction in Intelligent Systems.Perez Tellez, F.; Cardiff, J.; Rosso, P.; Pinto Avendaño, DE. (2014). Weblog and short text feature extraction and impact on categorisation. Journal of Intelligent and Fuzzy Systems. 27(5):2529-2544. https://doi.org/10.3233/IFS-141227S2529254427
Recommended from our members
Extracting Personal Behavioral Patterns from Geo-Referenced Tweets
This paper presents an exploratory study of the potential of geo-referenced Twitter data for extracting knowledge about significant personal places, behaviors and potential interests of people. The study was done analysing two months’ worth of tweets from residents of the greater Seattle area
Sentiment analysis: the case of twitch chat - Mining user feedback from livestream chats
Project Work presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Information Systems and Technologies ManagementIn a world where users often share their thoughts and opinions through online communication
channels, applications that can tap into these channels as to extract consumer feedback have
become increasingly valuable. Traditional marketing research techniques such as interviews or
surveys offer results that pale in comparison to sentiment analysis applications that can extract
organic feedback from an extremely large selection, with very little resources and in real-time.
This thesis focuses on proposing and developing one of these tools that targets livestreams,
which have, over the years, seen a massive increase in popularity from both a user-base
standpoint as well as brand involvement. We chose the livestreaming platform “Twitch” as the
target of research and developed a sentiment analysis model, using rule-based approaches,
capable of interpreting user chat messages and identifying whether those messages are negative,
positive or neutral. Additionally, an application was developed to better view and analyze the
results of the model. By segmenting our results by product reveal, we also exhibit how the
application allows for the extraction of various insights about the public’s opinion of that
product
Race, Religion and the City: Twitter Word Frequency Patterns Reveal Dominant Demographic Dimensions in the United States
Recently, numerous approaches have emerged in the social sciences to exploit
the opportunities made possible by the vast amounts of data generated by online
social networks (OSNs). Having access to information about users on such a
scale opens up a range of possibilities, all without the limitations associated
with often slow and expensive paper-based polls. A question that remains to be
satisfactorily addressed, however, is how demography is represented in the OSN
content? Here, we study language use in the US using a corpus of text compiled
from over half a billion geo-tagged messages from the online microblogging
platform Twitter. Our intention is to reveal the most important spatial
patterns in language use in an unsupervised manner and relate them to
demographics. Our approach is based on Latent Semantic Analysis (LSA) augmented
with the Robust Principal Component Analysis (RPCA) methodology. We find
spatially correlated patterns that can be interpreted based on the words
associated with them. The main language features can be related to slang use,
urbanization, travel, religion and ethnicity, the patterns of which are shown
to correlate plausibly with traditional census data. Our findings thus validate
the concept of demography being represented in OSN language use and show that
the traits observed are inherently present in the word frequencies without any
previous assumptions about the dataset. Thus, they could form the basis of
further research focusing on the evaluation of demographic data estimation from
other big data sources, or on the dynamical processes that result in the
patterns found here
- …