171 research outputs found
Corpus Creation for Sentiment Analysis in Code-Mixed Tamil-English Text
Understanding the sentiment of a comment from a video or an image is an
essential task in many applications. Sentiment analysis of a text can be useful
for various decision-making processes. One such application is to analyse the
popular sentiments of videos on social media based on viewer comments. However,
comments from social media do not follow strict rules of grammar, and they
contain mixing of more than one language, often written in non-native scripts.
Non-availability of annotated code-mixed data for a low-resourced language like
Tamil also adds difficulty to this problem. To overcome this, we created a gold
standard Tamil-English code-switched, sentiment-annotated corpus containing
15,744 comment posts from YouTube. In this paper, we describe the process of
creating the corpus and assigning polarities. We present inter-annotator
agreement and show the results of sentiment analysis trained on this corpus as
a benchmark
Code-switching in Irish tweets: a preliminary analysis
As is the case with many languages, research into code-switching in Modern Irish
has, until recently, mainly been focused
on the spoken language. Online usergenerated content (UGC) is less restrictive than traditional written text, allowing
for code-switching, and as such, provides
a new platform for text-based research in
this field of study. This paper reports on
the annotation of (English) code-switching
in a corpus of 1496 Irish tweets and
provides a computational analysis of the
nature of code-switching amongst Irish
speaking Twitter users, with a view to
providing a basis for future linguistic and
socio-linguistic studies
Natural language processing for similar languages, varieties, and dialects: A survey
There has been a lot of recent interest in the natural language processing (NLP) community in the computational processing of language varieties and dialects, with the aim to improve the performance of applications such as machine translation, speech recognition, and dialogue systems. Here, we attempt to survey this growing field of research, with focus on computational methods for processing similar languages, varieties, and dialects. In particular, we discuss the most important challenges when dealing with diatopic language variation, and we present some of the available datasets, the process of data collection, and the most common data collection strategies used to compile datasets for similar languages, varieties, and dialects. We further present a number of studies on computational methods developed and/or adapted for preprocessing, normalization, part-of-speech tagging, and parsing similar languages, language varieties, and dialects. Finally, we discuss relevant applications such as language and dialect identification and machine translation for closely related languages, language varieties, and dialects.Non peer reviewe
Deep Learning for Text Style Transfer: A Survey
Text style transfer is an important task in natural language generation,
which aims to control certain attributes in the generated text, such as
politeness, emotion, humor, and many others. It has a long history in the field
of natural language processing, and recently has re-gained significant
attention thanks to the promising performance brought by deep neural models. In
this paper, we present a systematic survey of the research on neural text style
transfer, spanning over 100 representative articles since the first neural text
style transfer work in 2017. We discuss the task formulation, existing datasets
and subtasks, evaluation, as well as the rich methodologies in the presence of
parallel and non-parallel data. We also provide discussions on a variety of
important topics regarding the future development of this task. Our curated
paper list is at https://github.com/zhijing-jin/Text_Style_Transfer_SurveyComment: Computational Linguistics Journal 202
- …