343 research outputs found
Adapting Deep Learning for Sentiment Classification of Code-Switched Informal Short Text
Nowadays, an abundance of short text is being generated that uses nonstandard
writing styles influenced by regional languages. Such informal and
code-switched content are under-resourced in terms of labeled datasets and
language models even for popular tasks like sentiment classification. In this
work, we (1) present a labeled dataset called MultiSenti for sentiment
classification of code-switched informal short text, (2) explore the
feasibility of adapting resources from a resource-rich language for an informal
one, and (3) propose a deep learning-based model for sentiment classification
of code-switched informal short text. We aim to achieve this without any
lexical normalization, language translation, or code-switching indication. The
performance of the proposed models is compared with three existing multilingual
sentiment classification models. The results show that the proposed model
performs better in general and adapting character-based embeddings yield
equivalent performance while being computationally more efficient than training
word-based domain-specific embeddings
Deep learning for religious and continent-based toxic content detection and classification
With time, numerous online communication platforms have emerged that allow people to express themselves, increasing the dissemination of toxic languages, such as racism, sexual harassment, and other negative behaviors that are not accepted in polite society. As a result, toxic language identification in online communication has emerged as a critical application of natural language processing. Numerous academic and industrial researchers have recently researched toxic language identification using machine learning algorithms. However, Nontoxic comments, including particular identification descriptors, such as Muslim, Jewish, White, and Black, were assigned unrealistically high toxicity ratings in several machine learning models. This research analyzes and compares modern deep learning algorithms for multilabel toxic comments classification. We explore two scenarios: the first is a multilabel classification of Religious toxic comments, and the second is a multilabel classification of race or toxic ethnicity comments with various word embeddings (GloVe, Word2vec, and FastText) without word embeddings using an ordinary embedding layer. Experiments show that the CNN model produced the best results for classifying multilabel toxic comments in both scenarios. We compared the outcomes of these modern deep learning model performances in terms of multilabel evaluation metrics
Compositional language processing for multilingual sentiment analysis
Programa Oficial de Doutoramento en Computación. 5009V01[Abstract] This dissertation presents new approaches in the field of sentiment
analysis and polarity classification, oriented towards obtaining the sentiment
of a phrase, sentence or document from a natural language
processing point of view. It makes a special emphasis on methods
to handle semantic composionality, i. e. the ability to compound the
sentiment of multiword phrases, where the global sentiment might
be different or even opposite to the one coming from each of their
their individual components; and the application of these methods to
multilingual scenarios.
On the one hand, we introduce knowledge-based approaches to calculate
the semantic orientation at the sentence level, that can handle
different phenomena for the purpose at hand (e. g. negation, intensification
or adversative subordinate clauses).
On the other hand, we describe how to build machine learning
models to perform polarity classification from a different perspective,
combining linguistic (lexical, syntactic and semantic) knowledge,
with an emphasis in noisy and micro-texts.
Experiments on standard corpora and international evaluation campaigns
show the competitiveness of the methods here proposed, in
monolingual, multilingual and code-switching scenarios.
The contributions presented in the thesis have potential applications
in the era of the Web 2.0 and social media, such as being able to
determine what is the view of society about products, celebrities or
events, identify their strengths and weaknesses or monitor how these
opinions evolve over time. We also show how some of the proposed
models can be useful for other data analysis tasks.[Resumen] Esta tesis presenta nuevas técnicas en el ámbito del análisis del sentimiento
y la clasificación de polaridad, centradas en obtener el sentimiento
de una frase, oración o documento siguiendo enfoques basados en
procesamiento del lenguaje natural. En concreto, nos centramos en
desarrollar métodos capaces de manejar la semántica composicional,
es decir, con la capacidad de componer el sentimiento de oraciones
donde la polaridad global puede ser distinta, o incluso opuesta, de la
que se obtendrÃa individualmente para cada uno de sus términos; y
cómo dichos métodos pueden ser aplicados en entornos multilingües.
En la primera parte de este trabajo, introducimos aproximaciones
basadas en conocimiento para calcular la orientación semántica a nivel
de oración, teniendo en cuenta construcciones lingüÃsticas relevantes
en el ámbito que nos ocupa (por ejemplo, la negación, intensificación,
o las oraciones subordinadas adversativas).
En la segunda parte, describimos cómo construir clasificadores de
polaridad basados en aprendizaje automático que combinan información
léxica, sintáctica y semántica; centrándonos en su aplicación sobre
textos cortos y de pobre calidad gramatical.
Los experimentos realizados sobre colecciones estándar y competiciones
de evaluación internacionales muestran la efectividad de los
métodos aquà propuestos en entornos monolingües, multilingües y
de code-switching.
Las contribuciones presentadas en esta tesis tienen diversas aplicaciones
en la era de la Web 2.0 y las redes sociales, como determinar la
opinión que la sociedad tiene sobre un producto, celebridad o evento;
identificar sus puntos fuertes y débiles o monitorizar cómo estas opiniones
evolucionan a lo largo del tiempo. Por último, también mostramos
cómo algunos de los modelos propuestos pueden ser útiles
para otras tareas de análisis de datos.[Resumo] Esta tese presenta novas técnicas no ámbito da análise do sentimento
e da clasificación da polaridade, orientadas a obter o sentimento dunha
frase, oración ou documento seguindo aproximacións baseadas
no procesamento da linguaxe natural. En particular, centrámosnos
en métodos capaces de manexar a semántica composicional: métodos
coa habilidade para compor o sentimento de oracións onde o sentimento
global pode ser distinto, ou incluso oposto, do que se obterÃa
individualmente para cada un dos seus términos; e como ditos métodos
poden ser aplicados en entornos multilingües.
Na primeira parte da tese, introducimos aproximacións baseadas
en coñecemento; para calcular a orientación semántica a nivel de oración,
tendo en conta construccións lingüÃsticas importantes no ámbito
que nos ocupa (por exemplo, a negación, a intensificación ou as oracións
subordinadas adversativas).
Na segunda parte, describimos como podemos construir clasificadores
de polaridade baseados en aprendizaxe automática e que combinan
información léxica, sintáctica e semántica, centrándonos en textos
curtos e de pobre calidade gramatical.
Os experimentos levados a cabo sobre coleccións estándar e competicións
de avaliación internacionais mostran a efectividade dos métodos
aquà propostos, en entornos monolingües, multilingües e de
code-switching.
As contribucións presentadas nesta tese teñen diversas aplicacións
na era da Web 2.0 e das redes sociais, como determinar a opinión que
a sociedade ten sobre un produto, celebridade ou evento; identificar
os seus puntos fortes e febles ou monitorizar como esas opinións
evolucionan o largo do tempo. Como punto final, tamén amosamos
como algúns dos modelos aquà propostos poden ser útiles para outras
tarefas de análise de datos
Automatic stance detection on political discourse in Twitter
The majority of opinion mining tasks in natural language processing (NLP) have been focused on sentiment analysis of texts about products and services while there is comparatively less research on automatic detection of political opinion. Almost all previous research work has been done for English, while this thesis is focused on the automatic detection of stance (whether he or she is favorable or not towards important political topic) from Twitter posts in Catalan, Spanish and English. The main objective of this work is to build and compare automatic stance detection systems using supervised both classic machine and deep learning techniques. We also study the influence of text normalization and perform experiments with differentt methods for word representations such as TF-IDF measures for unigrams, word embeddings, tweet embeddings, and contextual character-based embeddings. We obtain state-of-the-art results in the stance detection task on the IberEval 2018 dataset. Our research shows that text normalization and feature selection is important for the systems with unigram features, and does not affect the performance when working with word vector representations. Classic methods such as unigrams and SVM classifier still outperform deep learning techniques, but seem to be prone to overfitting. The classifiers trained using word vector representations and the neural network models encoded with contextual character-based vectors show greater robustness
Automatic stance detection on political discourse in Twitter
The majority of opinion mining tasks in natural language processing (NLP) have been focused on sentiment analysis of texts about products and services while there is comparatively less research on automatic detection of political opinion. Almost all previous research work has been done for English, while this thesis is focused on the automatic detection of stance (whether he or she is favorable or not towards important political topic) from Twitter posts in Catalan, Spanish and English. The main objective of this work is to build and compare automatic stance detection systems using supervised both classic machine and deep learning techniques. We also study the influence of text normalization and perform experiments with differentt methods for word representations such as TF-IDF measures for unigrams, word embeddings, tweet embeddings, and contextual character-based embeddings. We obtain state-of-the-art results in the stance detection task on the IberEval 2018 dataset. Our research shows that text normalization and feature selection is important for the systems with unigram features, and does not affect the performance when working with word vector representations. Classic methods such as unigrams and SVM classifier still outperform deep learning techniques, but seem to be prone to overfitting. The classifiers trained using word vector representations and the neural network models encoded with contextual character-based vectors show greater robustness
Multilingual sentiment analysis in social media.
252 p.This thesis addresses the task of analysing sentiment in messages coming from social media. The ultimate goal was to develop a Sentiment Analysis system for Basque. However, because of the socio-linguistic reality of the Basque language a tool providing only analysis for Basque would not be enough for a real world application. Thus, we set out to develop a multilingual system, including Basque, English, French and Spanish.The thesis addresses the following challenges to build such a system:- Analysing methods for creating Sentiment lexicons, suitable for less resourced languages.- Analysis of social media (specifically Twitter): Tweets pose several challenges in order to understand and extract opinions from such messages. Language identification and microtext normalization are addressed.- Research the state of the art in polarity classification, and develop a supervised classifier that is tested against well known social media benchmarks.- Develop a social media monitor capable of analysing sentiment with respect to specific events, products or organizations
- …