650 research outputs found
Gender prediction from tweets: Improving neural representations with hand-crafted features
Author profiling is the characterization of an author through some key attributes
such as gender, age, and language. In this paper, a RNN model with Attention
(RNNwA) is proposed to predict the gender of a twitter user using their tweets.
Both word level and tweet level attentions are utilized to learn ’where to look’.
This model1 is improved by concatenating LSA-reduced n-gram features with the
learned neural representation of a user. Both models are tested on three languages:
English, Spanish, Arabic. The improved version of the proposed model (RNNwA
+ n-gram) achieves state-of-the-art performance on English and has competitive
results on Spanish and Arabic
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
Automatic Detection of Online Jihadist Hate Speech
We have developed a system that automatically detects online jihadist hate
speech with over 80% accuracy, by using techniques from Natural Language
Processing and Machine Learning. The system is trained on a corpus of 45,000
subversive Twitter messages collected from October 2014 to December 2016. We
present a qualitative and quantitative analysis of the jihadist rhetoric in the
corpus, examine the network of Twitter users, outline the technical procedure
used to train the system, and discuss examples of use.Comment: 31 page
Personality Analysis Using Classification on Turkish Tweets
According to the psychology literature, there is a strong correlation between personality traits and the linguistic behavior of people. Due to the increase in computer based communication, individuals express their personalities in written forms on social media. Hence, social media has become a convenient resource to analyze the relationship between personality traits and lingusitic behaviour. Although there is a vast amount of studies on social media, only a small number of them focus on personality prediction. In this work, the authors aim to model the relationship between the social media messages of individuals and big five personality traits as a supervised learning problem. They use Twitter posts and user statistics for analysis. They investigate various approaches for user profile representation, explore several supervised learning techniques, and present comparative analysis results. The results confirm the findings of psychology literature, and they show that computational analysis of tweets using supervised learning methods can be used to determine the personality of individuals
Relationship Between Personality Patterns and Harmfulness : Analysis and Prediction Based on Sentence Embedding
This paper hypothesizes that harmful utterances need to be judged in the context of whole sentences, and the authors extract features of harmful expressions using a general-purpose language model. Based on the extracted features, the authors propose a method to predict the presence or absence of harmful categories. In addition, the authors believe that it is possible to analyze users who incite others by combining this method with research on analyzing the personality of the speaker from statements on social networking sites. The results confirmed that the proposed method can judge the possibility of harmful comments with higher accuracy than simple dictionary-based models or models using a distributed representation of words. The relationship between personality patterns and harmful expressions was also confirmed by an analysis based on a harmful judgment model
Predicting & Optimizing Airlines Customer Satisfaction Using Classification
This research is going to be a machine learning project that aims to study the various factors that may play a role in forming customer satisfaction response and tries to figure out which attributes or combination of them are the driver of positive customer satisfaction. The research is going to use initially some dataset from Kaggle (explained in the section of data source) in order to run machine learning algorithms and creating a predictor that would help airlines in predicting which customers are satisfied and trying to have a proactive reaction in case of negative feedback, so we can make it up to the annoyed customer and get him satisfied. The research is going to examine several classification algorithms and tries to tune them in order to get the best result. Then will do experiments on resulting models and tries to find the optimal one among the others
- …