650 research outputs found

    Gender prediction from tweets: Improving neural representations with hand-crafted features

    Get PDF
    Author profiling is the characterization of an author through some key attributes such as gender, age, and language. In this paper, a RNN model with Attention (RNNwA) is proposed to predict the gender of a twitter user using their tweets. Both word level and tweet level attentions are utilized to learn ’where to look’. This model1 is improved by concatenating LSA-reduced n-gram features with the learned neural representation of a user. Both models are tested on three languages: English, Spanish, Arabic. The improved version of the proposed model (RNNwA + n-gram) achieves state-of-the-art performance on English and has competitive results on Spanish and Arabic

    Computational Sociolinguistics: A Survey

    Get PDF
    Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present a survey of the emerging field of "Computational Sociolinguistics" that reflects this increased interest. We aim to provide a comprehensive overview of CL research on sociolinguistic themes, featuring topics such as the relation between language and social identity, language use in social interaction and multilingual communication. Moreover, we demonstrate the potential for synergy between the research communities involved, by showing how the large-scale data-driven methods that are widely used in CL can complement existing sociolinguistic studies, and how sociolinguistics can inform and challenge the methods and assumptions employed in CL studies. We hope to convey the possible benefits of a closer collaboration between the two communities and conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication: 18th February, 201

    Automatic Detection of Online Jihadist Hate Speech

    Full text link
    We have developed a system that automatically detects online jihadist hate speech with over 80% accuracy, by using techniques from Natural Language Processing and Machine Learning. The system is trained on a corpus of 45,000 subversive Twitter messages collected from October 2014 to December 2016. We present a qualitative and quantitative analysis of the jihadist rhetoric in the corpus, examine the network of Twitter users, outline the technical procedure used to train the system, and discuss examples of use.Comment: 31 page

    Personality Analysis Using Classification on Turkish Tweets

    Get PDF
    According to the psychology literature, there is a strong correlation between personality traits and the linguistic behavior of people. Due to the increase in computer based communication, individuals express their personalities in written forms on social media. Hence, social media has become a convenient resource to analyze the relationship between personality traits and lingusitic behaviour. Although there is a vast amount of studies on social media, only a small number of them focus on personality prediction. In this work, the authors aim to model the relationship between the social media messages of individuals and big five personality traits as a supervised learning problem. They use Twitter posts and user statistics for analysis. They investigate various approaches for user profile representation, explore several supervised learning techniques, and present comparative analysis results. The results confirm the findings of psychology literature, and they show that computational analysis of tweets using supervised learning methods can be used to determine the personality of individuals

    Relationship Between Personality Patterns and Harmfulness : Analysis and Prediction Based on Sentence Embedding

    Get PDF
    This paper hypothesizes that harmful utterances need to be judged in the context of whole sentences, and the authors extract features of harmful expressions using a general-purpose language model. Based on the extracted features, the authors propose a method to predict the presence or absence of harmful categories. In addition, the authors believe that it is possible to analyze users who incite others by combining this method with research on analyzing the personality of the speaker from statements on social networking sites. The results confirmed that the proposed method can judge the possibility of harmful comments with higher accuracy than simple dictionary-based models or models using a distributed representation of words. The relationship between personality patterns and harmful expressions was also confirmed by an analysis based on a harmful judgment model

    Predicting & Optimizing Airlines Customer Satisfaction Using Classification

    Get PDF
    This research is going to be a machine learning project that aims to study the various factors that may play a role in forming customer satisfaction response and tries to figure out which attributes or combination of them are the driver of positive customer satisfaction. The research is going to use initially some dataset from Kaggle (explained in the section of data source) in order to run machine learning algorithms and creating a predictor that would help airlines in predicting which customers are satisfied and trying to have a proactive reaction in case of negative feedback, so we can make it up to the annoyed customer and get him satisfied. The research is going to examine several classification algorithms and tries to tune them in order to get the best result. Then will do experiments on resulting models and tries to find the optimal one among the others
    corecore